-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file_suffix flag #22
Comments
In addition, may I ask whether compressed fasta (.fna.gz) could be directly used by GUNC? |
gunc will take everything before the first occurance of
No, but the sample names in your output will contain the suffix, providing the suffix is only there to allow gunc to remove it from the input filename
Yes any other questions let me know! |
Hi, many thanks for your reply. So, the function of --file_suffix is just to provide the file name in the output, there is no any effect on the detection of chimerism in genomes, do I understand correctly? |
Correct! :) |
Okay, many thanks. |
Sorry, a further question about the database of progenomes or GTDB, may I ask which one is generally recommended to use for the detection of chimerism in genomes? |
Hi @Biofarmer ! Both databases work fine, we found little difference in accuracy. However, since the GUNC db based on proGenomes is smaller, it is faster to run so we use it by default. |
Okay, it is good to know. Thanks |
Hi, I am running GUNC for 10000 genomes with 5 threads, and it has been running 8 days, and now at Running Diamond period. There is no "diamond_output" folder in the output directory, is it normal? |
there is no way currently of seeing the progress of diamond, the run time can vary depending on the input.. but maybe you just want to run them in smaller batches? if you are running so many genomes at once i would increase both cpus and memory.. |
Hi, thanks for your reply. In addition, if I understand correctly, the genecall files will be merged from all input genomes, if so, may I ask whether the label of each contig (text after ">" but before the first space, which is taken by prodigal as gene ID) should be unique for all input genomes? Or it does not matter? Thanks |
it shouldnt matter.. they are merged but are tagged with the name of the genome file so they can be separated after diamond has run |
Okay...that's great. Thanks. The merged genecall files is intermediate, and has been deleted once finished and cannot be seen, right? |
In addition, as the --temp_dir directory by default is Current working directory. If I submit several jobs in the same working directory at once with different output directory, may I ask whether GUNC will select the right temporary files from the same working directory? Is the temporary file of each job with different names? Thanks |
Hi, may I ask the answers from questions as above? |
I think it would be fine but try it out and see to be sure.. |
Hi, many thanks for your confirmation. |
Hi, I just used GUNC to check the genome from NCBI, And GUNC worked for genome GCF_902109435.1_40087_F01_genomic.fna, but there was no value in output, is it due to its small size of genome? Thanks |
Another question: there is a slight difference in the number of n_genes_mapped in the output when running one genome individually (--input_fasta genome.fna) or running a few genomes together (--input_dir genome_folder/ --file_suffix .fna). May I ask whether it is normal and why? |
samples with |
Can you give an example of where the output differs, so i can look more closely? |
Thanks for reply. |
Hi, just take the genomes from NCBI (GCF_902703415.1 and GCF_900232175.1) for example, the n_genes_mapped is 5145 and 6693 when tested together, and the value will be 5157 and 6740 respectively when tested individually. The overall conclusion is almost same. |
this will be fixed in the next version of gunc
yes, the output will be amended to include more decimal places in the next version also |
Dear Gunc Team,
I am using GUNC v1.0.5, and want to ask a question about the --file_suffix. The suffix of my input files is .fna, and some genomes from NCBI may contain .fna in the middle of genome names. If providing with --input_dir and --file_suffix .fna, I am wondering whether GUNC could make right action on those kind of genomes that contains .fna in the middle of names? So I provide the --input_file with the path of each genome, may I ask whether --file_suffix .fna is still needed when --input_file is provide? Or any other suggestions?
Many thanks
Wang
The text was updated successfully, but these errors were encountered: