file_suffix flag #22

Biofarmer · 2022-04-06T09:50:22Z

Dear Gunc Team,

I am using GUNC v1.0.5, and want to ask a question about the --file_suffix. The suffix of my input files is .fna, and some genomes from NCBI may contain .fna in the middle of genome names. If providing with --input_dir and --file_suffix .fna, I am wondering whether GUNC could make right action on those kind of genomes that contains .fna in the middle of names? So I provide the --input_file with the path of each genome, may I ask whether --file_suffix .fna is still needed when --input_file is provide? Or any other suggestions?

Many thanks
Wang

Biofarmer · 2022-04-07T02:30:59Z

In addition, may I ask whether compressed fasta (.fna.gz) could be directly used by GUNC?
Thanks

fullama · 2022-04-07T08:17:29Z

If providing with --input_dir and --file_suffix .fna, I am wondering whether GUNC could make right action on those kind of genomes that contains .fna in the middle of names?

gunc will take everything before the first occurance of .fna in the file name and use that as the sample name in the output

may I ask whether --file_suffix .fna is still needed when --input_file is provide?

No, but the sample names in your output will contain the suffix, providing the suffix is only there to allow gunc to remove it from the input filename

may I ask whether compressed fasta (.fna.gz) could be directly used by GUNC?

Yes

any other questions let me know!

Biofarmer · 2022-04-07T09:18:11Z

Hi, many thanks for your reply. So, the function of --file_suffix is just to provide the file name in the output, there is no any effect on the detection of chimerism in genomes, do I understand correctly?

fullama · 2022-04-07T09:19:29Z

Correct! :)

Biofarmer · 2022-04-07T09:20:13Z

Okay, many thanks.

Biofarmer · 2022-04-08T06:38:38Z

Sorry, a further question about the database of progenomes or GTDB, may I ask which one is generally recommended to use for the detection of chimerism in genomes?
Thanks

defleury · 2022-04-08T07:19:16Z

Hi @Biofarmer !

Both databases work fine, we found little difference in accuracy. However, since the GUNC db based on proGenomes is smaller, it is faster to run so we use it by default.

Biofarmer · 2022-04-08T07:49:04Z

Okay, it is good to know. Thanks

Biofarmer · 2022-04-18T01:07:02Z

Hi, I am running GUNC for 10000 genomes with 5 threads, and it has been running 8 days, and now at Running Diamond period. There is no "diamond_output" folder in the output directory, is it normal?
In addition, may I ask whether it is possible to check the process of diamond? Because I want to know how long I still need to wait, if longer, I may kill the job and rerun it with more threads or split the genomes into parts.
PS, I have been run another 10000 genomes with 10 threads before, which was finished with 3 days.
Thanks

fullama · 2022-04-19T08:05:14Z

there is no way currently of seeing the progress of diamond, the run time can vary depending on the input.. but maybe you just want to run them in smaller batches? if you are running so many genomes at once i would increase both cpus and memory..

Biofarmer · 2022-04-19T08:40:50Z

Hi, thanks for your reply. In addition, if I understand correctly, the genecall files will be merged from all input genomes, if so, may I ask whether the label of each contig (text after ">" but before the first space, which is taken by prodigal as gene ID) should be unique for all input genomes? Or it does not matter? Thanks

fullama · 2022-04-19T08:56:41Z

it shouldnt matter.. they are merged but are tagged with the name of the genome file so they can be separated after diamond has run

Biofarmer · 2022-04-19T09:16:24Z

Okay...that's great. Thanks. The merged genecall files is intermediate, and has been deleted once finished and cannot be seen, right?

Biofarmer · 2022-04-19T12:54:18Z

In addition, as the --temp_dir directory by default is Current working directory. If I submit several jobs in the same working directory at once with different output directory, may I ask whether GUNC will select the right temporary files from the same working directory? Is the temporary file of each job with different names? Thanks

Biofarmer · 2022-05-12T03:18:56Z

Hi, may I ask the answers from questions as above?

fullama · 2022-05-12T21:03:46Z

I think it would be fine but try it out and see to be sure..

Biofarmer · 2022-05-13T10:30:13Z

Hi, many thanks for your confirmation.

Biofarmer · 2022-05-26T10:03:19Z

Hi, I just used GUNC to check the genome from NCBI, GCF_902703415.1_Combinated_assembly_ONT_-_Illumina_genomic.fna. It failed, however, when changed the genome name without _-_ to GCF_902703415.1_Combinated_assembly_ONT_---_Illumina_genomic.fna, or GCF_902703415.1_Combinated_assembly_ONT_--_Illumina_genomic.fna, or GCF_902703415.1_Combinated_assembly_ONT__Illumina_genomic.fna, or
GCF_902703415.1_Combinated_assembly_ONT-Illumina_genomic.fna, or
GCF_902703415.1_Combinated_assembly_ONT_Illumina_genomic.fna. Anyone worked with GUNC. May I ask why only genome name containing _-_ format did not work with GUNC?

And GUNC worked for genome GCF_902109435.1_40087_F01_genomic.fna, but there was no value in output, is it due to its small size of genome?

Thanks

Biofarmer · 2022-05-27T04:09:54Z

Another question: there is a slight difference in the number of n_genes_mapped in the output when running one genome individually (--input_fasta genome.fna) or running a few genomes together (--input_dir genome_folder/ --file_suffix .fna). May I ask whether it is normal and why?
Thanks

fullama · 2022-05-30T08:31:49Z

samples with _-_ dont work because internally gunc uses it as a delimiter to label sequences with the samples name when merging them together.. I didnt think anyone would ever use _-_ in a sample name.. ill see if i can change it for the next version..

fullama · 2022-05-30T08:44:12Z

Another question: there is a slight difference in the number of n_genes_mapped in the output when running one genome individually (--input_fasta genome.fna) or running a few genomes together (--input_dir genome_folder/ --file_suffix .fna). May I ask whether it is normal and why? Thanks

Can you give an example of where the output differs, so i can look more closely?

Biofarmer · 2022-05-30T10:11:29Z

samples with _-_ dont work because internally gunc uses it as a delimiter to label sequences with the samples name when merging them together.. I didnt think anyone would ever use _-_ in a sample name.. ill see if i can change it for the next version..

Thanks for reply.

Biofarmer · 2022-05-30T10:24:56Z

Another question: there is a slight difference in the number of n_genes_mapped in the output when running one genome individually (--input_fasta genome.fna) or running a few genomes together (--input_dir genome_folder/ --file_suffix .fna). May I ask whether it is normal and why? Thanks

Can you give an example of where the output differs, so i can look more closely?

Hi, just take the genomes from NCBI (GCF_902703415.1 and GCF_900232175.1) for example, the n_genes_mapped is 5145 and 6693 when tested together, and the value will be 5157 and 6740 respectively when tested individually. The overall conclusion is almost same.
By the way, may I ask whether the result of pass.GUNC is based on the third digit of clade_separation_score after decimal? Because I see same genomes that are reported with clade_separation_score 0.45, sometimes are given False or sometimes True for pass.GUNC. So I am wondering the clade_separation_score is just reported with two digits after decimal, but judgement for pass.GUNC is based on at least three digits after decimal.

fullama · 2023-05-31T13:04:23Z

Hi, just take the genomes from NCBI (GCF_902703415.1 and GCF_900232175.1) for example, the n_genes_mapped is 5145 and 6693 when tested together, and the value will be 5157 and 6740 respectively when tested individually. The overall conclusion is almost same.

this will be fixed in the next version of gunc

may I ask whether the result of pass.GUNC is based on the third digit of clade_separation_score after decimal?

yes, the output will be amended to include more decimal places in the next version also

fullama closed this as completed Apr 7, 2022

fullama reopened this May 30, 2022

fullama closed this as completed May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file_suffix flag #22

file_suffix flag #22

Biofarmer commented Apr 6, 2022 •

edited

Loading

Biofarmer commented Apr 7, 2022

fullama commented Apr 7, 2022

Biofarmer commented Apr 7, 2022

fullama commented Apr 7, 2022

Biofarmer commented Apr 7, 2022

Biofarmer commented Apr 8, 2022

defleury commented Apr 8, 2022

Biofarmer commented Apr 8, 2022

Biofarmer commented Apr 18, 2022

fullama commented Apr 19, 2022

Biofarmer commented Apr 19, 2022

fullama commented Apr 19, 2022

Biofarmer commented Apr 19, 2022 •

edited

Loading

Biofarmer commented Apr 19, 2022 •

edited

Loading

Biofarmer commented May 12, 2022

fullama commented May 12, 2022

Biofarmer commented May 13, 2022

Biofarmer commented May 26, 2022 •

edited

Loading

Biofarmer commented May 27, 2022

fullama commented May 30, 2022 •

edited

Loading

fullama commented May 30, 2022

Biofarmer commented May 30, 2022

Biofarmer commented May 30, 2022

fullama commented May 31, 2023

file_suffix flag #22

file_suffix flag #22

Comments

Biofarmer commented Apr 6, 2022 • edited Loading

Biofarmer commented Apr 7, 2022

fullama commented Apr 7, 2022

Biofarmer commented Apr 7, 2022

fullama commented Apr 7, 2022

Biofarmer commented Apr 7, 2022

Biofarmer commented Apr 8, 2022

defleury commented Apr 8, 2022

Biofarmer commented Apr 8, 2022

Biofarmer commented Apr 18, 2022

fullama commented Apr 19, 2022

Biofarmer commented Apr 19, 2022

fullama commented Apr 19, 2022

Biofarmer commented Apr 19, 2022 • edited Loading

Biofarmer commented Apr 19, 2022 • edited Loading

Biofarmer commented May 12, 2022

fullama commented May 12, 2022

Biofarmer commented May 13, 2022

Biofarmer commented May 26, 2022 • edited Loading

Biofarmer commented May 27, 2022

fullama commented May 30, 2022 • edited Loading

fullama commented May 30, 2022

Biofarmer commented May 30, 2022

Biofarmer commented May 30, 2022

fullama commented May 31, 2023

Biofarmer commented Apr 6, 2022 •

edited

Loading

Biofarmer commented Apr 19, 2022 •

edited

Loading

Biofarmer commented Apr 19, 2022 •

edited

Loading

Biofarmer commented May 26, 2022 •

edited

Loading

fullama commented May 30, 2022 •

edited

Loading