Roary not generating pan_genome_reference.fa #223

swlong · 2016-01-13T21:52:13Z

Moving on from my earlier work, I am attempting to generate a "panplasmidome" using reference plasmid sequences from Genbank. Of course, as these plasmids are diverse, there is no real "core" to speak of, however having a "pan_genome_reference" could be useful for mapping short reads looking for known plasmid gene content.

I've run roary several times but the pan_genome_reference.fa isn't being generated. It appears the other usual output files are being generated including gene_presence_absence.csv and all of the Rtab files. No errors or warnings are being generated.

Altering the threshold for calling genes "core" to a low percentage so that some core genes are called does not appear to change the script behavior. Is there an inherent problem with using Roary to generate a pan-plasmid reference? I've tried gff converted from gb files, as well as fresh FASTA files annotated by Prokka. Same result. If this is in fact an error, and not a known limitation, I'll upload sample data.

Best,
S. Wesley Long

andrewjpage · 2016-01-13T21:57:57Z

Sounds like an interesting project, although plasmids vary so much you'd
need to select them carefully to begin with. Do you get any 'core' genes in
the summary stats file? Make sure to use -e -n as parameters.
Andrew
On 13 Jan 2016 21:52, "S. Wesley Long" notifications@github.com wrote:

Moving on from my earlier work, I am attempting to generate a
"panplasmidome" using reference plasmid sequences from Genbank Of course,
as these plasmids are diverse, there is no real "core" to speak of, however
having a "pan_genome_reference" could be useful for mapping short reads
looking for known plasmid gene content

I've run roary several times but the pan_genome_referencefa isn't being
generated It appears the other usual output files are being generated
including gene_presence_absencecsv and all of the Rtab files No errors or
warnings are being generated

Altering the threshold for calling genes "core" to a low percentage so
that some core genes are called does not appear to change the script
behavior Is there an inherent problem with using Roary to generate a
pan-plasmid reference? I've tried gff converted from gb files, as well as
fresh FASTA files annotated by Prokka Same result If this is in fact an
error, and not a known limitation, I'll upload sample data

Best,
S Wesley Long

—
Reply to this email directly or view it on GitHub
#223.

andrewjpage · 2016-01-13T22:07:06Z

Also maybe turn off splitting paralogs since the gene order varys so much,
and people don't set the start consistently.
On 13 Jan 2016 21:57, "Andrew Page" andrewjpage@gmail.com wrote:

Sounds like an interesting project, although plasmids vary so much you'd
need to select them carefully to begin with. Do you get any 'core' genes in
the summary stats file? Make sure to use -e -n as parameters.
Andrew
On 13 Jan 2016 21:52, "S. Wesley Long" notifications@github.com wrote:

Moving on from my earlier work, I am attempting to generate a
"panplasmidome" using reference plasmid sequences from Genbank Of course,
as these plasmids are diverse, there is no real "core" to speak of, however
having a "pan_genome_reference" could be useful for mapping short reads
looking for known plasmid gene content

I've run roary several times but the pan_genome_referencefa isn't being
generated It appears the other usual output files are being generated
including gene_presence_absencecsv and all of the Rtab files No errors or
warnings are being generated

Altering the threshold for calling genes "core" to a low percentage so
that some core genes are called does not appear to change the script
behavior Is there an inherent problem with using Roary to generate a
pan-plasmid reference? I've tried gff converted from gb files, as well as
fresh FASTA files annotated by Prokka Same result If this is in fact an
error, and not a known limitation, I'll upload sample data

Best,
S Wesley Long

—
Reply to this email directly or view it on GitHub
#223.

swlong · 2016-01-14T00:40:07Z

Andrew,

Thanks for the help! No core genes were found in the summary_stats using the defaults, which was no surprise. I hadn't run it with -e -n because I wasn't expecting a core gene alignment (as there were no core genes). However, running Roary with -e -n results in the generation of pan_genome_reference.fa as expected by Roary. Oddly enough the post-analysis step is still taking awhile to run - I say odd because there should be no core genes to align. I apologize that it wasn't clear to me that -e -n were necessary to generate the pan_genome_reference.

Running with the -s to avoid splitting paralogs did not change the generation of pan_genome_reference.fa (or the lack thereof).

Best,
S. Wesley Long

andrewjpage · 2016-01-15T11:51:55Z

Sorry if it wasnt clear in the documentation, its on my list to improve it somewhat.
Andrew

andrewjpage closed this as completed Jul 25, 2016

matinnuhamunada mentioned this issue Mar 4, 2022

Roary expansion and fixes NBChub/bgcflow#122

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roary not generating pan_genome_reference.fa #223

Roary not generating pan_genome_reference.fa #223

swlong commented Jan 13, 2016

andrewjpage commented Jan 13, 2016

andrewjpage commented Jan 13, 2016

swlong commented Jan 14, 2016

andrewjpage commented Jan 15, 2016

Roary not generating pan_genome_reference.fa #223

Roary not generating pan_genome_reference.fa #223

Comments

swlong commented Jan 13, 2016

andrewjpage commented Jan 13, 2016

andrewjpage commented Jan 13, 2016

swlong commented Jan 14, 2016

andrewjpage commented Jan 15, 2016