Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roary not generating pan_genome_reference.fa #223

Closed
swlong opened this issue Jan 13, 2016 · 4 comments
Closed

Roary not generating pan_genome_reference.fa #223

swlong opened this issue Jan 13, 2016 · 4 comments

Comments

@swlong
Copy link

swlong commented Jan 13, 2016

Moving on from my earlier work, I am attempting to generate a "panplasmidome" using reference plasmid sequences from Genbank. Of course, as these plasmids are diverse, there is no real "core" to speak of, however having a "pan_genome_reference" could be useful for mapping short reads looking for known plasmid gene content.

I've run roary several times but the pan_genome_reference.fa isn't being generated. It appears the other usual output files are being generated including gene_presence_absence.csv and all of the Rtab files. No errors or warnings are being generated.

Altering the threshold for calling genes "core" to a low percentage so that some core genes are called does not appear to change the script behavior. Is there an inherent problem with using Roary to generate a pan-plasmid reference? I've tried gff converted from gb files, as well as fresh FASTA files annotated by Prokka. Same result. If this is in fact an error, and not a known limitation, I'll upload sample data.

Best,
S. Wesley Long

@andrewjpage
Copy link
Member

Sounds like an interesting project, although plasmids vary so much you'd
need to select them carefully to begin with. Do you get any 'core' genes in
the summary stats file? Make sure to use -e -n as parameters.
Andrew
On 13 Jan 2016 21:52, "S. Wesley Long" notifications@github.com wrote:

Moving on from my earlier work, I am attempting to generate a
"panplasmidome" using reference plasmid sequences from Genbank Of course,
as these plasmids are diverse, there is no real "core" to speak of, however
having a "pan_genome_reference" could be useful for mapping short reads
looking for known plasmid gene content

I've run roary several times but the pan_genome_referencefa isn't being
generated It appears the other usual output files are being generated
including gene_presence_absencecsv and all of the Rtab files No errors or
warnings are being generated

Altering the threshold for calling genes "core" to a low percentage so
that some core genes are called does not appear to change the script
behavior Is there an inherent problem with using Roary to generate a
pan-plasmid reference? I've tried gff converted from gb files, as well as
fresh FASTA files annotated by Prokka Same result If this is in fact an
error, and not a known limitation, I'll upload sample data

Best,
S Wesley Long


Reply to this email directly or view it on GitHub
#223.

@andrewjpage
Copy link
Member

Also maybe turn off splitting paralogs since the gene order varys so much,
and people don't set the start consistently.
On 13 Jan 2016 21:57, "Andrew Page" andrewjpage@gmail.com wrote:

Sounds like an interesting project, although plasmids vary so much you'd
need to select them carefully to begin with. Do you get any 'core' genes in
the summary stats file? Make sure to use -e -n as parameters.
Andrew
On 13 Jan 2016 21:52, "S. Wesley Long" notifications@github.com wrote:

Moving on from my earlier work, I am attempting to generate a
"panplasmidome" using reference plasmid sequences from Genbank Of course,
as these plasmids are diverse, there is no real "core" to speak of, however
having a "pan_genome_reference" could be useful for mapping short reads
looking for known plasmid gene content

I've run roary several times but the pan_genome_referencefa isn't being
generated It appears the other usual output files are being generated
including gene_presence_absencecsv and all of the Rtab files No errors or
warnings are being generated

Altering the threshold for calling genes "core" to a low percentage so
that some core genes are called does not appear to change the script
behavior Is there an inherent problem with using Roary to generate a
pan-plasmid reference? I've tried gff converted from gb files, as well as
fresh FASTA files annotated by Prokka Same result If this is in fact an
error, and not a known limitation, I'll upload sample data

Best,
S Wesley Long


Reply to this email directly or view it on GitHub
#223.

@swlong
Copy link
Author

swlong commented Jan 14, 2016

Andrew,

Thanks for the help! No core genes were found in the summary_stats using the defaults, which was no surprise. I hadn't run it with -e -n because I wasn't expecting a core gene alignment (as there were no core genes). However, running Roary with -e -n results in the generation of pan_genome_reference.fa as expected by Roary. Oddly enough the post-analysis step is still taking awhile to run - I say odd because there should be no core genes to align. I apologize that it wasn't clear to me that -e -n were necessary to generate the pan_genome_reference.

Running with the -s to avoid splitting paralogs did not change the generation of pan_genome_reference.fa (or the lack thereof).

Best,
S. Wesley Long

@andrewjpage
Copy link
Member

Sorry if it wasnt clear in the documentation, its on my list to improve it somewhat.
Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants