-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roary_plots.py generating flawed plots #221
Comments
Something seems to have gone quite wrong indeed, sorry about that. How many core genes do you get in your summary statistics file? Where did the input files come from (PROKKA?) and does each one have a unique prefix so that the IDs of each gene are unique to the set? |
Summary stats file: This was run with default settings for blastp. Input files were downloaded from genbank as gb files with full sequence then converted to GFF3 using the bp_genbank2gff3.pl script. Input files have unique prefixes - a few input files were "fixed" by Roary for having duplicate gene IDs. I was going to upload a smaller sample run to see if I could replicate problems but IT forced a reboot on my system overnight and killed my run... hopefully later today or tomorrow I should have some actual datafiles to share. Additional oddity: pangenome_matrix.png reports 51 strains in tree even though only 48 strains were used to generate the dataset. |
I have replicated the issues with a smaller dataset (5 Cdiff genomes) and the issues appear to be the same. The tar.gz of the directory is a bit too large to upload directly here so I created a repository to allow for easy access. Hoping to get to the bottom of this, as Roary is a very useful tool. Directory containing all files and output can be found here: In short, here was my workflow:
The issues, including the Bio::Root::Exception error during the post analysis step and the appearance of the roary_plots remains the same. I'm hoping providing this data helps find a solution. Let me know if I can be of any further service. Best, P.S. Summary stats for the Cdiff dataset (only 5 genomes): |
Thanks for the data. It looks like Roary ran to completion. I reran the roary_plots script and it produced a proper tree, so I suspect theres an issue with versioning of the python dependancies for this script (Phylo). We'll take a look to see if we can track it down. JSCandy should be able to show you the same information (its an experimental interactive viewer) if you want to give it a shot. Once you load up your data you can change the viewing mode by clicking on the JSCandy logo. Theres also the roary2svg.pl script which gives a similar view (but not against a tree). |
Andrew, Thanks for the help. I do agree that it looks like roary is running appropriately. I'm not sure what to make of the downstream Bio::Root::Exception. Thanks for the JSCandy suggestion, it appears to work in a similar manner to generate the matrix plot and may have other uses as well. Best, |
Hi Wesley, I believe that the very last version of the script fixes all the problems you've witnessed:
Plus, now there's a new option "--labels" to add sample names to the tree. I also agree that JScandy is very cool and useful. Marco |
Hi everybody |
Hi there, from the look of the error, it seems that you do not have the script in the same directory where you are calling it. Please download it from here and place it in your working directory. Also, please keep in mind that the the script doesn't necessarily expect you to have the input files named in a certain way; just run the command above by changing the input to match your input files:
Hope this helps, |
Dear Marco
Thanks for your email. But Please send me a clear script to draw plots of
pan genome.
Regards
…On Mon, Feb 26, 2018 at 4:52 AM, Marco Galardini ***@***.***> wrote:
Hi there,
from the look of the error, it seems that you do not have the script in
the same directory where you are calling it. Please download it from here
<https://raw.githubusercontent.com/sanger-pathogens/Roary/master/contrib/roary_plots/roary_plots.py>
and place it in your working directory.
Also, please keep in mind that the the script doesn't necessarily expect
you to have the input files named in a certain way; just run the command
above by changing the input to match your input files:
python roary_plots.py YOUR_TREE.nwk YOUR_ROARY_OUTPUT.csv
Hope this helps,
Marco
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#221 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AjGNO3Dwzxis740WkLJ-f_MRSzDiCh_0ks5tYn7VgaJpZM4G_3IR>
.
--
Ali Chenari Bouket
Ph.D. in Plant Pathology
アリ チェナリ ブーケット
哲学博士
植物病理学
|
Hi, I'm not sure I understood your last message: if you look again at my previous reply you'll see a link to the
Hope this helps, |
Hi All, I executed the roary_plots.py for 12 gffs and the trees were drawn alright but no labels were given. |
Hi, did you add the |
Hi @mgalardini . I was confused about the --label options so I did not include that . Below is the command I used |
I see; retry it with the |
Thanks @mgalardini the --labels worked. I noticed that some of my label were truncated. |
I believe you could add the `--format svg` option, so that the output files
are saved in that format (i.e. pangenome_matrix.svg), which can then be
manipulated with inkscape or illustrator. I believe the full labels are
there, just hidden below the presence/absence matrix.
Hope this helps.
…On Wed, Jun 17, 2020 at 2:49 PM vincentappiah ***@***.***> wrote:
Thanks @mgalardini <https://github.com/mgalardini> the --labels worked. I
noticed that some of my label were truncated.
The labels with 4 characters were okay but those longer (such as
mycobacterium_ulcerans_strain) were truncated to Mycobacter.
Is there a way to show the full names?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAISWX2VXKERPRUU6FMNURDRXEF3ZANCNFSM4BX7OIIQ>
.
--
Marco Galardini
|
Thanks @mgalardini I added the --format svg option. I can now edit using inkscape. |
Great, glad it worked! |
Hi @mgalardini, is there a way to change the color of the output graph? |
Yes, see this line and change the |
Hi there,
Thanks for your reply, I am going to check it out.
Greetings, Julio
…On Fri, Jul 9, 2021, 8:21 PM Marco Galardini ***@***.***> wrote:
Yes, see this line
<https://github.com/sanger-pathogens/Roary/blob/12a726e9ef87bb73a19ed4d22fe7e6b3551d6da1/contrib/roary_plots/roary_plots.py#L119>
and change the plt.cm.Blues part to have a different color for the
heatmap.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APOSHU46CVV7TVDC3UBJHM3TW3SWPANCNFSM4BX7OIIQ>
.
|
I am running roary_plots.py after successfully running Roary as well as FastTree similar to the instructions provided, and the following occurs:
A warning is generated:
FutureWarning: order is deprecated. use sort_values(...)
idx = roary.sum(axis=1).order(ascending=False).index
The three plots are generated but they are all erroneous in one way or another.
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open pan_genome_sequences/group_16429.fa.aln: No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::Root::IO::_initialize_io /usr/share/perl5/Bio/Root/IO.pm:351
STACK: Bio::SeqIO::_initialize /usr/share/perl5/Bio/SeqIO.pm:491
STACK: Bio::SeqIO::fasta::_initialize /usr/share/perl5/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /usr/share/perl5/Bio/SeqIO.pm:372
STACK: Bio::SeqIO::new /usr/share/perl5/Bio/SeqIO.pm:413
STACK: Bio::Roary::SortFasta::_input_seqio /usr/local/share/perl/5.18.2/Bio/Roary/SortFasta.pm:27
STACK: Bio::Roary::SortFasta::sort_fasta /usr/local/share/perl/5.18.2/Bio/Roary/SortFasta.pm:68
STACK: Bio::Roary::CommandLine::GeneAlignmentFromNucleotides::run /usr/local/share/perl/5.18.2/Bio/Roary/CommandLine/GeneAlignmentFromNucleotides.pm:107
STACK: /usr/local/bin/protein_alignment_from_nucleotides:14
This seems to be happening for some (but not all) clusters... yet core_gene_alignment.aln is still being generated and contains data.
Has anyone else seen this problem? I will attach example data momentarily. I am running Roary on a Biolinux 8 box.
Best,
S. W. Long
The text was updated successfully, but these errors were encountered: