Could not obtain pan_genome_sequences #426

pedrorvc · 2018-10-17T14:34:27Z

Hello, I am using Roary to generate a pan-genome with 274 .gff files obtained from Prokka.
The Roary command I'm using is roary -e -f roary_intermediates -n -p 4 -z *.gff, in order to obtain the nucleotide sequences of the results.
Roary completes the job but the following warning is given:

Attribute (fasta_file) does not pass the type constraint because: Validation failed for 'Str' with value undef at reader Bio::Roary::Output::GroupsMultifastaNucleotide::fasta_file (defined at /usr/local/share/perl/5.18.2/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 29) line 15
	Bio::Roary::Output::GroupsMultifastaNucleotide::fasta_file('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x17b208700)') called at /usr/local/share/perl/5.18.2/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 43
	Bio::Roary::Output::GroupsMultifastaNucleotide::_build__input_seqio('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x17b208700)') called at reader Bio::Roary::Output::GroupsMultifastaNucleotide::_input_seqio (defined at /usr/local/share/perl/5.18.2/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 30) line 7
	Bio::Roary::Output::GroupsMultifastaNucleotide::_input_seqio('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x17b208700)') called at /usr/local/share/perl/5.18.2/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 53
	Bio::Roary::Output::GroupsMultifastaNucleotide::populate_files('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x17b208700)') called at /usr/local/share/perl/5.18.2/Bio/Roary/Output/GroupsMultifastasNucleotide.pm line 65
	Bio::Roary::Output::GroupsMultifastasNucleotide::create_files('Bio::Roary::Output::GroupsMultifastasNucleotide=HASH(0x17abef948)') called at /usr/local/share/perl/5.18.2/Bio/Roary/PostAnalysis.pm line 131
	Bio::Roary::PostAnalysis::run('Bio::Roary::PostAnalysis=HASH(0x2312168)') called at /usr/local/share/perl/5.18.2/Bio/Roary/CommandLine/RoaryPostAnalysis.pm line 128
	Bio::Roary::CommandLine::RoaryPostAnalysis::run('Bio::Roary::CommandLine::RoaryPostAnalysis=HASH(0x22ebc20)') called at /usr/local/bin/pan_genome_post_analysis line 14

When I checked the results the pan_genome_sequences was empty.
Does this warning mean that some of my .gff files do not contain the fasta sequence?

P.S.: here goes the output of roary -a and one of my .gff files (in .txt format only to upload it here)

9_Escherichia_coli_FAP1_CP009578.1_Netherlands.txt

2018/10/17 15:27:49 Looking for 'Rscript' - found /usr/bin/Rscript
2018/10/17 15:27:49 Determined Rscript version is 3.4
2018/10/17 15:27:49 Looking for 'awk' - found /usr/bin/awk
2018/10/17 15:27:49 Looking for 'bedtools' - found /usr/bin/bedtools
2018/10/17 15:27:49 Determined bedtools version is 2.17
2018/10/17 15:27:49 Looking for 'blastp' - found /home/geo1/SW/ncbi-blast-2.7.1+/bin/blastp
2018/10/17 15:27:49 Determined blastp version is 2.7.1
2018/10/17 15:27:49 Looking for 'grep' - found /bin/grep
2018/10/17 15:27:49 Optional tool 'kraken' not found in your $PATH
2018/10/17 15:27:49 Optional tool 'kraken-report' not found in your $PATH
2018/10/17 15:27:49 Looking for 'mafft' - found /usr/bin/mafft
2018/10/17 15:27:49 Determined mafft version is 7.392
2018/10/17 15:27:49 Looking for 'makeblastdb' - found /home/geo1/SW/ncbi-blast-2.7.1+/bin/makeblastdb
2018/10/17 15:27:49 Determined makeblastdb version is 2.7.1
2018/10/17 15:27:49 Looking for 'mcl' - found /usr/bin/mcl
2018/10/17 15:27:49 Determined mcl version is 12-135
2018/10/17 15:27:49 Looking for 'parallel' - found /usr/bin/parallel
2018/10/17 15:27:49 Determined parallel version is 20130922
2018/10/17 15:27:49 Looking for 'prank' - found /usr/bin/prank
2018/10/17 15:27:49 Determined prank version is 140110
2018/10/17 15:27:49 Looking for 'sed' - found /bin/sed
2018/10/17 15:27:49 Looking for 'cdhit' - found /usr/bin/cdhit
2018/10/17 15:27:49 Determined cdhit version is 4.6
2018/10/17 15:27:49 Looking for 'fasttree' - found /usr/bin/fasttree
2018/10/17 15:27:49 Determined fasttree version is 2.1
2018/10/17 15:27:49 Roary version 3.12.0

The text was updated successfully, but these errors were encountered:

tseemann · 2018-10-17T23:42:21Z

@pedrorvc type tail -n 1 *.gff and see if they all have DNA sequence on last line. If not, you can't use them.

pedrorvc · 2018-10-17T23:57:36Z

@pedrorvc type tail -n 1 *.gff and see if they all have DNA sequence on last line. If not, you can't use them.

@tseemann Thank you very much!
I followed your suggestion and verified that 8 files have sequences ending with NN, one of which is the file I attached.
Would this trigger the warning?

pedrorvc · 2018-10-20T23:30:25Z

So I have been trying to understand what causes this error and, after various attempts, the problem seems to lie in the amount of .gff files given to process.
One of the things I tried was to run the same command with batches of 50 files (and afterwards 100 files) and I was able to successfully obtain the pan_genome_sequences, which led me to believe that some sort of file limit could be causing the warning.
At the moment of this post, I was able to successfully obtain the pan_genome_sequences with a maximum of 235 .gff files.
If a file limit exists, can it trigger this warning?

pedrorvc · 2018-10-20T23:31:38Z

Apologies for closing the issue due to a misclick.

tseemann · 2018-10-21T00:53:23Z

Maybe you are running out of RAM ?

pedrorvc · 2018-10-21T01:26:45Z

It don't think so. I accompanied the RAM usage of my most recent failed run and I still had about 6GB of RAM available.

maesaar · 2018-10-21T05:39:17Z

@pedrorvc I have used Roary with more than 1000 files and have not had any issues.

tseemann · 2018-10-22T02:19:48Z

What does ulimit -a say for your account?

pedrorvc · 2018-10-22T09:55:02Z

For my account ulimit -a says this:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 64019
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 64019
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

andrewjpage · 2018-10-22T10:53:47Z

Your allowable number of open files is too low. For example on my system its:
open files (-n) 1048576

I would recommend you increase it (your system administrator can assist if you don't know how to do it).

andrewjpage · 2018-10-22T10:56:00Z

Additionally you appear to be running quite old versions of software which would indicate your running a very old version of linux?

pedrorvc · 2018-10-23T15:30:16Z

@andrewjpage I increased the number of open files and it solved the problem, thank you very much!

JinxiangChenHome · 2019-03-11T00:24:02Z

Your allowable number of open files is too low. For example on my system its:
open files (-n) 1048576

I would recommend you increase it (your system administrator can assist if you don't know how to do it).

Hi andrewjpage,

I changed the value of open files to 1048576, but the problem is still not solved.

I used 1800 gff files to produce pan genome. My machine information: 16 VCPUS RAM 64G Instances 2 5T

I don't konw how to solve this problem. I have the same mistake information as him.

My mistake information as follows:

please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.

Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.

Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.

Attribute (fasta_file) does not pass the type constraint because: Validation failed for 'Str' with value undef at reader Bio::Roary::Output::GroupsMultifastaNucleotide::fasta_file (defined at /usr/local/share/perl/5.22.1/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 29) line 15
Bio::Roary::Output::GroupsMultifastaNucleotide::fasta_file('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x96345d5d8)') called at /usr/local/share/perl/5.22.1/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 43
Bio::Roary::Output::GroupsMultifastaNucleotide::_build__input_seqio('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x96345d5d8)') called at reader Bio::Roary::Output::GroupsMultifastaNucleotide::_input_seqio (defined at /usr/local/share/perl/5.22.1/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 30) line 8
Bio::Roary::Output::GroupsMultifastaNucleotide::_input_seqio('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x96345d5d8)') called at /usr/local/share/perl/5.22.1/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 53
Bio::Roary::Output::GroupsMultifastaNucleotide::populate_files('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x96345d5d8)') called at /usr/local/share/perl/5.22.1/Bio/Roary/Output/GroupsMultifastasNucleotide.pm line 65
Bio::Roary::Output::GroupsMultifastasNucleotide::create_files('Bio::Roary::Output::GroupsMultifastasNucleotide=HASH(0x900572c78)') called at /usr/local/share/perl/5.22.1/Bio/Roary/PostAnalysis.pm line 131
Bio::Roary::PostAnalysis::run('Bio::Roary::PostAnalysis=HASH(0x4df69a8)') called at /usr/local/share/perl/5.22.1/Bio/Roary/CommandLine/RoaryPostAnalysis.pm line 128
Bio::Roary::CommandLine::RoaryPostAnalysis::run('Bio::Roary::CommandLine::RoaryPostAnalysis=HASH(0x10fa1f0)') called at /usr/local/bin/pan_genome_post_analysis line 14

Looking forward to your reply. Thank you!

Best regards,

Jinxiang

xianggx01 · 2022-04-20T13:14:40Z

Hi, I used 2472 gff files to produce pan genome and encountered this problem too, while the software could successfully run with 232 gff files. This is my mistake information:

"Please cite Roary if you use any of the results it produces:
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
"Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693
doi: http://doi.org/10.1093/bioinformatics/btv421
Pubmed: 26198102

Use of uninitialized value in require at /usr/local/lib64/perl5/Moose/Meta/TypeConstraint.pm line 60.
Attribute (fasta_file) does not pass the type constraint because: Validation failed for 'Str' with value undef at reader Bio::Roary::Output::GroupsMultifastaNucleotide::fasta_file (defined at /usr/local/share/perl5/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 29) line 15
Bio::Roary::Output::GroupsMultifastaNucleotide::fasta_file('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x10ac17f350)') called at /usr/local/share/perl5/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 43
Bio::Roary::Output::GroupsMultifastaNucleotide::_build__input_seqio('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x10ac17f350)') called at reader Bio::Roary::Output::GroupsMultifastaNucleotide::_input_seqio (defined at /usr/local/share/perl5/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 30) line 7
Bio::Roary::Output::GroupsMultifastaNucleotide::_input_seqio('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x10ac17f350)') called at /usr/local/share/perl5/Bio/Roary/Output/GroupsMultifastaNucleotide.pm line 53
Bio::Roary::Output::GroupsMultifastaNucleotide::populate_files('Bio::Roary::Output::GroupsMultifastaNucleotide=HASH(0x10ac17f350)') called at /usr/local/share/perl5/Bio/Roary/Output/GroupsMultifastasNucleotide.pm line 65
Bio::Roary::Output::GroupsMultifastasNucleotide::create_files('Bio::Roary::Output::GroupsMultifastasNucleotide=HASH(0xfd12db478)') called at /usr/local/share/perl5/Bio/Roary/PostAnalysis.pm line 131
Bio::Roary::PostAnalysis::run('Bio::Roary::PostAnalysis=HASH(0x4ba5ec8)') called at /usr/local/share/perl5/Bio/Roary/CommandLine/RoaryPostAnalysis.pm line 128
Bio::Roary::CommandLine::RoaryPostAnalysis::run('Bio::Roary::CommandLine::RoaryPostAnalysis=HASH(0x1d99188)') called at /usr/local/bin/pan_genome_post_analysis line 14"

Looking forward to any reply. Thank you!

Best regards,
Guoxiu Xiang

pedrorvc closed this as completed Oct 20, 2018

pedrorvc reopened this Oct 20, 2018

andrewjpage closed this as completed Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not obtain pan_genome_sequences #426

Could not obtain pan_genome_sequences #426

pedrorvc commented Oct 17, 2018

tseemann commented Oct 17, 2018

pedrorvc commented Oct 17, 2018

pedrorvc commented Oct 20, 2018

pedrorvc commented Oct 20, 2018

tseemann commented Oct 21, 2018

pedrorvc commented Oct 21, 2018

maesaar commented Oct 21, 2018

tseemann commented Oct 22, 2018

pedrorvc commented Oct 22, 2018

andrewjpage commented Oct 22, 2018

andrewjpage commented Oct 22, 2018

pedrorvc commented Oct 23, 2018

JinxiangChenHome commented Mar 11, 2019

xianggx01 commented Apr 20, 2022

Could not obtain pan_genome_sequences #426

Could not obtain pan_genome_sequences #426

Comments

pedrorvc commented Oct 17, 2018

tseemann commented Oct 17, 2018

pedrorvc commented Oct 17, 2018

pedrorvc commented Oct 20, 2018

pedrorvc commented Oct 20, 2018

tseemann commented Oct 21, 2018

pedrorvc commented Oct 21, 2018

maesaar commented Oct 21, 2018

tseemann commented Oct 22, 2018

pedrorvc commented Oct 22, 2018

andrewjpage commented Oct 22, 2018

andrewjpage commented Oct 22, 2018

pedrorvc commented Oct 23, 2018

JinxiangChenHome commented Mar 11, 2019

xianggx01 commented Apr 20, 2022