Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when RepeatMasker fails #298

Closed
hyphaltip opened this issue Jun 26, 2019 · 4 comments
Closed

Better error message when RepeatMasker fails #298

hyphaltip opened this issue Jun 26, 2019 · 4 comments

Comments

@hyphaltip
Copy link
Collaborator

hyphaltip commented Jun 26, 2019

Are you using the latest release?
Yes

Describe the bug

When RepeatMasker fails because of user issues (eg specifying a library which does not exist - in this case below using -l instead of -s), or when there is a running problem, this is not reported, instead we get a python error because the expected next step file (eg file.masked.fasta) is not loadable. So it seems like it would be helpful if the error messages or testing checked on this first to give the user a better error message?

What command did you issue?

funannotate mask -i FungiDB-43_LprolificansJHH5317_Genome.fasta --cpus 8 -l fungi --out LprolificansJHH5317_Genome.masked.fasta

[04:19 PM]: OS: linux2, 64 cores, ~ 528 GB RAM. Python: 2.7.12
[04:19 PM]: Running funannotate v1.5.2
[04:19 PM]: Soft-masking: running RepeatMasker with custom library
Traceback (most recent call last):
  File "/bigdata/operations/pkgadmin/opt/linux/centos/7.x/x86_64/pkgs/funannotate/1.5.2-30c1166/bin/funannotate-mask.py", line 72, in <module>
    with open(args.out, 'rU') as input:
IOError: [Errno 2] No such file or directory: 'LprolificansJHH5317_Genome.masked.fasta'

Logfiles

Here's logfile which clearly points to user error problem.

[06/26/19 16:19:17]: /bigdata/operations/pkgadmin/opt/linux/centos/7.x/x86_64/pkgs/funannotate/1.5.2-30c1166/bin/funannotate-mask.py -i FungiDB-43_LprolificansJHH5317_Genome.fasta --cpus 8 -l fungi --out LprolificansJHH5317_Genom
e.masked.fasta

[06/26/19 16:19:18]: OS: linux2, 64 cores, ~ 528 GB RAM. Python: 2.7.12
[06/26/19 16:19:18]: Running funannotate v1.5.2
[06/26/19 16:19:19]: Soft-masking: running RepeatMasker with custom library
RepeatMasker version open-4.0.7
Search Engine: NCBI/RMBLAST [ 2.6.0+ ]
RepeatMasker::setspecies: Could not find user specified library /bigdata/stajichlab/shared/projects/Scedosporium/annotation/test/fungi.

Also another example of an error file where the A separate example of an error when the installed Perl version is mismatched (eg we have conda, package, and local installs of Perl which sometimes get loaded and shouldn't be - here is another error from the logfile:)

/bigdata/operations/pkgadmin/opt/linux/centos/7.x/x86_64/pkgs/funannotate/1.5.2-30c1166/bin/funannotate-mask.py -i FungiDB-43_LprolificansJHH5317_Genome.fasta --cpus 8 -s fungi --out LprolificansJHH5317_Genome.masked.fasta

[06/26/19 16:19:03]: OS: linux2, 64 cores, ~ 528 GB RAM. Python: 2.7.12
[06/26/19 16:19:04]: Running funannotate v1.5.2
[06/26/19 16:19:04]: Soft-masking: running RepeatMasker with custom library
Cwd.c: loadable library and perl binaries are mismatched (got handshake key 0xdb80080, needed 0xde00080)

OS/Install Information

Checking dependencies for funannotate v1.5.2

You are running Python v 2.7.12. Now checking python packages...
biopython: 1.70
goatools: 0.8.12
matplotlib: 2.1.1
natsort: 6.0.0
numpy: 1.12.1
pandas: 0.24.1
psutil: 5.5.1
requests: 2.21.0
scikit-learn: 0.20.1
scipy: 1.1.0
seaborn: 0.9.0
All 11 python packages installed

You are running Perl v 5.026002. Now checking perl modules...
Bio::Perl: 1.007002
Carp: 1.38
Clone: 0.41
DBD::SQLite: 1.62
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.852
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.36
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed

Checking external dependencies...
RepeatMasker: RepeatMasker 4.0.7
RepeatModeler: RepeatModeler 1.0.11
Trinity: 2.8.4
augustus: 3.3
bamtools: bamtools 2.4.0
bedtools: bedtools v2.27.1
blat: BLAT v36x2
diamond: diamond 0.9.22
emapper.py: emapper-06a477e
ete3: 3.1.1
exonerate: exonerate 2.2.0
fasta: no way to determine
gmap: 2019-03-04
gmes_petap.pl: 4.38
hisat2: 2.1.0
hmmscan: HMMER 3.2.1 (June 2018)
hmmsearch: HMMER 3.2.1 (June 2018)
java: 1.8.0_45
kallisto: 0.45.0
mafft: v7.427 (2019/Mar/29)
makeblastdb: makeblastdb 2.2.30+
minimap2: 2.17-r941
nucmer: 4.0.0beta2
pslCDnaFilter: no way to determine
rmblastn: rmblastn 2.6.0+
samtools: samtools 1.9
stringtie: 1.3.5
tRNAscan-SE: 1.3.1 (January 2012)
tbl2asn: unknown, likely 25.3
tblastn: tblastn 2.2.30+
trimal: trimAl v1.4.rev15 build[2013-12-17]
ERROR: CodingQuarry not installed
Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/linux/centos/7.x/x86_64/pkgs/funannotate/share
$PASAHOME=/rhome/jstajich/.pasa
$TRINITYHOME=/opt/linux/centos/7.x/x86_64/pkgs/trinity-rnaseq/2.8.4
$EVM_HOME=/opt/linux/centos/7.x/x86_64/pkgs/EVM/1.1.1-live
$AUGUSTUS_CONFIG_PATH=/opt/linux/centos/7.x/x86_64/pkgs/augustus/3.3/config
$GENEMARK_PATH=/opt/linux/centos/7.x/x86_64/pkgs/genemarkESET/4.38
$BAMTOOLS_PATH=/opt/linux/centos/7.x/x86_64/pkgs/bamtools/2.4.0/bin
All 7 environmental variables are set

@nextgenusfs
Copy link
Owner

Okay, so simply need to check if -l, --repeatmodeler_lib is a valid file. Try d869dae and see if that is desired behavior.

@nextgenusfs
Copy link
Owner

nextgenusfs commented Jun 27, 2019

The perl libraries is not quite as easy I guess. RepeatModeler has been taken down from Conda because the build keeps failing and apparently isn't functional (I don't know why). Coupled with the new RepBase licensing issues would be better/simpler solution long term to replace repeatmodeler/masker. Any ideas? I tried out this Red program awhile back but it tended to overmask some genomes and under mask others and I couldn't get the parameters fine tuned. Maybe just a simple trf + tantan + any other repeat detection would suffice as a "stock" simple solution?

@hyphaltip
Copy link
Collaborator Author

validating - I don't think we can try to guess about the perl problem just that we could try to detect a RepeatMasker fail?

@hyphaltip
Copy link
Collaborator Author

I do think a stock solution might still be okay as an alt for groups which will not be able to install full RM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants