Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Error: Failed to open the database file #2

Closed
oushujun opened this issue May 31, 2017 · 15 comments
Closed

Fatal Error: Failed to open the database file #2

oushujun opened this issue May 31, 2017 · 15 comments
Labels

Comments

@oushujun
Copy link
Owner

@mcscimenc I moved your last bug report to this new thread.
Start forwarding:

OK, the program runs for a little while and now gives this error:

ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.
ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.

Fatal Error:
Failed to open the database file
Program halted !!

Can't open Salvinia_cucullata_v1.1.fa.LTRlib.clust: No such file or directory.
ERROR: This script is written to convert fasta files into a prettier format.
Usage: fasta-reformat.pl input-fasta-file number-of-positions-per-line
ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/annotate_gff.pl line 12.

Salvinia_cucullata_v1.1.fa.LTRlib.clust doesn't exist, and I ran LTR_retriever with -v

@oushujun
Copy link
Owner Author

This seems like a CD-HIT error. Please update your CD-HIT package if possible. Please check or attach the file "Salvinia_cucullata_v1.1.fa.LTRlib" in this thread for further checking.

Shujun

@mcsimenc
Copy link

I have the most current version of CD-HIT installed. The Salvinia_cucullata_v1.1.fa.LTRlib doesn't exist. Here's a ls -lh of the working directory:

-rw-r--r-- 1 derstudent derlab  691 May 30 18:29 call_ltrretriever.qsub
-rw-r--r-- 1 derstudent derlab  915 May 30 18:52 debug
drwxr-xr-x 2 derstudent derlab 4.0K May 30 18:29 input
-rw-r--r-- 1 derstudent derlab    0 May 30 15:51 LTR_retriever.err
-rw-r--r-- 1 derstudent derlab 2.7K May 30 18:52 LTR_retriever.out
drwxr-xr-x 2 derstudent derlab   73 May 30 18:50 RM_31919.TueMay301850262017
drwxr-xr-x 2 derstudent derlab   75 May 30 18:52 RM_3321.TueMay301852152017
drwxr-xr-x 2 derstudent derlab    6 May 30 18:52 RM_3358.TueMay301852192017
-rw------- 1 derstudent derlab  788 May 30 19:08 SacuLTRHarv.LTRretr.e1258
-rw------- 1 derstudent derlab    0 May 30 19:08 SacuLTRHarv.LTRretr.o1258
-rwxr-x--- 1 derstudent derlab 223M May 28 16:23 Salvinia_cucullata_v1.1.fa
-rw-r--r-- 1 derstudent derlab 1.1M May 30 18:50 Salvinia_cucullata_v1.1.fa.defalse
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.LTRanno.gff
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.LTRlib.fa
-rw-r--r-- 1 derstudent derlab 619K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE
-rw-r--r-- 1 derstudent derlab 619K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.clust
-rw-r--r-- 1 derstudent derlab 9.0K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.clust.clstr
-rw-r--r-- 1 derstudent derlab  36M May 30 18:30 Salvinia_cucullata_v1.1.fa.ltrTE.fa
-rw-r--r-- 1 derstudent derlab 159K May 30 18:35 Salvinia_cucullata_v1.1.fa.ltrTE.fa.cleanup
-rw-r--r-- 1 derstudent derlab 1.9M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.mask.lib
-rw-r--r-- 1 derstudent derlab  56K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.nmtf
-rw-r--r-- 1 derstudent derlab 1.3M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.pass
-rw-r--r-- 1 derstudent derlab  39K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.pass.clust.clstr
-rw-r--r-- 1 derstudent derlab  30K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.pass.list
-rw-r--r-- 1 derstudent derlab 4.2K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.pass.nmtf.list
-rw-r--r-- 1 derstudent derlab  25M May 30 18:35 Salvinia_cucullata_v1.1.fa.ltrTE.stg1
-rw-r--r-- 1 derstudent derlab 619K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg2
-rw-r--r-- 1 derstudent derlab 619K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln
-rw-r--r-- 1 derstudent derlab 619K May 30 18:51 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln.clean
-rw-r--r-- 1 derstudent derlab   47 May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln.clean.exclude.list
-rw-r--r-- 1 derstudent derlab   47 May 30 18:51 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln.exclude.list
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.dna.out
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.line.out
-rw-r--r-- 1 derstudent derlab    0 May 30 18:51 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.otherTE.out
-rw-r--r-- 1 derstudent derlab  70K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.plantP.out
-rw-r--r-- 1 derstudent derlab 3.5M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc.cln
-rw-r--r-- 1 derstudent derlab  37K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc.list
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc.masked.cleanup
-rw-r--r-- 1 derstudent derlab 7.7K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.veryfalse
-rw-r--r-- 1 derstudent derlab 1.3M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.veryfalse.fa
-rw-r--r-- 1 derstudent derlab  17K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.veryfalse.list
-rw-r--r-- 1 derstudent derlab  609 May 30 18:52 Salvinia_cucullata_v1.1.fa.nmtf.LTRlib.fa
-rw-r--r-- 1 derstudent derlab 4.3K May 30 18:52 Salvinia_cucullata_v1.1.fa.nmtf.pass.list
-rw-r--r-- 1 derstudent derlab  56K May 30 18:52 Salvinia_cucullata_v1.1.fa.nmtf.prelib
-rw-r--r-- 1 derstudent derlab  30K May 30 18:52 Salvinia_cucullata_v1.1.fa.pass.list
-rw-r--r-- 1 derstudent derlab 331K May 30 18:52 Salvinia_cucullata_v1.1.fa.pass.list.gff3
-rw-r--r-- 1 derstudent derlab 620K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib
-rw-r--r-- 1 derstudent derlab 597K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT.cln
-rw-r--r-- 1 derstudent derlab 6.1K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT.list
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT.masked.cleanup
-rw-r--r-- 1 derstudent derlab  24K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR
-rw-r--r-- 1 derstudent derlab  24K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR.clust
-rw-r--r-- 1 derstudent derlab 2.7K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR.clust.clstr
-rw-r--r-- 1 derstudent derlab 2.6K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR.list
-rw-r--r-- 1 derstudent derlab 761K May 30 18:52 Salvinia_cucullata_v1.1.fa.retriever.all.scn.adj
-rw-r--r-- 1 derstudent derlab 6.3K May 30 18:52 Salvinia_cucullata_v1.1.fa.retriever.all.scn.adj.list
-rw-r--r-- 1 derstudent derlab 429K May 30 18:29 Salvinia_cucullata_v1.1.fa.retriever.scn
-rw-r--r-- 1 derstudent derlab 761K May 30 18:50 Salvinia_cucullata_v1.1.fa.retriever.scn.adj
-rw-r--r-- 1 derstudent derlab  78K May 30 18:50 Salvinia_cucullata_v1.1.fa.retriever.scn.adj.list
-rw-r--r-- 1 derstudent derlab 234K May 30 18:35 Salvinia_cucullata_v1.1.fa.retriever.scn.extend
-rw-r--r-- 1 derstudent derlab  25M May 30 18:35 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa
-rw-r--r-- 1 derstudent derlab  51M May 30 18:36 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa
-rw-r--r-- 1 derstudent derlab 252K May 30 18:37 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa.anno
-rw-r--r-- 1 derstudent derlab 3.9M May 30 18:37 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa.scn
-rw-r--r-- 1 derstudent derlab 1.5M May 30 18:37 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa.tbl
-rw-r--r-- 1 derstudent derlab 357K May 30 18:30 Salvinia_cucullata_v1.1.fa.retriever.scn.full
-rw-r--r-- 1 derstudent derlab 672K May 30 18:30 Salvinia_cucullata_v1.1.fa.retriever.scn.list

@oushujun
Copy link
Owner Author

oushujun commented Jun 2, 2017

Hi,

Sorry that our server was down for 2 days and I need to take care of it first.
Thanks for providing the detailed output files info. It looks like RepeatMasker is not running correctly. If you have it installed, this could be caused by the "long sequence name" issue. Sequence names longer than 15 characters may not be recognized by RepeatMasker and could cause program halt.
I have developed a new module to deal with the long sequence name issue and pushed to GitHub. Please download the latest version and see if it works for your genome. It is still suggested that you chop short the sequence name other than the program do it for you - it may not clever enough to make a decent conversion. Please let me know if this is not your case.

Thank you!
Shujun

@mcsimenc
Copy link

mcsimenc commented Jun 2, 2017

Hi Shujun, no worries I'm glad you're helping work it out, I think LTR_retriever will be very useful for our analyses! We're running RepeatMasker version open-4.0.7.

All of the sequence names in the genome are of this format (it turns out they are exactly 15 char):

>Sacu_v1.1_s0001
>Sacu_v1.1_s0002
>Sacu_v1.1_s0123

I downloaded the new release and it is running right now. I may not be able to get back with the result until Monday.

@oushujun
Copy link
Owner Author

oushujun commented Jun 7, 2017

Hi,

I pushed some updates to the repository which may fix some problems you have previously. Please update the code and try again. Thanks!

Shujun

@oushujun oushujun added the bug label Jun 7, 2017
@mcsimenc
Copy link

mcsimenc commented Jun 8, 2017

Hi Shujun,

I just ran the new updated program and get an error from hmmpress when it is called by RepeatMasker. The hmmpress log file describes an error with sequence headers:

Error: File format problem in trying to open HMM file /home/joshd/data/salvinia/repeat_lib/LTR_retriever/RM_5827.WedJun72213492017/Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib.
Format tag is '>Sacu_v1.1_s0001:1488510..1488827|LTR_1': unrecognized.
Current H3 format is 'HMMER3/f'. Previous H2/H3 formats 

From LTR_retriever:

ERROR: RepeatMasker is not running properly!
        Please check the file Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib and Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc and test run:
                RepeatMasker -q -pa 40 -no_is -norna -nolow -div 40 -lib Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib -cutoff 225 Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc

@oushujun
Copy link
Owner Author

oushujun commented Jun 8, 2017

Hi,

This is a very helpful information! Do you installed RepeatMasker using HMMER as the primary search engine? Basically, the first error is saying the program is expecting an HMM file but the input file "Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib" is not recognizable (because this is a fasta file!). The second error is the new checking criteria I implement, and obviously it found the expecting result is not there.
Please test run and see what errors RepeatMasker found:
RepeatMasker -q -pa 40 -no_is -norna -nolow -div 40 -lib Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib -cutoff 225 Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc

Regards,
Shujun

@mcsimenc
Copy link

mcsimenc commented Jun 8, 2017

Yes I had installed RepeatMasker with hmmer as the default search engine. I changed it to ncbi blast+ and LTR_retriever ran without errors. However I'm unsure if it finished completely. I noticed that the number of elements in *defalse plus the number in *pass.list.gff3 is only a little more than half the elements in the input from LTRHarvest. I'm also surprised that only ~3% of the input elements made it into *pass.list.gff3. How can I make sure everything finished? Thanks!
Matt

@mcsimenc
Copy link

mcsimenc commented Jun 8, 2017

Also I did try the -no_is flag with RepeatMasker and I saw the same error.

@oushujun
Copy link
Owner Author

oushujun commented Jun 8, 2017

Dear Matt,

If there is no error or warning message, LTR_retriever probably has run correctly!
It's very normal that the majority of input candidates cannot make it all the way down to the pass.list, because a lot of them (can't name a number, but half is not surprising!) are false positives or truncated LTRs! Before the structural analysis (*defalse), the program had done several steps of filtering such as gap filtering, tandem repeat filtering, length balance filtering, and etc. The structural analysis will try to find out the structural information of candidates and further decide whether that is a real LTR or not.
Note that the purpose of this program is to confidentially and sensitively identify intact LTRs, and further generate a library (exemplar). Intact LTRs represent the most recent LTR amplifications with clear structural information, hence we can confidently say this is an LTR. Using this confident set, we can further confidently identify most, if not all, LTRs in the genome.
It may be hard to believe that 3% of the input is all you need, but in our practices this is true. Also, you can find our benchmark data in the manuscript: http://biorxiv.org/content/early/2017/05/12/137141.article-metrics
LTR_retriever is highly specific and accurate, but the sensitivity is also as high as the input. So basically you just removed all false LTRs and retain all true LTRs with the program. You can now check the genome.out and genome.tbl files for the whole genome LTR annotation.
Let me know if the data are still not correct.

Best,
Shujun Ou

@oushujun
Copy link
Owner Author

oushujun commented Jun 8, 2017

For your last comment, could you describe it with more details about what data and command you used?

Thanks,
Shujun

@mcsimenc
Copy link

mcsimenc commented Jun 9, 2017

What I meant by the comment about the -no_is flag is that I tried the command you suggested:

RepeatMasker -q -pa 40 -no_is -norna -nolow -div 40 -lib Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib -cutoff 225 Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc

and I saw the hmmer error:

Error: File format problem in trying to open HMM file

but that was with RepeatMasker running HMMER as the default search engine.

The reason I am unsure if it finished is because only 67% of the input elements are mentioned either in *defalse or *pass.list. Maybe I'm misunderstanding what these files contain.

@oushujun
Copy link
Owner Author

oushujun commented Jun 9, 2017

Hi Matt,

Yes, you are right. Not all input elements can enter the steps of *defalse or *pass.list. Before these steps, there is a prescreening step which will screen out candidates with sequencing gaps, tandem repeats and etc. Such candidates are highly not likely to be a true LTR and thus will not be passed to the next step (i.e., *defalse). That's why you only see part of them show up in *defalse.

Best,
Shujun Ou

@oushujun oushujun closed this as completed Jun 9, 2017
@Suchithra-V
Copy link

Suchithra-V commented Feb 12, 2018

Hi. I am working on a 16s data and I was using cd-hit-otu latest release specifically for Mi-seq . The qc and otu shell scripts were generated successfully. But after running otu script I got this error after somtime.. Please help me resolve this.
screenshot from 2018-02-12 13 32 11

These are the result files obtained after running otu script.

otu_results

@oushujun
Copy link
Owner Author

oushujun commented Feb 12, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants