Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when searching nucleotide DB #3

Closed
simone-pignotti opened this issue Mar 5, 2019 · 17 comments
Closed

Error when searching nucleotide DB #3

simone-pignotti opened this issue Mar 5, 2019 · 17 comments

Comments

@simone-pignotti
Copy link

I am using the docker version of the webserver, and I have successfully built and searched the example uniclust DB and another custom protein DB.
I have also built a DNA DB, but searching it produces an error. Both the building and the searching logs are attached, and the issue seems to be a wrong path:

mmseqs-web-worker_1     | splitsequence /opt/mmseqs-web/databases/mydb_nt.idx /opt/mmseqs-web/jobs/UE3fNh-OopaZXL7T5LF-ZwnmWa4PbEOPQiOu8w/tmp/13815044513268057608/search_tmp/9906939544243059259/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --threads 24 --compressed 0 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               43a10ce5f7e83f421af19b7a5cc1792e9c3d3bbd
mmseqs-web-worker_1     | Max. sequence length          10000
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Could not open data file /opt/mmseqs-web/databases/mydb_nt.idx_h!
mmseqs-web-worker_1     | Error: Split sequence died
mmseqs-web-worker_1     | Error: Search died
mmseqs-web-worker_1     | 2019/03/05 13:54:03 Execution Error: exit status 1

I believe the splitsequence command is supposed to take mydb_nt and not mydb_nt.idx as argument, therefore it should be an easy fix.

I hope this is useful, let me know if you can't reproduce the issue.
Simone
createdb.log
search.log

@milot-mirdita
Copy link
Member

Thanks for trying the MMseqs2 webserver!

I've triggered rebuilding the docker images based on the latest MMseqs2 image. The issue should be fixed there. I can do more detailed testing in a couple days though.

@simone-pignotti
Copy link
Author

Thank you for the quick fix, I'll test it on my instance and will let you know if the problem persists.

@simone-pignotti
Copy link
Author

I tried the latest mmseqs-app-* docker images (running docker-compose pull and docker-compose up --build), and I got a new error (see full log for details).

mmseqs-web-worker_1     | Unrecognized parameter --index-type
mmseqs-web-worker_1     | Did you mean "--search-type"?
mmseqs-web-worker_1     | 2019/03/05 15:21:17 Execution Error: exit status 1

It is triggered by the createindex command, but I can't find the specific command in the log.
dockerup.log

@milot-mirdita
Copy link
Member

I will look at the problem in detail in the evening. I know whats wrong, but its a little bit of a larger fix.

@milot-mirdita
Copy link
Member

Update: The docker images should work again. I split search field in the .params file into two separate fields: index und search one for the indexing step and one for the actual search.

There is one more minor issue that I'll resolve soon, that MMseqs2 is currently not cleaning up correctly after itself and will leave an unnecessary number of files for each job. I'll fix that as soon as I can.

@simone-pignotti
Copy link
Author

simone-pignotti commented Mar 11, 2019

I keep getting errors with the updated image:

mmseqs-web-worker_1     | 2019/03/11 12:41:25 MMseqs2 worker
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createdb /opt/mmseqs-web/databases/test.fasta /opt/mmseqs-web/databases/test
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Split Seq. by len             true
mmseqs-web-worker_1     | Database type                 0
mmseqs-web-worker_1     | Do not shuffle input database true
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Assuming DNA database, forcing parameter --dont-split-seq-by-len true
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 139ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 613ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 10ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 3s 889ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createindex /opt/mmseqs-web/databases/test /tmp --remove-tmp-files true --check-compatible true
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Seed Substitution Matrix      PAM30.out
mmseqs-web-worker_1     | K-mer size                    0
mmseqs-web-worker_1     | Alphabet size                 21
mmseqs-web-worker_1     | Compositional bias            1
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Mask Residues                 1
mmseqs-web-worker_1     | Spaced Kmer                   1
mmseqs-web-worker_1     | Spaced k-mer pattern
mmseqs-web-worker_1     | Sensitivity                   7.5
mmseqs-web-worker_1     | K-score                       0
mmseqs-web-worker_1     | Check Compatible              true
mmseqs-web-worker_1     | Search type                   0
mmseqs-web-worker_1     | Split DB                      0
mmseqs-web-worker_1     | Split Memory Limit            0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     | Min codons in orf             30
mmseqs-web-worker_1     | Max codons in length          98202
mmseqs-web-worker_1     | Max orf gaps                  2147483647
mmseqs-web-worker_1     | Contig start mode             2
mmseqs-web-worker_1     | Contig end mode               2
mmseqs-web-worker_1     | Orf start mode                1
mmseqs-web-worker_1     | Forward Frames                1,2,3
mmseqs-web-worker_1     | Reverse Frames                1,2,3
mmseqs-web-worker_1     | Translation Table             1
mmseqs-web-worker_1     | Use all table starts          false
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Add Orf Stop                  false
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Strand selection              1
mmseqs-web-worker_1     | Remove Temporary Files        true
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 0ms
mmseqs-web-worker_1     | Database /opt/mmseqs-web/databases/test is a nucleotide database.
mmseqs-web-worker_1     | Please provide the parameter --search-type 2 (translated) or 3 (nucleotide)
mmseqs-web-worker_1     | 2019/03/11 12:41:29 Execution Error: exit status 1

Adding "index":"--search-type 3" to the params dictionary results in:

mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createdb /opt/mmseqs-web/databases/test.fasta /opt/mmseqs-web/databases/test
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Split Seq. by len             true
mmseqs-web-worker_1     | Database type                 0
mmseqs-web-worker_1     | Do not shuffle input database true
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Assuming DNA database, forcing parameter --dont-split-seq-by-len true
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 204ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 563ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 10ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 3s 906ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | createindex /opt/mmseqs-web/databases/test /tmp --remove-tmp-files true --check-compatible true --search-type 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Seed Substitution Matrix      PAM30.out
mmseqs-web-worker_1     | K-mer size                    0
mmseqs-web-worker_1     | Alphabet size                 21
mmseqs-web-worker_1     | Compositional bias            1
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Mask Residues                 1
mmseqs-web-worker_1     | Spaced Kmer                   1
mmseqs-web-worker_1     | Spaced k-mer pattern
mmseqs-web-worker_1     | Sensitivity                   7.5
mmseqs-web-worker_1     | K-score                       0
mmseqs-web-worker_1     | Check Compatible              true
mmseqs-web-worker_1     | Search type                   3
mmseqs-web-worker_1     | Split DB                      0
mmseqs-web-worker_1     | Split Memory Limit            0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     | Min codons in orf             30
mmseqs-web-worker_1     | Max codons in length          98202
mmseqs-web-worker_1     | Max orf gaps                  2147483647
mmseqs-web-worker_1     | Contig start mode             2
mmseqs-web-worker_1     | Contig end mode               2
mmseqs-web-worker_1     | Orf start mode                1
mmseqs-web-worker_1     | Forward Frames                1,2,3
mmseqs-web-worker_1     | Reverse Frames                1,2,3
mmseqs-web-worker_1     | Translation Table             1
mmseqs-web-worker_1     | Use all table starts          false
mmseqs-web-worker_1     | Offset of numeric ids         0
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Add Orf Stop                  false
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Strand selection              1
mmseqs-web-worker_1     | Remove Temporary Files        true
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | splitsequence /opt/mmseqs-web/databases/test /tmp/9366001766242878652/nucl_split_seq --max-seq-len 65535 --sequence-overlap 0 -
-threads 24 --compressed 0 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Overlap between sequences     0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Compressed                    0
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 21ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 316ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 648ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | extractframes /tmp/9366001766242878652/nucl_split_seq /tmp/9366001766242878652/nucl_split_seq_rev --forward-frames 1 --threads
24 --compressed 0 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:       efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Forward Frames        1
mmseqs-web-worker_1     | Reverse Frames        1,2,3
mmseqs-web-worker_1     | Threads               24
mmseqs-web-worker_1     | Compressed            0
mmseqs-web-worker_1     | Verbosity             3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | .......Time for merging files: 0h 0m 0s 22ms
mmseqs-web-worker_1     | Time for merging files: 0h 0m 0s 315ms
mmseqs-web-worker_1     | Time for processing: 0h 0m 0s 686ms
mmseqs-web-worker_1     | Program call:
mmseqs-web-worker_1     | indexdb /tmp/9366001766242878652/nucl_split_seq_rev.dbtype /opt/mmseqs-web/databases/test --seed-sub-mat PAM30.out -k 0 --alph-
size 21 --comp-bias-corr 1 --max-seq-len 65535 --mask 1 --spaced-kmer-mode 1 -s 7.5 --k-score 0 --check-compatible 1 --search-type 3 --split 0 --split-me
mory-limit 0 --threads 24 -v 3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | MMseqs Version:               efbd8d3b2f808c43c4e1629d8e74eb72cc8e92ba
mmseqs-web-worker_1     | Seed Substitution Matrix      PAM30.out
mmseqs-web-worker_1     | K-mer size                    0
mmseqs-web-worker_1     | Alphabet size                 21
mmseqs-web-worker_1     | Compositional bias            1
mmseqs-web-worker_1     | Max. sequence length          65535
mmseqs-web-worker_1     | Mask Residues                 1
mmseqs-web-worker_1     | Spaced Kmer                   1
mmseqs-web-worker_1     | Spaced k-mer pattern
mmseqs-web-worker_1     | Sensitivity                   7.5
mmseqs-web-worker_1     | K-score                       0
mmseqs-web-worker_1     | Check Compatible              true
mmseqs-web-worker_1     | Search type                   3
mmseqs-web-worker_1     | Split DB                      0
mmseqs-web-worker_1     | Split Memory Limit            0
mmseqs-web-worker_1     | Threads                       24
mmseqs-web-worker_1     | Verbosity                     3
mmseqs-web-worker_1     |
mmseqs-web-worker_1     | Could not open index file /tmp/9366001766242878652/nucl_split_seq_rev.dbtype.index!
mmseqs-web-worker_1     | Error: indexdb died
mmseqs-web-worker_1     | 2019/03/11 12:46:21 Execution Error: exit status 1

UPDATE: using "index":"--search-type 2" in params works fine (forget the previous edit, my bad)

@simone-pignotti
Copy link
Author

I tried to use the latest soedinglab/mmseqs2 docker image and run into the same issue when running:
docker run -v `pwd`:`pwd` -w `pwd` soedinglab/mmseqs2 mmseqs createindex databases/test /tmp --remove-tmp-files true --check-compatible true --search-type 3
Therefore this has nothing to do with the web server. Should I open a new issue on the main repo?

@milot-mirdita
Copy link
Member

Yes, please to that. Sorry for the slow support. We have a deadline approaching soon :/

@simone-pignotti
Copy link
Author

simone-pignotti commented Mar 14, 2019

No problem, it's done. Thank you again!
UPDATE: see soedinglab/MMseqs2#175

@simone-pignotti
Copy link
Author

Hi,
The issue with mmseqs has been solved and I have tested the nucleotide indexing and searching successfully. Could you please re-trigger the backend build?
Version 9 should be good enough but I believe there was another commit related to bug fixes in the nucleotide search after that, so maybe latest is better ( or previous commit including at least 0e3fbac011481fd6291b92a0b48adce98fc0f007 , d9a44e89721ae3348246997bf1f009671ff58a83 , d1e25ae7f4e921c041022d93b69f16fc324339f9 ).

Thank you!

@simone-pignotti
Copy link
Author

simone-pignotti commented May 24, 2019

I tested the mmseqs executables from the latest mmseqs2 docker image with the web app (by copying them manually into the web app docker image using docker cp) and I can confirm that it works now!

EDIT: it works only after setting the index and search params to --search-type 3 [--strand 2] (strand only needed if you want to search both strand, that I believe should be made the default with nucleotide search). I think only setting index would work, as search is normally set to auto and should detect that from the index, but I haven't tested it. This is an example of #2 and it would be nice to have a full example of working nucleotide index in addition to the params specification.

@milot-mirdita
Copy link
Member

Sorry for not rebuilding the containers sooner, once Docker Hub finishes you should have MMseqs2 r9 in them.

Also a caveat: The nucleotide-nucleotide search is still in development by @martin-steinegger There is no manuscript or exhaustive benchmarks yet.

@simone-pignotti
Copy link
Author

No problem, that's great. I will keep an eye on the nt-nt search development :) I guess this issue can be closed, if you agree

@milot-mirdita
Copy link
Member

Okay :) we’d be happy for feedback if you run into any issues with the nt-nt search.

@simone-pignotti
Copy link
Author

Sure! I haven't yet, but I must say I only use it occasionally. Thanks again.

@josemduarte
Copy link

I've found this thread very useful in debugging problems with nucleotide searches. Thanks @simone-pignotti and @milot-mirdita

Now I have managed to have DNA and RNA databases up and running in mmseqs-app. However the search doesn't work and it looks purely like an API problem. The server logs seem to have quite happily run mmseqs.

This is my DNA and RNA databases config:

{
"databases": [
{
"name": "PDB DNA sequence (seqres)",
"version": "2020-02-18",
"path": "pdb_dna_sequence",
"default": false,
"order": 1,
"index": "--search-type 3",
"search": "--max-seqs 2000 --search-type 3"
},
{
"name": "PDB RNA sequence (seqres)",
"version": "2020-02-18",
"path": "pdb_rna_sequence",
"default": false,
"order": 2,
"index": "--search-type 3",
"search": "--max-seqs 2000 --search-type 3"
}
]
}

And what I get from API ticket endpoint (e.g. api/ticket/LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg) is:

{
"id": "LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg",
"status": "COMPLETE"
}

So all looks ok there. However from result endpoint (e.g. api/result/LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg/0) I get this (with a 400 http return code):

record on line 3: wrong number of fields

Any ideas what can be wrong?

@milot-mirdita
Copy link
Member

Please execute the following command from within the docker-compose directory and upload the output.

head jobs/LwDKtIlhXr4oSa7w-zTpgwxHMXikC-FInHXmvg/alis_*

Can you please answer in a new issue so we don't spam simone's email with notifications?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants