Why is classify-consensus-vsearch so slow? #80

BenKaehler · 2017-07-12T00:32:50Z

classify-consensus-vsearch was 50 times slower than classify-consensus-blast in the run time analysis for the paper. The users have noticed.

We should double check that there isn't anything strange going on.

The text was updated successfully, but these errors were encountered:

BenKaehler · 2017-07-21T00:14:41Z

The plot thickens. It would appear that longish sequences make vsearch very slow. That may not be surprising, given that we are performing global alignments. I have attached the data files here and here.

This command works ok, with an average sequence length of 230.

$ time vsearch --usearch_global query.fasta --id 0.97 --strand both --maxaccepts 1 --maxrejects 0 --output_no_hits --db ref.fasta --threads 1 --blast6out out
vsearch v2.0.3_linux_x86_64, 1007.8GB RAM, 56 cores
https://github.com/torognes/vsearch

Reading file ref.fasta 100%  
11493263 nt in 50020 seqs, min 32, max 250, avg 230
WARNING: 604 sequences shorter than 32 nucleotides discarded.
Masking 100%  
Counting unique k-mers 100%  
Creating index of unique k-mers 100%  
Searching 100%  
Matching query sequences: 5292 of 6184 (85.58%)

real	23m35.282s
user	16m24.472s
sys	0m0.148s

This command does not work ok, with an average sequence length of 1428. Note that I got sick of waiting and killed it when it was only 30% complete after more than 10 hours.

$ cat not_works_ok/command 
$ time vsearch --usearch_global 10000.fna --id 0.8 --strand both --maxaccepts 1 --maxrejects 0 --output_no_hits --db 10000.fna --threads 1 --blast6out out
vsearch v2.0.3_linux_x86_64, 1007.8GB RAM, 56 cores
https://github.com/torognes/vsearch

Reading file 10000.fna 100%  
14284110 nt in 10000 seqs, min 1274, max 2353, avg 1428
Masking 100%  
Counting unique k-mers 100%  
Creating index of unique k-mers 100%  
Searching 30%^C
real	613m24.420s
user	524m9.868s
sys	0m10.192s

nbokulich · 2017-08-07T14:46:00Z

@BenKaehler can we close this issue? It seems that we resolved this — vsearch (unlike, say, BLAST+) experiences a dramatic runtime increase when very long sequences are used.

However, vsearch works fine (with runtimes approximately equivalent to BLAST+) when amplicon sequences are used.

The issue is with vsearch itself, not with how we're wrapping it.

jairideout · 2017-09-06T22:56:50Z

@BenKaehler @nbokulich is this safe to close?

nbokulich · 2017-09-07T00:26:00Z

Yes, I think it is safe to close. It is not, in any case, a bug — vsearch just performs much slower on full-length sequences. I don't have write privileges so cannot close. Thanks @jairideout !

BenKaehler assigned nbokulich Jul 21, 2017

jairideout added the type:bug Something is wrong. label Aug 31, 2017

thermokarst closed this as completed Sep 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is classify-consensus-vsearch so slow? #80

Why is classify-consensus-vsearch so slow? #80

BenKaehler commented Jul 12, 2017 •

edited

Loading

BenKaehler commented Jul 21, 2017

nbokulich commented Aug 7, 2017

jairideout commented Sep 6, 2017

nbokulich commented Sep 7, 2017

Why is classify-consensus-vsearch so slow? #80

Why is classify-consensus-vsearch so slow? #80

Comments

BenKaehler commented Jul 12, 2017 • edited Loading

BenKaehler commented Jul 21, 2017

nbokulich commented Aug 7, 2017

jairideout commented Sep 6, 2017

nbokulich commented Sep 7, 2017

BenKaehler commented Jul 12, 2017 •

edited

Loading