Expected Behavior
The analysis finished in minutes on MMSeq2 MSA server using colabfold
Current Behavior
Local mmseqs always paused for hours without generating outputs
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
I am using colab_search which calls mmseqs like search search_results/qdb db/uniref30_2103_db search_results/res search_results/tmp --num-iterations 3 --db-load-mode 2 -a -s 8 -e 0.1 --max-seqs 10000 --split 8. The query contains 4 amino acid sequences, and each has the length of 493 amino acid.
NOTE, when I took off --split 8, I also observed that mmseqs halts at certain stage.
MMseqs Output (for bugs)
search search_results/qdb db/uniref30_2103_db search_results/res search_results/tmp --num-iterations 3 --db-load-mode 2 -a -s 8 -e 0.1 --max-seqs 10000 --split 8 [93/1999]
MMseqs Version: b768f48f0bd73688b6a68132159a97f7b6310f03
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace true
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.1
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 2
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 72
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 8
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 10000
Split database 8
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Exact k-mer matching 0 [49/1999]
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Gap pseudo count 10
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 3
Start sensitivity 4
Search iterations 3 [5/1999]
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
prefilter search_results/qdb db/uniref30_2103_db.idx search_results/tmp/12005814431969335264/pref_0 --sub-mat aa:blosum62.out,nucl:nucleotide.out --seed-sub-mat aa:VTML80.out,nuc
l:nucleotide.out -s 8 -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 10000 --split 8 --split-mode 2 --split-memory-limit 0
-c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kme
r-mode 1 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 72 --compressed 0 -v 3
Index version: 16
Generated by: b768f48f0bd73688b6a68132159a97f7b6310f03
ScoreMatrix: VTML80.out
Query database size: 190 type: Aminoacid
Estimated memory consumption: 148G
Target database size: 29291635 type: Aminoacid
Process prefiltering step 1 of 1
k-mer similarity threshold: 96
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 190
Target db start 1 to 29291635
^CTraceback (most recent call last): ] 37.57% 72 eta 0s
I had to stop it as mmseqs took hours without progress.
Context
I am quite puzzled what I should do to figure this out.
The machine is located on our cluster, so there is enough disk space and memory.
I tried to check the process status, and it is always in the D status with 100-200% CPU usage ( based on htop outputs).
Not sure how I can speed things up at this stage.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): b768f48
- Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): self-complied
- For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: gcc 6.1
- Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, support AVX2/SSE, total 503 G memory (
free -g)
- Operating system and version: Red Hat Enterprise Linux Server release 7.6 (Maipo)
Expected Behavior
The analysis finished in minutes on MMSeq2 MSA server using colabfold
Current Behavior
Local mmseqs always paused for hours without generating outputs
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
I am using
colab_searchwhich callsmmseqslikesearch search_results/qdb db/uniref30_2103_db search_results/res search_results/tmp --num-iterations 3 --db-load-mode 2 -a -s 8 -e 0.1 --max-seqs 10000 --split 8. The query contains 4 amino acid sequences, and each has the length of 493 amino acid.NOTE, when I took off
--split 8, I also observed that mmseqs halts at certain stage.MMseqs Output (for bugs)
I had to stop it as mmseqs took hours without progress.
Context
I am quite puzzled what I should do to figure this out.
The machine is located on our cluster, so there is enough disk space and memory.
I tried to check the process status, and it is always in the
Dstatus with 100-200% CPU usage ( based onhtopoutputs).Not sure how I can speed things up at this stage.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
free -g)