Skip to content

uniprotdb.index file is showing as generic file  #887

@vineethvintu

Description

@vineethvintu

Expected Behavior

I have provided the below command
mmseqs expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124

I have created the Uniprotdb using mmseqs createdb command so the uniportdb.index file was created with it.

Current Behavior

But I am seeing after giving expandaln command facing an issue saying the uniprotdb.index is generic type
Input database "./uniprot/uniprotdb.index" has the wrong type (Generic)
Allowed input:

  • Index
  • Nucleotide
  • Profile
  • Aminoacid

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
MMSEQS="$1"
QUERY="$2"
BASE="$4"
DB1="$5"
DB2="$6"
DB3="$7"
USE_ENV="$8"
USE_TEMPLATES="$9"
FILTER="${10}"
TAXONOMY="${11}"
M8OUT="${12}"
EXPAND_EVAL=inf
ALIGN_EVAL=10
DIFF=3000
QSC=-20.0
MAX_ACCEPT=1000000
if [ "${FILTER}" = "1" ]; then
0.1 was not used in benchmarks due to POSIX shell bug in line above
EXPAND_EVAL=0.1
ALIGN_EVAL=10
QSC=0.8
MAX_ACCEPT=100000
fi
export MMSEQS_CALL_DEPTH=1
SEARCH_PARAM="--num-iterations 3 --db-load-mode 2 -a --k-score 'seq:96,prof:80' -e 0.1 --max-seqs 10000"
FILTER_PARAM="--filter-min-enable 1000 --diff ${DIFF} --qid 0.0,0.2,0.4,0.6,0.8,1.0 --qsc 0 --max-seq-id 0.95"
EXPAND_PARAM="--expansion-mode 0 -e ${EXPAND_EVAL} --expand-filter-clusters ${FILTER} --max-seq-id 0.95"
mkdir -p "${BASE}"
mkdir -p "${BASE}"
"${MMSEQS}" createdb "${QUERY}" "${BASE}/qdb"
"${MMSEQS}" search "${BASE}/qdb" "${DB1}" "${BASE}/res" "${BASE}/tmp1" $SEARCH_PARAM
"${MMSEQS}" mvdb "${BASE}/tmp1/latest/profile_1" "${BASE}/prof_res"
"${MMSEQS}" lndb "${BASE}/qdb_h" "${BASE}/prof_res_h"
mmseqs expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124

I got stucked at the above command

next I am gonna do
"${MMSEQS}" align "${BASE}/prof_res" "${DB1}.idx" "${BASE}/res_exp" "${BASE}/res_exp_realign" --db-load-mode 2 -e ${ALIGN_EVAL} --max-accept ${MAX_ACCEPT} --alt-ali 10 -a
"${MMSEQS}" filterresult "${BASE}/qdb" "${DB1}.idx" "${BASE}/res_exp_realign" "${BASE}/res_exp_realign_filter" --db-load-mode 2 --qid 0 --qsc $QSC --diff 0 --max-seq-id 1.0 --filter-min-enable 100

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
$ time mmseqs expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124
expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124

MMseqs Version: GITDIR-NOTFOUND
Expansion mode 0
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Max sequence length 65535
Score bias 0
Compositional bias 1
Compositional bias 1
E-value threshold inf
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Pseudo count mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Expand filter clusters 0
Use filter only at N seqs 0
Maximum seq. id. threshold 0.95
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Preload mode 2
Compressed 0
Threads 124
Verbosity 3

Input database "./uniprot/uniprotdb.index" has the wrong type (Generic)
Allowed input:

  • Index
  • Nucleotide
  • Profile
  • Aminoacid

Context

trying to get the mmseqs out in the MSA format so we can input that to Alphafold to predict the structure of protein

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
    MMseqs2 (Many against Many sequence searching) is an open-source software suite for very fast,
    parallelized protein sequence searches and clustering of huge protein sequence data sets.

Please cite: M. Steinegger and J. Soding. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi:10.1038/nbt.3988 (2017).

MMseqs2 Version: GITDIR-NOTFOUND
© Martin Steinegger (martin.steinegger@snu.ac.kr)

usage: mmseqs []

Easy workflows for plain text input/output
easy-search Sensitive homology search
easy-cluster Slower, sensitive clustering
easy-linclust Fast linear time cluster, less sensitive clustering
easy-taxonomy Taxonomic classification
easy-rbh Find reciprocal best hit

Main workflows for database input/output
search Sensitive homology search
map Map nearly identical sequences
rbh Reciprocal best hit search
linclust Fast, less sensitive clustering
cluster Slower, sensitive clustering
clusterupdate Update previous clustering with new sequences
taxonomy Taxonomic classification

Input database creation
databases List and download databases
createdb Convert FASTA/Q file(s) to a sequence DB
createindex Store precomputed index on disk to reduce search overhead
convertmsa Convert Stockholm/PFAM MSA file to a MSA DB
msa2profile Convert a MSA DB to a profile DB

Format conversion for downstream processing
convertalis Convert alignment DB to BLAST-tab, SAM or custom format
createtsv Convert result DB to tab-separated flat file
convert2fasta Convert sequence DB to FASTA format
taxonomyreport Create a taxonomy report in Kraken or Krona format

An extended list of all modules can be obtained by calling 'mmseqs -h'.

Bash completion for modules and parameters can be installed by adding "source MMSEQS_HOME/util/bash-completion.sh" to your "$HOME/.bash_profile".
Include the location of the MMseqs2 binary in your "$PATH" environment variable.

  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
    $ which mmseqs
    ~/MMseqs2-71dd32ec43e3ac4dabf111bbc4b124f1c66a85f1/build/bin/mmseqs
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:
    MACOS 15

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions