Skip to content

Commit

Permalink
Add biorxiv citation
Browse files Browse the repository at this point in the history
  • Loading branch information
milot-mirdita committed Feb 9, 2022
1 parent 4927694 commit 3c64211
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 22 deletions.
20 changes: 13 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
# Foldseek
Software suite for searching and clustering protein structures.
Foldseek enables fast and sensitive comparisons of large structure sets. It reaches sensitivities similar to state-of-the-art structural aligners while being at least 20,000 times faster.

<p align="center"><img src="https://github.com/steineggerlab/foldseek/blob/master/.github/foldseek.png" height="250"/></p>

## Publications

[van Kempen M, Kim S, Tumescheit C, Mirdita M, Söding J, and Steinegger M. Foldseek: fast and accurate protein structure search. bioRxiv, doi:10.1101/2022.02.07.479398 (2021)](https://www.biorxiv.org/content/10.1101/2022.02.07.479398)

## Webserver
Search your protein structures against [AlphaFold DBs](https://alphafold.ebi.ac.uk/) and [PDB](https://www.rcsb.org/) in seconds using our Foldseek webserver. 🚀 [search.foldseek.com](https://search.foldseek.com)
Search your protein structures against the [AlphaFoldDB](https://alphafold.ebi.ac.uk/) and [PDB](https://www.rcsb.org/) in seconds using our Foldseek webserver:

[🚀search.foldseek.com](https://search.foldseek.com)

## Installation

Expand All @@ -21,7 +27,7 @@ Precompiled binaries for other architectures (ARM64, PPC64LE) and very old AMD/I

### Quick start

`easy-search` can search single or multiple queries formatted in pdb/mcif format (flat or gz) against a target database (`example/`) of protein structures. It outputs a tab separated file of the alignments (`.m8`) the fields are `query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits`.
`easy-search` can search single or multiple queries formatted in PDB/mmCIF format (flat or `.gz`) against a target database (`example/`) of protein structures. It outputs a tab-separated file of the alignments (`.m8`) the fields are `query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits`.

foldseek easy-search example/d1asha_ example/ aln.m8 tmpFolder

Expand All @@ -47,7 +53,7 @@ The target database can be pre-processed by `createdb`. This make sense if searc


### Databases
Setup the PDB or AlphaFold using the `databases` module.
Setup the PDB or AlphaFoldDB using the `databases` module.

# pdb
foldseek databases PDB pdb tmp
Expand Down Expand Up @@ -78,7 +84,7 @@ In case of the alignment type (`--alignment-type 1`) tmalign we sort the results
foldseek easy-search example/d1asha_ example/ aln tmp --alignment-type 1
```

It is possible to compute the TMscores for the any kind of alignment output (e.g. 3Di/AA) using the following commands:
It is possible to compute TMscores for the kind of alignment output (e.g. 3Di/AA) using the following commands:
```
foldseek createdb example/ targetDB
foldseek createdb example/ queryDB
Expand All @@ -102,9 +108,9 @@ Compiling `foldseek` from source has the advantage of system-specific optimizati
make install
export PATH=$(pwd)/foldseek/bin/:$PATH

:exclamation: If you want to compile `foldseek` on macOS, please install and use `gcc` from Homebrew. The default macOS `clang` compiler does not support OpenMP and `foldseek` will not be able to run multi-threaded. Adjust the `cmake` call above to:
:exclamation: If you want to compile `foldseek` on macOS, please install and use `gcc` from Homebrew. The default macOS `clang` compiler does not support OpenMP (by default) and `foldseek` will not be able to run multi-threaded. Adjust the `cmake` call above to:

CC="$(brew --prefix)/bin/gcc-10" CXX="$(brew --prefix)/bin/g++-10" cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..
CC="$(brew --prefix)/bin/gcc-11" CXX="$(brew --prefix)/bin/g++-11" cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..


## Hardware requirements
Expand Down
4 changes: 3 additions & 1 deletion src/commons/LocalParameters.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,11 @@ LocalParameters::LocalParameters() :
chainNameMode = 0;
tmAlignFast = 1;
nsample = 5000;
citations.emplace(CITATION_FOLDSEEK, "van Kempen M, Kim S,Tumescheit C, Mirdita M, Söding J, and Steinegger M. Foldseek: fast and accurate protein structure search. bioRxiv, doi:10.1101/2022.02.07.479398 (2021)");
}
std::vector<int> FoldSeekDbValidator::tmscore = {LocalParameters::DBTYPE_TMSCORE};
std::vector<int> FoldSeekDbValidator::cadb = {LocalParameters::DBTYPE_CA_ALPHA};
std::vector<int> FoldSeekDbValidator::flatfileStdinAndFolder = {LocalParameters::DBTYPE_FLATFILE, LocalParameters::DBTYPE_STDIN,LocalParameters::DBTYPE_DIRECTORY};
std::vector<int> FoldSeekDbValidator::flatfileAndFolder = {LocalParameters::DBTYPE_FLATFILE, LocalParameters::DBTYPE_DIRECTORY};
std::vector<int> FoldSeekDbValidator::flatfileAndFolder = {LocalParameters::DBTYPE_FLATFILE, LocalParameters::DBTYPE_DIRECTORY};
2 changes: 2 additions & 0 deletions src/commons/LocalParameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

#include <Parameters.h>

const int CITATION_FOLDSEEK = CITATION_END;

struct FoldSeekDbValidator : public DbValidator {
static std::vector<int> tmscore;
static std::vector<int> cadb;
Expand Down
28 changes: 14 additions & 14 deletions src/foldseek.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@

const char* binary_name = "foldseek";
const char* tool_name = "foldseek";
const char* tool_introduction = "Protein Structure Search and Clustering.";
const char* main_author = "Michel van Kempen, Stephanie Kim, Charlotte Tumescheit, Martin Steinegger";
const char* show_extended_help = NULL;
const char* tool_introduction = "Foldseek enables fast and sensitive comparisons of large structure sets. It reaches sensitivities similar to state-of-the-art structural aligners while being at least 20,000 times faster.\n\nPlease cite: van Kempen M, Kim S,Tumescheit C, Mirdita M, Söding J, and Steinegger M. Foldseek: fast and accurate protein structure search. bioRxiv, doi:10.1101/2022.02.07.479398 (2021)";
const char* main_author = "Michel van Kempen, Stephanie Kim, Charlotte Tumescheit, Milot Mirdita, Johannes Söding, Martin Steinegger";
const char* show_extended_help = "1";
const char* show_bash_info = NULL;
const char* index_version_compatible = "fs1";
bool hide_base_commands = true;
Expand All @@ -31,7 +31,7 @@ std::vector<struct Command> commands = {
"Convert PDB/mmCIF files to an db.",
"Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:PDB|mmCIF[.gz]> ... <i:PDB|mmCIF[.gz]> <o:sequenceDB>",
CITATION_MMSEQS2, {{"PDB|mmCIF[.gz]|stdin", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA | DbType::VARIADIC, &DbValidator::flatfileStdinAndGeneric },
CITATION_FOLDSEEK, {{"PDB|mmCIF[.gz]|stdin", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA | DbType::VARIADIC, &DbValidator::flatfileStdinAndGeneric },
{"sequenceDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::flatfile }}},
{"easy-search", easystructuresearch, &localPar.easystructuresearchworkflow, COMMAND_EASY,
"Sensitive homology search",
Expand All @@ -43,7 +43,7 @@ std::vector<struct Command> commands = {
"foldseek easy-search examples/d1asha_ examples/ result.m8 tmp --alignment-type 1\n\n",
"Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:PDB|mmCIF[.gz]> ... <i:PDB|mmCIF[.gz]>|<i:stdin> <i:targetFastaFile[.gz]>|<i:targetDB> <o:alignmentFile> <tmpDir>",
CITATION_SERVER | CITATION_MMSEQS2,{{"fastaFile[.gz|.bz2]", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA|DbType::VARIADIC, &FoldSeekDbValidator::flatfileStdinAndFolder },
CITATION_FOLDSEEK, {{"fastaFile[.gz|.bz2]", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA|DbType::VARIADIC, &FoldSeekDbValidator::flatfileStdinAndFolder },
{"targetDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &FoldSeekDbValidator::flatfileAndFolder },
{"alignmentFile", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::flatfile },
{"tmpDir", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::directory }}},
Expand All @@ -54,7 +54,7 @@ std::vector<struct Command> commands = {
"foldseek convertalis queryDB targetDB resultDB result.m8\n\n",
"Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:queryDB> <i:targetDB> <o:alignmentDB> <tmpDir>",
CITATION_MMSEQS2, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
CITATION_FOLDSEEK, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"targetDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"alignmentDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::alignmentDb },
{"tmpDir", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::directory }}},
Expand All @@ -74,15 +74,15 @@ std::vector<struct Command> commands = {
"mmseqs cluster sequenceDB clusterDB tmp --cluster-reassign\n",
"Martin Steinegger <martin.steinegger@snu.ac.kr> & Lars von den Driesch",
"<i:sequenceDB> <o:clusterDB> <tmpDir>",
CITATION_LINCLUST|CITATION_MMSEQS1|CITATION_MMSEQS2, {{"sequenceDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
CITATION_FOLDSEEK|CITATION_MMSEQS2, {{"sequenceDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"clusterDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::clusterDb },
{"tmpDir", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::directory }}},
{"tmalign", tmalign, &localPar.tmalign, COMMAND_ALIGNMENT,
"Compute tm-score ",
NULL,
"Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:queryDB> <i:targetDB> <i:prefilterDB> <o:resultDB>",
CITATION_MMSEQS2, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
CITATION_FOLDSEEK, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"targetDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"resultDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::resultDb },
{"alnDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &FoldSeekDbValidator::alignmentDb }}},
Expand All @@ -91,7 +91,7 @@ std::vector<struct Command> commands = {
NULL,
"Charlotte Tumescheit <ch.tumescheit@gmail.com> & Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:queryDB> <i:targetDB> <i:prefilterDB> <o:resultDB>",
CITATION_MMSEQS2, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
CITATION_FOLDSEEK, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"targetDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"resultDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::resultDb },
{"alnDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &FoldSeekDbValidator::alignmentDb }}},
Expand All @@ -100,7 +100,7 @@ std::vector<struct Command> commands = {
NULL,
"Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:queryDB> <i:targetDB> <i:alnDB> <o:resultDB>",
CITATION_MMSEQS2, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &FoldSeekDbValidator::cadb },
CITATION_FOLDSEEK, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &FoldSeekDbValidator::cadb },
{"targetDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &FoldSeekDbValidator::cadb },
{"alignmentDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::alignmentDb },
{"tmDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &FoldSeekDbValidator::tmscore }}},
Expand All @@ -109,15 +109,15 @@ std::vector<struct Command> commands = {
NULL,
"Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:queryDB> <i:targetDB> <o:resultDB>",
CITATION_MMSEQS2, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &FoldSeekDbValidator::sequenceDb },
CITATION_FOLDSEEK, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &FoldSeekDbValidator::sequenceDb },
{"targetDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &FoldSeekDbValidator::sequenceDb },
{"tmDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &FoldSeekDbValidator::genericDb }}},
{"databases", databases, &localPar.databases, COMMAND_DATABASE_CREATION,
"List and download databases",
NULL,
"Milot Mirdita <milot@mirdita.de>",
"<name> <o:sequenceDB> <tmpDir>",
CITATION_TAXONOMY|CITATION_MMSEQS2, {{"selection", 0, DbType::ZERO_OR_ALL, &DbValidator::empty },
CITATION_TAXONOMY|CITATION_FOLDSEEK, {{"selection", 0, DbType::ZERO_OR_ALL, &DbValidator::empty },
{"sequenceDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"tmpDir", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::directory }}},
{"createindex", structureindex, &localPar.createindex, COMMAND_DATABASE_CREATION,
Expand All @@ -126,7 +126,7 @@ std::vector<struct Command> commands = {
"mmseqs createindex sequenceDB tmp\n",
"Martin Steinegger <martin.steinegger@snu.ac.kr>",
"<i:sequenceDB> <tmpDir>",
CITATION_SERVER | CITATION_MMSEQS2,{{"sequenceDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA|DbType::NEED_HEADER, &DbValidator::sequenceDb },
CITATION_SERVER | CITATION_FOLDSEEK,{{"sequenceDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA|DbType::NEED_HEADER, &DbValidator::sequenceDb },
{"tmpDir", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::directory }}},
{"mmcreateindex", createindex, &localPar.createindex, COMMAND_HIDDEN,
NULL,
Expand All @@ -140,7 +140,7 @@ std::vector<struct Command> commands = {
NULL,
"",
"",
CITATION_MMSEQS2, {{"",DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, NULL}}}
CITATION_FOLDSEEK, {{"",DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, NULL}}}
};

#include "structdatabases.sh.h"
Expand Down

0 comments on commit 3c64211

Please sign in to comment.