Add capability for localcolabfold to run vs a locally hosted database (FSx Lustre or local web-server) #19

nfg-interline · 2022-05-18T15:02:29Z

Description of feature

Currently, the localcolabfold.nf script is relying on a colabfold_batch command against the public web server hosted by the ColabFold developers (https://github.com/sokrypton/ColabFold) . This may not be obvious to first-time users of colabfold, because the workflow is checking for af2 native database files in the database path. The workflow does not use the database files if the mode is specified as colabfold.

The issue with using the public web server is that it is a limited resource. After submitting anywhere from 10-100 jobs, additional jobs get killed by the web server (see attached file "command-2.err.txt" command-2.err.txt) and IP addresses can remain blocked for some time (1-2 days). This is necessary to keep the webserver operational and serving the community.

A scalable solution for running colabfold would be highly beneficial to all nf-core/proteinfold users. Two options exist:

Solution 1: querying against a locally-installed colabfold db. The colabfold sequence dbs, which are distinct from the af2 native sequence dbs due to the fact colabfold uses mmseqs2 rather than jackhmmer for MSA generation, could be installed locally on an FSx Lustre filesystem. AWS developers have provided such a solution using FSx . NOTE this would also required the Issue "Split colabfold_batch command in localcolabfold.nf into two separate, colabfold_search + colabfold_batch, commands" to be resolved, since colabfold_batch will not run against a file system, only a webserver.

Solution 2: Provide an option for users to pass their own, locally hosted webserver url to the workflow. The local webserver can be set up following instructions here .

Of the two solutions, Solution 1 is preferable for most users. This is because it only requires that a database be maintained in a file system. Solution 2 would require a webserver to be running constantly, or to be configured and launched as part of the pipeline execution (overly complex). The webserver requires at least 2 TB of RAM, which amounts to a considerable overhead if running constantly.

nfg-interline added the enhancement Improvement for existing functionality label May 18, 2022

nfg-interline mentioned this issue May 18, 2022

Split colabfold_batch command in localcolabfold.nf into two separate, colabfold_search + colabfold_batch, commands #20

Closed

athbaltzis mentioned this issue Jun 21, 2022

Split colabfold workflow into two modes (colabfold_local and colabfold_webserver) and enable the usage of a custom webserver #23

Merged

6 tasks

athbaltzis closed this as completed in #23 Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add capability for localcolabfold to run vs a locally hosted database (FSx Lustre or local web-server) #19

Add capability for localcolabfold to run vs a locally hosted database (FSx Lustre or local web-server) #19

nfg-interline commented May 18, 2022 •

edited

Loading

Add capability for localcolabfold to run vs a locally hosted database (FSx Lustre or local web-server) #19

Add capability for localcolabfold to run vs a locally hosted database (FSx Lustre or local web-server) #19

Comments

nfg-interline commented May 18, 2022 • edited Loading

Description of feature

nfg-interline commented May 18, 2022 •

edited

Loading