Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databases used in the mmseq2 search, local version #20

Open
guilhemfaure opened this issue Aug 9, 2021 · 8 comments
Open

Databases used in the mmseq2 search, local version #20

guilhemfaure opened this issue Aug 9, 2021 · 8 comments

Comments

@guilhemfaure
Copy link

Hello,
I would like to run locally the msa building step of the colab notebook and use the exact same set of databases to do some comparison with other databases.
Is it possible to get access to the set of databases the mmseq2 server is using as well as the version of mmseqs2 and the specific command lines executed on the server?
In the slides you presented (awesome presentation!), you mentioned you are using a 30%id clustered DB built from SMAG, MGNIFY, BFD, and MetaEuk. Do you provide somewhere a downloadable version of the master 30%seq_id db?

Thanks a lot!

@milot-mirdita
Copy link
Collaborator

We are working on preparing the preprint and will make the databases available then. This should hopefully happen very soon.

@guilhemfaure
Copy link
Author

Thanks a lot! Looking forward to reading your paper!

@avilella
Copy link

I am also interested in running ColabFold (MMseqs2 works great for me) on a local installation, or a way that allows us to programmatically call it for 10E4-10E5 of molecules. Looking forward to a solution one way or another, and also about reading the details behind in a preprint.

@fstrozzi
Copy link

fstrozzi commented Sep 15, 2021

Hello,
the preprint came out https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1.full.pdf but it doesn't seem to mention a direct access to download the clustered database used for MMSeqs2 search. Do you think it would be possible to provide a direct link for that ?

Thanks for all this work, ColabFold is just great !

@martin-steinegger
Copy link
Collaborator

We are so sorry for the delay. We have the database ready but our FTP storage space is limited. We asked our IT for an increase of the quota. Once we get it approved we will upload the database and scripts how to build and run it.

@fstrozzi
Copy link

@martin-steinegger nothing to be sorry about, you are doing a fantastic job with this project !

And thanks for the quick answer. Have you also thought about storing these datasets and the database in the cloud with e.g. the AWS Open Dataset repository (and/or the equivalent thing on Google Cloud ?)

@martin-steinegger
Copy link
Collaborator

@fstrozzi thank you! We would be happy to host our databases on the open dataset repository. But we were never successful when applying to Google or AWS.

@milot-mirdita
Copy link
Collaborator

We have uploaded the ColabFold databases at https://colabfold.mmseqs.com. You can find instructions how to create MMseqs2 databases from these archives in the MMseqs2 wiki.

We also finished merging all the MMseqs2 changes back to the main repository (starting from commit soedinglab/MMseqs2@f651879 it should work).

We will make running everything easier as soon as possible, however you should be able to get a local ColabFold installation running.
We haven't finished setup procedures for the template search databases yet. Hopefully we will manage that in the next few days too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants