Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to search sequences in bulk? #341

Closed
1185307269 opened this issue Nov 5, 2022 · 6 comments
Closed

How to search sequences in bulk? #341

1185307269 opened this issue Nov 5, 2022 · 6 comments
Assignees

Comments

@1185307269
Copy link

Dear all:
I have a question: How can I do a bulk search sequences in the web version?
image
Or can I download it through an API?

@tomsercu
Copy link
Contributor

tomsercu commented Nov 7, 2022

Unfortunately we don't support bulk sequence search, you should use mmseqs directly. We'll upload a fasta file and pre-computed mmseqs db.

@1185307269
Copy link
Author

Thank you very much for your reply!
Where can I download the fasta file and the pre-computed mmseqs db?

@ebetica
Copy link
Contributor

ebetica commented Nov 11, 2022

I uploaded the high quality fasta here:

s3://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/highquality_clust30.fasta

And mgnify90 here:

s3://dl.fbaipublicfiles.com/esmatlas/v0/full/mgnify90.fasta

@ebetica ebetica closed this as completed Nov 11, 2022
@1185307269
Copy link
Author

Thank you very much for your reply!!!!!But I have one more stupid question to ask: what is the difference between these two databases? If I have some sequences to search, which database should I take for searching?

@ebetica
Copy link
Contributor

ebetica commented Nov 11, 2022

The high quality is a subset of mgnify90 which is redundancy reduced and well predicted by ESMFold (pTM & pLDDT > 0.7)

tomsercu added a commit that referenced this issue Nov 17, 2022
document the fasta paths from #341 and `pip install` fix
@tomsercu
Copy link
Contributor

We now provide https://dl.fbaipublicfiles.com/esmatlas/v0/full/atlas.fasta with 617051007 records precisely matching stats.parquet.

See #366 for more context

andersoncarlosfs pushed a commit to andersoncarlosfs/esm that referenced this issue Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants