New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run the notebooks over a .fasta file? #5
Comments
Unfortunately, Google-Colab is not designed for production runs. It is intended to provide an interactive session. If we provide capabilities to iterate through many proteins (with minimal "interactive" input from user), the user will be heavily penalized (lose good-GPU priority) for any future google-colab runs. That being said, we could provide non-google-colab/non-notebook examples for production runs. |
Thank you so much. I use a pro version of Colab. Do you think the same issue would still be problematic for pro users?. Also, please provide the non-google-colab/non-notebook examples. I have a fasta file with 964 sequences and my task is to get model representations for all the sequences. |
We built a parser for fasta structure on top of this project which you can checkout here: https://github.com/wells-wood-research/alphafold2-multiprocessing The idea is that you give a fasta with multiple structures and the code will run them each on alpha fold. We've also added multiprocessing to run multiple structures at once. This is intended to be run with a copy of alphafold locally but I'm sure you could adapt it to run it on Colab. |
I would ask you to please not use automation to submit jobs to the MMseqs2 API currently. Right now we don't implement any prioritization, so you will block the queue for everyone. We could implement some prioritization scheme, the API should be fast enough to deal with a few thousand automated jobs. However, right now it will result in a bad user experience for Colab Notebook users. |
The jobsystem is implemented here: We will also release the script to run MMseqs2 locally soon (we are still improving MSA quality). |
I had to add rate limiting to the MSA submission endpoint. If you want a couple hundred MSAs please submit only one SINGLE job with multiple queries as one single FASTA file:
You'll eventually get two a3m (uniref and environmental) with multiple MSAs separated by null bytes. However, the order of MSAs is random (due to threading). So you'll have to look at the first line in each entry. Same for the Templates M8: the order of each block of queries is random, you'll have something like:
|
Hi Thank you so much for your help. I am thinking about calculating the MSA separately for each of my sequences and then use them to the input to 'custom MSA'. Could you please share your thoughts on this? I do not wish to cause problems to other users. |
Hi @sokrypton @milot-mirdita, I figured out the aforementioned issue. However, now I would like to extract representations learned by RoseTTAFold. Any ideas on how can I extract them? Thanks |
Is there an example for this that illustrates how the fasta file should be formatted for a homo/heterooligomer? and if running it using stand-alone AF is any different from conventional runs? |
Add --stop-at-score, --model-order parameter
Could you please tell how to run the notebooks over a fasta file ? I wish to loop through the fasta file and generate .pdb files
The text was updated successfully, but these errors were encountered: