Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download the model weights to local #174

Closed
lzhangUT opened this issue Mar 8, 2022 · 7 comments
Closed

download the model weights to local #174

lzhangUT opened this issue Mar 8, 2022 · 7 comments

Comments

@lzhangUT
Copy link

lzhangUT commented Mar 8, 2022

Hi, @rmrao,
I am interested in using the esm-iv models (1-5), it takes a long time to download the model/model weights every time when I run one sequence. I wonder if I can download the model weights (or model) into my local workspace, as in Azure Databricks? so It would read the model very quickly when I need to run a lot of sequences. if so, what would be the code to do that?
THank you.

@rmrao
Copy link
Contributor

rmrao commented Mar 9, 2022

By default, the checkpoints are downloaded to and cached in the directory defined by f"{torch.hub.get_dir()}/checkpoints". If you simply modify the torch hub cache directory (see the documentation here) before the model is downloaded, it should download it to the new cache directory and future runs should check this cache directory.

If it's downloading the model every time, then probably the default cache points to some ephemeral directory only associated with a particular Azure instance. If you change this to point to a persistent storage directory, that should solve the issue.

If for any reason that doesn't work, the URLs for all the models are of the form f"https://dl.fbaipublicfiles.com/fair-esm/models/{model_name}.pt". You can manually download the model and load it with esm.pretrained.load_model_and_alphabet_local(<path/to/file>).

@lzhangUT
Copy link
Author

lzhangUT commented Mar 10, 2022 via email

@rmrao
Copy link
Contributor

rmrao commented Mar 10, 2022

You can ignore the warning. (No regression weights are provided for ESM-1v because its not designed for contact prediction. We should probably just throw an error if you try to do contact prediction with ESM-1v models.)

As for the speed - after the first time it downloads the model, it should never download it again. If you're still seeing slowdowns you could change the loading code to the local loading version (esm.pretrained.load_model_and_alphabet_local(<path/to/file>)) to ensure that it's not downloading anything.

It's possible when using a cloud setup that the transfer of the model weights from storage actually takes quite a while (the weights for each individual model are ~10GB). I'm not familiar enough with Azure to suggest solutions if this is the issue.

@lzhangUT
Copy link
Author

Thank you, @rmrao
everything works now.
I do have another question:
when I use the esm-iv models (1-5) predictions for validation/test, in the paper, it was mentioned that the average value of the five model predictions or/and the ensemble of the five model was compared with the experimental data. It is easy to think about the average of the five models, what is the ensemble value? Is there another model to count for the ensemble or different calculation from the five models to count for?
Thanks

@lzhangUT
Copy link
Author

another question is about the bootstrap.
from the paper, it says 'To compute bootstraps for the pointplots (figure 3), we randomly resample each deep mutational scan (with
replacement) and compute the Spearman ρ between the experimental data and model predictions.

Figure 3 also says 'Points are mean ± std of 20 bootstrapped samples.'.
a little confused here, here 20 is the sample size from each dataset (that would be too small, wouldn't it), or 20 is the times of bootstrapping, if so, what is the sample size for each bootstrap (the number of observations for each dataset itself?
THanks

@rmrao
Copy link
Contributor

rmrao commented Mar 16, 2022

As far as I'm aware the ensemble prediction is the average of the five models. If there's a part of the paper that implies something different, let me know and I can take a look.

I actually didn't run the bootstrapping experiment so I'm not sure of the details right now. @robert-verkuil, do you happen to know? If not I'll try to figure it out but it may take a week or so.

@tomsercu
Copy link
Contributor

Hi @lzhangUT , thanks for your interest in our models!
The follow-up questions would better belong int he discussion forums, but let me quickly answer:

  • Average of models: first compute rhos, then average. Vs ensemble: average predictions, ie different ranking before computing the spearman rho.
  • Bootstrap: for a given protein, the DMS size is N (typically order of thousands of mutations). The bootstrap sample size is N, sampled with replacement. "mean ± std of 20 bootstrapped samples" means we repeat this bootstrap sample 20 times, then compute rho for each of the 20 bootstrap samples, compute mean and std.

Hope that helps - If anything is unclear you can open a gh discussion and we can follow up there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants