Provide pre-training code? #30

Jacoberts · 2021-02-05T20:48:49Z

Hi there!

I'm trying to compare ESM to UniRep, the embedding from the Church lab, for variant function prediction. Eventually, there are a few proteins our lab would like to optimize, and ESM has some advantages over UniRep. I need to "evolutionarily fine tune" ESM, as the Church lab does for UniRep: refine the global model's weights by continuing training on a small neighborhood (~100k sequences) around the target protein.

Could y'all provide any of the code you used in the pre-training task? Eg, your implementations of noising / masking, your loss function, or your gradient descent function?

Thank you, I think ESM is super cool!
Best,
Jacob

Jacoberts · 2021-02-05T20:55:11Z

Just saw the closed issue #11, which is very similar! If you're still not planning on providing any of your fairseq code, then I understand closing this out as duplicate. I'd really appreciate if you could provide the code, though!

joshim5 · 2021-02-05T21:45:26Z

HI @Jacoberts, thanks for your interest! In our experiments, we didn't see much of an improvement from evolutionary fine-tuning. For example, see Figure 15 in the appendix of our recent paper at ICLR 2021. We don't have plans to release any pre-training code at this time, but I would encourage you to try using ESM even without evolutionary fine-tuning. You may be surprised by the results!

Jacoberts · 2021-02-07T01:19:58Z

Hi @joshim5, thanks for your reply! I'm finding that ESM is underperforming UniRep and eUniRep on the prediction task defined by Alley et al. Honestly your results make sense to me: I wouldn't expect evotuning to do much. But the Church lab had a phenomenal increase in recall in the generalization set by evotuning! I think I'll try to whip up a fairseq config for ESM and see if eESM does any better.

gokceneraslan · 2021-02-25T20:17:09Z

Could y'all provide any of the code you used in the pre-training task? Eg, your implementations of noising / masking, your loss function, or your gradient descent function?

@joshim5 Do you mind commenting on this part of the issue too? Thanks :)

joshim5 · 2021-02-25T21:06:39Z

@gokceneraslan these details are listed in our pre-print. See page 21 "Pre-training task." If you find anything missing, feel free to start a new discussion and we're happy to clarify any details.

gokceneraslan · 2021-02-25T23:21:54Z

@joshim5 Thanks for the reply, sorry I missed the explanation that there is no plan to release any pre-training code at this time.

I hugely appreciate the quality of the released code but I think not releasing the training code (which is obviously non-trivial for a model with this complexity) highly hinders the overall reproducibility of the paper, and is a very bad practice, especially in the compbio domain (for those who are wondering how it should be done, here is a good example: https://github.com/kundajelab/bpnet-manuscript by @Avsecz).

hussius · 2021-10-12T18:00:33Z

@Jacoberts In case you managed to whip up a fairseq config, I'd be very grateful if you could share it!

michaelalb · 2022-03-21T12:56:05Z

@Jacoberts or anyone else who created code for pretraining with fairseq or any other framework and can share, It would be a big help

hussius · 2022-03-21T13:27:19Z

Since ESM-1b is now available on Huggingface (https://huggingface.co/facebook/esm-1b), you should be able to use the HuggingFace tooling for evolutionary finetuning/pretraining.

ulupo · 2022-04-06T14:06:10Z

Along these lines, I have the following question (tangentially relevant to #143): it isn't 100% clear to me, having read the MSA Transformer paper, whether the initial token embedding weights were also randomly initialised and learnt as part of the overall MLM pre-training, or whether pre-computed embeddings (trained separately in some other way) were fed to the model at pre-training time. I imagine the former was the case, but would appreciate the clarification. Thanks!

tomsercu · 2022-04-06T17:55:55Z

initial token embedding weights were also randomly initialised

Correct, This is how it was done, there is no change wrt fairseq TransformerSentenceEncoder self.embed_tokens

ulupo · 2022-04-06T17:59:55Z

Thanks @tomsercu, really appreciate the fast replies.

ulupo · 2022-04-06T20:42:37Z

@tomsercu just one last thing: You probably meant to also quote "and learnt as part of the overall MLM pre-training", right?

tomsercu · 2022-04-06T21:36:13Z

yes they're just regular model weights of the MSA transformer, all being trained.

Jacoberts closed this as completed Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide pre-training code? #30

Provide pre-training code? #30

Jacoberts commented Feb 5, 2021

Jacoberts commented Feb 5, 2021 •

edited

joshim5 commented Feb 5, 2021

Jacoberts commented Feb 7, 2021 •

edited

gokceneraslan commented Feb 25, 2021

joshim5 commented Feb 25, 2021

gokceneraslan commented Feb 25, 2021

hussius commented Oct 12, 2021

michaelalb commented Mar 21, 2022

hussius commented Mar 21, 2022

ulupo commented Apr 6, 2022 •

edited

tomsercu commented Apr 6, 2022 •

edited

ulupo commented Apr 6, 2022

ulupo commented Apr 6, 2022 •

edited

tomsercu commented Apr 6, 2022

Provide pre-training code? #30

Provide pre-training code? #30

Comments

Jacoberts commented Feb 5, 2021

Jacoberts commented Feb 5, 2021 • edited

joshim5 commented Feb 5, 2021

Jacoberts commented Feb 7, 2021 • edited

gokceneraslan commented Feb 25, 2021

joshim5 commented Feb 25, 2021

gokceneraslan commented Feb 25, 2021

hussius commented Oct 12, 2021

michaelalb commented Mar 21, 2022

hussius commented Mar 21, 2022

ulupo commented Apr 6, 2022 • edited

tomsercu commented Apr 6, 2022 • edited

ulupo commented Apr 6, 2022

ulupo commented Apr 6, 2022 • edited

tomsercu commented Apr 6, 2022

Jacoberts commented Feb 5, 2021 •

edited

Jacoberts commented Feb 7, 2021 •

edited

ulupo commented Apr 6, 2022 •

edited

tomsercu commented Apr 6, 2022 •

edited

ulupo commented Apr 6, 2022 •

edited