Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

Extracting per-residue/per-protein embeddings on GPU #2

Closed
ptynecki opened this issue Sep 5, 2020 · 1 comment
Closed

Extracting per-residue/per-protein embeddings on GPU #2

ptynecki opened this issue Sep 5, 2020 · 1 comment

Comments

@ptynecki
Copy link

ptynecki commented Sep 5, 2020

Hey,

Thank you for doing the research which is needed in order to many biotech issues.

Is there any plan to add support for extracting per-residue embeddings on GPU (multi-GPU)?

...

# Extract per-residue embeddings (on CPU)
with torch.no_grad():
    results = model(batch_tokens, repr_layers=[34])

...

I have another question: how can I apply ESM embedding to get per-protein vector?
Is it enough if I will apply mean(dim=0)?

Thanks,
Piotr

@joshim5
Copy link
Contributor

joshim5 commented Sep 6, 2020

Hi Piotr, thanks for your interest and these great questions!
Yes, you can certainly extract per-residue embeddings on GPU. It's as easy as calling model.cuda() before extracting the representations. Here's a short tutorial explaining this in more detail.

To answer your second question, you can get per-protein vectors by averaging the representations. It's a little more complicated than applying mean(dim=0) because it's important to (a) drop the initial beginning of sentence token; and (b) remove all padding tokens. You can use the provided extract.py script with --include mean to do this automatically. Here's the relevant line of code that applies the mean pooling.

I'm closing out this issue, but feel free to reopen if you have any more questions.

@joshim5 joshim5 closed this as completed Sep 6, 2020
tomsercu added a commit that referenced this issue Nov 1, 2022
commit dc668778e747f2f9759bac3ac0c77d0056e118d3
Merge: 92cde0e ca8a710
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 22:17:48 2022 -0400

    Merge remote-tracking branch 'public/main' into release_staging

commit 92cde0ead44afff919f13c85f4ec3890630a983b
Merge: a4996ab 13235b5
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 21:28:08 2022 -0400

    Merge pull request #13 from tomsercu/tsercu/writing

    small edits to readme, docstring

commit 13235b54f0305f8fdf8585dfdeb3bf898f7f05b4
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 21:26:57 2022 -0400

    address roshan comment esmfold_v0

commit 47c1d26e6df7f762542298c26096e2d0d344fc57
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 18:45:18 2022 -0400

    small edits to readme, docstring

commit a4996abb967016b4931369da9eeefb2196097cb1
Merge: 9f6cf02 ef946bd
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 18:48:11 2022 -0400

    Merge pull request #10 from tomsercu/update-requirements

    Update requirements

commit ef946bd3fdde9b6f7be0332ef764bd93fa195a5c
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 18:43:54 2022 -0400

    pin deepspeed, add scipy, fix test

commit e5fdf19761e50ddf82c9bd67b1802d376236523b
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 16:59:30 2022 -0400

    Update README.md

commit 9f6cf02d47f8474d187a9970e528747ea1124da0
Merge: 3b30d05 580acd3
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 16:54:09 2022 -0400

    Merge pull request #11 from tomsercu/set-chunk-size

    add chunk size option

commit 580acd317f8ff9e367cf567d58ecc935d266b4b2
Author: Roshan Rao <rmrao@fb.com>
Date:   Mon Oct 31 08:10:21 2022 -0700

    add chunk size option

commit 57b34dd65c2ce33396aabb859d7eba621faa6127
Author: Roshan Rao <rmrao@fb.com>
Date:   Mon Oct 31 07:51:40 2022 -0700

    make note about openfold install failures

commit 373d64f6b1bb826c205ca62dbfc7d3bd767c54b3
Author: Roshan Rao <rmrao@fb.com>
Date:   Mon Oct 31 07:49:31 2022 -0700

    updated requirements to support easier pip install

commit 3b30d05d4f9c1bcb55b10a690b9b43c4486bcf3c
Merge: 9daebe3 5d58359
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 10:08:45 2022 -0400

    Merge pull request #8 from tomsercu/tsercu/updates

    README changes, version bump, random touchups

commit 5d58359b2733eb44be37d05f1bf518ecf1eb942e
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 09:40:54 2022 -0400

    fixtest

commit edcf8d59f7776a7db25edfa21a092f6f10d0221e
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Sun Oct 30 23:47:45 2022 -0400

    README changes, version bump, random touchups

commit 9daebe36b02850ba010397076055867c1f74b1dd
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Mon Oct 31 11:12:53 2022 +0000

    Add esmfold_v0 checkpoint (#9)

commit e9297b7625f7868b08b35cf7cf75af1d7ad70e86
Merge: 64b3d98 22dade4
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Sun Oct 30 23:24:43 2022 -0400

    Merge pull request #4 fromupdate-README

    Update readme, add batched inference

commit 22dade46abcfe6a3a39d51af8fd1cdc481f256d6
Author: Roshan Rao <rmrao@fb.com>
Date:   Sat Oct 29 15:25:56 2022 -0700

    change paramters to match alphafold

commit f5316e99ff1ffdeaca216104ba17915798f978d2
Author: Roshan Rao <rmrao@fb.com>
Date:   Sat Oct 29 15:25:41 2022 -0700

    import torch in example

commit d254595de411db35e55cdd8d182732de5061eadb
Author: Roshan Rao <rmrao@fb.com>
Date:   Fri Oct 28 10:21:10 2022 -0700

    update README with structure prediction examples

commit 24bae1ffa6c9553522adc0fc4649b65db9a2c162
Author: Roshan Rao <rmrao@fb.com>
Date:   Fri Oct 28 10:08:59 2022 -0700

    batch inference working

commit 64b3d98282995f5e99e66ed817d6b05550544984
Merge: 010f1a8 4dcb94f
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 16:37:33 2022 -0400

    Merge pull request #3 from tomsercu/support-recycles-and-multimers

    Add support for recycling and for multimer prediction

commit 4dcb94fb23dbe5672bf1e441c46a8b6a2699d955
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 13:36:36 2022 -0700

    fix bug in encoding + make recycles default to cfg.trunk.max_recycles

commit 795502f0c7f6afc8317df8000b9453d9910fd6be
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 07:54:23 2022 -0700

    refactor + fix openfold numpy version bug

commit 948b410cb744fd8f34b026cc689c1ae48b3e0770
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 07:53:53 2022 -0700

    change model naming to esmfold_v1, add it into esm.pretrained

commit 77ba2924d980f5b0f39f27cfa873dc687f3916e7
Author: Roshan Rao <rmrao@fb.com>
Date:   Wed Oct 26 17:02:32 2022 -0700

    Add support for recycling and for multimer prediction

commit 010f1a8fc530b6b3297f0252e0f7f91248a23a71
Merge: bf28b26 566d6f4
Author: Roshan Rao <rmrao@fb.com>
Date:   Wed Oct 26 19:49:19 2022 -0400

    Merge pull request #2 from tomsercu/simplified-release

    simplify model code

commit 566d6f4b7cbc93a92258a445eabbd8cba2711184
Author: Roshan Rao <rmrao@fb.com>
Date:   Wed Oct 26 08:35:28 2022 -0700

    simplify model code

commit bf28b266f5235a3de094bcd50b0baba6130908ab
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 08:59:03 2022 -0700

    Inference script

commit a13a05c08c21cd2cf69dbcc4c7067fcbccc7b4ef
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 08:45:08 2022 -0700

    Inference script

commit 54e5290c7e8c5d2a7b1c75cddd35a0f3a67ecc4c
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 07:45:37 2022 -0700

    pretrained model loading

commit 596f7e53c7598badbf025ceb73a30f2f5fe31deb
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 06:51:28 2022 -0700

    Simplify code

commit afc818701f09e3eae86e4355c3b9504a7d6b0234
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 05:38:24 2022 -0700

    Simplify code

commit 8482a5e803697885c7448290139dcc1c1cc01099
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 4 08:14:02 2022 -0700

    ESMFold public release

commit ddc0ce179ac4714d55927ebcf17815362d63880c
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 4 07:55:10 2022 -0700

    ESMFold public release

commit 8f53b2fbaf64b576538c6e91eb50224c7a460875
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 4 07:21:39 2022 -0700

    ESMFold public release

Co-authored-by: Zeming Lin <ebetica0@gmail.com>
Co-authored-by: Nikita Smetanin <nikitasmetanin@meta.com>
Co-authored-by: Roshan Rao <rmrao@fb.com>
Co-authored-by: Tom Sercu <tsercu@meta.com>
harryhaemin pushed a commit to harryhaemin/esm that referenced this issue Feb 24, 2023
commit dc668778e747f2f9759bac3ac0c77d0056e118d3
Merge: 92cde0e ca8a710
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 22:17:48 2022 -0400

    Merge remote-tracking branch 'public/main' into release_staging

commit 92cde0ead44afff919f13c85f4ec3890630a983b
Merge: a4996ab 13235b5
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 21:28:08 2022 -0400

    Merge pull request facebookresearch#13 from tomsercu/tsercu/writing

    small edits to readme, docstring

commit 13235b54f0305f8fdf8585dfdeb3bf898f7f05b4
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 21:26:57 2022 -0400

    address roshan comment esmfold_v0

commit 47c1d26e6df7f762542298c26096e2d0d344fc57
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 18:45:18 2022 -0400

    small edits to readme, docstring

commit a4996abb967016b4931369da9eeefb2196097cb1
Merge: 9f6cf02 ef946bd
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 18:48:11 2022 -0400

    Merge pull request facebookresearch#10 from tomsercu/update-requirements

    Update requirements

commit ef946bd3fdde9b6f7be0332ef764bd93fa195a5c
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 18:43:54 2022 -0400

    pin deepspeed, add scipy, fix test

commit e5fdf19761e50ddf82c9bd67b1802d376236523b
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 16:59:30 2022 -0400

    Update README.md

commit 9f6cf02d47f8474d187a9970e528747ea1124da0
Merge: 3b30d05 580acd3
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 16:54:09 2022 -0400

    Merge pull request facebookresearch#11 from tomsercu/set-chunk-size

    add chunk size option

commit 580acd317f8ff9e367cf567d58ecc935d266b4b2
Author: Roshan Rao <rmrao@fb.com>
Date:   Mon Oct 31 08:10:21 2022 -0700

    add chunk size option

commit 57b34dd65c2ce33396aabb859d7eba621faa6127
Author: Roshan Rao <rmrao@fb.com>
Date:   Mon Oct 31 07:51:40 2022 -0700

    make note about openfold install failures

commit 373d64f6b1bb826c205ca62dbfc7d3bd767c54b3
Author: Roshan Rao <rmrao@fb.com>
Date:   Mon Oct 31 07:49:31 2022 -0700

    updated requirements to support easier pip install

commit 3b30d05d4f9c1bcb55b10a690b9b43c4486bcf3c
Merge: 9daebe3 5d58359
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 10:08:45 2022 -0400

    Merge pull request facebookresearch#8 from tomsercu/tsercu/updates

    README changes, version bump, random touchups

commit 5d58359b2733eb44be37d05f1bf518ecf1eb942e
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Mon Oct 31 09:40:54 2022 -0400

    fixtest

commit edcf8d59f7776a7db25edfa21a092f6f10d0221e
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Sun Oct 30 23:47:45 2022 -0400

    README changes, version bump, random touchups

commit 9daebe36b02850ba010397076055867c1f74b1dd
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Mon Oct 31 11:12:53 2022 +0000

    Add esmfold_v0 checkpoint (facebookresearch#9)

commit e9297b7625f7868b08b35cf7cf75af1d7ad70e86
Merge: 64b3d98 22dade4
Author: Tom Sercu <tom.sercu@gmail.com>
Date:   Sun Oct 30 23:24:43 2022 -0400

    Merge pull request facebookresearch#4 fromupdate-README

    Update readme, add batched inference

commit 22dade46abcfe6a3a39d51af8fd1cdc481f256d6
Author: Roshan Rao <rmrao@fb.com>
Date:   Sat Oct 29 15:25:56 2022 -0700

    change paramters to match alphafold

commit f5316e99ff1ffdeaca216104ba17915798f978d2
Author: Roshan Rao <rmrao@fb.com>
Date:   Sat Oct 29 15:25:41 2022 -0700

    import torch in example

commit d254595de411db35e55cdd8d182732de5061eadb
Author: Roshan Rao <rmrao@fb.com>
Date:   Fri Oct 28 10:21:10 2022 -0700

    update README with structure prediction examples

commit 24bae1ffa6c9553522adc0fc4649b65db9a2c162
Author: Roshan Rao <rmrao@fb.com>
Date:   Fri Oct 28 10:08:59 2022 -0700

    batch inference working

commit 64b3d98282995f5e99e66ed817d6b05550544984
Merge: 010f1a8 4dcb94f
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 16:37:33 2022 -0400

    Merge pull request facebookresearch#3 from tomsercu/support-recycles-and-multimers

    Add support for recycling and for multimer prediction

commit 4dcb94fb23dbe5672bf1e441c46a8b6a2699d955
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 13:36:36 2022 -0700

    fix bug in encoding + make recycles default to cfg.trunk.max_recycles

commit 795502f0c7f6afc8317df8000b9453d9910fd6be
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 07:54:23 2022 -0700

    refactor + fix openfold numpy version bug

commit 948b410cb744fd8f34b026cc689c1ae48b3e0770
Author: Roshan Rao <rmrao@fb.com>
Date:   Thu Oct 27 07:53:53 2022 -0700

    change model naming to esmfold_v1, add it into esm.pretrained

commit 77ba2924d980f5b0f39f27cfa873dc687f3916e7
Author: Roshan Rao <rmrao@fb.com>
Date:   Wed Oct 26 17:02:32 2022 -0700

    Add support for recycling and for multimer prediction

commit 010f1a8fc530b6b3297f0252e0f7f91248a23a71
Merge: bf28b26 566d6f4
Author: Roshan Rao <rmrao@fb.com>
Date:   Wed Oct 26 19:49:19 2022 -0400

    Merge pull request facebookresearch#2 from tomsercu/simplified-release

    simplify model code

commit 566d6f4b7cbc93a92258a445eabbd8cba2711184
Author: Roshan Rao <rmrao@fb.com>
Date:   Wed Oct 26 08:35:28 2022 -0700

    simplify model code

commit bf28b266f5235a3de094bcd50b0baba6130908ab
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 08:59:03 2022 -0700

    Inference script

commit a13a05c08c21cd2cf69dbcc4c7067fcbccc7b4ef
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 08:45:08 2022 -0700

    Inference script

commit 54e5290c7e8c5d2a7b1c75cddd35a0f3a67ecc4c
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 07:45:37 2022 -0700

    pretrained model loading

commit 596f7e53c7598badbf025ceb73a30f2f5fe31deb
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 06:51:28 2022 -0700

    Simplify code

commit afc818701f09e3eae86e4355c3b9504a7d6b0234
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 25 05:38:24 2022 -0700

    Simplify code

commit 8482a5e803697885c7448290139dcc1c1cc01099
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 4 08:14:02 2022 -0700

    ESMFold public release

commit ddc0ce179ac4714d55927ebcf17815362d63880c
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 4 07:55:10 2022 -0700

    ESMFold public release

commit 8f53b2fbaf64b576538c6e91eb50224c7a460875
Author: Nikita Smetanin <nikitozzz.pl@gmail.com>
Date:   Tue Oct 4 07:21:39 2022 -0700

    ESMFold public release

Co-authored-by: Zeming Lin <ebetica0@gmail.com>
Co-authored-by: Nikita Smetanin <nikitasmetanin@meta.com>
Co-authored-by: Roshan Rao <rmrao@fb.com>
Co-authored-by: Tom Sercu <tsercu@meta.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants