v1.0.6 (#69)

* updating (#65) * adding plotting function to color by pLDDT * moving functions to af2rank class * adding option to specify custom params * typo * remove recycle-dimension, in prep for multimer support * missed a few edits to remove recycle dims * adding multimer files * cleaning up the multimer code * initial experiment towards adding multimer support * adding missing geometry files * typos * bugfixes * dropout_scale support for multimer * all_atom_masks -> all_atom_mask (to be consistent with multimer) * debugging template injection for multimers * fixing multimer nan bug see: google-deepmind/alphafold#513 * adding multimers support for binder-hallucination * moving multimer feature creation to prep.py * fixing config * bugfix; adding multimer support for other protocols * cleanup * cleaning up the prep options * fixing crop typo thanks @hunarbatra * v1.0.6-alpha * typo * main updates (#68) * adding plotting function to color by pLDDT * moving functions to af2rank class * fixed crop_feats error * fixing colab link * adding "seq" to inputs (for custom loss) Co-authored-by: Hunar Batra <i@hunarbatra.com> * cleanup, add iptm * typos * standardizing the template-specific options * cleaning template update code * minor edit * adding fape support for multimers * multimer fape loss bugfix * splitting fape/i_fape * change order of verbose print * rewriting fape function to accept number of homo-oligomeric copies * cleaning up the code * adding weights to fape loss * adding option to control fape_cutoff (aka clamp) * adding seq_ent loss * stabilize entropy calculation * removing seq-ent for now * disabling stats correction by default * Update .gitignore * cleaning up the pair loss * adding experimental copies support to partial protocol (for Possu) * bugfixes involving partial+copies * adding homo-oligomeric support to rewire * undo last commit * bugfix * cleanup * adding seq_ent loss * correcting entropy compute based fix_seq * rescaling entropy loss based on number of fixed positions * cleanup * Update loss.py * cleaning up the code * adding alphafold-multimer support to AF2Rank * typo * adding support for repeat/homooligomers for partial hallucination * bugfixes for partial homo-oligomeric support * minor edits (for future) * refactoring prep_pdb, adding option to offset and extend length * updating binder contact loss to include binder2target and target2binder contacts * adding i_con back (as (tb_con+bt_con)/2) * typos * removing i_con * cleanup * design.py - pull _apply_gradient() out of step() * updating defaults - fixbb confidence loss set to zero and is over all positions - binder only positions in PDB are loaded, missing density ignored - adding num_tot option to control number of total contacts to optimize for * rename num_tot to num_pos * bugfix * fix_seq option replaced with fix_pos to allow control which positions are fixed * refactoring * adding experimentally resolved loss on CA * setting default exp_res weight * adding mlm * rename * bugfix * adding option to disable mlm * remove target_feat * temp, broken, updating use_crop * Update design.py * bugfix, crop_feat, remove add_batch * fixing crop options * adding helper functions * fixing typos * adding i_pae for binder design * bugfix, removing 2stage_binder_hallucination for now * cleanup * revert * updating default mlm_dropout default * cleanup * fixing backprop option for multimer model * update - adding "first" recycle mode, thanks @whitead - adding "hard" annealing step to 3 stage design - adding "ramp_recycles" option to 3 stage design * partial revert * revert latest experiments * typos * refactoring recycle code to reduce compile time * typos; changing recycle_mode default to last * improving num_recycles control * adding small plddt loss to binder hallucination default * updating readme * cleanup * Update design.ipynb * cleaning up recycle code, moving experimental crop functions to crop.py * crop.py import fix * cleaning up predict() * bugfix in partial hallucination protocol -[pos]itions not defined if copies not defined * debugging... * bugfix: fix_pos in partial * bugfix for partial fix_pos option * adding fix_pos option to trRosetta * Update joint_model.py * revert Co-authored-by: Hunar Batra <i@hunarbatra.com>
sokrypton · Sep 9, 2022 · c192460 · c192460
1 parent cb9ef26
commit c192460
Show file tree

Hide file tree

Showing 47 changed files with 5,575 additions and 1,840 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1 +1,2 @@
-**/.DS_Store
+**/.DS_Store
+*.pyc
diff --git a/af/README.md b/af/README.md
@@ -1,4 +1,4 @@
-# AfDesign (v1.0.5)
+# AfDesign (v1.0.6)
 ### Google Colab
 <a href="https://colab.research.google.com/github/sokrypton/ColabDesign/blob/main/af/design.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
@@ -16,14 +16,15 @@ Minor changes changes include renaming intra_pae/inter_con to pae/con and inter_
 - **11July2022** - v1.0.3 - Improved homo-oligomeric support. RMSD and dgram losses have been refactored to automatically save aligned coordinates. Multimeric coordinates now saved with chain identifiers.
 - **23July2022** - v1.0.4 - Adding support for openfold weights. To enable set `mk_afdesign_model(..., use_openfold=True)`.
 - **31July2022** - v1.0.5 - Refactoring to add support for swapping batch features without recompile. Allowing for implementation of [AF2Rank](https://github.com/sokrypton/ColabDesign/blob/main/af/examples/AF2Rank.ipynb)!
+- **19Aug2022** - v1.0.6 - Adding support for alphafold-multimer. To enable set `mk_afdesign_model(..., use_multimer=True)`. For multimer mode, multiple recycles maybe needed!
 
 ### setup
 ```bash
 pip install git+https://github.com/sokrypton/ColabDesign.git
 
 # download alphafold weights
 mkdir params
-curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params
+curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar | tar x -C params
 
 # download openfold weights (optional)
 for W in openfold_model_ptm_1 openfold_model_ptm_2 openfold_model_no_templ_ptm_1
@@ -97,14 +98,15 @@ model.opt["weights"]["pae"] = 0.0
 #### How do I control number of recycles used during design?
 ```python 
 model = mk_afdesign_model(num_recycles=1, recycle_mode="average")
-# if recycle_mode in ["average","last","sample"] the number of recycles can change during optimization
+# if recycle_mode in ["average",last","sample","first"] the number of recycles can change during optimization
 model.set_opt(num_recycles=1)
 ```
 - `num_recycles` - number of recycles to use during design (for denovo proteins we find 0 is often enough)
 - `recycle_mode` - optimizing across all recycles can be tricky, we experiment with a couple of ways:
-  - *last* - use loss from last recycle. (Not recommended, unless you increase number optimization)
-  - *sample* - Same as *last* but each iteration a different number of recycles are used. (Previous default).
-  - *average* - compute loss at each recycle and average gradients. (Default; Recommended).
+  - *last* - use loss from last recycle. (Default)
+  - *average* - compute loss at each recycle and average gradients. (Previous default from v.1.0.5)
+  - *sample* - Same as *last* but each iteration a different number of recycles are used.
+  - *first* - use loss from first recycle.
   - *add_prev* - average the outputs (dgram, plddt, pae) across all recycles before computing loss.
   - *backprop* - use loss from last recycle, but backprop through all recycles.
 
@@ -122,17 +124,6 @@ model.set_opt(num_models=1)
 #### Can I use OpenFold model params for design instead of AlphaFold?
 ```python
 model = mk_afdesign_model(use_openfold=True, use_alphafold=False)
-# OR
-model.set_opt(use_openfold=True, use_alphafold=False)
-```
-#### How is contact defined? How do I change it?
-By default, 2 [con]tacts per positions are optimized to be within cβ-cβ < 14.0Å and sequence seperation ≥ 9. This can be changed with:
-```python
-model.set_opt(con=dict(cutoff=8, seqsep=5, num=1))
-```
-For interface:
-```python
-model.set_opt(i_con=dict(...))
 ```
 #### For binder hallucination, can I specify the site I want to bind?
 ```python
@@ -142,12 +133,6 @@ model.prep_inputs(..., hotspot="1-10,15,3")
 ```python
 model.prep_inputs(..., chain="A,B")
 ```
-#### Can I design homo-oligomers?
-```python
-model.prep_inputs(..., copies=2)
-# specify interface specific contact and/or pae loss
-model.set_weights(i_con=1, i_pae=0)
-```
 #### For fixed backbone design, how do I force the sequence to be the same for homo-dimer optimization?
 ```python
 model.prep_inputs(pdb_filename="6Q40.pdb", chain="A,B", copies=2, homooligomer=True)
@@ -168,14 +153,13 @@ model.restart(seed=0)
   - `design_hard()` - optimize *one_hot(logits)* inputs (discrete)
 
 - For complex topologies, we find directly optimizing one_hot encoded sequence `design_hard()` to be very challenging. 
-To get around this problem, we propose optimizing in 2 or 3 stages.
-  - `design_2stage()` - *soft* → *hard*
+To get around this problem, we propose optimizing in 3 stages.
   - `design_3stage()` - *logits* → *soft* → *hard*
+
 #### What are all the different losses being optimized?
 - general losses
   - *pae*       - minimizes the predicted alignment error
   - *plddt*     - maximizes the predicted LDDT
-  - *msa_ent*   - minimize entropy for MSA design (see example at the end of notebook)
   - *pae* and *plddt* values are between 0 and 1 (where lower is better for both)
 
 - fixbb specific losses
@@ -184,18 +168,26 @@ To get around this problem, we propose optimizing in 2 or 3 stages.
   - we find *dgram_cce* loss to be more stable for design (compared to *fape*)
 
 - hallucination specific losses
-  - *con*       - maximize number of contacts. (We find just minimizing *plddt* results in single long helix, 
-and maximizing *pae* results in a two helix bundle. To encourage compact structures we add a `con` term)
+  - *con*       - maximize `1` contacts per position. `model.set_opt("con",num=1)`
 
 - binder specific losses
-  - *i_pae* - minimize PAE interface of the proteins
-  - *pae* - minimize PAE within binder
-  - *i_con* - maximize number of contacts at the interface of the proteins
-  - *con* - maximize number of contacts within binder
+  - *pae* - minimize PAE at interface and within binder
+  - *con* - - maximize `2` contacts per binder position, within binder. `model.set_opt("con",num=2)`
+  - *i_con* - maximize `1` contacts per binder position `model.set_opt("i_con",num=1)`
 
 - partial hallucination specific losses
   - *sc_fape* - sidechain-specific fape
 
+#### How is contact defined? How do I change it?
+By default, 2 [con]tacts per positions are optimized to be within cβ-cβ < 14.0Å and sequence seperation ≥ 9. This can be changed with:
+```python
+model.set_opt(con=dict(cutoff=8, seqsep=5, num=1))
+```
+For interface:
+```python
+model.set_opt(i_con=dict(...))
+```
+
 # Advanced FAQ
 #### loss during Gradient descent is too jumpy, can I do some kind of greedy search towards the end?
 Gradient descent updates multiple positions each iteration, which can be a little too aggressive during hard (discrete) mode.

diff --git a/af/design.ipynb b/af/design.ipynb
@@ -16,7 +16,7 @@
         "id": "OA2k3sAYuiXe"
       },
       "source": [
-        "#AfDesign (v1.0.5)\n",
+        "#AfDesign (v1.0.6)\n",
         "Backprop through AlphaFold for protein design.\n",
         "\n",
         "**WARNING**\n",
@@ -42,7 +42,7 @@
         "  ln -s /usr/local/lib/python3.7/dist-packages/colabdesign colabdesign\n",
         "  # download params\n",
         "  mkdir params\n",
-        "  curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params\n",
+        "  curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar | tar x -C params\n",
         "  for W in openfold_model_ptm_1 openfold_model_ptm_2 openfold_model_no_templ_ptm_1\n",
         "  do wget -qnc https://files.ipd.uw.edu/krypton/openfold/${W}.npz -P params; done\n",
         "fi"
@@ -246,7 +246,9 @@
       },
       "source": [
         "# binder hallucination\n",
-        "For a given protein target and protein binder length, generate/hallucinate a protein binder sequence AlphaFold thinks will bind to the target structure. To do this, we minimize PAE and maximize number of contacts at the interface and within the binder, and we maximize pLDDT of the binder."
+        "For a given protein target and protein binder length, generate/hallucinate a protein binder sequence AlphaFold thinks will bind to the target structure.\n",
+        "To do this, we minimize PAE and maximize number of contacts at the interface and within the binder, and we maximize pLDDT of the binder.\n",
+        "By default, AlphaFold-ptm with residue index offset hack is used. To enable AlphaFold-multimer set: mk_afdesign_model(use_multimer=True).\n"
       ]
     },
     {
@@ -275,12 +277,6 @@
       "outputs": [],
       "source": [
         "af_model.restart()\n",
-        "\n",
-        "# settings we find work best for helical peptide binder hallucination\n",
-        "af_model.set_weights(plddt=0.1, pae=0.1, i_pae=1.0, con=0.1, i_con=0.5)\n",
-        "af_model.set_opt(con=dict(binary=True, cutoff=21.6875, num=af_model._binder_len, seqsep=0))\n",
-        "af_model.set_opt(i_con=dict(binary=True, cutoff=21.6875, num=af_model._binder_len))\n",
-        "\n",
         "af_model.design_3stage(100,100,10)"
       ]
     },
@@ -373,8 +369,7 @@
         "af_model.prep_inputs(pdb_filename=get_pdb(\"6MRR\"),\n",
         "                     chain=\"A\",\n",
         "                     pos=\"3-30,33-68\",  # define positions to contrain\n",
-        "                     length=100,        # total length if different from input pdb\n",
-        "                     fix_seq=False)     # set True to constrain sequence in the specified positions\n",
+        "                     length=100)          # total length if different from input pdb\n",
         "\n",
         "af_model.rewire(loops=[36]) # set loop length between segments                     "
       ],
@@ -452,4 +447,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
+}