#**RFdiffusion aa**
RFdiffusion aa is a method for structure generation, with or without conditional information (a motif, target etc). It can perform a whole range of protein design challenges as we have outlined in the RFdiffusion [manuscript](https://www.science.org/doi/10.1126/science.adl2528).

**<font color="red">NOTE:</font>** This notebook is in development, we are still working on adding all the options from the manuscript above.

For **instructions**, see end of Notebook.

See [diffusion_foldcond](https://colab.research.google.com/github/engelberger/ColabDesign/blob/max/rf/examples/diffusion.ipynb) for fold conditioning functionality.

See [original version](https://colab.research.google.com/github/sokrypton/ColabDesign/blob/main/rf/examples/diffusion_ori.ipynb) of this notebook (from 31Mar2023).



In [1]:
#@title COLAB ONLY setup **RFdiffusion All Atom** (~2m)
#%%time
import os, time, signal
import sys, random, string, re

import os
import sys



if not os.path.isdir("params"):
  os.system("apt-get && apt-get install aria2")
  os.system("apt-get install openbabel libopenbabel-dev")
  os.system("apt-get remove swig")
  os.system("apt-get install swig3.0")
  os.system("ln -s /usr/bin/swig3.0 /usr/bin/swig")
  os.system("ln -s /usr/include/openbabel3 /usr/local/include/openbabel3")
  os.system("mkdir params")
  # send param download into background
  os.system("(\
  aria2c -q -x 16 https://files.ipd.uw.edu/krypton/schedules.zip; \
  aria2c -q -x 16 http://files.ipd.uw.edu/pub/RFdiffusion/6f5902ac237024bdd0c176cb93063dc4/Base_ckpt.pt; \
  aria2c -q -x 16 http://files.ipd.uw.edu/pub/RFdiffusion/e29311f6f1bf1af907f9ef9f44b8328b/Complex_base_ckpt.pt; \
  aria2c -q -x 16 http://files.ipd.uw.edu/pub/RFdiffusion/f572d396fae9206628714fb2ce00f72e/Complex_beta_ckpt.pt; \
  aria2c -q -x 16 https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar; \
  aria2c -q -x 16 http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFDiffusionAA_paper_weights.pt; \
  tar -xf alphafold_params_2022-12-06.tar -C params; \
  touch params/done.txt) &")

if not os.path.isdir("RFdiffusion"):
  print("installing RFdiffusion...")
  os.system("git clone --branch max https://github.com/engelberger/RFdiffusion.git")
  # RoseTTAFold-All-Atom
  os.system("git clone --recurse-submodules --branch colab_march_2024 https://github.com/engelberger/rf_diffusion_all_atom.git")
  os.system("pip -q install jedi omegaconf hydra-core icecream pyrsistent assertpy deepdiff fire openbabel omegaconf")
  os.system("pip install dgl -f https://data.dgl.ai/wheels/cu121/repo.html")
  os.system("cd RFdiffusion/env/SE3Transformer; pip -q install --no-cache-dir -r requirements.txt; pip -q install .")
  os.system("wget -qnc https://files.ipd.uw.edu/krypton/ananas")
  os.system("chmod +x ananas")
  os.system("wget https://raw.githubusercontent.com/RosettaCommons/tools/8099121a5655572e3375dd6d3e9fc4f2edcd2ed3/protein_tools/scripts/clean_pdb.py")
  os.system("wget https://raw.githubusercontent.com/RosettaCommons/tools/8099121a5655572e3375dd6d3e9fc4f2edcd2ed3/protein_tools/scripts/amino_acids.py")
if not os.path.isdir("colabdesign"):
  print("installing ColabDesign...")
  os.system("pip -q install git+https://github.com/sokrypton/ColabDesign.git")
  os.system("ln -s /usr/local/lib/python3.*/dist-packages/colabdesign colabdesign")

if not os.path.isdir("RFdiffusion/models"):
  print("downloading RFdiffusion params...")
  os.system("mkdir RFdiffusion/models")
  models = ["Base_ckpt.pt","Complex_base_ckpt.pt","Complex_beta_ckpt.pt"]
  for m in models:
    while os.path.isfile(f"{m}.aria2"):
      time.sleep(5)
  os.system(f"mv {' '.join(models)} RFdiffusion/models")
  os.system("unzip schedules.zip; rm schedules.zip")


if not os.path.exists("rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt"):
  print("downloading RFdiffusion all params...")

  models = ["RFDiffusionAA_paper_weights.pt"]
  for m in models:
    while os.path.isfile(f"{m}.aria2"):
      time.sleep(5)
  os.system(f"mv {' '.join(models)} rf_diffusion_all_atom") # It looks this is not working as expected


In [2]:
## Colab setup
if "RFdiffusion" not in sys.path:
    os.environ["DGLBACKEND"] = "pytorch"
    sys.path.append("/workspaces/all_atom_binder_diffusion/RFdiffusion")
    sys.path.append("/content/RFdiffusion")
    sys.path.append("/content/RoseTTAFold-All-Atom/rf2aa")
    sys.path.append("/content/rf_diffusion_all_atom")
    
import sys, os
## DevContainer setup
if "RFdiffusion" not in sys.path:
    os.environ["DGLBACKEND"] = "pytorch"
    sys.path.append("/workspaces/all_atom_binder_diffusion/RFdiffusion")
    sys.path.append("/workspaces/all_atom_binder_diffusion/RFdiffusion")
    sys.path.append("/workspaces/all_atom_binder_diffusion/RoseTTAFold-All-Atom/rf2aa")
    sys.path.append("/workspaces/all_atom_binder_diffusion/rf_diffusion_all_atom")


In [2]:
%cd rf_diffusion_all_atom

/workspaces/all_atom_binder_diffusion/rf_diffusion_all_atom


In [None]:
import subprocess
from google.colab import files
import os
import requests
import random
import string

def download_pdb(pdb_code, output_dir="/content/input"):
    """
    Download a PDB file given a PDB code.
    """
    url = f"https://files.rcsb.org/download/{pdb_code}.pdb"
    response = requests.get(url)
    if response.status_code == 200:
        os.makedirs(output_dir, exist_ok=True)
        pdb_path = os.path.join(output_dir, f"{pdb_code}.pdb")
        with open(pdb_path, 'w') as file:
            file.write(response.text)
        return pdb_path
    else:
        raise ValueError(f"Failed to download PDB file for {pdb_code}")

def handle_pdb_input(pdb_input_type, pdb_code=None, output_dir="/content/input"):
    """
    Handle PDB input by either uploading a file or downloading it using a PDB code.
    """
    if pdb_input_type == "upload":
        uploaded = files.upload()
        pdb_filename = next(iter(uploaded))
        pdb_path = os.path.join(output_dir, pdb_filename)
        with open(pdb_path, 'wb') as file:
            file.write(uploaded[pdb_filename])
        return pdb_path
    elif pdb_input_type == "pdb_code":
        return download_pdb(pdb_code, output_dir)
        
    else:
        raise ValueError("Invalid PDB input type")

def run_rfdiffusion_all_atom(input_pdb, contigs, contig_length, ligand, num_designs=1, design_startnum=0, output_prefix="output/ligand_protein_motif/sample", deterministic=True, T=200):
    """
    Wrapper function to run rfdiffusion all atom with specified options.
    """
    # Generate a unique output path to avoid overwriting
    unique_suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))
    output_prefix = f"{output_prefix}_{unique_suffix}"

    # Convert contigs list to string format
    contigs_str = ",".join([f"'{contig}'" for contig in contigs])
    # Here we should add the "']","\']" at the end of the last string
    contigs_str = contigs_str[:-1] + "\\'"
    print(contigs_str)
    # weights path
    weights_path = "./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt"
    cmd = [
        "python", "/workspaces/all_atom_binder_diffusion/rf_diffusion_all_atom/run_inference.py",
        f"inference.deterministic={str(deterministic).lower()}",
        f"diffuser.T={T}",
        f"inference.output_prefix={output_prefix}",
        f"inference.input_pdb={input_pdb}",
        #f"inference.input_pdb=/workspaces/all_atom_binder_diffusion/rf_diffusion_all_atom/input/1haz.pdb",
        f"contigmap.contigs=[\{contigs_str}]",
        f'contigmap.length="{contig_length}"',
        f"inference.ligand={ligand}",
        f"inference.num_designs={num_designs}",
        f"inference.design_startnum={design_startnum}",
        f"inference.ckpt_path={weights_path}"
    ]

    print(f"{' '.join(cmd)}")

    # Execute the command, handle the cases where it fails
    cmd = f"{' '.join(cmd)}"
    
    # Run the command with os.system
    os.system(cmd)
    

# Interface for specifying PDB input
#@title ### Small molecule binder design
pdb_input_type = "upload" #@param ["upload", "pdb_code", "manual_path"]
pdb_code = "1haz" #@param {type:"string"}

if pdb_input_type == "pdb_code":
    input_pdb = handle_pdb_input(pdb_input_type, pdb_code)
else:
    print("Please upload your PDB file:")
    # if colab is false
    colab = False
    if colab:
        input_pdb = handle_pdb_input(pdb_input_type)
    else:
        input_pdb = "./rf_diffusion_all_atom/input/1haz.pdb"
contigs = "10-120,A84-87,10-120" #@param {type:"string"}
contig_length = "150-150" #@param {type:"string"}
ligand = "CYC" #@param {type:"string"}
num_designs = 1 #@param {type:"integer"}
design_startnum = 0 #@param {type:"integer"}
output_prefix = "output/ligand_protein_motif/sample" #@param {type:"string"}
deterministic = True #@param {type:"boolean"}
T = 200 #@param {type:"integer"}

# Split contigs string into list
contigs_list = contigs.split(',')

# Call the wrapper function with the specified options
run_rfdiffusion_all_atom(input_pdb, contigs_list, contig_length, ligand, num_designs, design_startnum, output_prefix, deterministic, T)

In [6]:
import subprocess
from google.colab import files
import os
import requests
import random
import string

def download_pdb(pdb_code, output_dir="/content/input"):
    """
    Download a PDB file given a PDB code.
    """
    url = f"https://files.rcsb.org/download/{pdb_code}.pdb"
    response = requests.get(url)
    if response.status_code == 200:
        os.makedirs(output_dir, exist_ok=True)
        pdb_path = os.path.join(output_dir, f"{pdb_code}.pdb")
        with open(pdb_path, 'w') as file:
            file.write(response.text)
       
        return pdb_path
    else:
        raise ValueError(f"Failed to download PDB file for {pdb_code}")

def handle_pdb_input(pdb_input_type, pdb_code=None, output_dir="/content/input"):
    """
    Handle PDB input by either uploading a file or downloading it using a PDB code.
    """
    if pdb_input_type == "upload":
        uploaded = files.upload()
        pdb_filename = next(iter(uploaded))
        pdb_path = os.path.join(output_dir, pdb_filename)
        with open(pdb_path, 'wb') as file:
            file.write(uploaded[pdb_filename])
        return pdb_path
    elif pdb_input_type == "pdb_code":
        return download_pdb(pdb_code, output_dir)

    else:
        raise ValueError("Invalid PDB input type")

def run_rfdiffusion_all_atom(input_pdb, contigs, contig_length, ligand, num_designs=1, design_startnum=0, output_prefix="output/ligand_protein_motif/sample", deterministic=True, T=200):
    """
    Wrapper function to run rfdiffusion all atom with specified options.
    """
    # Generate a unique output path to avoid overwriting
    unique_suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))
    output_prefix = f"{output_prefix}_{unique_suffix}"

    # Convert contigs list to string format
    contigs_str = ",".join([f"'{contig}'" for contig in contigs])
    # Here we should add the "']","\']" at the end of the last string
    contigs_str = contigs_str[:-1] + "\\'"
    print(contigs_str)
    # weights path
    weights_path = "./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt"
    cmd = [
        "python", "./rf_diffusion_all_atom/run_inference.py",
        f"inference.deterministic={str(deterministic).lower()}",
        f"diffuser.T={T}",
        f"inference.output_prefix={output_prefix}",
        f"inference.input_pdb={input_pdb}",
        f"contigmap.contigs=[\{contigs_str}]",
        #f'contigmap.length="{contig_length}"',
        f"inference.ligand={ligand}",
        f"inference.num_designs={num_designs}",
        f"inference.design_startnum={design_startnum}",
        f"inference.ckpt_path={weights_path}"
    ]

    print(f"{' '.join(cmd)}")

    # Execute the command, handle the cases where it fails
    cmd = f"{' '.join(cmd)}"
    
    # Run the command with os.system
    os.system(cmd)
    
#@title ### Small molecule binder design with protein motif
# Interface for specifying PDB input
pdb_input_type = "pdb_code" #@param ["upload", "pdb_code", "manual_path"]
pdb_code = "7v11" #@param {type:"string"}

if pdb_input_type == "pdb_code":
    input_pdb = handle_pdb_input(pdb_input_type, pdb_code)
elif pdb_input_type == "upload":   
    print("Please upload your PDB file:")
    # if colab is false
    colab = False
    if colab:
        input_pdb = handle_pdb_input(pdb_input_type)
elif pdb_input_type == "manual_path":   
    input_pdb = pdb_code 

contigs = "150-150" #@param {type:"string"}
contig_length = "" #@param {type:"string"}
ligand = "OQO" #@param {type:"string"}
num_designs = 1 #@param {type:"integer"}
design_startnum = 0 #@param {type:"integer"}
output_prefix = "output/ligand_protein_motif/sample" #@param {type:"string"}
deterministic = True #@param {type:"boolean"}
T = 100 #@param {type:"integer"}

# Split contigs string into list
contigs_list = contigs.split(',')

# Call the wrapper function with the specified options
run_rfdiffusion_all_atom(input_pdb, contigs_list, contig_length, ligand, num_designs, design_startnum, output_prefix, deterministic, T)

'150-150\'
python ./rf_diffusion_all_atom/run_inference.py inference.deterministic=true diffuser.T=100 inference.output_prefix=output/ligand_protein_motif/sample_dey4v inference.input_pdb=/content/input/7v11.pdb contigmap.contigs=[\'150-150\'] inference.ligand=OQO inference.num_designs=1 inference.design_startnum=0 inference.ckpt_path=./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt




[2024-03-10 20:54:06,547][inference.model_runners][INFO] - Reading checkpoint from ./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt
loading ./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt
loaded ./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt
OVERRIDING: You are changing diffuser.T from the value this model was trained with.
[2024-03-10 20:54:11,567][inference.model_runners][INFO] - Loading checkpoint.
[2024-03-10 20:54:11,759][diffusion][INFO] - Using cached IGSO3.
[2024-03-10 20:54:11,774][__main__][INFO] - Making design output/ligand_protein_motif/sample_dey4v_0
[2024-03-10 20:54:11,774][__main__][INFO] - making design 0 of 0:1
[2024-03-10 20:54:11,774][inference.model_runners][INFO] - Using contig: ['150-150']
With this beta schedule (linear schedule, beta_0 = 0.02, beta_T = 0.14), alpha_bar_T = 0.0002225016796728596
Using cached chi_beta_T dictionary.
Done calculating chi_beta_T, chi_alphas_T, and chi_abars_T dictionaries.
[2024-03-10 20:54:15,675][inference.mo

In [16]:
import subprocess
from google.colab import files
import os
import requests
import random
import string
import yaml

def download_pdb(pdb_code, output_dir="/content/input"):
    """
    Download a PDB file given a PDB code.
    """
    url = f"https://files.rcsb.org/download/{pdb_code}.pdb"
    response = requests.get(url)
    if response.status_code == 200:
        os.makedirs(output_dir, exist_ok=True)
        pdb_path = os.path.join(output_dir, f"{pdb_code}.pdb")
        with open(pdb_path, 'w') as file:
            file.write(response.text)
       
        return pdb_path
    else:
        raise ValueError(f"Failed to download PDB file for {pdb_code}")

def handle_pdb_input(pdb_input_type, pdb_code=None, output_dir="/content/input"):
    """
    Handle PDB input by either uploading a file or downloading it using a PDB code.
    """
    if pdb_input_type == "upload":
        uploaded = files.upload()
        pdb_filename = next(iter(uploaded))
        pdb_path = os.path.join(output_dir, pdb_filename)
        with open(pdb_path, 'wb') as file:
            file.write(uploaded[pdb_filename])
        return pdb_path
    elif pdb_input_type == "pdb_code":
        return download_pdb(pdb_code, output_dir)

    else:
        raise ValueError("Invalid PDB input type")


def _run_rfdiffusion_all_atom(input_pdb, contigs, contig_length, ligand, num_designs=1, design_startnum=0, output_prefix="output/ligand_protein_motif/sample", deterministic=True, T=200):
    """
    Wrapper function to run rfdiffusion all atom with specified options, using a YAML configuration file.
    """
    # Generate a unique output path to avoid overwriting
    unique_suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))
    output_prefix_with_suffix = f"{output_prefix}_{unique_suffix}"

    # Convert contigs list to string format for YAML
    contigs_yaml = [f"{contig}" for contig in contigs]
    # If contig_length is not empty, if it is a empty string, it will be None
    contig_length = contig_length if contig_length else None
    # Define the configuration dictionary
    config = {
        "inference": {
            "deterministic": deterministic,
            "output_prefix": output_prefix_with_suffix,
            "input_pdb": input_pdb,
            "ligand": f'{ligand}',
            "num_designs": num_designs,
            "design_startnum": design_startnum,
            "ckpt_path": "./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt",
            "model_runner" : "NRBStyleSelfCond"
        },
        "diffuser": {
            "T": T
        },
        "contigmap": {
            "contigs": contigs_yaml,
            "length": contig_length
        },
        "model" : { "freeze_track_motif" : "True"},
        "defaults": ["aa"]
        
    }

    # Write the configuration to a YAML file
    config_filename = f"config_{unique_suffix}.yaml"
    with open(config_filename, 'w') as file:
        yaml.dump(config, file)

    # Construct the command to run the inference script with the YAML config file
    cmd = [
        "python", "./rf_diffusion_all_atom/run_inference.py",
        f"--config-name={config_filename}",
        "--config-dir=."
    ]

    print(f"Running command: {' '.join(cmd)}")

    # Execute the command
    subprocess.run(cmd)    
    
 
import datetime

def run_rfdiffusion_all_atom(input_pdb, contigs, contig_length, ligand, num_designs=1, design_startnum=0, output_prefix="output/ligand_protein_motif/sample", deterministic=True, T=200):
    """
    Wrapper function to run rfdiffusion all atom with specified options, using a YAML configuration file.
    """
    # Generate a unique directory name based on the current timestamp
    unique_dir_name = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    unique_output_dir = os.path.join(output_prefix, unique_dir_name)
    os.makedirs(unique_output_dir, exist_ok=True)

    # Update the output_prefix to include the unique directory
    output_prefix_with_suffix = os.path.join(unique_output_dir, "sample")

    # Convert contigs list to string format for YAML
    contigs_yaml = [f"{contig}" for contig in contigs]
    # If contig_length is not empty, if it is a empty string, it will be None
    contig_length = contig_length if contig_length else None
    # Define the configuration dictionary
    config = {
        "inference": {
            "deterministic": deterministic,
            "output_prefix": output_prefix_with_suffix,
            "input_pdb": input_pdb,
            "ligand": f'{ligand}',
            "num_designs": num_designs,
            "design_startnum": design_startnum,
            "ckpt_path": "./rf_diffusion_all_atom/RFDiffusionAA_paper_weights.pt",
            "model_runner" : "NRBStyleSelfCond"
        },
        "diffuser": {
            "T": T
        },
        "contigmap": {
            "contigs": contigs_yaml,
            "length": contig_length
        },
        "model" : { "freeze_track_motif" : "True"},
        "defaults": ["aa"]
        
    }

    # Write the configuration to a YAML file within the unique directory
    config_filename = os.path.join(unique_output_dir, "config.yaml")
    with open(config_filename, 'w') as file:
        yaml.dump(config, file)

    # Construct the command to run the inference script with the YAML config file
    cmd = [
        "python", "./rf_diffusion_all_atom/run_inference.py",
        f"--config-name={config_filename}",
        "--config-dir=."
    ]

    print(f"Running command: {' '.join(cmd)}")


    # Execute the command
    subprocess.run(cmd)

    
#@title ### Small molecule binder design with protein motif
# Interface for specifying PDB input
pdb_input_type = "pdb_code" #@param ["upload", "pdb_code", "manual_path"]
pdb_code = "7v11" #@param {type:"string"}

if pdb_input_type == "pdb_code":
    input_pdb = handle_pdb_input(pdb_input_type, pdb_code)
elif pdb_input_type == "upload":   
    print("Please upload your PDB file:")
    # if colab is false
    colab = False
    if colab:
        input_pdb = handle_pdb_input(pdb_input_type)
elif pdb_input_type == "manual_path":   
    input_pdb = pdb_code 

contigs = "150-150" #@param {type:"string"}
contig_length = "" #@param {type:"string"}
ligand = "OQO" #@param {type:"string"}
num_designs = 1 #@param {type:"integer"}
design_startnum = 0 #@param {type:"integer"}
output_prefix = "output/ligand_protein_motif/sample" #@param {type:"string"}
deterministic = True #@param {type:"boolean"}
T = 100 #@param {type:"integer"}

# Split contigs string into list
contigs_list = contigs.split(',')

# Call the wrapper function with the specified options
run_rfdiffusion_all_atom(input_pdb, contigs_list, contig_length, ligand, num_designs, design_startnum, output_prefix, deterministic, T)

Running command: python ./rf_diffusion_all_atom/run_inference.py --config-name=output/ligand_protein_motif/sample/20240310_212026/config.yaml --config-dir=.


In 'output/ligand_protein_motif/sample/20240310_212026/config.yaml': Could not load 'output/ligand_protein_motif/sample/20240310_212026/aa'.

Config search path:
	provider=hydra, path=pkg://hydra.conf
	provider=main, path=file:///workspaces/all_atom_binder_diffusion/rf_diffusion_all_atom/config/inference
	provider=command-line, path=file:///workspaces/all_atom_binder_diffusion
	provider=schema, path=structured://

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.


contigmap.contigs=[\'10-120,A84-87,10-120\']

In [None]:
#@title run **RFdiffusion** to generate a backbone
# - `contigs='E6-155:70-100'` `pdb='5KQV'` `hotspot='E64,E88,E96'`
name = "CyclicBinderTest6" #@param {type:"string"}
contigs = "12-12:E6-155" #@param {type:"string"}
pdb = "5KQV" #@param {type:"string"}
iterations = 50 #@param ["25", "50", "100", "150", "200"] {type:"raw"}
hotspot = "E64,E88,E96" #@param {type:"string"}
num_designs = 1 #@param ["1", "2", "4", "8", "16", "32"] {type:"raw"}
visual = "image" #@param ["none", "image", "interactive"]
#@markdown ---
#@markdown **symmetry** settings
#@markdown ---
symmetry = "none" #@param ["none", "auto", "cyclic", "dihedral"]
order = 1 #@param ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"] {type:"raw"}
chains = "" #@param {type:"string"}
add_potential = True #@param {type:"boolean"}
#@markdown - `symmetry='auto'` enables automatic symmetry dectection with [AnAnaS](https://team.inria.fr/nano-d/software/ananas/).
#@markdown - `chains="A,B"` filter PDB input to these chains (may help auto-symm detector)
#@markdown - `add_potential` to discourage clashes between chains
#@markdown ---
#@markdown **advanced** settings
#@markdown ---
partial_T = "auto" # @param ["auto", "10", "20", "40", "60", "80"]
#@markdown - specify number of noising steps (only used for the partial diffusion protocol)
use_beta_model = False #@param {type:"boolean"}
#@markdown - if you are seeing lots of helices, switch to the "beta" params for a better SSE balance.
cyclic_peptide = True
# determine where to save
path = name
while os.path.exists(f"outputs/{path}_0.pdb"):
  path = name + "_" + ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))

flags = {"contigs":contigs,
         "pdb":pdb,
         "order":order,
         "iterations":iterations,
         "symmetry":symmetry,
         "hotspot":hotspot,
         "path":path,
         "chains":chains,
         "add_potential":add_potential,
         "num_designs":num_designs,
         "use_beta_model":use_beta_model,
         "visual":visual,
         "partial_T":partial_T,
         "cyclic_peptide":cyclic_peptide}

for k,v in flags.items():
  if isinstance(v,str):
    flags[k] = v.replace("'","").replace('"','')

contigs, copies = run_diffusion(**flags)

mode: fixed
output: outputs/CyclicBinderTest6_yqu8s
contigs: ['12-12', 'E6-155']
/app/RFdiffusion/run_inference.py inference.output_prefix=outputs/CyclicBinderTest6_yqu8s inference.num_designs=1 inference.cyclic_peptide=True inference.input_pdb=outputs/CyclicBinderTest6_yqu8s/input.pdb diffuser.T=50 ppi.hotspot_res='[E64,E88,E96]' 'contigmap.contigs=[12-12 E6-155]' inference.dump_pdb=True inference.dump_pdb_path='/dev/shm'


VBox(children=(FloatProgress(value=0.0, bar_style='info', description='running', max=1.0), Output()))

In [None]:
# @title Display 3D structure {run: "auto"}
animate = "interactive"  # @param ["none", "movie", "interactive"]
color = "chain"  # @param ["rainbow", "chain", "plddt"]
denoise = True
dpi = 100  # @param ["100", "200", "400"] {type:"raw"}
from colabdesign.shared.plot import pymol_color_list
from colabdesign.rf.utils import get_ca, get_Ls, make_animation
from string import ascii_uppercase, ascii_lowercase

alphabet_list = list(ascii_uppercase + ascii_lowercase)


def plot_pdb(num=0):
    if denoise:
        pdb_traj = f"outputs/traj/{path}_{num}_pX0_traj.pdb"
    else:
        pdb_traj = f"outputs/traj/{path}_{num}_Xt-1_traj.pdb"
    if animate in ["none", "interactive"]:
        hbondCutoff = 4.0
        view = py3Dmol.view(js="https://3dmol.org/build/3Dmol.js")
        if animate == "interactive":
            pdb_str = open(pdb_traj, "r").read()
            view.addModelsAsFrames(pdb_str, "pdb", {"hbondCutoff": hbondCutoff})
        else:
            pdb = f"outputs/{path}_{num}.pdb"
            pdb_str = open(pdb, "r").read()
            view.addModel(pdb_str, "pdb", {"hbondCutoff": hbondCutoff})
        if color == "rainbow":
            view.setStyle({"cartoon": {"color": "spectrum"}})
        elif color == "chain":
            for n, chain, c in zip(
                range(len(contigs)), alphabet_list, pymol_color_list
            ):
                view.setStyle({"chain": chain}, {"cartoon": {"color": c}})
        else:
            view.setStyle(
                {
                    "cartoon": {
                        "colorscheme": {
                            "prop": "b",
                            "gradient": "roygb",
                            "min": 0.5,
                            "max": 0.9,
                        }
                    }
                }
            )
        view.zoomTo()
        if animate == "interactive":
            view.animate({"loop": "backAndForth"})
        view.show()
    else:
        Ls = get_Ls(contigs)
        xyz, bfact = get_ca(pdb_traj, get_bfact=True)
        xyz = xyz.reshape((-1, sum(Ls), 3))[::-1]
        bfact = bfact.reshape((-1, sum(Ls)))[::-1]
        if color == "chain":
            display(HTML(make_animation(xyz, Ls=Ls, dpi=dpi, ref=-1)))
        elif color == "rainbow":
            display(HTML(make_animation(xyz, dpi=dpi, ref=-1)))
        else:
            display(HTML(make_animation(xyz, plddt=bfact * 100, dpi=dpi, ref=-1)))


if num_designs > 1:
    output = widgets.Output()

    def on_change(change):
        if change["name"] == "value":
            with output:
                output.clear_output(wait=True)
                plot_pdb(change["new"])

    dropdown = widgets.Dropdown(
        options=[(f"{k}", k) for k in range(num_designs)],
        value=0,
        description="design:",
    )
    dropdown.observe(on_change)
    display(widgets.VBox([dropdown, output]))
    with output:
        plot_pdb(dropdown.value)
else:
    plot_pdb()

In [None]:
%%time
#@title run **ProteinMPNN** to generate a sequence and **AlphaFold** to validate
#@markdown ProteinMPNN Settings
num_seqs = 32 #@param ["1", "2", "4", "8", "16", "32", "64"] {type:"raw"}
mpnn_sampling_temp = 0.1 #@param ["0.0001", "0.1", "0.15", "0.2", "0.25", "0.3", "0.5", "1.0"] {type:"raw"}
rm_aa = "C" #@param {type:"string"}
use_solubleMPNN = False #@param {type:"boolean"}
#@markdown - `mpnn_sampling_temp` - control diversity of sampled sequences. (higher = more diverse).
#@markdown - `rm_aa='C'` - do not use [C]ysteines.
#@markdown - `use_solubleMPNN` - use weights trained only on soluble proteins. See [preprint](https://www.biorxiv.org/content/10.1101/2023.05.09.540044v2).
#@markdown
#@markdown AlphaFold Settings
initial_guess = False #@param {type:"boolean"}
#@markdown - soft initialization with desired coordinates, see [paper](https://www.nature.com/articles/s41467-023-38328-5).
num_recycles = 12 #@param ["0", "1", "2", "3", "6", "12"] {type:"raw"}
#@markdown - for **binder** design, we recommend `initial_guess=True num_recycles=3`
use_multimer = True #@param {type:"boolean"}
#@markdown - `use_multimer` - use AlphaFold Multimer v3 params for prediction.

if not os.path.isfile("/app/params/done.txt"):
  print("downloading AlphaFold params...")
  while not os.path.isfile("params/done.txt"):
    time.sleep(5)
sys.path.append("/app/params")
contigs_str = ":".join(contigs)
opts = [f"--pdb=outputs/{path}_0.pdb",
        f"--loc=outputs/{path}",
        f"--contig={contigs_str}",
        f"--copies={copies}",
        f"--num_seqs={num_seqs}",
        f"--num_recycles={num_recycles}",
        f"--rm_aa={rm_aa}",
        f"--mpnn_sampling_temp={mpnn_sampling_temp}",
        f"--num_designs={num_designs}"]
if initial_guess: opts.append("--initial_guess")
if use_multimer: opts.append("--use_multimer")
if use_solubleMPNN: opts.append("--use_soluble")
if cyclic_peptide: opts.append("--cyclic_peptide")
opts = ' '.join(opts)
print(opts)
!python ../../colabdesign/rf/designability_test.py {opts}

In [None]:
# @title Display best result
import py3Dmol


def plot_pdb(num="best"):
    if num == "best":
        with open(f"outputs/{path}/best.pdb", "r") as f:
            # REMARK 001 design {m} N {n} RMSD {rmsd}
            info = f.readline().strip("\n").split()
        num = info[3]
    hbondCutoff = 4.0
    view = py3Dmol.view(js="https://3dmol.org/build/3Dmol.js")
    pdb_str = open(f"outputs/{path}_{num}.pdb", "r").read()
    view.addModel(pdb_str, "pdb", {"hbondCutoff": hbondCutoff})
    pdb_str = open(f"outputs/{path}/best_design{num}.pdb", "r").read()
    view.addModel(pdb_str, "pdb", {"hbondCutoff": hbondCutoff})

    view.setStyle(
        {"model": 0}, {"cartoon": {}}
    )  #: {'colorscheme': {'prop':'b','gradient': 'roygb','min':0,'max':100}}})
    view.setStyle(
        {"model": 1},
        {
            "cartoon": {
                "colorscheme": {"prop": "b", "gradient": "roygb", "min": 0, "max": 100}
            }
        },
    )
    view.zoomTo()
    view.show()


if num_designs > 1:

    def on_change(change):
        if change["name"] == "value":
            with output:
                output.clear_output(wait=True)
                plot_pdb(change["new"])

    dropdown = widgets.Dropdown(
        options=["best"] + [str(k) for k in range(num_designs)],
        value="best",
        description="design:",
    )
    dropdown.observe(on_change)
    output = widgets.Output()
    display(widgets.VBox([dropdown, output]))
    with output:
        plot_pdb(dropdown.value)
else:
    plot_pdb()

In [None]:
#@title Package and download results
#@markdown If you are having issues downloading the result archive,
#@markdown try disabling your adblocker and run this cell again.
#@markdown  If that fails click on the little folder icon to the
#@markdown  left, navigate to file: `name.result.zip`,
#@markdown  right-click and select \"Download\"
#@markdown (see [screenshot](https://pbs.twimg.com/media/E6wRW2lWUAEOuoe?format=jpg&name=small)).
!zip -r {path}.result.zip outputs/{path}* outputs/traj/{path}*
files.download(f"{path}.result.zip")

**Instructions**
---
---

Use `contigs` to define continious chains. Use a `:` to define multiple contigs and a `/` to define mutliple segments within a contig.
For example:

**unconditional**
- `contigs='100'` - diffuse **monomer** of length 100
- `contigs='50:100'` - diffuse **hetero-oligomer** of lengths 50 and 100
- `contigs='50'` `symmetry='cyclic'` `order=2` - make two copies of the defined contig(s) and add a symmetry constraint, for **homo-oligomeric** diffusion.

**binder design**
- `contigs='A:50'` `pdb='4N5T'` - diffuse a **binder** of length 50 to chain A of defined PDB.
- `contigs='E6-155:70-100'` `pdb='5KQV'` `hotspot='E64,E88,E96'` - diffuse a **binder** of length 70 to 100 (sampled randomly) to chain E and defined hotspot(s).

**motif scaffolding**
 - `contigs='40/A163-181/40'` `pdb='5TPN'`
 - `contigs='A3-30/36/A33-68'` `pdb='6MRR'` - diffuse a loop of length 36 between two segments of defined PDB ranges.

**partial diffusion**
- `contigs=''` `pdb='6MRR'` - noise all coordinates
- `contigs='A1-10'` `pdb='6MRR'` - keep first 10 positions fixed, noise the rest
- `contigs='A'` `pdb='1SSC'` - fix chain A, noise the rest

*hints and tips*
- `pdb=''` leave blank to get an upload prompt
- `contigs='50-100'` use dash to specify a range of lengths to sample from