# Introduction

Welcome to the 2021 UCSF Biophysics Onboarding tutorial on protein modelling with pyRosetta! The contents of this tutorial were inspired by a pyRosetta tutorial produced on YouTube by Professor Sari Sabban of King Abdulaziz University.


Before getting started here, you should make sure that you have activated the pyRosetta conda environment and opened jupyter notebook with the following command:

`sudo jupyter notebook --allow-root`

Let's begin by importing pyRosetta into this Python session and initializing the module.

In [1]:
# load some native Python modules
import os
# load pyRosetta module
from pyrosetta import *
from pyrosetta.toolbox import *
# initialize pyRosetta state (e.g. random number generator seed)
init()

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python36.Release 2021.33+release.21c4761a87a1193dca5c6c2e1047681a200715d4 2021-08-14T17:47:22] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python36.Release r293 2021.33+release.21c4761a87a 21c4761a87a1193dca5c6c2e1047681a200715d4 http://www.pyrosetta.org 2021-08-14T17:47:22
[0mcore.init: {0} [0mcommand: PyRosetta -ex1 -ex2aro -database /opt/anaconda3/envs/pyrosetta_onboarding/lib/python3.6/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=-327240177 seed_offset=0 real_seed=-327240177 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Norma

### pyRosetta Poses from the RCSB and from PDB Files

pyRosetta is a useful tool for modelling and manipulating protein structures, so in order to make use of it we will need to load in a protein! For this exercise we will use the <i>de novo</i> designed protein Co-LOCKR from Lajoie et al. (2020), which exhibits switching behavior between two states. The authors have deposited the structure to the RCSB (the U.S.-based host of the Protein Data Bank) with the four-letter accession code 7JH5. We will load it using the pyRosetta function `pose_from_rcsb`.

In [2]:
pose = pose_from_rcsb('7jh5')

[0mcore.chemical.GlobalResidueTypeSet: {0} [0mFinished initializing fa_standard residue type set.  Created 983 residue types
[0mcore.chemical.GlobalResidueTypeSet: {0} [0mTotal time to initialize 1.1212 seconds.
[0mcore.import_pose.import_pose: {0} [0mFile '7JH5.clean.pdb' automatically determined to be of type PDB






[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 1 because of missing atom number 6 atom name  OG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 3 because of missing atom number 6 atom name  OG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 4 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 7 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 52 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 53 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 59 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 64 because of missing atom number 8 atom name  CE
[0mcore.pack.pack_missing_sidechains: {0} 

[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 143 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 151 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 158 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 162 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 163 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 170 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 176 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 178 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidec

[0mcore.pack.task: {0} [0mPacker task: initialize from command line()
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
[0mcore.scoring.etable: {0} [0mStarting energy table calculation
[0mcore.scoring.etable: {0} [0msmooth_etable: changing atr/rep split to bottom of energy well
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing lj etables (maxdis = 6)
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing solvation etables (max_dis = 6)
[0mcore.scoring.etable: {0} [0mFinished calculating energy tables.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref201

You will notice that the above code returned quite a good deal of output.  This is because, in addition to downloading the structure from the RCSB in PDB format (creating the file `7JH5.pdb` in the working directory), it performed a series of operations to "clean" the PDB file, such as removing solvent and other non-protein atoms (creating the file `7JH5.clean.pdb`) and inferring the sidechain conformations of some residues which had missing sidechain information in the initial PDB file. This latter issue is why you can see mention of "packer tasks" and the "Dunbrack rotamer library." pyRosetta is scanning through a library of rotamers, or statistically-preferred sidechain conformers, to determine the set that leads to ideal packing of the protein fold in the regions where sidechains need to be inferred.

The appearance of these additional PDB files can be noted using `os.listdir` on the current working directory.

In [3]:
os.listdir('.')

['poseA_relaxed.pdb',
 'Rosetta_Intro.ipynb',
 '7JH5.pdb',
 '7JH5.clean.pdb',
 '.ipynb_checkpoints',
 'poseA.pdb']

pyRosetta can also read in protein structures from PDB files directly, which can be useful when dealing with output from other software such as PyMOL and AlphaFold. An example of this follows:

In [4]:
pose = pose_from_pdb('7JH5.clean.pdb')

[0mcore.import_pose.import_pose: {0} [0mFile '7JH5.clean.pdb' automatically determined to be of type PDB






[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 1 because of missing atom number 6 atom name  OG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 3 because of missing atom number 6 atom name  OG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 4 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 7 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 52 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 53 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 59 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 64 because of missing atom number 8 atom name  CE
[0mcore.pack.pack_missing_sidechains: {0} 

[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 158 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 162 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 163 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 170 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 176 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 178 because of missing atom number 7 atom name  CD
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 189 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: {0} [0mpacking residue number 192 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidec

[0mcore.pack.pack_rotamers: {0} [0mRequesting all available threads for interaction graph computation.
[0mcore.pack.interaction_graph.interaction_graph_factory: {0} [0mInstantiating DensePDInteractionGraph
[0mcore.pack.rotamer_set.RotamerSets: {0} [0mCompleted interaction graph pre-calculation in 1 available threads (1 had been requested).


### Pose from Sequence

pyRosetta also supports input of protein poses from sequence, although the pose object returned by this input function will not be initialized with atomic coordinates.  These will have to be inferred at a later stage.  Here we will load a pose from the sequence of a single "key" helix designed to pair with the Co-LOCKR protein.

In [5]:
seq_key = 'SGGSDEARKAIARVKRESKRIVEDAERLIREAAAASEKISREAERLIRGG'
pose_key = pose_from_sequence(seq_key)

### Sequence from Pose

We know now how to get a pose from a sequence, but pyRosetta also lets us get a sequence from a pose. Let's get the sequence of that original pose we read in from the RCSB.

In [6]:
pose.sequence()

'SGSELARKLLEASTKLQRLNIRLAEALLEAIARLQELNLELVYLAVELTDPKRIRDEIKEVKDKSKEIIRRAEKEIDDAAKESEKILEEAREAISGSGSELAKLLLKAIAETQDLNLRAAKAFLEAAAKLQELNIRAVELLVKLTDPATIREALEHAKRRSKEIIDEAERAIRAAKRESERIIEEARRLIEKGSELARELLRAHAQLQRLNLELLRELLRALAQLQELNLDLLRLASELTDPDEARKAIARVKRESNAYYADAERLIREAAAASEKISREAERLILARKLLEASTKLQRLNIRLAEALLEAIARLQELNLELVYLAVELTDPKRIRDEIKEVKDKSKEIIRRAEKEIDDAAKESEKILEEAREAISGSGSELAKLLLKAIAETQDLNLRAAKAFLEAAAKLQELNIRAVELLVKLTDPATIREALEHAKRRSKEIIDEAERAIRAAKRESERIIEEARRLIESELARELLRAHAQLQRLNLELLRELLRALAQLQELNLDLLRLASELTDPDEARKAIARVKRESNAYYADAERLIREAAAASEKISREAERLI'

### Residue Indexing

Let's now get some information about a particular residue in the protein, including its name (amino acid three-letter code) and its chain (indexed by a capital letter, 'A' or 'B' for 7JH5).

In [7]:
print('total_residues :', pose.total_residue())

resn = 42 # feel free to change this value and execute the next three cells as many times as you'd like

assert resn >= 1 and resn <= pose.total_residue() # determine whether resn falls within its range
                                                  # note that pyRosetta indexes from 1 here, not from 0

res = pose.residue(resn).name()
chain = pose.pdb_info().chain(resn)

print('residue :', '\t', res)
print('chain :', '\t', chain)

total_residues : 564
residue : 	 VAL
chain : 	 A


pyRosetta assigns integer indices to each residue in the protein structure that ascend in unbroken succession from the N-terminus of the chain in which the residue can be found. These indices reset to 0 for each chain. These often do not agree with the residue indices in PDB files, which may account for residues that are not resolved in or have been truncated from the structure. To go between pyRosetta indices and Python indices, the following methods are used:

In [8]:
idx = pose.pdb_info().number(resn)
py_idx = pose.pdb_info().pdb2pose(chain, idx)

print(idx, py_idx)

assert py_idx == resn # ensure that calculation of pynum has taken us full circle

40 42


### Residue Dihedral Angles

It is also easy to check the values (in degrees) of the phi, psi, and omega backbone dihedrals for a given residue. (Recall that omega, the dihedral within the peptide moiety, is almost always close to 180 degrees.)

In [9]:
phi = pose.phi(resn)
psi = pose.psi(resn)
omega = pose.omega(resn)
print('phi :', phi)
print('psi :', psi)
print('omega :', omega)

phi : -67.49233070639391
psi : -38.348442514645946
omega : 178.13342184026885


### Number of Chains

We can check the number of chains in a pose object, as well as split one into a list of poses, one for each chain.

In [10]:
num_chains = pose.num_chains()
print('number of chains :', num_chains)

poseA, poseB = pose.split_by_chain()

number of chains : 2


### Secondary Structure

It can be useful to think about the secondary structure of a protein (e.g. alpha helices and beta strands). pyRosetta has a simple built-in method for calculating secondary structure for each residue on the basis of its backbone phi and psi angles. In this scheme, 'H' denotes a helical residue, 'E' a residue in a beta strand or sheet, and 'L' a residue in an unstructured loop. It can be seen that the chains of 7JH5 each consist of a set of six looped helices.

In [11]:
for chain_pose in [poseA, poseB]:
    chain_pose.display_secstruct()

1       9       17      25      33      41      49      57      65      73      
SGSELARKLLEASTKLQRLNIRLAEALLEAIARLQELNLELVYLAVELTDPKRIRDEIKEVKDKSKEIIRRAEKEIDDAA
LHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHLLLHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH


81      89      97      105     113     121     129     137     145     153     
KESEKILEEAREAISGSGSELAKLLLKAIAETQDLNLRAAKAFLEAAAKLQELNIRAVELLVKLTDPATIREALEHAKRR
HHHHHHHHHHHHHHHLLLLHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHLLLLLHHHHHHHHHHHHHH


161     169     177     185     193     201     209     217     225     233     
SKEIIDEAERAIRAAKRESERIIEEARRLIEKGSELARELLRAHAQLQRLNLELLRELLRALAQLQELNLDLLRLASELT
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHLLLLHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHLL


241     249     257     265     273     281     
DPDEARKAIARVKRESNAYYADAERLIREAAAASEKISREAERLI
LHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHL


1       9       17      25      33      41      49      57      65      73      
LARKLLEASTKLQRLNIRLAEALLEAIARLQELNLELVYLA

# Viewing, Scoring, and Altering Poses

### Viewing Poses with PyMOL

PyMOL, like Chimera, is software for viewing the conformations of biomolecular systems such as proteins. The developers of pyRosetta have enabled pyRosetta and PyMOL to be linked together. After adding the correct lines to your .pymolrc file and starting PyMOL in another terminal window, the following will enable you to visualize your structures from a pyRosetta session:

In [12]:
pymover = PyMOLMover()
pymover.apply(poseA)

### Scoring the Conformation

Central to Rosetta is its score function, which serves as a (very rough) approximation of the free energy (up to a constant) of a folded protein. The most commonly used (and default) score function is `ref2015`. This function can be loaded in as the "full-atom" score function and the weights of each term can be visualized by printing the score function object:

In [13]:
scorefxn = get_fa_scorefxn()
print(scorefxn)

[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
ScoreFunction::show():
weights: (fa_atr 1) (fa_rep 0.55) (fa_sol 1) (fa_intra_rep 0.005) (fa_intra_sol_xover4 1) (lk_ball_wtd 1) (fa_elec 1) (pro_close 1.25) (hbond_sr_bb 1) (hbond_lr_bb 1) (hbond_bb_sc 1) (hbond_sc 1) (dslf_fa13 1.25) (omega 0.4) (fa_dun 0.7) (p_aa_pp 0.6) (yhh_planarity 0.625) (ref 1) (rama_prepro 0.45)
energy_method_options: EnergyMethodOptions::show: aa_composition_setup_files: 
EnergyMethodOptions::show: mhc_epitope_setup_files: 
EnergyMethodOptions::show: netcharge_setup_files: 
EnergyMethodOptions::show: aspartimide_penalty_value: 25
EnergyMethodOptions::show: etable_type: FA_STANDARD_DEFAULT
analytic_etable_evaluation: 1
EnergyMethodOptions::show: method_weights: ref 1.32468 3.25479 -2.14574 -2.72453 1.21829 0.79816 -0.30065 2.30374 -0.71458 1.66147 1.65735 -1.34026 -1.64321 -1.45095 -0.09474 -0.28969 1.15175 2.64269 2.26099 0.58223
EnergyMethodOptions::show: method_weights: free_res

This score function behaves as a standard Python function, which takes a pose as input and returns a single float: the score of that pose.

In [14]:
score = scorefxn(poseA)
print('score =', score)

score = 52.11623257281714


We can also see a more detailed breakdown of the various contributions to the score.

In [15]:
scorefxn.show(pose)

[0mcore.scoring.ScoreFunction: {0} [0m
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 fa_atr                       1.000   -3424.573   -3424.573
 fa_rep                       0.550     678.439     373.141
 fa_sol                       1.000    2555.199    2555.199
 fa_intra_rep                 0.005    1311.184       6.556
 fa_intra_sol_xover4          1.000     144.810     144.810
 lk_ball_wtd                  1.000     -82.772     -82.772
 fa_elec                      1.000    -895.657    -895.657
 pro_close                    1.250      14.688      18.360
 hbond_sr_bb                  1.000    -479.242    -479.242
 hbond_lr_bb                  1.000       0.000       0.000
 hbond_bb_sc                  1.000      -5.434      -5.434
 hbond_sc                     1.000     -71.111     -71.111
 dslf_fa13                    1.250       0.000       0.

### Altering Dihedral Angles

pyRosetta is one of the only tools that allows you to alter the conformation of a protein one dihedral angle at a time, by swinging the remainder of the protein on the C-terminal side of a dihedral angle around the axis of the dihedral angle under consideration. An example follows:

In [16]:
py_idx = pose.pdb_info().pdb2pose('A', 242) # residue 242 is the central residue of the last loop in 7JH5
phi = pose.phi(py_idx)

poseA.set_phi(py_idx, phi - 90)

new_score = scorefxn(poseA)
print('score =', new_score)

pymover.apply(poseA)

score = 221.51467413477562


You will notice the score went up considerably after this dihedral was changed. This is to be expected, since a large amount of hydrophobic surface area was unburied when the helix swung upwards.

### Per-Residue Scores

It is possible to see the (unweighted) contributions of each term to a particular residue's score.

In [17]:
poseA.energies().show(py_idx)

[0mcore.scoring.Energies: {0} [0mE               fa_atr        fa_rep        fa_sol  fa_intra_repfa_intra_sol_x   lk_ball_wtd       fa_elec     pro_close   hbond_sr_bb   hbond_lr_bb   hbond_bb_sc      hbond_sc     dslf_fa13         omega        fa_dun       p_aa_pp yhh_planarity           ref   rama_prepro
[0mcore.scoring.Energies: {0} [0mE(i) 240         -1.91         65.55          1.49          1.37          0.07         -0.04          0.50          0.00          0.00          0.00          0.00          0.00          0.00          0.01          2.13          1.11          0.00          1.15          6.04


### Mutating Residues

pyRosetta makes it easy to mutate one residue to another. A subsequent round of repacking (that accompanies the action of mutation) helps to accommodate the new residue in a low-energy conformation.

In [18]:
print('score =', scorefxn(poseA))
print()
print('sequence :', poseA.sequence()[py_idx-5:py_idx+5])
print()

mutate_residue(poseA, py_idx, 'Y')

print()
print('score =', scorefxn(poseA))
print()
print('sequence :', poseA.sequence()[py_idx-5:py_idx+5])

pymover.apply(poseA) # visualize residue 242 in PyMOL again; it is now a tyrosine

score = 221.51467413477562

sequence : ASELTDPDEA

[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
[0mcore.pack.task: {0} [0mPacker task: initialize from command line()
[0mcore.pack.pack_rotamers: {0} [0mbuilt 4 rotamers at 1 positions.
[0mcore.pack.pack_rotamers: {0} [0mRequesting all available threads for interaction graph computation.
[0mcore.pack.interaction_graph.interaction_graph_factory: {0} [0mInstantiating PDInteractionGraph
[0mcore.pack.rotamer_set.RotamerSets: {0} [0mCompleted interaction graph pre-calculation in 1 available threads (1 had been requested).

score = 146.27684396629286

sequence : ASELYDPDEA


We notice that the energy contribution from this residue has become a bit more favorable with this mutation. This change is not enough, though, to offset the vast energy increase associated with rotating the alpha helix upward.

### Exporting a Structure

An altered structure can be exported to a PDB file, which makes pyRosetta quite useful as a tool for manipulating PDB files.

In [19]:
poseA.dump_pdb('poseA.pdb')

True

# Relaxing a Pose

### The FastRelax Mover

Though the score has increased considerably, we will see if we can change the conformation of some of the sidechains in order to bring the score part of the way back down. In other words, we want to find a low score (free energy) configuration of the protein, subject to the backbone we have imposed upon it. This is done as follows, using a class called a "mover" to make changes to the structure (specifically we will utilize a mover that carries out the "FastRelax" protocol):

In [20]:
relax = rosetta.protocols.relax.FastRelax()

[0mprotocols.relax.RelaxScriptManager: {0} [0mReading relax scripts list from database.
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
[0mprotocols.relax.RelaxScriptManager: {0} [0mLooking for MonomerRelax2019.txt
[0mprotocols.relax.RelaxScriptManager: {0} [0mrepeat %%nrepeats%%
[0mprotocols.relax.RelaxScriptManager: {0} [0mcoord_cst_weight 1.0
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.040
[0mprotocols.relax.RelaxScriptManager: {0} [0mrepack
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.051
[0mprotocols.relax.RelaxScriptManager: {0} [0mmin 0.01
[0mprotocols.relax.RelaxScriptManager: {0} [0mcoord_cst_weight 0.5
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.265
[0mprotocols.relax.RelaxScriptManager: {0} [0mrepack
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.280
[0mprotocols.relax.RelaxScriptManager: {0} [0mmin 0.01
[0mprotocols.relax.RelaxScriptManager: {0} [0mcoor

Hypothetically we now would apply the mover to the pose in order to relax it, then see the improvement to the score that results. Since this takes some time, I have done this on my own device and included the result in the Github repository.

In [21]:
# relax.apply(poseA)
# poseA.dump('poseA_relaxed.pdb')

poseA = pose_from_pdb('poseA_relaxed.pdb')

print()
print('score =', scorefxn(poseA))

pymover.apply(poseA)

[0mcore.import_pose.import_pose: {0} [0mFile 'poseA_relaxed.pdb' automatically determined to be of type PDB

score = -988.5384984918431


We note that the score has considerably improved, and upon viewing the pose in PyMOL, we find that the hinge has again closed, assuming a pose similar to that in the RCSB.

# Protein Modelling with AlphaFold2

We will now turn to some protein modelling using AlphaFold2, a deep learning-based protein folding algorithm that offers the best-quality folding models available for most proteins. To do so, we will work with the ColabFold notebook, developed by Milot Mirdita, Sergey Ovchinnikov, and Martin Steinegger from the original Google Colab notebook released by DeepMind (the developers for AlphaFold2). We will begin by again printing out the sequence for Co-LOCKR:

In [24]:
poseA.sequence()

'SGSELARKLLEASTKLQRLNIRLAEALLEAIARLQELNLELVYLAVELTDPKRIRDEIKEVKDKSKEIIRRAEKEIDDAAKESEKILEEAREAISGSGSELAKLLLKAIAETQDLNLRAAKAFLEAAAKLQELNIRAVELLVKLTDPATIREALEHAKRRSKEIIDEAERAIRAAKRESERIIEEARRLIEKGSELARELLRAHAQLQRLNLELLRELLRALAQLQELNLDLLRLASELYDPDEARKAIARVKRESNAYYADAERLIREAAAASEKISREAERLI'

We can fold this sequence at the standard ColabFold Notebook, found at:
https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb

We will also print out the sequence for another modelling target: Co-LOCKR without its C-terminal helix, but in the presence of the key helix. This will be a complex of two non-identical proteins, or a hetero-complex. Recall that we stored the integer index of the loop residue between the fifth and sixth helices as `py_idx`, and so we can acquire the sequence of the first five helices by slicing up to this value. We also stored the sequence of the key as `seq_key`. For hetero-complex modelling ColabFold takes as input the non-contiguous protein sequences separated by a colon. Let's make this sequence:

In [25]:
seq_hetero = poseA.sequence()[:py_idx] + ':' + seq_key
print(seq_hetero)

SGSELARKLLEASTKLQRLNIRLAEALLEAIARLQELNLELVYLAVELTDPKRIRDEIKEVKDKSKEIIRRAEKEIDDAAKESEKILEEAREAISGSGSELAKLLLKAIAETQDLNLRAAKAFLEAAAKLQELNIRAVELLVKLTDPATIREALEHAKRRSKEIIDEAERAIRAAKRESERIIEEARRLIEKGSELARELLRAHAQLQRLNLELLRELLRALAQLQELNLDLLRLASELY:SGGSDEARKAIARVKRESKRIVEDAERLIREAAAASEKISREAERLIRGG


We can fold this sequence at the advanced ColabFold Notebook, found at:
https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb

# Working with C++ Rosetta

Hailey will now walk you through some of the function of Rosetta using the original C++ build of Rosetta on Wynton.