# Mover Lab
In this lab, you will learn to use Movers to manipulate poses. 

In [1]:
import pyrosetta

pyrosetta.init()

PyRosetta-4 2020 [Rosetta PyRosetta4.conda.mac.python37.Release 2020.02+release.22ef835b4a2647af94fcd6421a85720f07eddf12 2020-01-05T17:31:56] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.python37.Release r242 2020.02+release.22ef835b4a2 22ef835b4a2647af94fcd6421a85720f07eddf12 http://www.pyrosetta.org 2020-01-05T17:31:56
[0mcore.init: {0} [0mcommand: PyRosetta -ex1 -ex2aro -database /Users/paul/anaconda3/envs/pyrosetta_env/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=-487033356 seed_offset=0 real_seed=-487033356 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Normal mode, seed=-487033356 RG_type=mt19937


Let's load the structure 2WPT, which is a complex of protein Im2 and colicin E9 DNase. Researchers have introduced various mutations to the interface to study the changes of binding free energy.

In [2]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('2wpt.pdb')

[0mcore.chemical.GlobalResidueTypeSet: {0} [0mFinished initializing fa_standard residue type set.  Created 980 residue types
[0mcore.chemical.GlobalResidueTypeSet: {0} [0mTotal time to initialize 1.21869 seconds.
[0mcore.import_pose.import_pose: {0} [0mFile '2wpt.pdb' automatically determined to be of type PDB
[0mcore.chemical.GlobalResidueTypeSet: {0} [0mLoading (but possibly not actually using) 'GOL' from the PDB components dictionary for residue type 'pdb_GOL'
[0mcore.chemical.GlobalResidueTypeSet: {0} [0mLoading (but possibly not actually using) 'NO3' from the PDB components dictionary for residue type 'pdb_NO3'


Open a PyMol window. Initialize a PyMol mover and let it send the pose to a PyMol session. As its name suggested, the PyMOLMover is a mover because it is derived from the Mover class. However, it is a special one, since it does not change the pose, but send the pose to PyMol. In PyMol, if you color the structure by chains, you can see that there are two proteins.

In [3]:
pmm = pyrosetta.PyMOLMover()
pmm.apply(pose)

## Backbone movers
Let's try to modify the protein backbone. The simplest way to sample backbone conformations is introducing random perturbations. The SmallMover makes small independent random perturbations of the phi and psi torsion angles of random residues. It uses the rama score to ensure that only favorable backbone torsion angles are being selected. Let's initialize a SmallMover and let it introduce 10 random perturbations.

In [4]:
small_mover = pyrosetta.rosetta.protocols.simple_moves.SmallMover()
small_mover.nmoves(10)
small_mover.apply(pose)

pmm.pymol_name('small_moved')
pmm.apply(pose)

[0mcore.scoring.ramachandran: {0} [0mshapovalov_lib::shap_rama_smooth_level of 4( aka highest_smooth ) got activated.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/rama/shapovalov/kappa25/all.ramaProb
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/rama/flat/avg_L_rama.dat
[0mcore.scoring.ramachandran: {0} [0mReading custom Ramachandran table from scoring/score_functions/rama/flat/avg_L_rama.dat.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/rama/flat/sym_all_rama.dat
[0mcore.scoring.ramachandran: {0} [0mReading custom Ramachandran table from scoring/score_functions/rama/flat/sym_all_rama.dat.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/rama/flat/sym_G_rama.dat
[0mcore.scoring.ramachandran: {0} [0mReading custom Ramachandran table from scoring/score_functions/rama/flat/sym_G_rama.dat.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_funct

In PyMol, compare the structures before and after perturbation. Do you find anything weird? Yes, the C-terminus changes much more than the N-terminus. This is called the lever effect in backbone sampling. The change at a residue will propagate to all its downstream residues. Because of the lever-arm effect, backbone perturbations are not local and bad contacts can be easily introduced.

The ShearMover deals with the lever effect. Instead of independently sampling backbone torsions, it changes torsions of two consecutive residues together in a way that the downstream lever effect is reduced. Let's import a fresh pose, initialize a ShearMover and let it introduce 100 perturbations.

In [5]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('2wpt.pdb')
shear_mover = pyrosetta.rosetta.protocols.simple_moves.ShearMover()
shear_mover.nmoves(100)
shear_mover.apply(pose)

pmm.pymol_name('shear_moved')
pmm.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile '2wpt.pdb' automatically determined to be of type PDB


Now you should see that the lever-arm effect is reduced, but not completely gone. 

"Backrub" is one method to realize true local sampling. The trade off is that backbone bond angles are changed slightly. Initialize a BackrubMover and apply 100 times.

In [6]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('2wpt.pdb')
br_mover = pyrosetta.rosetta.protocols.backrub.BackrubMover()
for i in range(100):
    br_mover.apply(pose)

pmm.pymol_name('backrub_moved')
pmm.apply(pose)

[0mcore.import_pose.import_pose: {0} [0mFile '2wpt.pdb' automatically determined to be of type PDB
[0mcore.mm.MMBondAngleLibrary: {0} [0mMM bond angle sets added fully assigned: 603; wildcard: 0 and 1 virtual parameter.
[0mbasic.io.database: {0} [0mDatabase file opened: sampling/branch_angle/branch_angle_1.txt
[0mbasic.io.database: {0} [0mDatabase file opened: sampling/branch_angle/branch_angle_2.txt
[0mprotocols.backrub.BackrubMover: {0} [0mSegment lengths: 3-34 atoms
[0mprotocols.backrub.BackrubMover: {0} [0mMain chain pivot atoms: CA
[0mprotocols.backrub.BackrubMover: {0} [0mAdding backrub segments for residues 1-200
[0mprotocols.backrub.BackrubMover: {0} [0mTotal Segments Added: 1778


Now you can see that the perturbations are evenly distributed throughout the structure.

## Mutate residues
Protein designers constantly explore conformation and sequence spaces of proteins. You already learned methods to sample the backbone conformation space, now it's time to consider introducing mutations.

A previous study showed that the N34V R38T mutations on chain A lowers binding free energy by -2.60 kcal/mol. Let's introduce these two mutations to our structure. Again, import a fresh pose.

In [7]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('2wpt.pdb')

[0mcore.import_pose.import_pose: {0} [0mFile '2wpt.pdb' automatically determined to be of type PDB


In Rosetta, residues in a pose are numbered from 1 to N which is the total number of residues. This indexing system is different from what you see from a PDB file. For example, the first lysine in our structure has Rosetta index 1 but its pdb index is A4. In order to introduce mutations, we need to first figure out the Rosetta indices of the residues of our interest. As we have done before, we will turn to the PDBInfo object attached to a pose.

In [8]:
print(pose.pdb_info().pdb2pose('A', 34))
print(pose.pdb_info().pdb2pose('A', 38))

31
35


Use the MutateResidue mover to introduce mutations N34V R38T.

In [9]:
mutater = pyrosetta.rosetta.protocols.simple_moves.MutateResidue()

mutater.set_target(31)
mutater.set_res_name('VAL')
mutater.apply(pose)

mutater.set_target(35)
mutater.set_res_name('THR')
mutater.apply(pose)

pmm.pymol_name('mutated')
pmm.apply(pose)

Now you should be able to see these mutations in PyMol. Now you learned movers that can help you expore the backbone and sequence spaces. You may have realized that the side chain conformations, which are very important, are not sampled. Side chain sampling will be covered in later labs.

## Exercises
1. Use the functions you learned from the previous lecture to score the poses before and after mutation. What is the change of the score value? Does it match the experimentally measured -2.60 kcal/mol? What score terms change significantly? What 10 residues' scores change the most? Do their changes make sense?

(Hint: it is possible to take the difference between two `EMapVector`s, but currently the functionality is half broken. If you have `emap1` and `emap2`, you can calculate the difference as follows:
```
diff_emap = EnergyMap(emap1)
temp_emap = diff_emap # create a reference to the same object
temp_emap -= emap2
print(temp_emap) # temp_emap is now None. This is the "half broken" part
print(diff_emap) # diff_emap has been modified
```

As of 5/27/2019, I have fixed this code in Rosetta's C++ version, but the fix will not make it out to your pyrosetta download in time for the 6/6/2019 code school.)

In [10]:
from pyrosetta import *
sfxn = get_score_function()

[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
[0mcore.scoring.etable: {0} [0mStarting energy table calculation
[0mcore.scoring.etable: {0} [0msmooth_etable: changing atr/rep split to bottom of energy well
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing lj etables (maxdis = 6)
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing solvation etables (max_dis = 6)
[0mcore.scoring.etable: {0} [0mFinished calculating energy tables.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
[0mbasic.io.database: {0} [0mDatabase file op

In [11]:
# Initialize poses before and after mutation
orig_pose = pyrosetta.toolbox.pose_from_rcsb('2wpt')
pose = Pose(orig_pose)
mut_pose = Pose(orig_pose)

[0mcore.import_pose.import_pose: {0} [0mFile '2WPT.clean.pdb' automatically determined to be of type PDB


In [48]:
# Apply mutation
mutater31 = pyrosetta.rosetta.protocols.simple_moves.MutateResidue()

mutater31.set_target(31)
mutater31.set_res_name('VAL')
mutater31.apply(mut_pose)

mutater35 = pyrosetta.rosetta.protocols.simple_moves.MutateResidue()
mutater35.set_target(35)
mutater35.set_res_name('THR')
mutater35.apply(mut_pose)

In [13]:
# YOUR CODE HERE PROBLEM 1
pose_before_score = sfxn(pose)
pose_mutated_score = sfxn(mut_pose)
print("pose_before score:", pose_before_score)
print("pose_mutated score:", pose_mutated_score)
change_score_val = pose_mutated_score - pose_before_score
print('ddG =', change_score_val) # experimental result -2.6 kcal/mol

[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/elec_cp_reps.dat
[0mcore.scoring.elec.util: {0} [0mRead 40 countpair representative atoms
[0mcore.pack.dunbrack.RotamerLibrary: {0} [0mshapovalov_lib_fixes_enable option is true.
[0mcore.pack.dunbrack.RotamerLibrary: {0} [0mshapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
[0mcore.pack.dunbrack.RotamerLibrary: {0} [0mBinary rotamer library selected: /Users/paul/anaconda3/envs/pyrosetta_env/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
[0mcore.pack.dunbrack.RotamerLibrary: {0} [0mUsing Dunbrack library binary file '/Users/paul/anaconda3/envs/pyrosetta_env/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
[0mcore.pack.dunbrack.RotamerLibrary: {0} [0mDunbrack 2010 library took 0.338424 seconds to load from binary
pose_before score: -40.64878956423769
pose_mutated score: 

In [14]:
total_number_of_residues = pose.total_residue()

In [15]:
sxfn = get_score_function()

[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m


In [18]:
diff_list = []

for i in range(1, total_number_of_residues + 1):
    orig_emap = pose.energies().residue_total_energies(i)
    mut_emap = mut_pose.energies().residue_total_energies(i)
    
    diff = pyrosetta.rosetta.core.scoring.EMapVector(mut_emap)
    temp_diff = diff
    temp_diff -= orig_emap
    
    diffe = temp_diff.dot(sxfn.weights())
    
    diff_list.append((i, pose.pdb_info().pose2pdb(i), abs(diffe)))

print(diff_list)

[(1, '4 A ', 0.0), (2, '5 A ', 0.0), (3, '6 A ', 0.0), (4, '7 A ', 0.0), (5, '8 A ', 0.0), (6, '9 A ', 0.0), (7, '10 A ', 0.0), (8, '11 A ', 0.0), (9, '12 A ', 0.0), (10, '13 A ', 0.0), (11, '14 A ', 0.0), (12, '15 A ', 0.0), (13, '16 A ', 0.0), (14, '17 A ', 0.0), (15, '18 A ', 1.9688592001188e-06), (16, '19 A ', 0.0), (17, '20 A ', 0.0), (18, '21 A ', 0.0), (19, '22 A ', 0.0), (20, '23 A ', 0.0), (21, '24 A ', 0.0), (22, '25 A ', 0.0), (23, '26 A ', 0.0), (24, '27 A ', 7.772449350795796e-12), (25, '28 A ', 0.0), (26, '29 A ', 0.003092827512702234), (27, '30 A ', 1.7464103769968033), (28, '31 A ', 0.1716769465586177), (29, '32 A ', 0.018197432164958227), (30, '33 A ', 0.2825448673479201), (31, '34 A ', 7.964237127748127), (32, '35 A ', 7.673420950866843), (33, '36 A ', 0.18157094803399543), (34, '37 A ', 0.13637798833519252), (35, '38 A ', 7.749985616447871), (36, '39 A ', 0.5153210603025428), (37, '40 A ', 0.008786480769913119), (38, '41 A ', 0.10767258848141525), (39, '42 A ', 0.354

In [27]:
# Sort diff_list for 10 highest values
diff_list.sort(key=lambda x:x[2], reverse=True)
for i in range(10):
    print(diff_list[i])

(31, '34 A ', 7.964237127748127)
(35, '38 A ', 7.749985616447871)
(32, '35 A ', 7.673420950866843)
(27, '30 A ', 1.7464103769968033)
(36, '39 A ', 0.5153210603025428)
(172, '97 B ', 0.412098976504391)
(170, '95 B ', 0.35920944353042794)
(39, '42 A ', 0.3543232494303874)
(30, '33 A ', 0.2825448673479201)
(161, '86 B ', 0.27589585454304877)


2. Redo the mutagenesis and ddG calculation on backbone perturbed structures. How much do the results change? Why?

In [34]:
# YOUR CODE HERE PROBLEM 2
# Initialize poses before and after mutation
orig_pose = pyrosetta.toolbox.pose_from_rcsb('2wpt')
pose = Pose(orig_pose)
mut_pose = Pose(orig_pose)

[0mcore.import_pose.import_pose: {0} [0mFile '2WPT.clean.pdb' automatically determined to be of type PDB


In [35]:
# Backbone perturbation BackrubMover()
br_mover = pyrosetta.rosetta.protocols.backrub.BackrubMover()
for i in range(100):
    br_mover.apply(mut_pose)

[0mbasic.io.database: {0} [0mDatabase file opened: sampling/branch_angle/branch_angle_1.txt
[0mbasic.io.database: {0} [0mDatabase file opened: sampling/branch_angle/branch_angle_2.txt
[0mprotocols.backrub.BackrubMover: {0} [0mSegment lengths: 3-34 atoms
[0mprotocols.backrub.BackrubMover: {0} [0mMain chain pivot atoms: CA
[0mprotocols.backrub.BackrubMover: {0} [0mAdding backrub segments for residues 1-195
[0mprotocols.backrub.BackrubMover: {0} [0mTotal Segments Added: 1778


In [37]:
# Calculate ddG
pose_before_score = sfxn(pose)
pose_mutated_score = sfxn(mut_pose)
print("pose_before score:", pose_before_score)
print("pose_mutated score:", pose_mutated_score)
change_score_val = pose_mutated_score - pose_before_score
print('ddG =', change_score_val)

pose_before score: -40.64878956423769
pose_mutated score: 6471.843893733009
ddG = 6512.492683297247


3. Generate a backbone ensemble made of 20 structures with your favorate backbone sampling method. Redo the mutagenesis and ddG calculation on each structure and take the mean/meadian/mimimal score. How much do the results change? Why?

In [52]:
# YOUR CODE HERE PROBLEM 3
# Initialize poses before and after perturbation
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('2wpt.pdb')

# Generate 20 new structures
pose_list = [pose.clone() for i in range(20)]

pose_before_scores = []
pose_after_scores = []
pose_ddG_list = []

# BackrubMover()
br_mover = pyrosetta.rosetta.protocols.backrub.BackrubMover()

# Run backrub on each of the 20 structures
for structure in pose_list:
    #
    br_mover.apply(structure) # apply mover
    #pose_after_scores.append(sfxn(pose))

for structure in pose_list:
    pose_before_scores.append(sfxn(structure))
    mutater31.apply(structure)
    mutater35.apply(structure)
    pose_after_scores.append(sfxn(structure))

for i in range(len(pose_before_scores)):
    pose_ddG_list.append(pose_after_scores[i] - pose_before_scores[i])

[0mcore.import_pose.import_pose: {0} [0mFile '2wpt.pdb' automatically determined to be of type PDB
[0mbasic.io.database: {0} [0mDatabase file opened: sampling/branch_angle/branch_angle_1.txt
[0mbasic.io.database: {0} [0mDatabase file opened: sampling/branch_angle/branch_angle_2.txt
[0mprotocols.backrub.BackrubMover: {0} [0mSegment lengths: 3-34 atoms
[0mprotocols.backrub.BackrubMover: {0} [0mMain chain pivot atoms: CA
[0mprotocols.backrub.BackrubMover: {0} [0mAdding backrub segments for residues 1-200
[0mprotocols.backrub.BackrubMover: {0} [0mTotal Segments Added: 1778


In [54]:
import numpy as np
pose_ddG_array = np.array(pose_ddG_list)
pose_before = np.array(pose_before_scores)
pose_after = np.array(pose_after_scores)
print(pose_before.mean())
print(pose_after.mean())
print('Mean score =', pose_ddG_array.mean())
print('Median score =', np.median(pose_ddG_array))
print('Min score =', pose_ddG_array.min())

191.18826046693317
214.4956563689754
Mean score = 23.30739590204225
Median score = 24.228007767580117
Min score = -1.238406967161609


4. The above ddG analysis is very crude and inaccurate. What improvements should be introduced to make it better?

In [None]:
# YOUR CODE HERE PROBLEM 4
The prediction protocol should let random mover, look to see if energy has gotten worse, reject, 
if energy gets better, accept it
MONTE CARLO 
change
score
mutate
score