# Hallucinating Scaffold 101

Authors: Angelica Lam

Last Updated: Aug. 24, 2021

### Introduction

The Baker Lab has developed a deep learning method to scaffold functional sites (including active sites) of proteins. The method works by feeding random protein sequences to trRosetta, which will use a structure prediction algorithm to guide sequence design. The Baker Lab demonstrates that a de novo protein generated by their method successfully binds to the substrate of its native enzyme. Although the de novo protein successfully binds to its desired substrate, it is unclear whether the protein retains the catalytic activity of its native enzyme.

Our team wants to determine if we can use the Baker Lab’s method to hallucinate scaffolds for enzymatic active sites that support catalytic activity. Moreover, we want to determine how much of our native target enzyme needs to be in the de novo protein in order for the de novo protein to exhibit the catalytic activity of the native enzyme.

### Proposed Plan

1. First, we need to pick an enzyme. We restrict ourselves to relatively simple enzymes that have a single active site and whose activity is easy to assay. After choosing an enzyme, we need to identify its catalytic residues (amino acids) and active site. Then we need to select “sets” of residues to scaffold. These sets vary by how much of the native enzyme we are including. For instance, we may have sets of residues that are 2 Å from the substrate, 5 Å, and so on.

2. Next, we need to use the Baker Lab’s deep learning method to scaffold our sets of residues. Their method will output the amino acid sequences of multiple de novo proteins that should have our inputted sets of residues in the correct geometry.

3. Then we need to computationally validate the de novo protein sequences to find the best candidates for testing in the wet lab. We use Rosetta forward folding both with and without the substrate to independently confirm that the sequences can fold into the desired de novo protein. We also use Rosetta docking functions, in particular their ligand docking function, to confirm that the de novo proteins can bind to the desired substrate.

4. Finally, we test the de novo proteins in the wet lab.

### Where are we now?

We have selected TEM-1 beta-lactamase and identified its catalytic residues. TEM-1 beta-lactamase has a single active site and is easy to assay because it provides E. coli with resistance to beta-lactam antibiotics (by hydrolyzing the molecule's beta-lactam ring).

Think ahead: We want our de novo proteins to have the same catalytic function as TEM-1 beta-lactamase. How might we test that these proteins function as expected in the wet lab?

Answer:

Activity 1: Look at the [enzyme mechanism for beta-lactamase](https://www.ebi.ac.uk/thornton-srv/m-csa/entry/2/). Name at least 5 catalytic residues.

Answer:

See the end of this notebook for answers.

### Next Steps

* Find PDB (protein data bank) entries for crystallizations of TEM1 beta-lactamase that also have the beta-lactam substrate (e.g., ampicillin, penicillin).
* Write a script to identify the "sets" of residues mentioned in Step 1.

### Technical Skills

In future steps, we will likely use trRosetta, Rosetta forward (ab initio) folding, and Rosetta docking. Here, we provide some simple exercises to give you a feel of what the software is like. We are also completely new to using Rosetta, so these exercises are not comprehensive. Feel free to do your own exploring!

### Rosetta forward (ab initio) folding

Ab initio uses a Monte Carlo algorithm to fold a provided amino acid sequence. The algorithm makes random protein conformations and uses an energy scoring function to evaluate them. Lower-energy conformations are favored. For more about the bio-physics behind the energy scoring function and ab initio structure prediction, see [this JHU Youtube series](https://www.youtube.com/playlist?list=PLHn7WmALbthnAwbJ4mWw5gk8dgqsjRL87). Notes on a few of the videos are [here](https://docs.google.com/document/d/1p6WEDLxbioRpTyY_r37ZXsduWJFtOspgrqCvyq_3jVw/edit?usp=sharing).

Rosetta can be run locally on the command line or on a public server.

#### Command line

The compiled Rosetta source code is massive, so we are using a "pretend" version to give you a sense of what it is like to use Rosetta. The "pretend" version has no actual Rosetta code.

Please watch [this tutorial](https://youtu.be/y6-1UUEf4Pw?t=657) by Sari Sabban from 11:00 to 28:00 minutes. You will be copying what he does.

As shown in the video, you need to fetch the 1ELW PDB file and fasta file. Both can be downloaded from the 1ELW entry of the PDB website. If you look at the 3D View of 1ELW (on the website), you will see that the first residue, methionine (M), is grayed out and does not show up in the 3D structure. Remove methionine from the fasta file. Also remove the sequence for Chains C, D. Additionally, you need to fetch the 3-mer, 9-mer, and secondary structure files. I have already submitted a job to Robetta's fragment server (moved to old.robetta.org since the video's publication), so you can fetch the files [here](http://old.robetta.org/downloads/fragments/73276/). The flags file is provided to you under badge-hs, but you need to change the paths to the input files. For organization's sake, it is preferable to keep the input files within badge-hs. Finally, run ab initio and extract the pdb from the resulting silent file using the commands shown in the video. To visualize your PDB file, use RCSB PDB's [3D View tool](https://www.rcsb.org/3d-view). Then [Superpose](https://www.rcsb.org/docs/3d-viewers/mol*/faqs-scenarios#how-do-i-compare-superpose-multiple-structures) your structure with 1ELW, by chains, and record the RMSD below.

RMSD = 

#### Robetta

Robetta (new) is a protein structure prediction server developed by the Baker lab at the University of Washington. Five options are provided for structure prediction: (1) A deep learning based method, RoseTTAFold, (2) A deep learning based method, TrRosetta, (3) Rosetta Comparative Modeling (RosettaCM), (4) Rosetta Ab Initio (RosettaAB), and (5) a fully automated pipeline that first predicts domains as independent folding units, models each unit with (3) or (4), and then assembles them into full chain models.<sup>[1](https://robetta.bakerlab.org/faqs.php)</sup>

Robetta's queue seems to give low priority to ab initio jobs, so [here](https://robetta.bakerlab.org/results.php?id=116737) is one I've already submitted. You can see the user input and results. The only input provided was the protein sequence (minus methionine). The job will be deleted on 10/28/2021.

Using the 3D View tool, superpose the five models from Robetta with 1ELW, by chains, and answer the questions below.

Which model had the lowest RMSD? What was that RMSD?

Model __

RMSD =

Which structure/s had lower RMSD value/s: the ones from Robetta or the one from running Rosetta on the command line? Why do you think this is the case? Hint: The ab initio job on Robetta took hours while running ab initio on the command line took under 5 minutes.

Answer:

### trRosetta

As previously stated, trRosetta is a deep learning based structure prediction method. For more information, see [About trRosetta](https://yanglab.nankai.edu.cn/trRosetta/help/) and the [Baker Lab's paper](https://www.pnas.org/content/117/3/1496).

If you want, you can try Robetta's trRosetta option on the protein sequence of 1ELW (or on a different protein sequence), but it can take a few hours to complete, so [here](https://robetta.bakerlab.org/results.php?id=116921) is one I've already submitted. The job will be deleted on 10/26/2021.

Using the results of the already submitted job, answer the questions below.

Which model had the lowest RMSD? What was that RMSD?

Model __

RMSD =

What do you notice is different between the models from the ab initio job and the trRosetta job? This question is very open-ended because I am also trying to figure out what to make of the differences.

Answer:

### Rosetta docking

Rosetta docking encompasses protein-protein, peptide-protein, and ligand-protein docking. Here we focus on ligand docking because we'll be testing our de novo proteins against small-molecule beta-lactam antibiotics such as [ampicillin](https://en.wikipedia.org/wiki/Ampicillin). To learn more about Rosetta ligand docking, see [this Meiler Lab tutorial video](https://www.youtube.com/watch?v=-4wgIuMfr_w).

ROSIE is a public server that offers Rosetta docking protocols. If you want, you can try their ligand docking option on the PDB entry, 3PBL, which shows the human dopamine D3 receptor in complex with eticlopride. To submit a job, you need the PDB file of the protein and the SDF file of the eticlopride ligand. Both files can be downloaded from the 3PBL entry of the PDB website, but you nedd to add hydrogens to the SDF file. If you have Chem3D or some other molecule editor, this can be accomplished by opening the SDF file, making sure that hydrogens are shown explicitly, and then saving the structure to a new SDF file. To avoid the hassle of dealing with a molecule editor, [here](https://rosie.graylab.jhu.edu/ligand_docking/viewjob/96197) is a job I've already submitted. I chose to generate a maximum of 50 ligand conformers with the BCL, use the starting coordinates in the SDF, and generate 10 structures.

Using the results of the already submitted job, answer the questions below.

Which model had the lowest interface score? What was that interface score?

Model-__

interface_delta =

Use the 3D View tool to compare one of the models to 3PBL. Did RosettaLigand detect the correct binding site?

Answer:

### More links to Rosetta resources

This is an unorganized hodgepodge of links to Rosetta resources. Feel free to add more!

[trRosetta Github page](https://github.com/gjoni/trRosetta)<br/>
[Ab initio Relax Documentation](https://new.rosettacommons.org/docs/latest/application_documentation/structure_prediction/abinitio-relax)<br/>
[Small-molecule ligand docking into comparative models with Rosetta](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5750396/)

### Answers

Answers may vary.

Think ahead: Use Golden Gate or Gibson assembly to build a plasmid with the de novo protein sequence, transform the plasmid into bacteria, and then test whether the bacteria become resistant to the antibiotic.

Activity 1: Ser70, Lys73, Ser130, Glu166, Ala237 and Lys234

Answers for the Technical Skills section are hidden. We can discuss these together.