# **Introduction** **to** **molecular** **visualization**

In this tutorial, we will outline some basic methods of visualizing proteins, which will be crucial to better understand the context of the machine learning analysis that we will conduct and discuss over the course of this workshop.

# Setting up PyMOL

PyMOL is a molecular visualization software that is simple to download and use. Instructions for download can be found here. When you first load the software, if everything has compiled correctly, you should see the interface in Figure 1. By default, you should get a 30-day free trial of this software, so don’t worry if you are prompted for a license upon initialization.

https://pymol.org/dokuwiki/doku.php?id=installation

The top part of the screen functions similarly to a terminal; that is, if you type

In [None]:
PyMOL> pwd
PyMOL> ls

you should see the path to your working directory followed by the list of files in your current directory. Since we will be writing and saving images within this tutorial, I recommend changing directories since, by default, PyMOL will load in your home directory. You can use basic Linux navigation commands within this interface, but you won’t be able to create or modify directories and files here. Create a directory that you want to work in (using your terminal or file viewer) and then change into that directory by typing:

In [None]:
PyMOL> cd /path/to/your/directory

![Screenshot 2024-10-21 at 10.32.13 AM.png](attachment:f646df80-af78-474b-bb1f-d92f6d9010aa.png)

Figure 1: Annotated PyMOL user interface.

Table 1: Main PyMOL functions

![Screenshot 2024-10-21 at 10.38.33 AM.png](attachment:a09d45e5-d672-4722-82ff-994fbfd2d696.png)

# Loading your first protein

As a basic introduction to PyMOL, we offer instructions to analyze a very basic protein. In the PyMOL command line, type

In [None]:
PyMOL> fetch 1AKI.

This will load a protein structure, hen egg-white lysozyme, which is a very simple enzyme that we will look at before expanding to a more complicated viral system. This enzyme is (conveniently) quite short and functions to catalyze hydrolysis, i.e., the breaking apart of two molecules. Once loaded, a gray bar will appear on the right side of the screen with the Protein Data Bank (PDB) ID of the protein (in this case 1AKI). Each letter within this bar is associated with a different function (Table 1).

When loaded, the protein will be in a ‘cartoon’ visualization, where a-helices and b-sheets are shown as spirals and arrows, respectively. These are key components of protein secondary structure and are thus crucial to the structure and functionality of the molecule. For the sake of this tutorial, we will not go into more depth regarding the construction of these structural components, but it is important to note how amino acids can interact with each other via bonded and nonbonded interactions, which is dependent on the structure of the protein components.

Before we continue, it will be important for you to be able to easily manipulate proteins within the PyMOL interface. The bottom right of the screen (Figure 1) shows the current mouse mode and the options for editing or moving. Changing mouse -> 1 button viewing in the top toolbar can be useful to implement simpler mouse functionality for laptop trackpads. Otherwise, 3-button mouse features can often be implemented with the trackpad by using the control, command, shift, and alt keys. Once we load our first protein, experiment with navigating throughout the visual area using combinations of these keys in 3-button mouse viewing, and adjust to 1-button viewing if necessary. To begin, when using 1-button viewing, clicking anywhere and dragging will rotate the protein around its rotational axis. You can change this axis at any time by clicking anywhere on the protein and selecting A -> origin or A -> zoom. Holding control (PC/Linux) or command (Mac) and clicking and dragging will allow you to zoom in and out. Clicking alt (PC/Linux) or option (Mac) and dragging will allow you to translate your protein across the screen.

Click C -> by ss -> choose any option here. This will distinctly color the secondary structures and give you a better idea about where these different components begin and end. Your protein should now look like Figure 2. We can take a picture of this image by typing the following commands into the command window:

In [None]:
PyMOL> ray 1000,1000
PyMOL> png lysozyme.png

This will save a picture of the structure within the visual window as lysozyme.png into your previously chosen directory. The ray command will make an image of the specified height and width; in this case, we specified dimensions of 1000x1000 pixels. These values can be adjusted to image different-sized areas, but note that a very large area could take multiple minutes to render.

Click S -> as -> licorice -> sticks. This will show explicitly the molecular structure of the amino acids within the protein. This representation, therefore, can contain information regarding which atoms are close to each other within the protein structure. This representation, however, does not contain any hydrogens (crystal structures typically do not include hydrogens). We can add these back in via the command A -> hydrogens -> add. You may notice that this command also makes it easier to see the water molecules in which the protein is solvated. To change back to the original viewing image, click S -> as -> cartoon. Lastly, to see a more holistic view of the entire protein, we can choose S -> as -> surface. This representation allows us to see the full globular form of the protein but isn’t particularly useful for analysis of specific residue or atomic interactions.

![GTQBioS-Tutorial1-May15-pymol.jpg](attachment:ace1ed78-5dc8-4954-8e7e-2c26fc3fd785.jpg)

Figure 2: Hen egg white lysozyme (PDB: 1AKI).

# Challenge Problem 1: Exploring multiple ways to visualize a protein

Generate three distinct visual representations of lysozyme. Use the Settings tab in the toolbar to explore various options, rather than relying solely on default settings. Consider different ways to depict the enzyme’s shape, including any binding pockets (easier to see in surface or mesh representation), as well as more detailed atomic representations. If needed, you can include the earlier cartoon representation as one of the three images.

# Challenge Problem 1 Solution:
It is important to be able to see both the shape of the protein and the
approximate proportion of atoms. For instance, it is clear that the majority of the protein appears to consist of nitrogen, oxygen, and carbon atoms, but sulfur atoms are also scattered throughout, albeit sparsely. This would not be able to be seen when only looking at the cartoon view.

![GTQBioS-Tutorial1-May15-1.jpg](attachment:ec0d8b53-9875-465c-b7ca-c079be34cc89.jpg)

It is important to be able to see both the shape of the protein and the approximate proportion of atoms. For instance, it is clear that the majority of the protein appears to consist of nitrogen, oxygen, and carbon atoms, but sulfur atoms are also scattered throughout, albeit sparsely. This would not be able to be seen when only looking at the cartoon view.

Click on the S, shown in Figure 3 or go to Display -> sequence in the top toolbar. This will show you the amino acid sequence of the protein. Highlighting any of these sequence strings will immediately show it within the visual interface. Highlighted residues will be temporarily stored in the variable ‘sele’, which can be manipulated in a similar way to the full protein.

# Challenge Problem 2: Observing Amino Acids

Identify and select the amino acids located within the beta-sheet of the hen lysozyme protein, represented by the two arrows in the cartoon depiction. Once you have visually identified your residues of interest, click on them to temporarily store them into the ’sele’ variable. Copy these residues into a separate object (on your sele object, click A -> copy to object), display them in a licorice format, and color by element C -> by element. Additionally, add hydrogen atoms to the structure and show any hydrogen bonds between the residues. To reveal the hydrogen bonds, navigate to the A -> Find -> Polar Contacts -> Intra-Main Chain option on your created object. Use the measurement wizard located in the top toolbar to calculate the length of the hydrogen bonds present in this section of the protein structure.

# Tips for using the measurement wizard

The measurement wizard is useful for measuring distances, angles, dihedrals, and more. To measure the distance between two atoms click Wizard -> measurement -> (click on the first atom) -> (click on the second atom). This will generate a new ‘measurement’ object, which shows a dashed line between the two chosen
atoms and its length in angstroms, saved as ‘measure01’. This object can be manipulated in a similar way to your protein; click the object name to hide the measurement, or delete it completely by clicking A -> delete.

![GTQBioS-Tutorial1-May15-2.jpg](attachment:fb6c7cde-17cb-4d92-8eae-4c47ad9b19fb.jpg)

Figure 3: PyMOL mouse option descriptions. The red circle indicates the location of the sequence viewer. Clicking the button which says ‘Residues’ will also allow you to change the default selection from residues to atoms, chains, etc. Clicking the button next to ‘mouse mode’ will quickly allow you to change between 3-button viewing, 3-button editing, and 1-button viewing, among other settings.

# Challenge Problem 2 Solution

This can be accomplished manually by clicking on residues and selecting (which can be good for visual understanding), but also quickly done via the command ‘sele chain A and ss S’ (it is convenient to use logical arguments within PyMOL). Although understanding hydrogen bonding is outside the scope of this tutorial, this exercise provides an example to visualize potential intermolecular interactions.

![GTQBioS-Tutorial1-May15-3.jpg](attachment:b49573bb-d47b-4287-8521-94ef0a4987d2.jpg)

We will finish up this section by manipulating the representation of the enzyme around a certain point of interest. In doing so, it will also be very useful to learn how to manipulate temporary selections. Click on any residue within the protein. This will temporarily store the selection into the variable ‘sele’. Then click A -> modify -> around -> residues within 12 angstroms. Once done, a much larger number of residues should be selected. You can then change the representation of this selection. This technique will be very useful when analyzing amino acids near particular ligands, ions, or residues.

Throughout the first part of this tutorial, we have covered basic visual manipulation of proteins using PyMOL along with some brief basics on protein structure. The next sections will continue this trend and allow you to gain an intuition for observing potential intermolecular interactions and differences between proteins with similar structures.

# Adding Some Complexity

We will now investigate a protein in a complex with a substrate. This tutorial will help you better understand intermolecular interactions that can occur between proteins of within a protein-substrate complex.

Create a new PyMOL window (or delete everything in your current session). In the PyMOL command line, type

In [None]:
PyMOL> fetch 4I8N

![GTQBioS-Tutorial1-May15-4.jpg](attachment:9bb5711d-4517-4fcc-a056-fffe7f6e8df4.jpg)
Figure 4: PTP1B protein (PDBID: 4I8N).

Color by secondary structure and observe your protein, which should look like Figure 4 (note: the colors used here are different from the default; feel free to explore different color palettes for your protein structure). This protein is called protein tyrosine phosphatase 1B (PTP1B), and functions to remove phosphate groups from molecules. For the sake of this tutorial, it is important only to know that this protein can be inhibited,
and that an inhibitor molecule is also found within this pdb file. Typically, when loading in a pdb file containing both a protein and a ligand, the protein will be automatically visualized in ‘cartoon’ format and
the ligand in ‘licorice’ format. This makes it easy to identify the ligand upon first loading of the molecule. PTP1B is an interesting protein because we can directly compare it to other proteins within the same PTP
family. For instance, type into the command line

In [None]:
PyMOL> fetch 1L8K
PyMOL> align 4I8N, 1L8K

Here we have fetched another protein, a T-cell protein tyrosine phosphatase (TC-PTP), which is known to be homologous, or containing very similar secondary and tertiary structure, to PTP1B.1 We have also aligned the two proteins so that we can better see their similarity. It is easier to view the similarities by first hiding any ligands and then making sure that each protein is colored one color (rather than having differentiated colors for secondary structure, atom types, etc.). Your alignment should look like Figure 5. When we align these proteins, PyMOL will immediately quantify how close the objects are via their root-mean-square-deviation (RMSD), distance discrepancy, calculation; the lower this number, the closer the two structures. The RMSD for this alignment is 0.556 °A (0.0556 nm), which is very small, as we would expect. It is a useful exercise to pick out which components of the proteins appear the most dissimilar. With proteins that have a high sequence conservation, discrepancies are often due to missing residues within the pdb file (these will show as greyed letters within the sequence view in PyMOL). For instance, there is an a-helix contained
within the pdb file for PTP1B (4I8N) that is not present within the TC-PTP structure (1L8K). Can you identify this helix?

When looking at an enzyme, we often care about the residues within the active site which are able to interact with associated ligand(s). Although interaction distances can vary across proteins due to potential catalytic movements, we can make a general estimate of how close a residue needs to be in order to have a potential interaction. For the sake of this tutorial, we will claim that any protein residue within 4°A (1 °A = 10−10 m) of the ligand could potentially interact, and then we will visually investigate to see which ones are more likely candidates.

![GTQBioS-Tutorial1-May15-5.jpg](attachment:3ac9738b-dd49-48b3-9e15-eca3b07c8d85.jpg)
Figure 5: Alignment of PTP1B (green) and TC-PTP (yellow).

# 1.3.1 Challenge Problem 3: Isolating a ligand

Copy the inhibitor (the residues shown in sticks when you initially fetch the structure) to a new object and rename. Change the color of this ligand and identify the amino acids within the PTP1B protein that could potentially interact with it by using the modify around command after selecting the ligand. Change the representation of these amino acids to licorice format. Find several residues that appear closest to the ligand (they will be on the inside of the protein). Use PyMOL to find any nearby polar contacts within the selection. Measure distances among the nearest neighbors. Save a png image of the ligand, its closest neighbors, and the corresponding atom distances.

# Challenge Problem Solution

![GTQBioS-Tutorial1-May15-6.jpg](attachment:478a6797-be86-41dd-8d45-3df7c6b92cda.jpg)

Once we have a list of residues that may be interacting with the ligand, we should inspect them more thoroughly to determine what sorts of interactions could occur. Different amino acids exhibit different chemical properties. Although a more thorough explanation of chemical interactions is out of the scope of this tutorial, it is worth noting specifically that hydrophobic species will not interact with polar or charged species. Polar species will be more likely to form hydrogen bonds, while charged species can form electrostatic interactions. Determine what types of interactions can be possible within the PTP1B protein-ligand complex. A figure containing amino acid properties is provided for your reference (Figure 6).

# Mini project: Understanding the SARS main protease

Now that you have sufficient skills in molecular visualization and a better idea for how atoms can interact within a larger protein, we will look at some further applications into understanding viral efficacy.

We will be looking at the main SARS-CoV protease. This enzyme is responsible for processing the majority of the replication proteins generated from SARS-CoV.2 As a result, inhibiting the functionality of this enzyme could serve as a method for mitigating SARS-CoV. Through this mini project, you will
analyze this protein and associated mutants to make observations on how single point mutations can alter the functionality of the key enzyme.

![GTQBioS-Tutorial1-May15-7.jpg](attachment:7868ff01-855c-4ec0-a9fa-19b64450f849.jpg)
Figure 6: Venn diagram illustrating key chemical characteristics of amino acids. This figure can be used as a reference when looking into protein-protein interactions.

# Challenge Problem 4: Characterizing mutants

Fetch the pdb 6Y2E and download the mutants of this protein, chosen
previously,2 from here. Find the mutations by comparing the sequences to the
original structure, and use any PyMOL skills that you desire to determine how these mutations could affect the functionality of the protein. As a tip, the easiest way to quickly identify the differences visually would be to manipulate the coloring and representation of each protein, ensuring that all structures are aligned. Take a picture of each mutated residue aligned with the original structure (there is only one per structure). Can you determine where potential active sites may be located? If so, take a picture of the potential sites (it may be easier to change the protein to a surface representation for this). Label each mutant object in the format AYYYB, where A residue at position YYY within the wild type species is replaced by a mutated residue, B (ex: A611V would mean the alanine at position 611 was mutated to a valine).

Now that we have a grasp on how our mutants differ from each other, we will now look at how the protease (and its mutants) will interact with an inhibitor, nirmatrelvir, the key component of the drug Paxlovid.

# Challenge Problem 5: Visualizing different interactions

Fetch the pdbs 8DZ2, 7U28, 7U29, and 7TLL. Determine which mutants these
pdbs correspond to from the last challenge problems. Examine interactions
within the active site between each protein and the inhibitor and determine
whether it is possible to see an immediate difference between the potential
effectivity of the inhibitor across the different proteins. Then, expand your gaze across the protein. These structures have a greater discrepancy than the previous ones, as they have been experimentally visualized rather than generated solely using PyMOL mutations. Take pictures of any differences among the mutants that are of interest to you. In summary, after aligning the structures, find areas that do not appear the same and determine if these differences can be directly traced to an amino acid mutation or rather if these differences occur despite identical residue sequences within these locations.

Lastly, we will explore the interactions between this protease and nirmatrelvir along with the mechanistic impact of the mutations that we have chosen.

# Challenge Problem 6: Putting it together

Do some research and find how SARS interacts with nirmatrelvir and how mutating specific amino acids can alter these interactions (a good place to
begin would be a recent study by Ullrich et al.2). Go back into the pdbs from earlier and try and pinpoint these locations/residues of interest. Do these locations line up with the discrepancies that you imaged for the last problem?

💡🤔 Please take the quiz below to test your knowledge of PyMOL!

In [None]:
pip install jupyterquiz

In [5]:
from IPython.display import IFrame
IFrame('quizzes/Quiz1_a.html', width=800, height=400)