# **Submodule 3.3 - Peptide-based drug design**

## **Learning Objectives:**
Understand and apply the basics steps involved in peptide based drug design.

## **Prerequisites:**
- Basics of protein and peptide structures.
- Understanding of biochemical pathways.
- Familiarity with content from Submodules 1, 2, 3.1, and 3.2.
- Basic understanding of PyMOL

## **Introduction**
This section explores the systematic process of peptide-based drug design, from initial target identification to final therapeutic optimization. Using PyMOL as our primary visualization platform, along with computational tools like AlphaFold and Cluspro, we'll investigate how to analyze protein structures and design effective peptide therapeutics. Through detailed examples, you'll learn to visualize protein-peptide interactions and apply various screening techniques including terminal truncation, peptide fragmentation, alanine scanning, and sequence shuffling to optimize peptide sequences. Building on these computational and experimental approaches, we'll examine how pharmacological and antiproliferative assays are used to evaluate drug candidates, using real-world decision-making scenarios to demonstrate the progression from initial design to final drug selection. You'll learn to interpret structural data and assay results, understand structure-activity relationships, and appreciate how this information guides the development of effective peptide-based drugs.

## **Peptide-based drug design**

As in the previous case of drug design, our first step is to identify the biochemical pathway and which biochemical pathway is responsible for disease state. Next, we identify the proteins that can be targeted to modulate the biochemical pathway. Then, if the protein is an enzyme, identify the binding site and peptide that can inhibit the binding site. In this case, you have to design a peptide that binds but does not undergo enzymatic reaction or the product is not formed (look at the section of enzyme-based drug design).  If two proteins are involved, then we need the interaction details of the two proteins.

- **Scenario 1:** Known Protein-Protein Complex Structure
    - Analyze binding surface interface between proteins
    - Design peptide inhibitors based on interaction surface
    - Focus on key binding regions

- **Scenario 2:**  Unknown 3D Structure(s)
    - Generate protein models using AlphaFold for unknown structures
    - Use Cluspro to predict protein complex formation
    - Analyze predicted interaction interfaces


## **Introduction to Assays**
Assays are standardized laboratory tests used to measure the biological or biochemical activity of a substance, particularly in drug discovery and development. They provide crucial quantitative data about how potential drug compounds interact with their targets and affect biological systems. In drug development, multiple assays are typically performed in sequence to make informed decisions. For instance, when developing a new cancer therapeutic, researchers might first conduct a binding assay to measure target affinity, followed by cell-based assays to evaluate efficacy. Consider a case where three compound candidates show similar binding affinity to a cancer target protein, but cell-based assays reveal that only one compound effectively reduces cancer cell proliferation while sparing healthy cells. This compound would then progress to more detailed pharmacological and toxicity testing, demonstrating how assay data guides critical decision-making in drug development.

### **Pharmalogical Assays**
Pharmacological assays are comprehensive, specialized tests designed to evaluate how drug compounds interact with and affect biological systems at multiple levels. These assays are crucial in early drug development for understanding not only how effectively a drug binds to its target, but also its broader biological impact, safety profile, and potential clinical viability. Through a systematic series of tests, researchers can build a complete profile of a drug's behavior, from molecular interactions to whole-system effects.

The assays include:
- <u>Binding assays</u>: Measure how well a drug binds to its target (e.g., receptors, enzymes)
- <u>Functional assays</u>: Assess the biological response triggered by drug-target interaction
- <u>ADME assays</u>: Evaluate drug absorption, distribution, metabolism, and excretion
- <u>Toxicity assays</u>: Determine potential harmful effects of drug compounds

Key parameters measured:
- Kd (dissociation constant)
- Ki (inhibition constant)
- EC50 (half maximal effective concentration)
- Bioavailability
- LD50 (lethal dose, 50%)

#### Decision-Making Using Pharmalogical Assays
Let's consider an example where we are developing a new receptor antagonist for neurological disorders. Three drug candidates (X, Y, and Z) were evaluated using multiple pharmacological assays. Here's a comparison of the results for Compound X and Compound Y:

| Compound | Binding Assay (Kd) | Functional Assay (Receptor Inhibition) | ADME Profile | Toxicity (LD50) |
|---|---|---|---|---|
| X | 2 nM | 85% | 60% oral bioavailability, 8-hour half-life | Well above therapeutic dose |
| Y | 1 nM | 90% | 20% oral bioavailability, 2-hour half-life | Well above therapeutic dose |


**Why Compound X is preferred over Compound Y:**

While Compound Y exhibits a slightly better binding affinity (Kd = 1nM) and higher receptor inhibition (90%) compared to Compound X, it has significantly lower oral bioavailability (20% vs. 60%) and a shorter half-life (2 hours vs. 8 hours). These factors would likely result in less effective drug delivery to the target site and the need for more frequent dosing, respectively. Considering the overall pharmacological profile, Compound X emerges as the more promising candidate despite its marginally lower binding affinity and receptor inhibition. This example highlights the importance of considering multiple pharmacological parameters beyond simple binding affinity when selecting drug candidates for further development. <mark>

Despite Compound Y having a slightly better binding affinity (Kd = 1nM), its poor bioavailability (20%) and shorter half-life (2 hours) made Compound X the better candidate for further development. This demonstrates how integrated pharmacological data guides compound selection beyond simple target affinity.

### **Antiproliferative Activity Assay**
Antiproliferative activity assays are specialized biological tests designed to quantify a compound's ability to inhibit cell growth or division. These assays are particularly critical in cancer drug development, where the primary goal is often to selectively stop or slow the growth of cancer cells while minimizing effects on healthy cells. By employing multiple complementary methods, researchers can build a comprehensive understanding of how a compound affects cell proliferation over different time scales and through various mechanisms.

Common assays include:
- <u>MTT assay</u>: Measures cell viability through metabolic activity
- <u>BrdU assay</u>: Detects DNA synthesis in proliferating cells
- <u>Colony formation assay</u>: Evaluates long-term growth inhibition

Key parameters measured:
- IC50 (half maximal inhibitory concentration)
- Cell growth inhibition percentage
- Time-dependent effects

#### Decision-Making Using Antiproliferative Active Assay
During the development of a breast cancer drug, three compounds (Z, A, and B) were evaluated using multiple antiproliferative assays. Here's a comparison of the results for Compound Z and Compound A:

| Compound | MTT Assay (IC50) | BrdU Assay (DNA Synthesis Reduction) | Colony Formation Assay (Inhibition) | Effect on Normal Cells |
|---|---|---|---|---|
| Z | 200 nM | 75% at 500 nM | Complete at 1 µM after 14 days | Minimal |
| A | 150 nM | 80% at 400 nM | Complete at 800 nM after 14 days | Significant |


**Why Compound Z is preferred over Compound A:**

While Compound A exhibits a slightly better potency in the MTT assay (IC50 = 150 nM) and BrdU assay, it also demonstrates a significant effect on normal cells, indicating potential toxicity. Compound Z, on the other hand, shows minimal effects on normal cells, making it a more selective and safer candidate. Although Compound Z has a slightly higher IC50 in the MTT assay, its superior selectivity for cancer cells and comparable antiproliferative effects in the other assays make it the preferred candidate for further development. This illustrates how multiple antiproliferative assays provide crucial information for better decision-making in cancer drug development, considering not only potency but also selectivity and safety.

While another compound showed a lower IC50 in the MTT assay (150nM), it also significantly affected normal cells. Compound Z was selected for further development due to its superior selectivity for cancer cells and sustained antiproliferative effect demonstrated in the colony formation assay. This illustrates how multiple antiproliferative assays provide complementary data for better decision-making in cancer drug development.

## **Screening Techniques**
Screening techniques in peptide-based drug design are systematic methods used to optimize peptide sequences and understand structure-activity relationships. These approaches are essential for developing more effective and drug-like peptides by identifying critical structural elements and improving pharmaceutical properties.

### Terminal Truncation
Terminal truncation is a fundamental peptide optimization strategy that systematically removes amino acids from either end of a peptide sequence. This method helps identify the minimal bioactive sequence, potentially improving drug-like properties.

Key aspects include:
- Systematic N-terminal or C-terminal amino acid removal
- Analysis of activity retention after each truncation
- Identification of minimal active sequence
- Optimization of peptide length

Example:

Full sequence:     YGRKKRRQRRR<br>
N-term truncation: -GRKKRRQRRR<br>
                   --RKKRRQRRR<br>
                   ---KKRRQRRR<br>
C-term truncation: YGRKKRRQR--<br>
                   YGRKKRRQ---<br>
                   YGRKKRR----<br>
Found minimum active sequence: GRKKRRQ

#### Decision-Making Using Terminal Truncation
Consider a 15-residue peptide inhibitor:

Results:
- Full sequence shows IC50 = 100nM
- N-terminal truncation series maintains activity until residue 4
- C-terminal truncation maintains activity until residue 12
- Minimum active sequence

Based on these results we create a truncated 9-residue peptide (residues 4-12), which exhibits IC50 = 80nM. The optimized shorter peptide would be selected for further development due to maintained activity and improved synthetic feasibility.

### Peptide fragmentation
Peptide fragmentation is a systematic method that breaks down peptides into overlapping segments to identify regions crucial for biological activity. This approach helps map the functional domains within larger peptide sequences.

Key aspects include:
- Generation of overlapping peptide segments
- Evaluation of each fragment's activity
- Identification of bioactive regions
- Mapping of functional domains

Example:

Original sequence: FLPVLAQFVLL (10-residue peptide)

Fragment 1 (1-6):  FLPVLA----<br>
Fragment 2 (3-8):  --PVLAQFV--<br>
Fragment 3 (5-10): ----LAQFVLL<br>
Most active fragment identified: PVLAQFV

#### Decision-Making Using Peptide fragmentation
Consider a 20-residue peptide inhibitor:
- Full sequence shows IC50 = 50nM
- Five overlapping fragments (10 residues each) were generated
- Fragment 2 (residues 5-14) shows IC50 = 75nM

Other fragments show minimal activity, the middle fragment was identified as the primary bioactive region and selected for further optimization.

### Alanine Scanning
Alanine scanning is a precise mutational analysis technique where each amino acid is systematically replaced with alanine to determine its contribution to peptide activity.

Key aspects include:
- Sequential alanine substitution
- Activity measurement after each substitution
- Identification of critical residues
- Structure-function mapping

Example:

Original:     KLWVRIPKLL<br>
Position 1:   ALWVRIPKLL (K→A)<br>
Position 2:   KAWVRIPKLL (L→A)<br>
Position 3:   KLAVRIPKLL (W→A)<br>

#### Decision-Making Using Alanine Scanning
Consider a 10-residue peptide antagonist:
- Original sequence shows Kd = 25nM
- Alanine substitution at positions 3, 6, and 8 causes >10-fold activity loss
- Positions 1, 2, and 10 tolerate substitution

Other positions show moderate effects, the results identified three essential residues for maintaining activity, guiding further optimization efforts.

### Shuffled Sequence
Shuffled sequence analysis examines how amino acid order affects peptide activity while maintaining the same composition, helping optimize sequence arrangement. Additionally, shuffled sequences serve as valuable controls in peptide studies, as they maintain identical amino acid composition but typically lack biological activity, helping validate sequence-specific effects of the original peptide.

Key aspects include:

- Systematic sequence rearrangement
- Activity comparison
- Stability assessment
- Structure-activity correlation
- Control sequence validation

Example:

Original:DFKNLRPVWY<br>
Variant 1: KNDFWPRVLY<br>
Variant 2: WVPDFKNLRY<br>
Variant 3: RPWDFKNLVY<br>
Best variant: RPWDFKNLVY (improved stability)<br>
Control: YWVPRNFKLD (inactive scrambled sequence)

### Decision-Making Example Using Multiple Screening Techniques
During optimization of a therapeutic peptide:
1. Alanine scanning identified three critical residues
2. Terminal truncation reduced sequence from 20 to 12 residues
3. Shuffled variants explored alternative arrangements

Optimized peptide outcome:
- 2-fold improved activity
- 3-fold better stability
- Reduced synthesis costs

This comprehensive screening approach led to an optimized candidate with improved drug-like properties while maintaining biological activity.

-------------------

# 📊 Tutorials
In these tutorials, we will use the PyMOL and AutoDock to work through <u>**five**</u> applied activities to:
- <u>Activities 1/3</u>: Produce a peptide based drug to inibit the CD2-CD58 protein-protein interaction for Rheumatoid arthritis
- <u>Activities 4-5</u>: Analyze the 3D structure of a peptide, optimize its sequence using screening techniques, and evaluate its potential as a therapeutic agent.

## Before you begin:
- Run PyMOL GUI by following the directions provided in the Submodule 0 notebook, provided here: [pymol_notebook](../submodule0_pymol_setup/pymol_notebook.ipynb)
  

## 🌟 **Activity 1: Visualizing CD2-CD58 Interaction Using PyMOL**
Rheumatoid arthritis is an autoimmune disease in which our own immune system attacks cells at our joints causing inflammation and synovial membrane deformation. This results in pain, inflammation and difficulty in movement of joints. The disease is known to start with the presence of rheumatoid factor in the body. This rheumatoid factor induces production of antibodies against collagen and hence joints are attached by T cells, antibodies causing inflammation. When T cells recognize the antigen presenting cells, several adhesion molecules are responsible for generating the immune response. In the first step a protein molecule CD2 on T cells binds to CD58 on antigen presenting cells. This protein-protein interaction results in cell signaling to T cells via cytoplasmic tail of CD2 generating inflammatory cytokines and hence inflammation of joints and immune response. It is known that CD58 is highly expressed in joints in arthritis patients. If we can inhibit CD2-CD58 interactions, we can modulate the immune response, reduce the inflammation and hence progression of arthritis.

<details>
  <summary>Click to see Reference</summary>
  Wang, J.-H., Smolyar, A., Tan, K., Liu, J.-H., Kim, M., Sun, Z.J., Wagner, G., Reinherz, E.L. Structure of a Heterophilic Adhesion Complex Between the Human CD2 and CD58(LFA-3) Counter-Receptors. (1999) <i>Cell</i> 97: 791-803.  

</details>

### **Objective:** <br>
Analyze the interaction between *CD2* and *CD58 (LFA-3)* in PyMOL to explore their structural and polar interactions. In this particular case the crystal structure of the complex of CD2-CD58 is available.

### **Steps to Complete this Activity**:

#### Step 1. **Fetch the structure**:<br>
   `fetch 1QA9`

#### Step 2. **Inspect the asymmetric unit**:
1. The asymmetric unit contains *two complexes* of CD2 and CD58, and each complex consists of four subunits(*A*, *B*, *C*, and *D*).
     - Subunits *A* and *B* == CD2 and CD58 of one complex.
     - Subunits *C* and *D* == CD2 and CD58 of the second complex.

#### Step 3. **Delete one set of complexes**:
1. Remove chains *C* and *D*:<br>
     `delete chain C`<br>
     `delete chain D`

#### Step 4. **Focus on chains A and B**:
1. Select chains *A* and *B* to work with the remaining complex:<br>
```select complex, chain A+B```

#### Step 5. **Identify polar contacts**:
1. Use the GUI to find polar contacts within the selection:<br>
     `A > preset > find > polar contacts within selection`

#### Step 6. **Visualize the interactions**:
1. Display the polar contacts as lines:<br>
     `S > lines`

### **Step 7 Observations**:
While exploring the polar contacts in PyMOL, focus on the interaction between **CD2 (chain A)** and **CD58 (chain B)**. Pay close attention to the following:

- <u>Structural Interface</u>: Observe how the two chains interact at the molecular level, identifying regions of close proximity that contribute to the stability of the interaction.

- <u>Key Residues Involved</u>: Identify the specific amino acid residues from both chains that participate in the polar interactions and contribute to binding.

To analyze these interactions effectively:  

1. Use the *distance measurement tool* in PyMOL to quantify key interactions.  
2. Utilize the `find polar contacts` feature to highlight hydrogen bonds and other interactions.  
3. Adjust visualization settings (e.g., transparency, coloring) to better observe the binding interface.

<details>
  <summary>Result</summary>
  We have generated the interface of CD2-CD58 interaction using PyMol here. <br>
Blue sticks are amino acid residues from CD2 protein; magenta sticks are amino acid residues from CD58. Yellow lines represent hydrogen bonds.<br>
    <center><img src="images/submodule3.3_activity1.png" width=500 /></center><br><br>
  
</details>  

------------
## 🌟 **Activity 2: Analyzing Peptide Candidates**
To inhibit this complex a small peptide can be created that binds to this either the CD2 or CD58 surface preventing the CD2-CD58 complex from forming. With this in mind by identifying the key interactions of this complex a peptide with these sequences can be selected to be the drug candidate. With how large the protein-protein surface is it can be difficult to identify the key interactions. One way of identify the key amino acids is to perform amino acid mutations and see how that effects the formation of this complex. Amino acid mutations were performed and the results are in the "Structure of a Heterophilic Adhesion Complex Between the Human CD2 and CD58(LFA-3) Counter-Receptors" paper. To simplify this table a figure was created with the CD2 and CD58 adhesion domain amino acids.The amino acids that were mutated are shown with the font size and bolding indicating the effect of the mutation.  Amino acids not bolded indicate the mutation had effect on binding

<center><img src="images/Arthritis.png" width=500 /></center><br><br>


Based on the interaction above, residues 30 to 48 from CD2 can be designed to mimic CD2 
If we design the above peptide, it can bind to CD58 and hence Cd58 cannot bind anymore. Thus, CD2 CD58 interaction is inhibited.
However, the peptide has 18 amino acids and a beta-sheet structure.  We have to shorten the peptide by looking at the 3D structural details.
We will select 30 to 48 and delete all the other parts of the protein



<center><img src="images/large_peptide.png" width=200 /></center><br>

If we design the above peptide, it can bind to CD58; hence, CD58 cannot bind anymore. Thus, CD2 CD58 interaction is inhibited. But the peptide has 18 amino acids and has beta sheet structure. We have to shorten the peptide by looking at the 3D structural details. We will select 30 to 48 and delete all the other parts of the protein<br><br>
Now, look at the secondary structure of the binding region of CD2. It has a beta-sheet structure with a beta-turn at the bottom of the figure above. However, for the design, you need to include amino acids that are important in binding, but the structure should still have beta-sheet structure Residues 31-37, 42 to 49.


<center><img src="images/small_peptide.png" width=400 /></center><br>
    
By doing this two seperate chains are generated which need to be connected. This can be done via introducing a Pro-Gly or Pro-Pro amino acid to introduce a beta-turn connecting the two peptides. Introducing either Pro-Pro or Pro-Gly between K37 and K43 a cyclized peptide can be created resulting in the structure below.    
    
<details>
  <summary>References</summary>
    1) Raghothama S, Awasthi S. β-Hairpin nucleation by Pro-Gly β-turns. Comparison of D-Pro-Gly and L-Pro-Gly sequences in an apolar octapeptide. Journal of the Chemical Society, Perkin Transactions 2. 1998(1):137-44.   https://pubs.rsc.org/en/content/articlelanding/1998/p2/a703331a

    2) Kumar S, Borish K, Dey S, Nagesh J, Das A. Sequence dependent folding motifs of the secondary structures of Gly-Pro and Pro-Gly containing oligopeptides. Physical Chemistry Chemical Physics. 2022;24(30):18408-18.
    
    https://pubmed.ncbi.nlm.nih.gov/35880873/     DOI: 10.1039/d2cp01306a  
</details>  

<center><img src="images/pro_gly.png" width=400 /></center><br>

There are several different sequences that are generated by doing this.There are several different viable peptide sequences possible the one we will use going forward is the following: <br><br>
Cyclo(DDIKWEKKIAQFRKPG)    


### **Objective:**
Now that we have a peptide candidates we must confirm that they are able to inhibit the protein-protein complex. 

#### **Step 1: Generate 3-D structures using AlphaFold**
1. Go to Alphafold server: https://alphafoldserver.com/fold/7c80de223b02907d
2. Continue with Google and accept the terms. 
3. Provide protein sequence: Cyclo(DDIKWEKKIAQFRKPG) 
4. Continue to preview job and submit.
5. Once the job is completed you will see the date and time and click that and use the download option to download the zip folder.
6. Within the downloaded folder named fold-xxxx different structures are downloaded and are named model 0, model 1, etc.You can check the quality of the structure using a Ramachandran Map.
<details>
  <summary>Click to see protein</summary>
    <mark> Put image of output of Alphafold </mark>
</details>  
#### **Step 2: Dock the peptide to CD58 and observe the binding.**
1. Create receptor file <br>
    `PyMol>fetch 1QA9`<br>
    `PyMol>delete chain A`<br>
    `PyMol>delete chain C`<br>
    `PyMol>delete chain D`<br>
    `PyMol>File>Export Molecule>Save>CD58receptor.pdb`<br>
2. Grid Creation<br>
    * Load and Save files for Autodock<br>
    `Autodock>File>Read Molecule>CD58receptor.pdb`<br>
    `Autodock>Grid >Select Molecule>Save Molecule>CD58receptor.pdbqt`<br><br>
    * Create Gridbox <br>
    `Autodock>Grid>Set Map Types>Directly>Accept`<br>
    `Autodock>Grid>Gridbox` <br>
    Create the grid box around the entire protein and save as CD58receptor.gpf<br><br>
    * Preparing The Ligand <br>
    `Autodock>Ligand>Input>Open>pepligand1.pdb` <br>
    ` Autodock>Ligand>Torsion Tree>Choose torsions` <br>
    Save the file as pepligand1.pdbqt <br>
    `Autodock>Grid>Macromolecule>Open CD58receptor.pdbqt` <br>
    A message with keep the charges or use new charges will show, keep the old charges.<br>
    `Autodock>Grid>Open gpf>Select CD58receptor.gpf <br><br>
    * Calculating Grid <br>
    `Autodock>Run>Run Autogrid>Select autogrod4` <br>
    input file CD58receptor.gpf<br>
    output is automatically selected as hivreceptor.glg<br>
    Launch  Autogrid job (Can take between 1-30 minutes)<br><br>
   
3. Docking <br>
     `AutodockDocking>Macromolecule>Set Rigid Filename CD2receptor.pdbqt`<br>
     `Autodock>Docking>Ligand>Open pepligand1.pdbqt`<br> 
     Set search parameters: 10 GA runs and 2.5 million max evaluations <br>
     `Autodock>Docking>Output>Lamarckian GA`<br>
     Save the file as CD58receptordock.dpf  (dpf-dockign parameter file)<br>
     Run Autodock selecting autodock4, parameter file CD58receptordock.dpf, output file CD58receptordock.dlg and launch
     
#### **Step 3: Docking Analysis**
1. Open the docking output file CD58receptordock.dlg
2. Open the CD58receptor.pdbqt file
3. `Autodock>Analyse>Conformation>Play, ranked by energy`
4. Select Show Info and then by playing or enter 1,2,3 ... low energy docked conformations can be displayed one at a time
5. Write pdbqt file pep1ligdock1.pdbqt 

#### **Step 4: Create control peptide** <br>
Going forward we will evaluate the pepties inactivation ability. However, for these experiments a control is needed, to generate the control a random sequence shuffle will be used. Execute the code cell below to generate a random sequence.


In [2]:
import random

def generate_shuffled_sequences(input_string, n_sequences=5):
    # Convert string to list of characters
    char_list = list(input_string)
    shuffled_sequences = set()  # Use set to ensure uniqueness
    max_attempts = n_sequences * 10  # Prevent infinite loop
    attempts = 0

    # Add original sequence to prevent it from appearing in shuffled versions
    shuffled_sequences.add(input_string)

    while len(shuffled_sequences) < n_sequences + 1 and attempts < max_attempts:
        # Create a copy of the character list and shuffle it
        temp_list = char_list.copy()
        random.shuffle(temp_list)
        shuffled = ''.join(temp_list)
        shuffled_sequences.add(shuffled)
        attempts += 1

    # Remove the original sequence
    shuffled_sequences.remove(input_string)

    # Convert to list and return only n sequences
    return list(shuffled_sequences)[:n_sequences]

original_sequence = "DFKNLRPVWY"
n = 1
shuffled_sequences = generate_shuffled_sequences(original_sequence, n)

# Print results
print(f"Original sequence: {original_sequence}\n")
print("Shuffled sequences:")
for i, sequence in enumerate(shuffled_sequences, 1):
    print(f"Variant {i}: {sequence}")


Original sequence: DFKNLRPVWY

Shuffled sequences:
Variant 1: YPRVFNWLKD


#### **Step 5: Evaluate the peptides ability to inhibity the protein-protein complex**
Here, we demonstrate cell adhesion assay for inhibition of CD2 and CD58 interaction by designed peptides. In this assay, adhesion between T cells and epithelial cells or Human Fibroblast-Like Synoviocytes (HFLS) -rheumatoid arthritis cells is monitored. Inhibition of adhesion of these cells is evaluated for the pharmacological activity of designed molecules.  T cells are labeled with fluorescent labels and HFLS-RA cells or epithelial cells are coated on the plates. When T cells are added to HFLS-RA or epithelial cells, they adhere to one another. If designed peptides are added and they are inhibitors, the adhesion between these two cells will be reduced. This reduction in the cell adhesion is monitored by fluorescence intensity. The data provided is log concentration of the peptide and percentage of cell adhesion inhibition. Plot the graph and obtain the IC50, and compare the IC50 of two compounds. 




In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV files
df1 = pd.read_csv('data/Plot3.2.1Data.csv')
df2 = pd.read_csv('data/Plot3.2.2Data.csv')

# Create a figure with two subplots side by side
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Inhibition by 1
ax1.scatter(df1['Log con'], df1['Inhibition activity'], label = 'Peptide A')
ax1.set_xlabel('Log(Concentration)')
ax1.set_ylabel('Relative % of Cell Adhesion Inhibition')
ax1.set_title('Cell Adhesion Inhibition vs log(Concentration)')
ax1.legend()
ax1.grid(True)

# Plot 2: Inhibition by 1
ax2.scatter(df2['log Con'], df2['Inhibition actiity'], label='Control Peptide')
ax2.set_xlabel('Log(Concentration)')
ax2.set_ylabel('Relative % of Cell Adhesion Inhibition')
ax2.set_title('Cell Adhesion Inhibition vs log(Concentration)')
ax2.legend()
ax2.grid(True)

# Adjust the spacing between subplots
plt.tight_layout()

# Show the plots
plt.show()

#### **Step 6: Identify key Amino Acids via Alanine Scanning**
Execute the code cell below to generate the sequences needed for alanine scanning.

In [None]:
# Alanine screening
def alanine_scan(peptide_sequence):
    """
    Perform alanine scanning mutagenesis on a peptide sequence

    Args:
        peptide_sequence (str): Original peptide sequence

    Returns:
        list: List of peptide sequences with each amino acid replaced by Alanine
    """
    # List to store all alanine-substituted sequences
    alanine_substitutions = []

    # Iterate through each position in the peptide
    for position in range(len(peptide_sequence)):
        # Create a list from the original sequence
        mutated_sequence = list(peptide_sequence)

        # Replace the amino acid at current position with Alanine
        mutated_sequence[position] = 'A'

        # Convert back to string
        alanine_substitution = ''.join(mutated_sequence)

        # Add to list of substitutions
        alanine_substitutions.append({
            'position': position + 1,  # 1-based indexing
            'original_aa': peptide_sequence[position],
            'substituted_sequence': alanine_substitution
        })

    return alanine_substitutions
example = alanine_scan("DDIKWEKKIAQFRKPG")
for i in example:
    print(i)

| Peptide sequence   | Cell adhesion inhibition data IC50 µM |
|--------------------|---------------------------------------|
| DDIKWEKKIAQFRKPG   | 0.5                                   |
| ADIKWEKKIAQFRKPG   | >50                                   |
| DAIKWEKKIAQFRKPG   | 0.2                                   |
| DDAKWEKKIAQFRKPG   | >50                                   |
| DDIAWEKKIAQFRKPG   | >50                                   |
| DDIKAEKKIAQFRKPG   | >50                                   |
| DDIKWAKKIAQFRKPG   | >50                                   |
| DDIKWEAKIAQFRKPG   | >50                                   |
| DDIKWEKAIAQFRKPG   | 5                                     |
| DDIKWEKKAAQFRKPG   | 2                                     |
| DDIKWEKKIAQFRKPG   | 0.5                                   |
| DDIKWEKKIAAFRKPG   | 0.6                                   |
| DDIKWEKKIAQARKPG   | 5                                     |
| DDIKWEKKIAQFAKPG   | 1                                     |
| DDIKWEKKIAQFRAPG   | 0.5                                   |
| DDIKWEKKIAQFRKAG   | 1                                     |
| DDIKWEKKIAQFRKPA   | 0.8                                   |

Alanine Scanning Outcome:
* Replacement of N-terminal amino acids with Ala results in loss of activity.
* Removal of proline for beta turn also results in slight loss of activity.
* Replacement of second amino acid Asp by alanine results in an increase in activity
* C-terminal amino acid residues (except proline) do not contribute significantly to the activity.


**Activity Outcome:** <br>
In this activity, we designed peptides to inhibit protein-protein interactions of CD2 and CD58 based on the 3D structures and mutation analysis data of two important proteins in the immune response.  Starting from the interface residues that are important in binding, we identified beta-sheet secondary structure peptides and applied conformational constraints to the peptide using Pro-Gly and Pro-Pro sequences. The cyclic peptide designed exhibited inhibition activity with IC50 of 0.5 µM. Further analysis was done using alanine scanning, and the most potent peptide was designed.

---------------
## 🌟 **Activity 3: Obtain the 3D Structure and Secondary Structure Information of a Peptide Using Google Colab**


### **Objective:** Obtain and analyze the protein with the sequence: <br>
RDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRA  

#### **Step 1: Obtain PDB File**<br>
1. Go to Alphafold server: https://alphafoldserver.com/fold/7c80de223b02907d
2. Continue with Google and accept the terms. 
3. Provide protein sequence: Cyclo(DDIKWEKKIAQFRKPG) 
4. Continue to preview job and submit.
5. Once the job is completed you will see the date and time and click that and use the download option to download the zip folder.
6. Within the downloaded folder named fold-xxxx different structures are downloaded and are named model 0, model 1, etc.You can check the quality of the structure using a Ramachandran Map.
<details>
  <summary>Click to see protein</summary>
    <center><img src="images/alphafold_activity3.png" width=400 /></center><br><br>
</details>  


-----------
## 🌟 **Activity 4: Peptide Design and Evaluation**

### **Objective**:
In activity 1 you obtained the 3-D structure of a peptide. However, this peptide is too large and we need to shorten it using the screening techniques we have learned.
#### **Step 1: Peptide Fragmentation**
Original Sequence: RDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRA  

Below is an example code that creates three peptide fragments. 

In [None]:
# Peptide Fragmentation
def generate_overlapping_peptides(parent_peptide, n_peptides):
    peptide_length = len(parent_peptide)

    # Calculate fragment size and overlap size
    # Fragment size needs to be large enough to accommodate overlaps
    fragment_size = (peptide_length + (n_peptides - 1)) // n_peptides
    overlap_size = (fragment_size * n_peptides - peptide_length) // (n_peptides - 1)

    peptides = []

    for i in range(n_peptides):
        # Calculate start and end positions for each fragment
        start = max(0, i * (fragment_size - overlap_size))
        end = min(peptide_length, start + fragment_size)

        fragment = parent_peptide[start:end]
        peptides.append(fragment)

    return peptides

# Peptide Sequence
parent_peptide = "APLLRTYWESDFGKNVVQEATRDDFYILLNPGTKLLT"
n_peptides = 3

fragments = generate_overlapping_peptides(parent_peptide, n_peptides)
# Visualize overlaps
print("\nOverlap visualization:")
positions = []
for i, peptide in enumerate(fragments):
    start = parent_peptide.find(peptide)
    padding = " " * start
    print(f"Fragment {i+1}: {padding}{peptide}")

#### **Step 2: Selecting Peptide Fragment**
We are going to compare the following three peptide fragments:<br>
Peptide 1: RDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCA <br>
Peptide 2: KIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGAT<br>              
Peptide 3: ESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRA   <br><br>

At this stage, you have to develop an assay to evaluate the activity of the peptides designed. It should be a simple, rapid, and reliable assay. Example: If you are designing a peptide for anticancer activity, develop an assay or use the existing assay of antiproliferative activity using cell culture. How the peptide kills the cells or reduces the growth of cells will be evaluated, and the IC50 value will be determined. The original protein was taken from EGFR which is involved in cell growth and development. We look for the ability of peptides to reduce the growth of cancer cells. TypicallyCellTiter-Glo assay (Tolliday N. Curr Protoc Chem Biol 2010 Sep 1;2(3):153-61. doi: 10.1002/9780470559277.ch100045) OR MTT (3-[4,5-dimethylthiazol-2-yl]-2,5 diphenyl tetrazolium bromide) assay is used (Merloo et al. Methods Mol Biol. 2011:731:237-45. doi: 10.1007/978-1-61779-080-5_20). <br><br>

The following hypothetical values are obtained from the cell growth inhibition assay on cancer cells (Also called antiproliferative activity).

| Peptide      | Antiproliferative activity IC50 (µM) |
|--------------|--------------------------------------|
| Peptide 1    | 1000                                 |
| Peptide 2    | 800                                  |
| Peptide 3    | 5000                                 |

The activity of peptide 2 is at a lower concentration. By doing two more peptide fragmentation steps we can end up with the sequence:
CKDTCPPLMLYNPTTYQM


#### **Step 3: Terminal Truncation**
Now that we have obtained a shorter peptide using peptide fragmentation we will now shorten it using terminal truncation. The code below generates the sequences for terminal truncation from both the N-terminus and C-terminus.

In [None]:
### Terminal truncation
def terminal_truncation(sequence):
    n_terminal = []
    c_terminal = []

    # N-terminal truncation
    for i in range(len(sequence)):
        truncated = sequence[i:]
        n_terminal.append(f"{''.join(['-']*i)}{truncated}")

    # C-terminal truncation
    for i in range(len(sequence)):
        truncated = sequence[:-i] if i > 0 else sequence
        c_terminal.append(f"{truncated}{''.join(['-']*i)}")

    return {
        "original": sequence,
        "n_terminal": n_terminal,
        "c_terminal": c_terminal
    }

# Example usage
sequence = "CKDTCPPLMLYNPTTYQM"
results = terminal_truncation(sequence)

# Print results
print(f"Original sequence: {results['original']}\n")
print("N-terminal truncation (first 3):")
for seq in results["n_terminal"][:3]:
    print(seq)
print("\nC-terminal truncation (first 3):")
for seq in results["c_terminal"][:3]:
    print(seq)

#### **Step 4: Selecting Smallest Peptide**
To select the smallest peptide fragment as our drug canididate we can carry out peptide fragmentation and alanine scanning.

Peptide Fragmentation Assay Results:
| Peptide sequence                | IC50 (µM) in cancer cells MTT assay |
|---------------------------------|-------------------------------------|
| CKDTCPPLMLYNPTTYQM              | 45                                 |
| KDTCPPLMLYNPTTYQM               | 250                                |
| DTCPPLMLYNPTTYQM                | 300                                |
| TCPPLMLYNPTTYQM                 | 320                                |
| CPPLMLYNPTTYQM                  | 500                                |
| PPLMLYNPTTYQM                   | 600                                |
| PLMLYNPTTYQM                    | 800                                |
| LMLYNPTTYQM                     | 850                                |
| MLYNPTTYQM                      | 800                                |
| LYNPTTYQM                       | > 1 mm                             |
|                                 |                                    |
| PTTYQM                          | >1 mM                              |
| CKDTCPPLMLYNPTTYQM              | 45                                 |
| CKDTCPPLMLYNPTTYQ               | 45                                 |
| CKDTCPPLMLYNPTTY                | 50                                 |
| CKDTCPPLMLYNPTT                 | 45                                 |
| CKDTCPPLMLYNPT                  | 45                                 |
| CKDTCPPLMLYNP                   | 45                                 |
| CKDTCPPLMLYN                    | 55                                 |
| CKDTCPPLML                      | 60                                 |
| CKDTCPPL                        | 75                                 |
| CKDTCPP                         | 100                                |
| CKDTC                           | 150                                |
| Cyclic(CKDTCPPLMLYNPTTYQM)      | 35                                 |

Alanine Scanning Assay Results:

| Peptide sequence      | IC50 (µM) in cancer cells MTT assay |
|----------------------|-------------------------------------|
| CKDTCPPLMLYNPTTYQM    | 45                                 |
| AKDTCPPLMLYNPTTYQM    | 200                                |
| CADTCPPLMLYNPTTYQM    | 100                                |
| CKATCPPLMLYNPTTYQM    | 80                                 |
| CKDACPPLMLYNPTTYQM    | 90                                 |
| CKDTAPPLMLYNPTTYQM    | 300                                |
| CKDTCAPLMLYNPTTYQM    | 500                                |
| CKDTCPALMLYNPTTYQM    | 400                                |
| CKDTCPPAMLYNPTTYQM    | 100                                |
| CKDTCPPLALYNPTTYQM    | 150                                |
| CKDTCPPLMAYNPTTYQM    | 150                                |
| CKDTCPPLMLANPTTYQM    | 450                                |
| CKDTCPPLMLYAPTTYQM    | 50                                 |
| CKDTCPPLMLYNATTYQM    | 50                                 |
| CKDTCPPLMLYNPATYQM    | 45                                 |
| CKDTCPPLMLYNPTAYQM    | 45                                 |
| CKDTCPPLMLYNPTTAQM    | 45                                 |
| CKDTCPPLMLYNPTTYQA    | 45                                 |

Observations
1. From peptide truncation experiments, the IC50 values indicated that when we remove N-terminal part of the sequence of the peptide (Peptides b to l)  activity of the peptide was reduced (lower the value of IC50, higher the activity).  
2. When C-terminal amino acids were removed (from n to s) , activity was not changed significantly. 
3. N-terminal part of the peptide is important for activity.
4. The peptide has two Cys residues that might form a disulfide bond. When Cys was removed, activity was reduced. Disulfide bond may be important for activity.
5. The peptide has two prolines. Removal of prolines reduced the activity. Pro-Pro sequence in peptides introduce beta turns. Beta turn might be important for activity.

From alanine scanning

1. When Cys was replaced with Ala, activity was reduced.
2. When Pro-Pro sequence was replaced with Ala, activity was reduced.
3. When C-terminal amino acids were replaced with Ala activity was not affected.
4. Cyclization of the peptide (peptide x, in truncation Table)  improved the activity

Based on these observations one can choose the peptide sequence CKDTCPPLMLYNP for further modification.

At this stage, you can introduce conformational constraints such as secondary structure to the structure of the peptide using different functional groups or replacing the amino acids and cycling the peptide. This is illustrated in the example of CD2-CD58 peptide design described in this module.


------------------------
# 📖 **Submodule 3 QUIZ**

In [None]:
#Render Quiz: Q1
from IPython.display import IFrame
IFrame('quiz/submodule3_quiz.html', width=1000, height=1000)

---------------
## **Conclusions**
This module provided a step-by-step guide to help you design a peptide for therapeutic purposes based on the knowledge of teh biochemical pathways. Dissecting the peptide and developign assay for evaluation of peptide and choosing the amino acid residues responsibel for activity usign alanine scanning and truncation approach. Further, once a sequence of peptide is choosen, how conformational constraints can be imposed on teh peptide usign cyclization and determine the possible peptide 3D structure usign google colab prediction. After completing this module and assignment, you should be able to visualize and protein-protein interaction and propose a possible peptide-based lead compound for synthesis and evaluation of pharmacological evaluation fo activity.

## **Clean Up**
<div class="alert alert-block alert-warning"> <b>Attention:</b> Remember to shutdown VM and delete any relevant resources</a>. </div>