# **Submodule 3.3 - Peptide-based drug design**

## **Learning Objectives:**
<mark> ADD </mark>

## **Prerequisites:** 
<mark> ADD </mark>

## **Introduction**
This section explores the systematic process of peptide-based drug design, from initial target identification to final therapeutic optimization. Using PyMOL as our primary visualization platform, along with computational tools like AlphaFold and Cluspro, we'll investigate how to analyze protein structures and design effective peptide therapeutics. Through detailed examples, you'll learn to visualize protein-peptide interactions and apply various screening techniques including terminal truncation, peptide fragmentation, alanine scanning, and sequence shuffling to optimize peptide sequences. Building on these computational and experimental approaches, we'll examine how pharmacological and antiproliferative assays are used to evaluate drug candidates, using real-world decision-making scenarios to demonstrate the progression from initial design to final drug selection. You'll learn to interpret structural data and assay results, understand structure-activity relationships, and appreciate how this information guides the development of effective peptide-based drugs.

## **Peptide-based drug design** 

As in the previous case of drug design, our first step is to identify the biochemical pathway and which biochemical pathway is responsible for disease state. Next, we identify the proteins that can be targeted to modulate the biochemical pathway. Then, if the protein is an enzyme, identify the binding site and peptide that can inhibit the binding site. In this case, you have to design a peptide that binds but does not undergo enzymatic reaction or the product is not formed (look at the section of enzyme-based drug design).  If two proteins are involved, then we need the interaction details of the two proteins.

- **Scenario 1:** Known Protein-Protein Complex Structure
    - Analyze binding surface interface between proteins
    - Design peptide inhibitors based on interaction surface
    - Focus on key binding regions

- **Scenario 2:**  Unknown 3D Structure(s)
    - Generate protein models using AlphaFold for unknown structures
    - Use Cluspro to predict protein complex formation
    - Analyze predicted interaction interfaces


## **Introduction to Assays**
Assays are standardized laboratory tests used to measure the biological or biochemical activity of a substance, particularly in drug discovery and development. They provide crucial quantitative data about how potential drug compounds interact with their targets and affect biological systems. In drug development, multiple assays are typically performed in sequence to make informed decisions. For instance, when developing a new cancer therapeutic, researchers might first conduct a binding assay to measure target affinity, followed by cell-based assays to evaluate efficacy. Consider a case where three compound candidates show similar binding affinity to a cancer target protein, but cell-based assays reveal that only one compound effectively reduces cancer cell proliferation while sparing healthy cells. This compound would then progress to more detailed pharmacological and toxicity testing, demonstrating how assay data guides critical decision-making in drug development.

### **Pharmalogical Assays**
Pharmacological assays are comprehensive, specialized tests designed to evaluate how drug compounds interact with and affect biological systems at multiple levels. These assays are crucial in early drug development for understanding not only how effectively a drug binds to its target, but also its broader biological impact, safety profile, and potential clinical viability. Through a systematic series of tests, researchers can build a complete profile of a drug's behavior, from molecular interactions to whole-system effects.

The assays include:
- <u>Binding assays</u>: Measure how well a drug binds to its target (e.g., receptors, enzymes)
- <u>Functional assays</u>: Assess the biological response triggered by drug-target interaction
- <u>ADME assays</u>: Evaluate drug absorption, distribution, metabolism, and excretion
- <u>Toxicity assays</u>: Determine potential harmful effects of drug compounds

Key parameters measured:
- Kd (dissociation constant)
- Ki (inhibition constant)
- EC50 (half maximal effective concentration)
- Bioavailability
- LD50 (lethal dose, 50%)

#### Decision-Making Using Pharmalogical Assays
Let's use an example where we are developing a new receptor antagonist for neurological disorders and three drug candidates were evaluated using multiple pharmacological assays. Compound X showed promising initial results where:

- Binding assay: Kd = 2nM (high affinity)
- Functional assay: 85% receptor inhibition
- ADME profile: 60% oral bioavailability, 8-hour half-life
- Toxicity assay: LD50 well above therapeutic dose

<mark> I think this would be more effective if we showed the results for Compound Y in the same format as above and then described why compound X is better </mark><br>
Despite Compound Y having a slightly better binding affinity (Kd = 1nM), its poor bioavailability (20%) and shorter half-life (2 hours) made Compound X the better candidate for further development. This demonstrates how integrated pharmacological data guides compound selection beyond simple target affinity.

### **Antiproliferative Activity Assay**
Antiproliferative activity assays are specialized biological tests designed to quantify a compound's ability to inhibit cell growth or division. These assays are particularly critical in cancer drug development, where the primary goal is often to selectively stop or slow the growth of cancer cells while minimizing effects on healthy cells. By employing multiple complementary methods, researchers can build a comprehensive understanding of how a compound affects cell proliferation over different time scales and through various mechanisms.

Common assays include:
- <u>MTT assay</u>: Measures cell viability through metabolic activity
- <u>BrdU assay</u>: Detects DNA synthesis in proliferating cells
- <u>Colony formation assay</u>: Evaluates long-term growth inhibition

Key parameters measured:
- IC50 (half maximal inhibitory concentration)
- Cell growth inhibition percentage
- Time-dependent effects

#### Decision-Making Using Antiproliferative Active Assay
During the development of a breast cancer drug, three compounds were evaluated using multiple antiproliferative assays. The results for Compound Z showed:

- MTT assay: IC50 = 200nM in cancer cells vs. minimal effect in normal cells
- BrdU assay: 75% reduction in DNA synthesis at 500nM
- Colony formation: Complete inhibition of colony formation at 1µM after 14 days

<mark>same comment as above</mark><br>
While another compound showed a lower IC50 in the MTT assay (150nM), it also significantly affected normal cells. Compound Z was selected for further development due to its superior selectivity for cancer cells and sustained antiproliferative effect demonstrated in the colony formation assay. This illustrates how multiple antiproliferative assays provide complementary data for better decision-making in cancer drug development.

## **Screening Techniques**
Screening techniques in peptide-based drug design are systematic methods used to optimize peptide sequences and understand structure-activity relationships. These approaches are essential for developing more effective and drug-like peptides by identifying critical structural elements and improving pharmaceutical properties.

### Terminal Truncation
Terminal truncation is a fundamental peptide optimization strategy that systematically removes amino acids from either end of a peptide sequence. This method helps identify the minimal bioactive sequence, potentially improving drug-like properties.

Key aspects include:
- Systematic N-terminal or C-terminal amino acid removal
- Analysis of activity retention after each truncation
- Identification of minimal active sequence
- Optimization of peptide length

Example:

Full sequence:     YGRKKRRQRRR<br>
N-term truncation: -GRKKRRQRRR<br>
                   --RKKRRQRRR<br>
                   ---KKRRQRRR<br>
C-term truncation: YGRKKRRQR--<br>
                   YGRKKRRQ---<br>
                   YGRKKRR----<br>
Found minimum active sequence: GRKKRRQ

#### Decision-Making Using Terminal Truncation
Consider a 15-residue peptide inhibitor:

Results:
- Full sequence shows IC50 = 100nM
- N-terminal truncation series maintains activity until residue 4
- C-terminal truncation maintains activity until residue 12
- Minimum active sequence

Based on these results we create a truncated 9-residue peptide (residues 4-12), which exhibits IC50 = 80nM. The optimized shorter peptide would be selected for further development due to maintained activity and improved synthetic feasibility.

### Peptide fragmentation
Peptide fragmentation is a systematic method that breaks down peptides into overlapping segments to identify regions crucial for biological activity. This approach helps map the functional domains within larger peptide sequences.

Key aspects include:
- Generation of overlapping peptide segments
- Evaluation of each fragment's activity
- Identification of bioactive regions
- Mapping of functional domains

Example:

Original sequence: FLPVLAQFVLL (10-residue peptide)

Fragment 1 (1-6):  FLPVLA----<br>
Fragment 2 (3-8):  --PVLAQFV--<br>
Fragment 3 (5-10): ----LAQFVLL<br>
Most active fragment identified: PVLAQFV

#### Decision-Making Using Peptide fragmentation
Consider a 20-residue peptide inhibitor:
- Full sequence shows IC50 = 50nM
- Five overlapping fragments (10 residues each) were generated
- Fragment 2 (residues 5-14) shows IC50 = 75nM

Other fragments show minimal activity, the middle fragment was identified as the primary bioactive region and selected for further optimization.

### Alanine Scanning
Alanine scanning is a precise mutational analysis technique where each amino acid is systematically replaced with alanine to determine its contribution to peptide activity.

Key aspects include:
- Sequential alanine substitution
- Activity measurement after each substitution
- Identification of critical residues
- Structure-function mapping

Example:

Original:     KLWVRIPKLL<br>
Position 1:   ALWVRIPKLL (K→A)<br>
Position 2:   KAWVRIPKLL (L→A)<br>
Position 3:   KLAVRIPKLL (W→A)<br>

#### Decision-Making Using Alanine Scanning
Consider a 10-residue peptide antagonist:
- Original sequence shows Kd = 25nM
- Alanine substitution at positions 3, 6, and 8 causes >10-fold activity loss
- Positions 1, 2, and 10 tolerate substitution

Other positions show moderate effects, the results identified three essential residues for maintaining activity, guiding further optimization efforts.

### Shuffled Sequence
Shuffled sequence analysis examines how amino acid order affects peptide activity while maintaining the same composition, helping optimize sequence arrangement. Additionally, shuffled sequences serve as valuable controls in peptide studies, as they maintain identical amino acid composition but typically lack biological activity, helping validate sequence-specific effects of the original peptide.

Key aspects include:

- Systematic sequence rearrangement
- Activity comparison
- Stability assessment
- Structure-activity correlation
- Control sequence validation

Example:

Original:DFKNLRPVWY<br> 
Variant 1: KNDFWPRVLY<br> 
Variant 2: WVPDFKNLRY<br> 
Variant 3: RPWDFKNLVY<br>
Best variant: RPWDFKNLVY (improved stability)<br> 
Control: YWVPRNFKLD (inactive scrambled sequence)

### Decision-Making Example Using Multiple Screening Techniques
During optimization of a therapeutic peptide:
1. Alanine scanning identified three critical residues
2. Terminal truncation reduced sequence from 20 to 12 residues
3. Shuffled variants explored alternative arrangements

Optimized peptide outcome:
- 2-fold improved activity
- 3-fold better stability
- Reduced synthesis costs

This comprehensive screening approach led to an optimized candidate with improved drug-like properties while maintaining biological activity.

-------------------

# 📊 Tutorials
In these tutorials, we will use the PyMOL and AutoDock to work through <u>**five**</u> applied activities to:
- Activities 1/3: Produce a peptide based drug to inibit the CD2-CD58 protein-protein interaction for Rheumatoid arthritis
- <mark> Add in text related to activity 4/5. </mark>

## Before you begin:
- Run the PyMOL GUI by following the directions provided in the Submodule 0 notebook, provided here: [start_gui-server](../submodule0_pymol_setup/start-gui-server.ipynb)
  

## 🌟 **Activity 1: Visualizing CD2-CD58 Interaction Using PyMOL**
Rheumatoid arthritis is an autoimmune disease in which our own immune system attacks cells at our joints causing inflammation and synovial membrane deformation. This results in pain, inflammation and difficulty in movement of joints. The disease is known to start with the presence of rheumatoid factor in the body. This rheumatoid factor induces production of antibodies against collagen and hence joints are attached by T cells, antibodies causing inflammation. When T cells recognize the antigen presenting cells, several adhesion molecules are responsible for generating the immune response. In the first step a protein molecule CD2 on T cells binds to CD58 on antigen presenting cells. This protein-protein interaction results in cell signaling to T cells via cytoplasmic tail of CD2 generating inflammatory cytokines and hence inflammation of joints and immune response. It is known that CD58 is highly expressed in joints in arthritis patients. If we can inhibit CD2-CD58 interactions, we can modulate the immune response, reduce the inflammation and hence progression of arthritis.

<details>
  <summary>Click to see Reference</summary>
  Wang, J.-H., Smolyar, A., Tan, K., Liu, J.-H., Kim, M., Sun, Z.J., Wagner, G., Reinherz, E.L. Structure of a Heterophilic Adhesion Complex Between the Human CD2 and CD58(LFA-3) Counter-Receptors. (1999) <i>Cell</i> 97: 791-803.  

</details>

### **Objective:** <br>
Analyze the interaction between *CD2* and *CD58 (LFA-3)* in PyMOL to explore their structural and polar interactions. In this particular case the crystal structure of the complex of CD2-CD58 is available.

### **Steps to Complete this Activity**:

#### Step 1. **Fetch the structure**:<br>
   `fetch 1QA9`

#### Step 2. **Inspect the asymmetric unit**:
1. The asymmetric unit contains *two complexes* of CD2 and CD58, and each complex consists of four subunits(*A*, *B*, *C*, and *D*).
     - Subunits *A* and *B* == CD2 and CD58 of one complex.
     - Subunits *C* and *D* == CD2 and CD58 of the second complex.

#### Step 3. **Delete one set of complexes**:
1. Remove chains *C* and *D*:<br>
     `delete chain C`<br>
     `delete chain D`

#### Step 4. **Focus on chains A and B**:
1. Select chains *A* and *B* to work with the remaining complex:<br>
<mark>command?</mark>

#### Step 5. **Identify polar contacts**:
1. Use the GUI to find polar contacts within the selection:<br>
     `A > preset > find > polar contacts within selection`

#### Step 6. **Visualize the interactions**:
1. Display the polar contacts as lines:<br>
     `S > lines`

### **Key Observation**:
<mark> Turn these into notecards - from JP I kind of thing this is better as an observation within PyMOL rather than making a flashcard</mark>
- Explore the **polar contacts** between CD2 (chain A) and CD58 (chain B) and note: 
    + The structural interface.
    + The key residues involved in the interaction.

-----------
<mark> I don't see the below as an activity as it is really just guiding the user through images and text and no interaction with tools/software. I think we either need to build in some interactivity or convert it from an activity to a guide. There was a PML script (see below) and originall there were some PyMOL commands. Seems like there is enough here to build it into a short activity.</mark>
## 🌟 **Activity 2: Identifying Peptide Drug Candidate**

### **Objective:** 
In Activity 1 we looked at the CD2-CD58 complex associated with Rheumatoid arthritis, and now we are going to identify a peptide to inhibit this complex.

It is hard to visualize many interactions.  You can look at the published paper and see the beta strands in CD2 bind to CD58 and mutation data indicates the amino acid residues that are important in binding.  To simplify the exercise, we have created a figure with CD2 and CD58 adhesion domain amino acids. Amino acids that were mutated are shown.  The size of the font and bold letters indicate the effect of mutation. Amino acids that are not in bold indicate that mutation of that residue did not have any effect on binding of CD2 and CD58 proteins.  

<center><img src="images/Arthritis.png" width=500 /></center><br><br>

Based on this data select E25 to S47 from Cd2 and K43 to K89

If we design the above peptide, it can bind to Cd58 and hence Cd58 cannot bind anymore. Thus, CD2 CD58 interation is inhibited.
But the peptide has 18 amino acids and has beta sheet structure.  We have to shorten the peptide looking at the 3D structural details.
We will select 30 to 48 and delete all the other part of the protein

<center><img src="images/large_peptide.png" width=200 /></center><br>

Now look at the secondary structure of the binding region of CD2. It has a beta sheet structure with a beta turn at the bottom of the figure above.  However, for the design, you need to include amino acids that are important in binding, but the structure should still have beta sheet structure Residues 31-37, 42 to 49

However, we end up with two separate chains. We will use conformation constraints now.  To stabilize such sheet structure in a peptide, we need to introduce beta-turn inducing amino acids. Pro-Gly and Pro-Pro sequences are known to induce beta turn. One side we can introduce Pro-Gly or Pro-Pro other side we will cyclize the peptide and obtain the following structures.

Similarly, we can choose the two other beta sheets in the CD2 structure 31-37, 84-90.

<center><img src="images/small_peptides.png" width=400 /></center><br>

Note that peptides are designed based on the 3D structure where the two beta strands were not connected but in proximity.  These strands are connected by peptide bond and beta-turn inducer Pro-Gly or Pro-Pro sequence. The direction of the sequence needs to be considered when you design the peptide. In the above example, direction of the chains is antiparallel. Hence the peptides designed will have sequence

`Cyclo(DDIKWEKKIAQFRKPG)   Pro-Gly for beta turn` <br>
`Cyclo(DDIKWEKSIYDTKGPG)` <br>
`Cyclo(DDIKWEKKIAQFRKPP)   Pro-Pro for beta turn` <br>
`Cyclo(DDIKWEKSIYDTKGPP)`

### **Alternative Method for Activity 2: Write and Load PML Script**

In [None]:
# Run PyMOL Script:
with open("pml_scripts/3.3 Cd2_Cd58_Interaction.pml", "w") as scriptout:
    # Step 1: Fetch the PDB structure of CD2-CD58 complex
    scriptout.write("fetch 1QA9, cd2_cd58, async=0\n")  # Fetch the PDB structure and name it cd2_cd58
    scriptout.write("hide everything, all\n")  # Hide all default representations
    scriptout.write("show cartoon, all\n")  # Show the structure as cartoon
    scriptout.write("zoom all\n")  # Fit the entire structure to view

    # Step 2: Remove Chains C and D
    scriptout.write("remove chain C+D\n")  # Remove chains C and D

    # Step 3: Define Chains A and B as Separate Selections
    scriptout.write("select chainA, chain A\n")  # Select chain A (CD2)
    scriptout.write("select chainB, chain B\n")  # Select chain B (CD58)
    scriptout.write("zoom chainA or chainB\n")  # Focus on chains A and B

    # Step 4: Identify and Display Polar Contacts Between Chains A and B
    scriptout.write("distance polar_contacts, chainA, chainB, 5.0\n")  # Measure distances within 5Å between chains A and B
    scriptout.write("set dash_width, 1.5\n")  # Adjust the dashed line width
    scriptout.write("color yellow, polar_contacts\n")  # Color the polar contacts yellow

    # Step 5: Highlight Key Residues in Chains A and B
    scriptout.write("select key_residues_cd2, chainA and resi 25-47\n")  # Residues 25-47 in CD2 (chain A)
    scriptout.write("select key_residues_cd58, chainB and resi 43-89\n")  # Residues 43-89 in CD58 (chain B)
    scriptout.write("show sticks, key_residues_cd2\n")  # Display CD2 residues as sticks
    scriptout.write("show sticks, key_residues_cd58\n")  # Display CD58 residues as sticks
    scriptout.write("color cyan, key_residues_cd2\n")  # Color CD2 residues cyan
    scriptout.write("color magenta, key_residues_cd58\n")  # Color CD58 residues magenta
    scriptout.write("zoom key_residues_cd2 or key_residues_cd58\n")  # Focus on key interacting residues

    # Step 6: Design a Peptide From Key CD2 Residues (30-48)
    scriptout.write("select peptide_design, chainA and resi 30-48\n")  # Select residues 30-48 from chain A
    scriptout.write("create cd2_peptide, peptide_design\n")  # Create a new object for the peptide
    scriptout.write("remove not cd2_peptide\n")  # Remove all other parts except the peptide
    scriptout.write("show cartoon, cd2_peptide\n")  # Show the peptide in cartoon representation
    scriptout.write("color green, cd2_peptide\n")  # Color the peptide green
    scriptout.write("zoom cd2_peptide\n")  # Focus on the designed peptide

    # Step 7: Save Outputs
    scriptout.write("png cd2_cd58_interaction_fixed.png, dpi=300\n")  # Save a PNG image
    scriptout.write("save cd2_cd58_interaction_session_fixed.pse\n")  # Save the session file

    # Notes for the user
    scriptout.write("# This script visualizes the CD2-CD58 interaction, measures polar contacts,\n")
    scriptout.write("# highlights key interacting residues, and designs a peptide from residues 30-48.\n")

------------
## 🌟 **Activity 3: Analyzing Peptide Candidates**

### **Objective:** 
Now that we have to peptide candidates we must confirm that they are able to inhibit the protein-protein complex.

<mark> steps 1 and 2 need a lot more detail and I don't think we should assume that step 2 can be completed simply by referring to submod 2. We should provide the step by step process specific to the given example/activity. </mark>
#### **Step 1: Generate 3-D structures using AlphaFold**

#### **Step 2: Dock the peptide to CD58 and observe the binding.** 
1. <mark> add docking steps </mark><br>
    - You can also review Submodule 2 for a docking refresher.
2. Compare the binding site of peptide to CD2 binding site.<br>
3. Ligand: use the peptide 3D structure <mark>need more details</mark><br> 
4. Receptor: Use 1QA9, separate one unit of CD58 structure, and use it as receptor pdb file <mark>need more details</mark><br>
<mark> Provide docking output </mark>

#### **Step 3: Compare the binding site of peptide to CD2 binding site.**
<mark> Add text for this step </mark>
    
#### **Step 4: Create control peptide** <br>
Going forward we will evaluate the pepties inactivation ability. However, for these experiments a control is needed, to generate the control a random sequence shuffle will be used. Execute the code cell below to generate a random sequence.
 

In [None]:
import random

def generate_shuffled_sequences(input_string, n_sequences=5):
    # Convert string to list of characters
    char_list = list(input_string)
    shuffled_sequences = set()  # Use set to ensure uniqueness
    max_attempts = n_sequences * 10  # Prevent infinite loop
    attempts = 0
    
    # Add original sequence to prevent it from appearing in shuffled versions
    shuffled_sequences.add(input_string)
    
    while len(shuffled_sequences) < n_sequences + 1 and attempts < max_attempts:
        # Create a copy of the character list and shuffle it
        temp_list = char_list.copy()
        random.shuffle(temp_list)
        shuffled = ''.join(temp_list)
        shuffled_sequences.add(shuffled)
        attempts += 1
    
    # Remove the original sequence
    shuffled_sequences.remove(input_string)
    
    # Convert to list and return only n sequences
    return list(shuffled_sequences)[:n_sequences]

original_sequence = "DFKNLRPVWY"
n = 1
shuffled_sequences = generate_shuffled_sequences(original, n)

# Print results
print(f"Original sequence: {original}\n")
print("Shuffled sequences:")
for i, sequence in enumerate(shuffled_sequences, 1):
    print(f"Variant {i}: {sequence}")


#### **Step 5: Evaluate the peptides ability to inhibity the protein-protein complex**
<mark> Add text expanding on this idea talking about how this can be done and provide an example of the results to analyze </mark>

#### **Step 6: Identify key Amino Acids via Alanine Scanning**
Execute the code cell below to generate the sequences needed for alanine scanning.

In [None]:
# Alanine screening
def alanine_scan(peptide_sequence):
    """
    Perform alanine scanning mutagenesis on a peptide sequence
    
    Args:
        peptide_sequence (str): Original peptide sequence
    
    Returns:
        list: List of peptide sequences with each amino acid replaced by Alanine
    """
    # List to store all alanine-substituted sequences
    alanine_substitutions = []
    
    # Iterate through each position in the peptide
    for position in range(len(peptide_sequence)):
        # Create a list from the original sequence
        mutated_sequence = list(peptide_sequence)
        
        # Replace the amino acid at current position with Alanine
        mutated_sequence[position] = 'A'
        
        # Convert back to string
        alanine_substitution = ''.join(mutated_sequence)
        
        # Add to list of substitutions
        alanine_substitutions.append({
            'position': position + 1,  # 1-based indexing
            'original_aa': peptide_sequence[position],
            'substituted_sequence': alanine_substitution
        })
    
    return alanine_substitutions
example = alanine_scan("ESDFG")
for i in example:
    print(i)

<mark> Add in results to analyze alanine screening</mark>

In [None]:
# Flashcard/quiz for alanine scanning results with explanation

**Experiment Outcome:** <br>
<mark> Add text overviewing activity and results </mark>

---------------
## 🌟 **Activity 4: Obtain the 3D Structure and Secondary Structure Information of a Peptide Using Google Colab**
<mark> For activity 4/5 if we picked a specific example can we tie in a story like peptide is to inhibit *** for ****. Then tie in a specific pharmalogical assay</mark><br>

### **Objective:** Obtain and analyze the protein with the sequence: <br>
<mark> Need protein sequence</mark>

#### **Step 1: Obtain PDB File**<br>
Use the alphafold server to obtain the 3-D structure, go to activity 4 in submodule 1.3 if you need a reminder of how to do this.
 
#### **Step 2: Use PyMOL to identify secondary structures**<br>
    
    
<mark> Turn these observations into flashcards. Should get the amino acid sequence for different secondary structures </mark>    
Observation
The peptide structure will include:
  - Helical structures.
  - Beta sheets.
  - Loops or unordered structures.   

Image of peptide Delete when added to flashcard<br>
<center><img src="images/peptide.png" width=400 /></center><br>


In [None]:
# Flash cards cell (One card is an image of the structure)

-----------
## 🌟 **Activity 5: Peptide Design and Evaluation**

### **Objective**:
In activity 1 you obtained the 3-D structure of a peptide. However, this peptide is too large and we need to shorten it using the screening techniques we have learned. <mark> If there is a specific receptor/condition associated talk about choice in assay/metrics to evaluate drug efficacy</mark>

#### **Step 1: Peptide Fragmentation**
Original Sequence: <mark> Add sequence Here</mark>

In [None]:
# Peptide Fragmentation
def generate_overlapping_peptides(parent_peptide, n_peptides):
    peptide_length = len(parent_peptide)
    
    # Calculate fragment size and overlap size
    # Fragment size needs to be large enough to accommodate overlaps
    fragment_size = (peptide_length + (n_peptides - 1)) // n_peptides
    overlap_size = (fragment_size * n_peptides - peptide_length) // (n_peptides - 1)
    
    peptides = []
    
    for i in range(n_peptides):
        # Calculate start and end positions for each fragment
        start = max(0, i * (fragment_size - overlap_size))
        end = min(peptide_length, start + fragment_size)
        
        fragment = parent_peptide[start:end]
        peptides.append(fragment)
    
    return peptides

# Peptide Sequence
parent_peptide = "APLLRTYWESDFGKNVVQEATRDDFYILLNPGTKLLT"
    
# Visualize overlaps
print("\nOverlap visualization:")
positions = []
for i, peptide in enumerate(fragments):
    start = parent_peptide.find(peptide)
    padding = " " * start
    print(f"Fragment {i+1}: {padding}{peptide}")

#### **Step 2: Selecting Peptide Fragment**
Now that we have the potential peptides an assay was conducted and the following activites were identified for each fragment:<br>
<mark> Add table of activity for each fragment </mark>

In [None]:
### Add in quiz or flashcard for what peptide to pick based on assay with explanation.


#### **Step 3: Terminal Truncation**
Now that we have obtained a shorter peptide using peptide fragmentation we will now shorten it using terminal truncation. The code below generates the sequences for terminal truncation from both the N-terminus and C-terminus.

In [None]:
### Terminal truncation
def terminal_truncation(sequence):
    n_terminal = []
    c_terminal = []
    
    # N-terminal truncation
    for i in range(len(sequence)):
        truncated = sequence[i:]
        n_terminal.append(f"{''.join(['-']*i)}{truncated}")
    
    # C-terminal truncation
    for i in range(len(sequence)):
        truncated = sequence[:-i] if i > 0 else sequence
        c_terminal.append(f"{truncated}{''.join(['-']*i)}")
    
    return {
        "original": sequence,
        "n_terminal": n_terminal,
        "c_terminal": c_terminal
    }

# Example usage
sequence = "YGRKKRRQRRR"
results = terminal_truncation(sequence)

# Print results
print(f"Original sequence: {results['original']}\n")
print("N-terminal truncation (first 3):")
for seq in results["n_terminal"][:3]:
    print(seq)
print("\nC-terminal truncation (first 3):")
for seq in results["c_terminal"][:3]:
    print(seq)

#### **Step 4: Selecting Smallest Peptide**
Now that we have the sequences to perform terminal truncation an assay is conducted to observe the peptides activity. Below is a table of the activity for each peptide.<br>
<mark>Add table of activity for the different peptides</mark>

In [None]:
# Flashcard for picking best peptide with explanation 

------------------------
# 📖 **Submodule 3 QUIZ**

In [2]:
#Render Quiz: Q1
from IPython.display import IFrame
IFrame('quiz/submodule3_quiz.html', width=1000, height=1000)

---------------
## **Conclusions**
<mark> ADD CONCLUSIONS </mark>

## **Clean Up**
<div class="alert alert-block alert-warning"> <b>Attention:</b> Remember to shutdown VM and delete any relevant resources</a>. </div>