# GenAI √ó Drug Discovery ‚Äî Module 0.2 ‚Äî Bio basics

This Colab notebook is for beginners.  
You‚Äôll learn the minimum biology intuition needed to understand how drugs work:

- **Proteins** (what they are and why they matter)
- **Domains** (protein ‚Äúsub‚Äëmodules‚Äù)
- **Active sites** (where chemistry or binding happens)
- **Kinetics**: **Km**, **Ki**, **IC50** (the most common numbers in drug discovery)

> üéØ Goal: By the end, you can explain what a protein target is, what an active site means,
> and interpret Km / Ki / IC50 at a high level.

## 0) Setup
We‚Äôll use:
- **matplotlib + numpy** for simple plots (kinetics + inhibition curves)
- **py3Dmol** (optional) to show a real protein structure in 3D

If you ever see an install error:
- `Runtime > Restart runtime`, then re-run the setup cell.

In [None]:
!pip -q install numpy matplotlib py3Dmol requests

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import requests
import py3Dmol

plt.rcParams["figure.dpi"] = 120

# 1) Proteins: the ‚Äúmachines‚Äù in your cells

A **protein** is a large biological molecule made of a chain of building blocks called **amino acids**.
When the chain folds, it becomes a 3D shape that can **do jobs**.

Think of proteins as **tiny machines** that can:
- speed up reactions (enzymes)
- send signals (receptors)
- transport things (channels/transporters)
- hold structures together (structural proteins)

### Drug discovery in one sentence
A drug often works by **binding to a protein** and changing what it does:
- blocking it
- activating it
- stabilizing a certain shape

‚úÖ **Big intuition:** proteins are **targets**, drugs are **binders/modulators**.

### Show a real protein in 3D
We‚Äôll download a small protein structure from the PDB (Protein Data Bank) and display it.

- The PDB code **1UBQ** is ubiquitin (a common demo structure).
- You can swap the code later to show other proteins.

In [None]:
def fetch_pdb(pdb_id="1UBQ"):
    url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
    r = requests.get(url, timeout=20)
    r.raise_for_status()
    return r.text

def show_protein_3d(pdb_text, style="cartoon"):
    view = py3Dmol.view(width=650, height=420)
    view.addModel(pdb_text, "pdb")
    if style == "cartoon":
        view.setStyle({"cartoon": {}})
    else:
        view.setStyle({"stick": {}})
    view.zoomTo()
    return view

pdb_text = fetch_pdb("1UBQ")
show_protein_3d(pdb_text, style="cartoon")

# 2) Domains: protein ‚Äúmodules‚Äù

Many proteins are not one single chunk.  
They‚Äôre built from **domains**‚Äîlike LEGO modules‚Äîeach with a job.

Examples of domain jobs:
- binding DNA or RNA
- binding a small molecule (a drug)
- acting as a switch (turning signaling on/off)
- catalyzing a reaction (enzyme domain)

‚úÖ **Big intuition:** a protein can be a **multi‚Äëtool** with different parts doing different things.

**Why this matters for GenAI:**  
When you see ‚Äúprotein target‚Äù, it might mean:
- the whole protein
- a specific **domain**
- or even a smaller **active site** pocket inside a domain

# 3) Active sites: where action happens

An **active site** is the specific region of a protein where:
- a reaction happens (for enzymes), or
- binding happens (for receptors/enzymes alike)

A useful picture:
- Protein = a 3D object
- Active site = a **pocket** or **groove**
- Drug = a **key** that fits the pocket

Active sites are made of specific amino acids that:
- shape the pocket (geometry)
- provide charges or H-bonds (chemistry)

‚úÖ **Big intuition:** small molecular changes can matter because binding is **shape + chemistry matching**.

# 4) Kinetics (Km): ‚Äúhow fast does the enzyme work?‚Äù

When a protein is an **enzyme**, it converts a **substrate** into a **product**.

A common model is **Michaelis‚ÄìMenten** kinetics:
- **Vmax** = maximum speed (when enzyme is saturated)
- **Km** = substrate concentration where speed is **half of Vmax**

Interpretation (non-technical):
- Low **Km** ‚Üí enzyme reaches half-speed at **low substrate** ‚Üí it ‚Äúgrabs substrate‚Äù effectively
- High **Km** ‚Üí needs more substrate to reach half-speed

‚úÖ **Big intuition:** **Km** tells you how ‚Äúeasily‚Äù the enzyme gets going as substrate increases.

In [None]:
def michaelis_menten(S, Vmax=1.0, Km=1.0):
    return (Vmax * S) / (Km + S)

S = np.linspace(0, 10, 400)
Km_list = [0.5, 1.5, 4.0]

plt.figure()
for Km in Km_list:
    plt.plot(S, michaelis_menten(S, Vmax=1.0, Km=Km), label=f"Km={Km}")
plt.xlabel("Substrate concentration [S]")
plt.ylabel("Reaction rate v (relative)")
plt.title("Michaelis‚ÄìMenten curves: effect of Km")
plt.ylim(0, 1.05)
plt.legend()
plt.show()

### Explain the Km plot
- All curves approach the same top speed (**Vmax**) on the right.
- The curve with **lower Km** rises earlier ‚Üí half-speed happens sooner.
- The curve with **higher Km** rises later ‚Üí needs more substrate.

So **Km shifts the curve left/right**.

# 5) Inhibitors and Ki: ‚Äúhow strongly does a drug block?‚Äù

Many drugs are **inhibitors**: they reduce a protein‚Äôs activity.

**Ki** is an inhibitor‚Äôs *binding strength* number:
- Lower **Ki** means **tighter binding** ‚Üí usually stronger inhibition
- Higher **Ki** means weaker binding

‚úÖ **Big intuition:** **Ki** is a binding/affinity-like number for inhibition.

> Note: Ki is not always identical to Kd, but the ‚Äúlower = stronger‚Äù intuition is safe here.

In [None]:
def activity_vs_inhibitor(I, Ki=1.0):
    # Toy hyperbolic inhibition curve: activity fraction decreases with inhibitor.
    return 1 / (1 + I / Ki)

I = np.logspace(-3, 2, 400)  # inhibitor concentration (log scale)
Ki_list = [0.03, 0.3, 3.0]

plt.figure()
for Ki in Ki_list:
    plt.plot(I, activity_vs_inhibitor(I, Ki=Ki), label=f"Ki={Ki}")
plt.xscale("log")
plt.xlabel("Inhibitor concentration [I] (log scale)")
plt.ylabel("Remaining activity (fraction)")
plt.title("Toy inhibition curves: lower Ki = stronger inhibitor")
plt.ylim(-0.02, 1.02)
plt.legend()
plt.show()

### Explain the Ki plot
- The x-axis is **log scale** (each step is 10√ó more inhibitor).
- Curves that drop earlier are **stronger inhibitors**.
- Lower **Ki** shifts the curve **left** (less inhibitor needed to reduce activity).

# 6) IC50: ‚Äúhow much drug to cut activity in half?‚Äù

**IC50** is the inhibitor concentration where the observed activity is reduced by **50%**.

- Lower IC50 ‚Üí more potent in that assay
- Higher IC50 ‚Üí less potent

Important note:
- **IC50 depends on assay conditions**, especially substrate concentration and enzyme amount.
- **Ki** is more ‚Äúintrinsic‚Äù for binding inhibition, while IC50 is more ‚Äúmeasurement-dependent‚Äù.

‚úÖ **Big intuition:** IC50 is a **practical potency number**, not always a universal truth.

In [None]:
def hill_inhibition(I, IC50=1.0, hill=1.0):
    # Standard dose-response inhibition curve (fraction activity remaining)
    return 1 / (1 + (I/IC50)**hill)

I = np.logspace(-3, 2, 400)
IC50_list = [0.01, 0.1, 1.0]

plt.figure()
for IC50 in IC50_list:
    plt.plot(I, hill_inhibition(I, IC50=IC50, hill=1.0), label=f"IC50={IC50}")
plt.xscale("log")
plt.xlabel("Drug concentration (log scale)")
plt.ylabel("Remaining activity (fraction)")
plt.title("Dose-response curves: IC50 shifts left/right")
plt.ylim(-0.02, 1.02)
plt.legend()
plt.show()

### Read IC50 from the plot
- Find y = 0.5 (half activity).
- The x-value at that point is the **IC50**.
- Lower IC50 curves are **left-shifted** (need less drug to reach 50% inhibition).

# 7) Mini recap
If you only remember 4 things:

1) **Proteins** are 3D machines; drugs work by binding/modulating them.  
2) **Domains** are modular parts of proteins, often with specific functions.  
3) **Active sites** are pockets where binding or catalysis happens.  
4) **Km / Ki / IC50** are common ‚Äúnumbers‚Äù:
   - Km: substrate level for half-speed
   - Ki: inhibitor binding strength (lower = stronger)
   - IC50: inhibitor amount for 50% effect (assay-dependent)

# 8) Next (I will Explain)

To connect even more directly to real drug discovery and datasets, should consider:

- **Levels of protein structure**: primary ‚Üí secondary ‚Üí tertiary ‚Üí quaternary  
- **Amino acids in one slide**: hydrophobic vs polar vs charged
- **Binding affinity (Kd)** and **occupancy** (how much target is bound at a given dose)
- **EC50 vs IC50** (activation vs inhibition)
- **Hill slope / cooperativity** (why some dose-response curves are steeper)
- **Allosteric sites** (binding away from active site that still changes function)
- **Selectivity** (binding the intended target vs off-target proteins)
- **Units & log scales** (nM, ¬µM, pM; why plots are log concentration)
- **Assay types**: biochemical vs cell-based (why results differ)