# MÍMIR: Target-Conditioned Peptide Design

Train the MÍMIR model on Google Colab using ESM-3 and LoRA for target-specific peptide generation.

In [None]:
# Check GPU availability
# Look for 'Tesla T4' (Free Tier, 16GB VRAM) or 'A100' (Pro, 40GB+ VRAM).
# If you have a T4, you might need to reduce batch_size if you hit OOM.
!nvidia-smi

## 1. Clone Repository & Install Dependencies
We clone the MÍMIR repository and install the specific dependencies required for ESM-3 and LoRA fine-tuning.

In [None]:
# Repository URL
REPO_URL = "https://github.com/pmall/mimir.git"

!git clone $REPO_URL
%cd mimir

# Install dependencies from pyproject.toml
# Dependencies for database (psycopg2) and env loading (python-dotenv) are removed as we use manual dataset upload.
!pip install biopython>=1.86 esm>=3.2.1.post1 peft>=0.18.1 torch>=2.5.1 httpx>=0.28.1 huggingface_hub>=0.36.0 transformers>=4.48.1 accelerate>=1.12.0 bitsandbytes>=0.45.0 --upgrade

## 2. Authenticate with Hugging Face
ESM-3 weights are gated. You must authenticate to download them. Ensure your token has access to `evolutionaryscale/esm3-sm-open-v1`.

In [None]:
!hf auth login

## 3. Model Setup & Verification
We download the ESM-3 weights and perform a quick load test to ensure the environment is correctly configured before starting the heavy training loop.

In [None]:
# Download ESM-3 weights
!python scripts/download_weights.py

# Verify successful load
import esm
from esm.models.esm3 import ESM3

print(f"ESM Version: {esm.__version__}")
try:
    model = ESM3.from_pretrained("esm3_sm_open_v1")
    print("✅ ESM-3 model loaded successfully. Ready for training.")
except Exception as e:
    print(f"❌ Failed to load ESM-3: {e}")

## 4. Upload Dataset
Upload your `dataset.csv` file containing the peptide-target pairs. This file is critical for training.

In [None]:
from google.colab import files
import os
import shutil

os.makedirs('data', exist_ok=True)

print("Please upload your 'dataset.csv' file:")
uploaded = files.upload()

for filename in uploaded.keys():
    print(f'Received file "{filename}"')
    target_path = 'data/dataset.csv'
    shutil.move(filename, target_path)
    print(f"Moved {filename} to {target_path}")
    break

## 5. Fine-Tune ESM-3

We now commence training. 

### Hyperparameters
- **`epochs`**: **500**. Provides robust coverage for the combinatorial space.
- **`batch_size`**: **256**. Since our peptides are short (<20 amino acids), we can use a large batch size even on T4 GPUs for efficient training.
- **`lr`**: **1e-4**. Standard LoRA learning rate.
- **`masking_boost_ratio`**: **0.5**. Boosts gradient for difficult, heavily masked samples.

In [None]:
!python scripts/train.py --epochs 500 --batch_size 256 --lr 1e-4 --masking_boost_ratio 0.5