# MÍMIR: Target-Conditioned Peptide Design

Train the MÍMIR model on Google Colab using ESM-3 and LoRA for target-specific peptide generation.

In [None]:
# Check GPU availability
# Look for 'Tesla T4' (Free Tier, 16GB VRAM) or 'A100' (Pro, 40GB+ VRAM).
# If you have a T4, you might need to reduce batch_size if you hit OOM.
!nvidia-smi

## 1. Clone Repository & Install Dependencies
We clone the MÍMIR repository and install the specific dependencies required for ESM-3 and LoRA fine-tuning.

In [None]:
# Repository URL
REPO_URL = "https://github.com/pmall/mimir.git"

!git clone $REPO_URL
%cd mimir

# Install dependencies from pyproject.toml
!pip install -e . --upgrade

## 2. Authenticate with Hugging Face
ESM-3 weights are gated. You must authenticate to download them. Ensure your token has access to `evolutionaryscale/esm3-sm-open-v1`.

In [None]:
!hf auth login

## 3. Model Setup & Verification
We download the ESM-3 weights and perform a quick load test to ensure the environment is correctly configured before starting the heavy training loop.

In [None]:
# Download ESM-3 weights
!python scripts/download_weights.py

# Verify successful load
import esm
from esm.models.esm3 import ESM3
import torch
import gc

print(f"ESM Version: {esm.__version__}")
try:
    # Load model to verify weights are present
    model = ESM3.from_pretrained("esm3_sm_open_v1")
    print("✅ ESM-3 model loaded successfully.")
    
    # CRITICAL: Delete model and clear cache to free GPU memory for the training script
    del model
    gc.collect()
    torch.cuda.empty_cache()
    print("🧹 GPU memory cleared. Ready for training.")
except Exception as e:
    print(f"❌ Failed to load ESM-3: {e}")

## 4. Upload Dataset
Upload your `mapping_dataset.csv` file containing the peptide-target pairs. This file is critical for training.

In [None]:
from google.colab import files
import os
import shutil

os.makedirs('data', exist_ok=True)

print("Please upload your 'mapping_dataset.csv' file:")
uploaded = files.upload()

for filename in uploaded.keys():
    print(f'Received file "{filename}"')
    # We rename it to mapping_dataset.csv for the training script
    target_path = 'data/mapping_dataset.csv'
    shutil.move(filename, target_path)
    print(f"Moved {filename} to {target_path}")
    break

## 5. Fine-Tune ESM-3

We now commence training. 

### Training Configuration
- **`epochs`**: **100**. Target run length for this session.
- **`batch_size`**: **64**. Adjusted for Colab T4 stability.
- **`masking_boost_ratio`**: **0.5**. Boosts gradient for difficult samples.
- **`lr`**: **1e-4**. Standard LoRA learning rate.


In [None]:
!python scripts/train.py --epochs 100 --batch_size 64 --masking_boost_ratio 0.5 --lr 1e-4 --dataset data/mapping_dataset.csv

## 6. Download Model
Download the **best** model (lowest average true loss) saved during training.

In [None]:
# Zip the BEST model
!zip -r mimir_best_model.zip checkpoints/best_model/

from google.colab import files
files.download('mimir_best_model.zip')