# üéØ Mistral 7B Fine-Tuning for French Football Commentary

**Goal**: Fine-tune Mistral 7B to generate human-quality French football commentary

**Dataset**: 255 L'√âquipe commentary examples from CAN 2025 & Europa League

**Hardware**: Google Colab T4 GPU (15GB VRAM)

**Estimated Time**: 6-12 hours

---

## ‚öôÔ∏è Setup

**IMPORTANT**: Enable GPU Runtime
1. Runtime ‚Üí Change runtime type
2. Hardware accelerator ‚Üí T4 GPU
3. Save

In [None]:
# Step 1: Install dependencies
!pip install -q unsloth transformers datasets trl peft accelerate bitsandbytes

In [None]:
# Step 2: Mount Google Drive (to save checkpoints)
from google.colab import drive
drive.mount('/content/drive')

import os
checkpoint_dir = '/content/drive/MyDrive/mistral-commentary-checkpoints'
os.makedirs(checkpoint_dir, exist_ok=True)
print(f"‚úÖ Checkpoints will be saved to: {checkpoint_dir}")

## üìÅ Upload Training Data

Upload `mistral_training.jsonl` from your local machine:
1. Click the folder icon on the left
2. Click upload icon
3. Select `mistral_training.jsonl` (136 KB)

In [None]:
# Step 3: Load and verify training data
import json

# Load dataset
with open('mistral_training.jsonl', 'r', encoding='utf-8') as f:
    training_data = [json.loads(line) for line in f]

print(f"‚úÖ Loaded {len(training_data)} training examples")
print(f"\nüìù Sample example:")
print(json.dumps(training_data[0], indent=2, ensure_ascii=False)[:500])

In [None]:
# Step 4: Format data for Mistral
from datasets import Dataset

def format_chat(example):
    """Format messages into Mistral chat format"""
    messages = example['messages']
    
    # Mistral chat template
    text = f"""<s>[INST] {messages[0]['content']}

{messages[1]['content']} [/INST] {messages[2]['content']}</s>"""
    
    return {'text': text}

# Convert to HuggingFace Dataset
dataset = Dataset.from_list(training_data)
dataset = dataset.map(format_chat, remove_columns=['messages'])

print(f"‚úÖ Formatted {len(dataset)} examples")
print(f"\nüìù Sample formatted text:")
print(dataset[0]['text'][:300])

## ü§ñ Load Mistral 7B Model

In [None]:
# Step 5: Load Mistral 7B with 4-bit quantization
from unsloth import FastLanguageModel
import torch

max_seq_length = 512  # Sufficient for commentary
dtype = None  # Auto-detect
load_in_4bit = True  # 4-bit quantization for T4 GPU

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/mistral-7b-v0.3-bnb-4bit",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print("‚úÖ Model loaded successfully")
print(f"Model size: {model.get_memory_footprint() / 1e9:.2f} GB")

In [None]:
# Step 6: Configure LoRA (Low-Rank Adaptation)
model = FastLanguageModel.get_peft_model(
    model,
    r=128,  # High rank for capturing writing style
    lora_alpha=64,
    lora_dropout=0.05,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",  # Attention layers
        "gate_proj", "up_proj", "down_proj"      # MLP layers
    ],
    use_rslora=True,  # Rank-stabilized LoRA
    use_gradient_checkpointing="unsloth"  # Memory optimization
)

print("‚úÖ LoRA configuration applied")
print(f"Trainable parameters: {model.print_trainable_parameters()}")

## üèãÔ∏è Training Configuration

In [None]:
# Step 7: Configure training parameters
from transformers import TrainingArguments
from trl import SFTTrainer

training_args = TrainingArguments(
    # Output
    output_dir=checkpoint_dir,
    
    # Training schedule
    num_train_epochs=3,  # 3 passes over the data
    per_device_train_batch_size=2,  # Small batch for T4 GPU
    gradient_accumulation_steps=8,  # Effective batch size: 16
    
    # Learning rate
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,  # 10% warmup
    
    # Optimization
    optim="adamw_8bit",  # Memory-efficient optimizer
    fp16=True,  # Mixed precision training
    
    # Logging & checkpointing
    logging_steps=10,
    save_steps=200,  # Checkpoint every 200 steps (Colab can disconnect)
    save_total_limit=3,  # Keep only last 3 checkpoints
    
    # Evaluation
    eval_strategy="no",  # No validation set (small dataset)
)

print("‚úÖ Training configuration:")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Batch size: {training_args.per_device_train_batch_size}")
print(f"   Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"   Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"   Learning rate: {training_args.learning_rate}")
print(f"   Checkpoints: {checkpoint_dir}")

In [None]:
# Step 8: Initialize trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    args=training_args,
    max_seq_length=max_seq_length,
    dataset_text_field="text",
)

print("‚úÖ Trainer initialized")
print(f"   Training examples: {len(dataset)}")
print(f"   Steps per epoch: {len(dataset) // (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps)}")
print(f"   Total training steps: {trainer.state.max_steps}")

## üöÄ Start Training

**Expected duration**: 6-12 hours on T4 GPU

**Monitor**:
- Loss should decrease from ~2.0 to ~0.5
- Checkpoints saved every 200 steps to Google Drive
- If Colab disconnects, you can resume from the last checkpoint

In [None]:
# Step 9: Train the model!
import time

start_time = time.time()

print("üöÄ Starting training...\n")
print("=" * 70)

trainer.train()

elapsed = time.time() - start_time
print("\n" + "=" * 70)
print(f"‚úÖ Training complete!")
print(f"   Duration: {elapsed / 3600:.2f} hours")
print("=" * 70)

## üß™ Test the Model

Generate sample commentary to verify quality

In [None]:
# Step 10: Test generation
FastLanguageModel.for_inference(model)  # Enable fast inference

def generate_commentary(minute, event_type):
    """Generate football commentary"""
    
    event_type_fr = {
        'goal': 'But',
        'commentary': 'Commentaire g√©n√©ral',
        'substitution': 'Remplacement',
        'penalty': 'P√©nalty'
    }.get(event_type, event_type)
    
    prompt = f"""<s>[INST] Tu es un commentateur sportif professionnel pour L'√âquipe, sp√©cialis√© dans le football. Ton style est vif, pr√©cis, √©motionnel mais jamais sensationnaliste. Tu varies ton vocabulaire et ta structure de phrases.

G√©n√®re un commentaire de match pour:

Minute: {minute}
Type d'√©v√©nement: {event_type_fr}

Commentaire: [/INST] """
    
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=120,
        temperature=0.9,
        top_p=0.95,
        top_k=50,
        repetition_penalty=1.15,
        do_sample=True
    )
    
    generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract only the assistant's response
    commentary = generated.split('[/INST]')[-1].strip()
    
    return commentary

# Test examples
test_cases = [
    ("45'+2", "goal"),
    ("67'", "commentary"),
    ("82'", "substitution"),
]

print("üß™ Testing model...\n")
print("=" * 70)

for minute, event_type in test_cases:
    commentary = generate_commentary(minute, event_type)
    print(f"\n{minute} - {event_type}")
    print(f"‚û°Ô∏è  {commentary}")
    print("-" * 70)

## üíæ Save Model

In [None]:
# Step 11: Save the fine-tuned model
output_dir = "/content/drive/MyDrive/mistral-commentary-final"

# Save LoRA adapter
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"‚úÖ Model saved to: {output_dir}")
print(f"   Size: {sum(f.stat().st_size for f in Path(output_dir).rglob('*') if f.is_file()) / 1e6:.1f} MB")

## üì¶ Export to GGUF (Optional)

For deployment with Ollama

In [None]:
# Step 12: Merge LoRA and export to GGUF
!pip install -q llama-cpp-python

# Merge LoRA adapter into base model
merged_dir = "/content/mistral-commentary-merged"
model.save_pretrained_merged(merged_dir, tokenizer, save_method="merged_16bit")

print(f"‚úÖ Merged model saved to: {merged_dir}")

# Convert to GGUF (requires llama.cpp)
# Note: This step requires additional setup
print("\n‚ö†Ô∏è  GGUF conversion:")
print("   1. Download merged model to local machine")
print("   2. Use llama.cpp convert.py to create GGUF")
print("   3. Upload to server: scp mistral-commentary-q4.gguf root@159.223.103.16:~/models/")

## ‚úÖ Training Complete!

### Next Steps:
1. **Download model** from Google Drive: `mistral-commentary-final/`
2. **Test locally** before deploying to production
3. **Deploy to server** with A/B testing framework
4. **Monitor quality** metrics (repetition, length variance)

### Files in Google Drive:
- `mistral-commentary-checkpoints/` - Training checkpoints
- `mistral-commentary-final/` - Fine-tuned LoRA adapter
- `mistral-commentary-merged/` - Merged 16-bit model

### Expected Results:
- ‚úÖ Natural French commentary (90-95% human-like)
- ‚úÖ No repetitive phrases (repetition score < 0.25)
- ‚úÖ Varied sentence length and structure
- ‚úÖ Proper tense mixing (pr√©sent, pass√© compos√©)

---

**Training time**: ~6-12 hours on T4 GPU  
**Model size**: ~4GB (GGUF q4_k_m)  
**Dataset**: 255 L'√âquipe examples