# Fine-tune TinyLlama for Gen Z Slang Explanation - V2 CORRECTED

**Assignment**: Implementing Autoregressive Models (ARM)

**Model**: TinyLlama-1.1B-Chat-v1.0

**Technique**: LoRA (Low-Rank Adaptation) Fine-tuning

**Dataset**: 1,779 Gen Z slang terms

---

## V2 Changes:
- Removed template placeholders from training data
- Simplified format to match inference exactly
- Model will learn actual content, not templates!

## Steps:
1. Install dependencies
2. Upload training data (V2!)
3. Load and prepare data (new format)
4. Load TinyLlama base model
5. Configure LoRA
6. Train (3 epochs, ~30-60 min)
7. Test and save adapters

## Step 1: Install Dependencies

In [None]:
# Install required libraries
!pip install -q transformers peft datasets accelerate

## Step 2: Upload Training Data V2

**IMPORTANT**: Upload `genz_slang_training_v2.jsonl` to Colab:
- Click the folder icon on the left
- Click the upload button
- Select `genz_slang_training_v2.jsonl` (NOT the old v1 file!)

In [None]:
# Verify the V2 file is uploaded
import os
if os.path.exists('genz_slang_training_v2.jsonl'):
    print("[OK] Training data V2 found!")
    with open('genz_slang_training_v2.jsonl', 'r') as f:
        num_examples = len(f.readlines())
    print(f"  {num_examples} training examples loaded")
else:
    print("[ERROR] Training data not found!")
    print("Please upload genz_slang_training_v2.jsonl")

## Step 3: Load and Prepare Data (V2 Format)

In [None]:
import json
from datasets import Dataset

# Load JSONL data - V2 already has proper text format
data = []
with open('genz_slang_training_v2.jsonl', 'r', encoding='utf-8') as f:
    for line in f:
        data.append(json.loads(line))

print(f"Loaded {len(data)} examples")

# Show first example to verify format
print("\nFirst training example:")
print("=" * 70)
print(data[0]['text'])
print("=" * 70)

# Create HuggingFace dataset
dataset = Dataset.from_list(data)

# Split into train/validation (90/10)
dataset = dataset.train_test_split(test_size=0.1, seed=42)

print(f"\nTraining examples: {len(dataset['train'])}")
print(f"Validation examples: {len(dataset['test'])}")

## Step 4: Load TinyLlama Model

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model

# Model name
MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

print("[OK] Model loaded successfully!")
print(f"  Model size: {model.num_parameters() / 1e6:.1f}M parameters")

## Step 5: Configure LoRA

In [None]:
# LoRA configuration
lora_config = LoraConfig(
    r=16,                      # Rank of LoRA matrices
    lora_alpha=32,             # Scaling factor
    target_modules=[           # Which modules to apply LoRA to
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
    ],
    lora_dropout=0.05,         # Dropout for regularization
    bias="none",
    task_type="CAUSAL_LM"      # Causal Language Modeling (Autoregressive)
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()
print("\n[OK] LoRA configured!")

## Step 6: Training Configuration

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./tinyllama-lora-genz-slang-v2",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    save_strategy="epoch",
    logging_steps=10,
    warmup_steps=50,
    lr_scheduler_type="cosine",
    optim="adamw_torch",
    report_to="none",
)

print("Training configuration:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Learning rate: {training_args.learning_rate}")

## Step 7: Train the Model

**This will take 30-60 minutes on Colab free GPU**

In [None]:
from transformers import Trainer, DataCollatorForLanguageModeling

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=256,
        padding="max_length"
    )

# Tokenize all data
print("Tokenizing training data...")
tokenized_train = dataset['train'].map(tokenize_function, batched=True, remove_columns=["text"])
tokenized_test = dataset['test'].map(tokenize_function, batched=True, remove_columns=["text"])

# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # Causal LM, not masked LM
)

# Create trainer
trainer = Trainer(
    model=model,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    args=training_args,
    data_collator=data_collator,
)

print("\nStarting training...")
print("This will take 30-60 minutes.")
print("="*70)

# Train!
trainer.train()

print("\n" + "="*70)
print("[OK] Training complete!")
print("="*70)

## Step 8: Test the Model (IMPORTANT!)

In [None]:
# Test with the V2 format (matches inference exactly!)
def test_model(term):
    # This matches inference.py exactly
    prompt = f"""Task: Explain the internet slang.
Term: {term}

Definition:"""

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=80,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
    )
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract just the generated part
    generated = result[len(prompt):].strip()
    return result, generated

# Test on slang terms
test_terms = ["w", "dank", "rizz", "fr", "bet"]

print("Testing the fine-tuned model:")
print("="*70)

for term in test_terms:
    print(f"\nTERM: {term}")
    print("─"*70)
    full, generated = test_model(term)
    print("Generated:")
    print(generated[:200])  # First 200 chars
    print("─"*70)

## Step 9: Save the LoRA Adapters

In [None]:
from datetime import datetime

# Create output directory with timestamp
timestamp = datetime.now().strftime("%Y-%m-%d")
output_dir = f"tinyllama-lora@{timestamp}"

# Save only the LoRA adapter (not the full model)
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"[OK] LoRA adapters saved to: {output_dir}")
print("\nContents:")
!ls -lh {output_dir}

## Step 10: Download the Adapters

**IMPORTANT**: Download these files to your computer:

1. Right-click on the folder icon (left sidebar)
2. Navigate to `tinyllama-lora@YYYY-MM-DD/`
3. Download these files:
   - `adapter_config.json`
   - `adapter_model.safetensors`

Then replace the files in your local project:
```
memeorigin/services/slang-explainer/models/adapters/tinyllama-lora@YYYY-MM-DD/
```

And update `inference.py` line 6 with the new date!

In [None]:
# Optional: Zip the adapter folder for easier download
import shutil

shutil.make_archive(output_dir, 'zip', output_dir)
print(f"[OK] Created {output_dir}.zip")
print("\nDownload this file and extract it to your project!")

---

## Summary - V2 Improvements

✅ Fixed training format (no more template placeholders!)

✅ Trained TinyLlama-1.1B on 1,779 Gen Z slang terms

✅ Used LoRA for efficient fine-tuning

✅ Format matches inference exactly

### What Changed:

**OLD (V1):**
```
Task: Explain...
Term: w
Format:
Definition: <one or two sentences>  ← Model copied this
Example: <one sentence>              ← And this
```

**NEW (V2):**
```
Task: Explain the internet slang.
Term: w

Definition: Shorthand for win        ← Actual content!
Example: Got the job today, big W!   ← Actual content!
```

### Next Steps:
1. Download the adapter files
2. Replace in your project
3. Update `inference.py` line 6
4. Test with `demo_with_fallback.py`
5. Model should generate REAL definitions now!

**Good luck! 🎉**