# Cortex Compliance AI - Fine-Tuning with TinyLlama

Uses TinyLlama (1.1B) - simple setup, no complex dependencies.

## Steps:
1. Runtime ‚Üí **T4 GPU**
2. Run all cells in order

In [None]:
# Step 1: Clean install (removes problematic packages)
import subprocess
import sys

# Remove bitsandbytes completely
subprocess.run([sys.executable, "-m", "pip", "uninstall", "-y", "bitsandbytes", "peft"], capture_output=True)
subprocess.run(["rm", "-rf", "/usr/local/lib/python3.12/dist-packages/bitsandbytes"], capture_output=True)

# Install only what we need
!pip install -q transformers==4.36.0 accelerate datasets huggingface_hub sentencepiece

print("‚úÖ Done! Continue to Step 2")

In [None]:
# Step 2: Login to Hugging Face
from huggingface_hub import notebook_login
notebook_login()

In [None]:
# Step 3: Load dataset from HuggingFace (already uploaded!)
from datasets import load_dataset

dataset = load_dataset("maaninder/cortex-compliance-data", split="train")
print(f"‚úÖ Loaded {len(dataset)} training examples from HuggingFace")
print(dataset[0]["text"][:200] + "...")

In [None]:
# Step 4: Load TinyLlama (1.1B - fits easily on T4)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
)

# Freeze most layers, only train last 2 layers (simple alternative to LoRA)
for param in model.parameters():
    param.requires_grad = False
for param in model.model.layers[-2:].parameters():
    param.requires_grad = True
for param in model.lm_head.parameters():
    param.requires_grad = True

trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"‚úÖ TinyLlama loaded")
print(f"Trainable: {trainable:,} / {total:,} ({100*trainable/total:.2f}%)")

In [None]:
# Step 5: Train with basic Trainer
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

# Tokenize dataset
def tokenize(example):
    return tokenizer(example["text"], truncation=True, max_length=512, padding="max_length")

tokenized = dataset.map(tokenize, remove_columns=["text"])

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./output",
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        num_train_epochs=3,
        learning_rate=5e-5,
        fp16=True,
        logging_steps=10,
        save_strategy="no",
        report_to="none",
    ),
    train_dataset=tokenized,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

print("üöÄ Training...")
trainer.train()
print("‚úÖ Done!")

In [None]:
# Step 6: Save to HuggingFace
from huggingface_hub import notebook_login, HfApi
notebook_login()

REPO = "maaninder/cortex-compliance-tinyllama"
model.push_to_hub(REPO)
tokenizer.push_to_hub(REPO)
print(f"‚úÖ Saved to: https://huggingface.co/{REPO}")

In [None]:
# Step 7: Test
prompt = "### Instruction:\nGenerate a contract for –û–û–û –¢–µ—Å—Ç\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))

## Done!

Model: https://huggingface.co/maaninder/cortex-compliance-tinyllama

**Simple approach:**
- TinyLlama 1.1B (no quantization needed)
- No bitsandbytes/peft/triton
- Just freeze layers + fine-tune