## Step 1: Install Dependencies

Install the required libraries for training with LoRA on CUDA GPUs.

In [None]:
%%capture
# Install required packages (this takes ~2 minutes)
!pip install -q transformers>=4.34.0 peft>=0.5.0 datasets>=2.14.0
!pip install -q bitsandbytes>=0.41.0 accelerate>=0.23.0
!pip install -q sentencepiece protobuf

## Step 2: Verify GPU Access

Make sure you have a GPU allocated. If this shows "No GPU", go to Runtime ‚Üí Change runtime type ‚Üí Select GPU.

In [None]:
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"‚úì GPU: {gpu_name}")
    print(f"‚úì VRAM: {gpu_memory:.1f} GB")
    
    if "T4" in gpu_name:
        print("\nüìù Note: T4 is the free tier GPU. Training will take ~45-60 minutes.")
    elif "V100" in gpu_name:
        print("\nüìù Note: V100 detected. Training will take ~20-30 minutes.")
    elif "A100" in gpu_name:
        print("\nüìù Note: A100 detected. Training will take ~15-20 minutes.")
else:
    print("‚ùå No GPU detected!")
    print("Go to: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí GPU")

## Step 3: Authenticate with Hugging Face

You need a Hugging Face token to download models and datasets.

Get your token here: https://huggingface.co/settings/tokens (select "Read" access)

Then click the üîë icon on the left sidebar and add it as a secret named `HF_TOKEN`.

In [None]:
from huggingface_hub import login
from google.colab import userdata

# Get token from Colab secrets
try:
    hf_token = userdata.get('HF_TOKEN')
    login(token=hf_token)
    print("‚úì Authenticated with Hugging Face")
except Exception as e:
    print("‚ùå Failed to get HF_TOKEN from secrets.")
    print("\nPlease:")
    print("1. Get your token from https://huggingface.co/settings/tokens")
    print("2. Click the üîë icon on the left sidebar")
    print("3. Add a secret named 'HF_TOKEN' with your token")
    print("\nOr manually login:")
    !huggingface-cli login

## Step 4: Load and Format Dataset

Download the RISC-V instruction dataset and format it for training.

Each example consists of:
- **Input**: Natural language description of an operation
- **Output**: RISC-V assembly instruction

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3"
DATASET_NAME = "davidpirkl/riscv-instruction-specification"

print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print("\nLoading dataset...")
dataset = load_dataset(DATASET_NAME, split="train")
print(f"‚úì Loaded {len(dataset)} examples")

# Show a sample
print("\nüìã Sample example:")
sample = dataset[0]
print(f"Description: {sample['description']}")
print(f"Instruction: {sample['instructions']}")

In [None]:
# Format dataset with chat template
print("Formatting dataset...")

def format_example(example):
    """Convert to chat format."""
    user_content = f"Write the RISC-V assembly instruction for the following operation:\n{example['description']}"
    assistant_content = example['instructions']
    
    messages = [
        {"role": "user", "content": user_content},
        {"role": "assistant", "content": assistant_content}
    ]
    
    text = tokenizer.apply_chat_template(messages, tokenize=False)
    return {"text": text}

dataset = dataset.map(format_example, remove_columns=dataset.column_names)

# Split train/validation (90/10)
dataset = dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = dataset["train"]
valid_dataset = dataset["test"]

print(f"‚úì Training examples: {len(train_dataset)}")
print(f"‚úì Validation examples: {len(valid_dataset)}")

# Show formatted example
print("\nüìã Formatted example (first 300 chars):")
print(train_dataset[0]["text"][:300] + "...")

## Step 5: Load Model with 4-bit Quantization

Load Mistral-7B with 4-bit quantization (QLoRA) to fit in GPU memory.

**What is quantization?**
- Reduces model weights from 16-bit to 4-bit precision
- Saves ~4x memory with minimal quality loss
- Makes 7B models trainable on consumer GPUs

In [None]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

print("Configuring 4-bit quantization...")

# Quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

print("Loading model (this takes ~2 minutes on first run)...")
print("The model is ~14GB and will be cached for future runs.\n")

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

print("‚úì Model loaded with 4-bit quantization")

## Step 6: Configure LoRA Adapters

Apply LoRA (Low-Rank Adaptation) to the model.

**What is LoRA?**
- Freezes the 7.2B base parameters (they don't change)
- Adds small "adapter" matrices (only 21M parameters, 0.29% of model)
- Only trains these adapters, making fine-tuning much faster and cheaper
- At inference, adapters modify the base model's behavior on-the-fly

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

print("Preparing model for training...")
model = prepare_model_for_kbit_training(model)

print("Configuring LoRA...")

lora_config = LoraConfig(
    r=16,                           # LoRA rank
    lora_alpha=16,                  # Scaling factor
    target_modules=[                # Which layers get adapters
        "q_proj", "k_proj", "v_proj",  # Attention
        "o_proj",
        "gate_proj", "up_proj", "down_proj",  # Feed-forward
    ],
    lora_dropout=0.0,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)

# Show parameter counts
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())

print(f"\n‚úì LoRA applied")
print(f"  Trainable: {trainable:,} parameters ({100*trainable/total:.3f}%)")
print(f"  Frozen: {total-trainable:,} parameters")
print(f"  Total: {total:,} parameters")

## Step 7: Tokenize Dataset

Convert text to token IDs that the model can process.

In [None]:
from transformers import DataCollatorForLanguageModeling

def tokenize_function(examples):
    """Tokenize and prepare for training."""
    tokenized = tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding=False,
    )
    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

print("Tokenizing datasets...")
train_dataset = train_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["text"],
)

valid_dataset = valid_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["text"],
)

print("‚úì Tokenization complete")

# Data collator handles padding
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)

## Step 8: Train the Model

Train for 600 steps (~2 epochs on this dataset).

**Expected time:**
- T4 (free tier): ~45-60 minutes
- V100: ~20-30 minutes  
- A100: ~15-20 minutes

**What to watch:**
- **Loss** should decrease from ~4.0 to ~1.0
- Training will save checkpoints every 100 steps
- Evaluation runs every 50 steps

‚òï Grab a coffee while this runs!

In [None]:
from transformers import TrainingArguments, Trainer

OUTPUT_DIR = "./adapters_riscv"

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    
    # Training
    max_steps=600,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,  # Effective batch = 2
    
    # Optimization  
    learning_rate=1e-5,
    lr_scheduler_type="constant",
    warmup_steps=0,
    
    # Precision
    bf16=torch.cuda.is_bf16_supported(),
    fp16=not torch.cuda.is_bf16_supported(),
    
    # Logging & eval
    logging_steps=50,
    eval_strategy="steps",
    eval_steps=50,
    
    # Checkpointing
    save_strategy="steps",
    save_steps=100,
    save_total_limit=6,
    
    # Misc
    report_to="none",
    load_best_model_at_end=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    data_collator=data_collator,
)

print("Starting training...\n")
print("Progress will be shown below:")
print("- Step X/600: Current progress")
print("- Loss: Should decrease from ~4.0 to ~1.0")
print("- Checkpoints saved every 100 steps\n")

# Train!
trainer.train()

print("\n‚úÖ Training complete!")

## Step 9: Save the Fine-Tuned Adapters

Save the LoRA adapters so you can use them later.

In [None]:
import os

final_dir = os.path.join(OUTPUT_DIR, "final")

print("Saving adapters...")
model.save_pretrained(final_dir)
tokenizer.save_pretrained(final_dir)

print(f"\n‚úì Adapters saved to: {final_dir}")
print("\nFiles saved:")
print("  - adapter_model.safetensors (~80MB) - The LoRA weights")
print("  - adapter_config.json - Configuration")
print("  - tokenizer files")

# Show file sizes
!ls -lh {final_dir}

## Step 10: Test the Fine-Tuned Model

Try out your model! It should now generate RISC-V assembly instructions.

**Important:** Use placeholder register names (rs1, rs2, rd) not specific ones (t0, s1, a0).

In [None]:
from peft import PeftModel

print("Loading model for inference...")

# Merge adapters for faster inference
model = model.merge_and_unload()
model.eval()

print("‚úì Model ready for inference")

def generate(query, max_tokens=100):
    """Generate RISC-V instruction from natural language."""
    # Format query
    full_query = f"Write the RISC-V assembly instruction for the following operation:\n{query}"
    messages = [{"role": "user", "content": full_query}]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    # Generate
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id,
        )
    
    # Decode
    response = tokenizer.decode(
        outputs[0][inputs.input_ids.shape[1]:],
        skip_special_tokens=True
    )
    
    return response.strip()

# Test examples
test_queries = [
    "Adds the values in rs1 and rs2, stores the result in rd",
    "Subtracts the value in rs2 from rs1, stores the result in rd",
    "Loads a word from memory at address rs1 into rd",
    "Multiplies the values in two registers (rs1, rs2) and stores the result in rd",
]

print("\n" + "="*60)
print("Testing Fine-Tuned Model")
print("="*60)

for i, query in enumerate(test_queries, 1):
    print(f"\n{i}. {query}")
    result = generate(query)
    print(f"   ‚Üí {result}")

print("\n" + "="*60)

## Step 11: Interactive Testing

Try your own queries! Type natural language descriptions and get RISC-V instructions.

Type 'quit' to stop.

In [None]:
print("Interactive RISC-V Assembly Generator")
print("="*60)
print("\nTip: Use placeholder names (rs1, rs2, rd, imm)")
print("Example: 'Branches to label if rs1 equals rs2'\n")

while True:
    query = input("\nYour query (or 'quit'): ").strip()
    
    if query.lower() in ['quit', 'exit', 'q']:
        print("\nDone!")
        break
    
    if not query:
        continue
    
    print("\nGenerating...")
    result = generate(query)
    print(f"\n{'='*60}")
    print(result)
    print(f"{'='*60}")

## Step 12: Download Adapters to Your Computer

Download the trained adapters to use them locally or share with others.

In [None]:
# Create a zip file of the adapters
!zip -r adapters_riscv.zip {final_dir}

print("\n‚úì Adapters packaged")
print("\nDownload options:")
print("1. Click the üìÅ icon on the left sidebar")
print("2. Find 'adapters_riscv.zip' and download it")
print("\nOr save to Google Drive:")

try:
    from google.colab import drive
    drive.mount('/content/drive')
    !cp -r {final_dir} /content/drive/MyDrive/
    print(f"\n‚úì Adapters copied to Google Drive: MyDrive/{os.path.basename(final_dir)}")
except:
    print("\n(Mount Google Drive manually if you want to save there)")

## üéâ Congratulations!

You've successfully fine-tuned a 7B parameter language model!

**What you learned:**
- ‚úÖ Parameter-efficient fine-tuning with LoRA
- ‚úÖ 4-bit quantization for memory optimization
- ‚úÖ Complete training pipeline from data to inference
- ‚úÖ How to adapt foundation models for specialized tasks

**Next steps:**
1. Try fine-tuning on your own dataset
2. Experiment with different hyperparameters
3. Share your adapters on Hugging Face Hub
4. Deploy your model in an application

**Resources:**
- Full project: https://github.com/jschroeder-mips/llm_training_mlx
- PEFT docs: https://huggingface.co/docs/peft
- LoRA paper: https://arxiv.org/abs/2106.09685