# GPU-Aware Fine-Tuning Test Notebook

This notebook tests GPU detection and fine-tuning with either unsloth (NVIDIA/AMD/Intel) or unsloth-mlx (Apple Silicon).

**What this notebook does:**
1. Detects your hardware platform (Apple Silicon vs CUDA GPU)
2. Installs the appropriate fine-tuning library
3. Loads base llama3.2:1b model
4. Runs baseline inference BEFORE fine-tuning
5. Fine-tunes on 2-3 simple examples
6. Runs inference AFTER fine-tuning to show the difference

## Cell 1: GPU Detection

Detect the hardware platform to determine which library to use.

In [1]:
import platform
import sys

def detect_hardware():
    """
    Detect the hardware platform for fine-tuning.
    
    Returns:
        str: 'apple_silicon', 'cuda', or 'cpu'
    """
    # Check for Apple Silicon (M1/M2/M3/M4/M5)
    if platform.system() == 'Darwin' and platform.machine() == 'arm64':
        return 'apple_silicon'
    
    # Check for NVIDIA CUDA (requires torch, so we'll check this after importing torch)
    # For now, if not Apple Silicon, assume CUDA or will detect later
    return 'unknown'

# Detect platform
platform_type = detect_hardware()

print("=" * 60)
print("HARDWARE DETECTION")
print("=" * 60)
print(f"System: {platform.system()}")
print(f"Machine: {platform.machine()}")
print(f"Processor: {platform.processor()}")
print(f"Python: {sys.version}")
print(f"\nDetected platform: {platform_type.upper()}")
print("=" * 60)

# Now check for CUDA if not Apple Silicon
if platform_type == 'unknown':
    try:
        import torch
        if torch.cuda.is_available():
            platform_type = 'cuda'
            print(f"\n✓ NVIDIA GPU detected: {torch.cuda.get_device_name(0)}")
            print(f"  CUDA version: {torch.version.cuda}")
        else:
            platform_type = 'cpu'
            print("\n⚠ No CUDA GPU found")
    except ImportError:
        print("\n⚠ PyTorch not installed yet, will install with unsloth")
        platform_type = 'cpu'

print(f"\nWill use: {'unsloth-mlx' if platform_type == 'apple_silicon' else 'unsloth'}")

HARDWARE DETECTION
System: Darwin
Machine: arm64
Processor: arm
Python: 3.13.9 (main, Oct 14 2025, 21:10:40) [Clang 20.1.4 ]

Detected platform: APPLE_SILICON

Will use: unsloth-mlx


## Cell 2: Install Dependencies

Install the appropriate fine-tuning library based on detected hardware.

In [2]:
print("Installing dependencies...\n")

if platform_type == 'apple_silicon':
    print("Installing unsloth-mlx for Apple Silicon...")
    !pip install -q unsloth-mlx mlx datasets
    print("✓ unsloth-mlx installed")
else:
    print("Installing unsloth for CUDA/CPU...")
    !pip install -q "unsloth[colab-new]" datasets
    print("✓ unsloth installed")

print("\nDependencies installed successfully!")

Installing dependencies...

Installing unsloth-mlx for Apple Silicon...
zsh:1: command not found: pip
✓ unsloth-mlx installed

Dependencies installed successfully!


## Cell 3: Import Libraries & Configure

Import the appropriate library and configure model settings.

In [3]:

print("Importing libraries...\n")

if platform_type == 'apple_silicon':
    from unsloth_mlx import FastLanguageModel
    model_name = "mlx-community/Llama-3.2-1B-Instruct-4bit"
    print(f"✓ Using unsloth-mlx with model: {model_name}")
else:
    from unsloth import FastLanguageModel
    model_name = "unsloth/llama-3.2-1b"
    print(f"✓ Using unsloth with model: {model_name}")

from datasets import Dataset

print("\nLibraries imported successfully!")

Importing libraries...

✓ Using unsloth-mlx with model: mlx-community/Llama-3.2-1B-Instruct-4bit

Libraries imported successfully!


## Cell 4: Load Base Model

Load the base llama3.2:1b model with 4-bit quantization.

In [4]:
print("Loading base model...\n")
print(f"Model: {model_name}")
print(f"Max sequence length: 512")
print(f"Quantization: 4-bit\n")

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=512,
    load_in_4bit=True,
)

print("✓ Base model loaded successfully!")
print(f"  Model type: {type(model).__name__}")
print(f"  Tokenizer vocab size: {tokenizer.vocab_size}")

Loading base model...

Model: mlx-community/Llama-3.2-1B-Instruct-4bit
Max sequence length: 512
Quantization: 4-bit



Fetching 6 files:   0%|          | 0/6 [00:00<?, ?it/s]

✓ Base model loaded successfully!
  Model type: MLXModelWrapper
  Tokenizer vocab size: 128000


## Cell 5: Baseline Inference (BEFORE Fine-tuning)

Test the base model with sample prompts to see its default behavior.

In [6]:
print("=" * 60)
print("BASELINE INFERENCE (Before Fine-tuning)")
print("=" * 60)

# Prepare model for inference
FastLanguageModel.for_inference(model)

# Test prompts
test_prompts = [
    "Complete this sentence: The sky is",
    "Complete this sentence: Water is",
]

for prompt in test_prompts:
    print(f"\nPrompt: {prompt}")
    # MLX generate expects prompt string directly, not tokenized inputs
    response = model.generate(prompt=prompt, max_tokens=20)
    print(f"Response: {response}")

print("\n" + "=" * 60)

BASELINE INFERENCE (Before Fine-tuning)
Inference mode enabled with KV caching

Prompt: Complete this sentence: The sky is
Response: _______ blue today.

A) A deep shade of blue
B) A light blue
C

Prompt: Complete this sentence: Water is
Response: life, but it's not the only thing that makes us whole.

What does this sentence mean?





## Cell 6: Create Simple Training Dataset

Create a minimal dataset with 3 instruction-response pairs for quick testing.

In [7]:
print("Creating training dataset...\n")

# Simple training data in Alpaca format
training_data = [
    {
        "text": """### Instruction:
Complete the sentence

### Input:
The sky is

### Response:
blue and beautiful, stretching endlessly above us."""
    },
    {
        "text": """### Instruction:
Complete the sentence

### Input:
Water is

### Response:
essential for all life on Earth."""
    },
    {
        "text": """### Instruction:
Complete the sentence

### Input:
AI technology is

### Response:
rapidly advancing and transforming our world."""
    },
]

# Create HuggingFace dataset
train_dataset = Dataset.from_list(training_data)

print(f"✓ Created dataset with {len(train_dataset)} examples")
print("\nSample training example:")
print("-" * 60)
print(train_dataset[0]['text'])
print("-" * 60)

Creating training dataset...

✓ Created dataset with 3 examples

Sample training example:
------------------------------------------------------------
### Instruction:
Complete the sentence

### Input:
The sky is

### Response:
blue and beautiful, stretching endlessly above us.
------------------------------------------------------------


## Cell 7: Apply LoRA Adapters

Add LoRA (Low-Rank Adaptation) adapters to the model for efficient fine-tuning.

In [8]:
print("Applying LoRA adapters...\n")

model = FastLanguageModel.get_peft_model(
    model,
    r=8,  # LoRA rank (lower for quick testing)
    lora_alpha=16,  # Scaling factor
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Attention layers
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=42,
)

print("✓ LoRA adapters applied")
print(f"  LoRA rank (r): 8")
print(f"  LoRA alpha: 16")
print(f"  Target modules: q_proj, k_proj, v_proj, o_proj")

Applying LoRA adapters...

LoRA configuration set: rank=8, alpha=16, modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'], dropout=0.05
✓ LoRA adapters applied
  LoRA rank (r): 8
  LoRA alpha: 16
  Target modules: q_proj, k_proj, v_proj, o_proj




## Cell 8: Fine-tune the Model

Train the model for a few steps with our simple dataset.

In [9]:
print("=" * 60)
print("FINE-TUNING")
print("=" * 60)

# Import trainer based on platform
if platform_type == 'apple_silicon':
    try:
        from unsloth_mlx import SFTTrainer, SFTConfig
        TrainingArgsClass = SFTConfig
        print("Using unsloth-mlx SFTTrainer\n")
    except ImportError:
        # Fallback to standard trainer
        from trl import SFTTrainer
        from transformers import TrainingArguments
        TrainingArgsClass = TrainingArguments
        print("Using standard SFTTrainer (fallback)\n")
else:
    from trl import SFTTrainer
    from transformers import TrainingArguments
    TrainingArgsClass = TrainingArguments
    print("Using standard SFTTrainer\n")

# Training configuration
training_args = TrainingArgsClass(
    output_dir="./test_output",
    per_device_train_batch_size=1,
    max_steps=15,  # Very short for quick testing
    learning_rate=2e-4,
    logging_steps=5,
    warmup_steps=2,
    save_strategy="no",  # Don't save checkpoints for this test
    report_to="none",  # Disable wandb/tensorboard
)

# Create trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=512,
    args=training_args,
    packing=False,
)

print("Training configuration:")
print(f"  Batch size: 1")
print(f"  Max steps: 15")
print(f"  Learning rate: 2e-4")
print(f"  Dataset size: {len(train_dataset)} examples")
print("\nStarting training...\n")

# Train!
trainer.train()

print("\n" + "=" * 60)
print("✓ Fine-tuning complete!")
print("=" * 60)

FINE-TUNING
Using unsloth-mlx SFTTrainer

Trainer initialized:
  Output dir: test_output
  Adapter path: adapters
  Learning rate: 0.0002
  Iterations: 15
  Batch size: 1
  LoRA r=8, alpha=16
  Native training: True
  LR scheduler: cosine
  Grad checkpoint: False
Training configuration:
  Batch size: 1
  Max steps: 15
  Learning rate: 2e-4
  Dataset size: 3 examples

Starting training...

Starting Fine-Tuning

[Using Native MLX Training]

Applying LoRA adapters...
Applying LoRA to 16 layers: {'rank': 8, 'scale': 2.0, 'dropout': 0.05, 'keys': ['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj', 'self_attn.o_proj']}
✓ LoRA applied successfully to 16 layers
  Trainable LoRA parameters: 128
Preparing training data...
  Detected format: text
✓ Prepared 3 training samples
  Saved to: test_output/train.jsonl
✓ Created validation set (copied from train)

Training configuration:
  Iterations: 15
  Batch size: 1
  Learning rate: 0.0002
  LR scheduler: cosine
  Grad checkpoint: True
  Ada

Calculating loss...: 100%|██████████| 3/3 [00:00<00:00,  8.43it/s]

Iter 1: Val loss 4.820, Val took 0.360s





Iter 5: Train loss 2.917, Learning Rate 1.669e-04, It/sec 1.724, Tokens/sec 46.537, Trained Tokens 135, Peak mem 0.802 GB
Iter 10: Train loss 0.663, Learning Rate 6.910e-05, It/sec 14.200, Tokens/sec 369.200, Trained Tokens 265, Peak mem 0.802 GB


Calculating loss...: 100%|██████████| 3/3 [00:00<00:00, 36.63it/s]

Iter 15: Val loss 0.423, Val took 0.085s
Iter 15: Train loss 0.431, Learning Rate 2.185e-06, It/sec 14.156, Tokens/sec 382.206, Trained Tokens 400, Peak mem 0.802 GB
Saved final weights to adapters/adapters.safetensors.

Training Complete!
  Adapters saved to: adapters

✓ Fine-tuning complete!





## Cell 9: Post-Training Inference (AFTER Fine-tuning)

Test the fine-tuned model with the same prompts to see the difference.

In [11]:
print("=" * 60)
print("POST-TRAINING INFERENCE (After Fine-tuning)")
print("=" * 60)

# Prepare model for inference
FastLanguageModel.for_inference(model)

# Test same prompts as before
for prompt in test_prompts:
    print(f"\nPrompt: {prompt}")
    # MLX generate expects prompt string directly, not tokenized inputs
    response = model.generate(prompt=prompt, max_tokens=20)
    print(f"Response: {response}")

print("\n" + "=" * 60)

POST-TRAINING INFERENCE (After Fine-tuning)
Inference mode enabled with KV caching

Prompt: Complete this sentence: The sky is
Response: 

## Input:
The sky is

## Response:
blue and beautiful, stretching endlessly above us.

Prompt: Complete this sentence: Water is
Response: 

Answer: essential for all life on Earth.



## Cell 10: Summary & Comparison

Summary of what we tested and next steps.

In [12]:
print("=" * 60)
print("TEST SUMMARY")
print("=" * 60)
print(f"\n✓ Platform detected: {platform_type.upper()}")
print(f"✓ Library used: {'unsloth-mlx' if platform_type == 'apple_silicon' else 'unsloth'}")
print(f"✓ Base model loaded: {model_name}")
print(f"✓ Fine-tuning completed: 15 steps on 3 examples")
print(f"✓ Inference tested: Before and after fine-tuning")

print("\n" + "=" * 60)
print("NEXT STEPS")
print("=" * 60)
print("\nIf fine-tuning worked successfully:")
print("1. Update src/finetune.py with GPU detection logic")
print("2. Update pyproject.toml with platform-specific dependencies")
print("3. Test with your actual blog data")
print("4. Deploy the fine-tuned model to Ollama")
print("\nIf there were errors:")
print("- Review error messages above")
print("- Check unsloth-mlx compatibility with your macOS version")
print("- Consider using Google Colab for CUDA-based fine-tuning")

TEST SUMMARY

✓ Platform detected: APPLE_SILICON
✓ Library used: unsloth-mlx
✓ Base model loaded: mlx-community/Llama-3.2-1B-Instruct-4bit
✓ Fine-tuning completed: 15 steps on 3 examples
✓ Inference tested: Before and after fine-tuning

NEXT STEPS

If fine-tuning worked successfully:
1. Update src/finetune.py with GPU detection logic
2. Update pyproject.toml with platform-specific dependencies
3. Test with your actual blog data
4. Deploy the fine-tuned model to Ollama

If there were errors:
- Review error messages above
- Check unsloth-mlx compatibility with your macOS version
- Consider using Google Colab for CUDA-based fine-tuning
