# üîß IMPORTANT: Kernel Selection

**Before running any cells, make sure you select the correct Python kernel:**

1. Click the **"Select Kernel"** button in the top-right corner of the notebook
2. Choose **"Python (envfin-416Final)"** from the list
   - This is your virtual environment with all dependencies installed
3. If you don't see this option, restart VS Code and try again

**Why this matters**: The notebook was crashing because it was using the wrong Python environment (`/opt/miniforge3/bin/python`) which doesn't have the required packages. Your correct environment is at `~/416Final/envfin/bin/python`.

---

In [37]:
# Import required libraries
import torch
import json
import numpy as np
from datasets import load_dataset, Dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer
import warnings
warnings.filterwarnings('ignore')

In [38]:
# Workaround for bitsandbytes compatibility with PyTorch 2.5+
import sys
import os

# Check if bitsandbytes can import cleanly
use_quantization = False
try:
    import bitsandbytes as bnb
    # Test if bitsandbytes actually works (not just imports)
    _ = bnb.nn.Linear4bit(10, 10)
    use_quantization = True
    print("‚úÖ bitsandbytes loaded successfully - QLoRA (4-bit) will be used")
except Exception as e:
    use_quantization = False
    print(f"‚ö†Ô∏è  bitsandbytes issue detected: {type(e).__name__}")
    print("   This is a known compatibility issue with PyTorch 2.5.x")
    print("   ‚úÖ Fallback: Will use standard LoRA (no quantization)")
    print("   üìä Impact: Training will use more GPU memory but still work effectively")
    print("   üí° For full QLoRA: downgrade to PyTorch 2.4.x or wait for bitsandbytes update")
    
    # Disable bitsandbytes for this session
    sys.modules['bitsandbytes'] = None

# Store flag globally for later cells
globals()['USE_QUANTIZATION'] = use_quantization

‚ö†Ô∏è  bitsandbytes issue detected: ModuleNotFoundError
   This is a known compatibility issue with PyTorch 2.5.x
   ‚úÖ Fallback: Will use standard LoRA (no quantization)
   üìä Impact: Training will use more GPU memory but still work effectively
   üí° For full QLoRA: downgrade to PyTorch 2.4.x or wait for bitsandbytes update


In [39]:
# üìä Training Mode Summary
print("=" * 70)
print("üéØ TRAINING MODE DETECTED")
print("=" * 70)
device = "cuda" if torch.cuda.is_available() else "cpu"
if USE_QUANTIZATION:
    print("‚úÖ Mode: QLoRA (4-bit Quantization)")
    print("   ‚Ä¢ Memory: ~7-10GB VRAM")
    print("   ‚Ä¢ Speed: Faster")
    print("   ‚Ä¢ Method: 4-bit NF4 quantization + LoRA adapters")
else:
    print("‚ö†Ô∏è  Mode: Standard LoRA (No Quantization)")
    print("   ‚Ä¢ Memory: ~15-20GB VRAM")
    print("   ‚Ä¢ Speed: Standard")
    print("   ‚Ä¢ Method: Full precision + LoRA adapters")
    print("   ‚Ä¢ Reason: bitsandbytes unavailable (PyTorch 2.5.x compatibility)")

print(f"\nüìç Device: {device}")
if torch.cuda.is_available():
    print(f"üéÆ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")

print("\nüí° Both modes produce high-quality results!")
print("=" * 70)

üéØ TRAINING MODE DETECTED
‚ö†Ô∏è  Mode: Standard LoRA (No Quantization)
   ‚Ä¢ Memory: ~15-20GB VRAM
   ‚Ä¢ Speed: Standard
   ‚Ä¢ Method: Full precision + LoRA adapters
   ‚Ä¢ Reason: bitsandbytes unavailable (PyTorch 2.5.x compatibility)

üìç Device: cuda
üéÆ GPU: Quadro RTX 6000
üíæ VRAM: 25.2GB

üí° Both modes produce high-quality results!


## ‚öôÔ∏è Environment Setup & Dependency Check

**Automatic Mode Detection:**
- ‚úÖ **bitsandbytes works**: Notebook will use **QLoRA** (4-bit quantization - memory efficient)
- ‚ö†Ô∏è **bitsandbytes fails**: Notebook will use **Standard LoRA** (no quantization - more memory but still effective)

**If you see the warning above:**
This is a known compatibility issue with PyTorch 2.5.x and bitsandbytes. The notebook will automatically use standard LoRA instead.

**To enable full QLoRA (optional):**
```bash
pip uninstall -y torch torchvision torchaudio
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
```

**Both modes work well!** The main difference is memory usage:
- **QLoRA**: ~7-10GB VRAM for Phi-3-mini
- **Standard LoRA**: ~15-20GB VRAM for Phi-3-mini

In [40]:
# Configuration
torch.manual_seed(42)

# Use Phi-3-mini-128k for longer context (recommended)
model_name = "microsoft/Phi-3-mini-128k-instruct"

# Check device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

Using device: cuda
CUDA available: True
GPU: Quadro RTX 6000
GPU Memory: 25.19 GB


# LoRA Fine-tuning: Phi-3-mini for CAD-to-Language Generation

This notebook demonstrates fine-tuning Phi-3-mini-128k using LoRA (Low-Rank Adaptation) to create a CAD-to-Language model using the CADmium dataset.

## Architecture
- **Base Model**: microsoft/Phi-3-mini-128k-instruct (longer context for CAD designs)
- **Training Method**: QLoRA (4-bit quantization + LoRA)
- **Dataset**: chandar-lab/CADmium (subset for demo)
- **Task**: Natural Language ‚Üí CAD JSON generation

## 1. Install Required Dependencies

In [41]:
# Install necessary packages
# !pip install -q transformers datasets peft accelerate bitsandbytes trl sentencepiece

## 2. Load and Prepare CADmium Dataset

We'll load a subset of the CADmium dataset and prepare it for fine-tuning with proper formatting.

In [42]:
# Load CADmium dataset (subset for demo to save memory)
print("Loading CADmium dataset...")
try:
    # Load from HuggingFace - we'll use a small subset
    dataset = load_dataset("chandar-lab/CADmium-ds", split="train", streaming=False)
    
    # Take a small subset for this demo (adjust based on your memory)
    num_samples = 100  # Start with 100 samples for demo
    dataset = dataset.shuffle(seed=42).select(range(min(num_samples, len(dataset))))
    
    print(f"‚úÖ Loaded {len(dataset)} samples from CADmium")
    print(f"Dataset columns: {dataset.column_names}")
    print(f"\nFirst example:")
    print(dataset[0])
except Exception as e:
    print(f"‚ùå Error loading dataset: {e}")
    print("\nWill create a synthetic example dataset for demonstration...")

Loading CADmium dataset...
‚úÖ Loaded 100 samples from CADmium
Dataset columns: ['uid', 'annotation', 'json_desc']

First example:
{'uid': '0072/00726842', 'annotation': 'Begin by creating a rectangular prism with overall dimensions 0.75 long, 0.375 wide, and 0.46875 high. \n\nNext, modify the ends of the prism as follows:\n\nAt each of the four vertical corners (both at x=0 and x=0.75 along the length), replace the sharp edge with a quarter-circle arc of radius 0.1875, centered horizontally and vertically on the face. The top and bottom vertical edges on the short faces (width sides) are thus rounded, blending tangent to both edge and face.\n\nOn the top and bottom faces, ensure each end describes a smooth semicircular extension: extrude the width at both ends (x=0 and x=0.75) into a half-cylinder, each with a radius 0.1875 and center at (0.1875, 0.1875) for x=0 and (0.5625, 0.1875) for x=0.75. The total length from tip to tip, including both half-cylindrical ends, is 0.75.\n\nBlend a

In [43]:
# Data preprocessing function
def format_cad_instruction(example):
    """
    Format the CADmium-ds dataset into instruction-following format.
    
    CADmium-ds structure:
    - name: Design name
    - annotation: Natural language description
    - sequence: CAD operations (JSON/dict)
    """
    # Extract instruction and JSON from CADmium-ds format
    instruction = example.get('annotation', 'Create a CAD model')
    
    # Handle sequence - could be dict or string
    sequence = example.get('sequence', {})
    if isinstance(sequence, dict):
        json_output = json.dumps(sequence, indent=2)
    elif isinstance(sequence, str):
        try:
            # Parse and re-format JSON to ensure consistency
            json_obj = json.loads(sequence)
            json_output = json.dumps(json_obj, indent=2)
        except:
            json_output = sequence  # Keep as is if not valid JSON
    else:
        json_output = '{}'
    
    # Format according to Phi-3 chat template
    messages = [
        {
            "role": "system",
            "content": "You map natural language instructions to a corresponding Fusion 360 JSON using the v1.0 schema. Generate valid, executable CAD JSON with proper units (meters), coordinate frames, and operation ordering (sketch ‚Üí feature ‚Üí transform)."
        },
        {
            "role": "user",
            "content": instruction
        },
        {
            "role": "assistant",
            "content": json_output
        }
    ]
    
    return {"messages": messages}

# Apply formatting
print("Formatting dataset...")
try:
    formatted_dataset = dataset.map(
        format_cad_instruction, 
        remove_columns=dataset.column_names,
        desc="Formatting CAD instructions"
    )
    print(f"‚úÖ Formatted {len(formatted_dataset)} examples")
    print(f"\nSample formatted example:")
    print(f"System: {formatted_dataset[0]['messages'][0]['content'][:80]}...")
    print(f"User: {formatted_dataset[0]['messages'][1]['content'][:80]}...")
    print(f"Assistant: {formatted_dataset[0]['messages'][2]['content'][:150]}...")
except Exception as e:
    print(f"‚ùå Error during formatting: {e}")
    # This shouldn't happen with synthetic data, but just in case
    print("\nUsing pre-formatted synthetic dataset...")
    synthetic_data = [
        {
            "messages": [
                {"role": "system", "content": "You map natural language instructions to a corresponding Fusion 360 JSON using the v1.0 schema."},
                {"role": "user", "content": "Create a rectangular sketch 10mm by 20mm centered at the origin"},
                {"role": "assistant", "content": '{"parts": {"part_0": {"sketch": {"center": [0, 0], "width": 0.01, "height": 0.02}, "frame": "world"}}}'}
            ]
        }
    ] * 10  # Repeat for demo
    formatted_dataset = Dataset.from_list(synthetic_data)
    print(f"Created synthetic dataset with {len(formatted_dataset)} examples")

Formatting dataset...


Formatting CAD instructions:   0%|          | 0/100 [00:00<?, ? examples/s]

‚úÖ Formatted 100 examples

Sample formatted example:
System: You map natural language instructions to a corresponding Fusion 360 JSON using t...
User: Begin by creating a rectangular prism with overall dimensions 0.75 long, 0.375 w...
Assistant: {}...


## 3. Configure QLoRA (4-bit Quantization + LoRA)

QLoRA allows us to fine-tune large models efficiently by:
- Loading the base model in 4-bit precision (nf4)
- Adding trainable LoRA adapters to attention and MLP layers

In [44]:
# Configure 4-bit quantization (if bitsandbytes is available)
if USE_QUANTIZATION:
    # Determine best compute dtype for quantization
    if torch.cuda.is_available():
        try:
            # Test bfloat16 support on GPU
            _ = torch.zeros(1, dtype=torch.bfloat16, device='cuda')
            quant_compute_dtype = torch.bfloat16
            print("‚úÖ Quantization will use bfloat16")
        except:
            quant_compute_dtype = torch.float16
            print("‚ö†Ô∏è  Quantization will use float16 (bfloat16 not supported)")
    else:
        quant_compute_dtype = torch.float32
        print("‚ÑπÔ∏è  CPU mode: quantization will use float32")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",  # Normal Float 4-bit
        bnb_4bit_compute_dtype=quant_compute_dtype,
        bnb_4bit_use_double_quant=True,  # Double quantization for even more memory savings
    )
    print("‚úÖ QLoRA: 4-bit quantization config created")
else:
    bnb_config = None
    print("‚ÑπÔ∏è  Standard LoRA: No quantization (bitsandbytes unavailable)")

‚ÑπÔ∏è  Standard LoRA: No quantization (bitsandbytes unavailable)


In [45]:
# Configure LoRA
lora_config = LoraConfig(
    r=16,  # Rank - controls adapter capacity (16-32 recommended)
    lora_alpha=16,  # Scaling factor (usually equal to r)
    lora_dropout=0.05,  # Dropout for regularization
    bias="none",
    task_type="CAUSAL_LM",
    # Target all attention and MLP modules for comprehensive adaptation
    target_modules=[
        "q_proj",    # Query projection
        "k_proj",    # Key projection
        "v_proj",    # Value projection
        "o_proj",    # Output projection
        "gate_proj", # MLP gate
        "up_proj",   # MLP up projection
        "down_proj"  # MLP down projection
    ],
)

print("‚úÖ LoRA config created")
print(f"   Rank: {lora_config.r}")
print(f"   Alpha: {lora_config.lora_alpha}")
print(f"   Dropout: {lora_config.lora_dropout}")
print(f"   Target modules: {lora_config.target_modules}")

‚úÖ LoRA config created
   Rank: 16
   Alpha: 16
   Dropout: 0.05
   Target modules: {'o_proj', 'down_proj', 'k_proj', 'q_proj', 'up_proj', 'v_proj', 'gate_proj'}


## 4. Load Base Model and Tokenizer with QLoRA

In [46]:
# Load tokenizer
print(f"Loading tokenizer from {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    padding_side="right",  # Required for training
)

# Set padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print(f"‚úÖ Tokenizer loaded")
print(f"   Vocab size: {len(tokenizer)}")
print(f"   Pad token: {tokenizer.pad_token}")
print(f"   EOS token: {tokenizer.eos_token}")

Loading tokenizer from microsoft/Phi-3-mini-128k-instruct...


‚úÖ Tokenizer loaded
   Vocab size: 32011
   Pad token: <|endoftext|>
   EOS token: <|endoftext|>


In [47]:
# Load model (with or without quantization)
if USE_QUANTIZATION:
    print(f"Loading model {model_name} with 4-bit quantization (QLoRA)...")
    print("‚è≥ This may take a few minutes...")
    
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True,
        torch_dtype=torch.bfloat16,
        attn_implementation="eager",
    )
    
    # Prepare model for k-bit training
    model = prepare_model_for_kbit_training(model)
    # Add LoRA adapters
else:
    print(f"Loading model {model_name} in standard precision (no quantization)...")
    print("‚è≥ This may take a few minutes...")
    
    # Determine best dtype
    if torch.cuda.is_available():
        try:
            _ = torch.zeros(1, dtype=torch.bfloat16, device='cuda')
            model_dtype = torch.bfloat16
        except:
            model_dtype = torch.float16
    else:
        model_dtype = torch.float32
    
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",
        trust_remote_code=True,
        torch_dtype=model_dtype,
        attn_implementation="eager",
    )
    print(f"‚úÖ Model loaded in {model_dtype} (standard LoRA, no quantization)")

# Prepare model for k-bit training (gradient checkpointing, etc.)
model = prepare_model_for_kbit_training(model)

# Add LoRA adapters
model = get_peft_model(model, lora_config)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"\n‚úÖ LoRA adapters added")
print(f"   Trainable params: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
print(f"   Total params: {total_params:,}")

Loading model microsoft/Phi-3-mini-128k-instruct in standard precision (no quantization)...
‚è≥ This may take a few minutes...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

‚úÖ Model loaded in torch.bfloat16 (standard LoRA, no quantization)

‚úÖ LoRA adapters added
   Trainable params: 8,912,896 (0.23%)
   Total params: 3,829,992,448


## 5. Configure Training with SFT (Supervised Fine-Tuning)

Following the recommendations:
- Learning rate: 2e-4 with cosine schedule
- Warmup: 3%
- Sequence length: 2-4k tokens
- Effective batch size: 256-512 tokens/step
- Training: 2-3 epochs with early stopping

In [48]:
# Training configuration
output_dir = "./phi3-cad-lora"

training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=2,  # 2-3 epochs recommended
    per_device_train_batch_size=1,  # Small batch size for memory efficiency
    gradient_accumulation_steps=8,  # Effective batch size = 8
    learning_rate=2e-4,  # Recommended for LoRA
    lr_scheduler_type="cosine",  # Cosine learning rate schedule
    warmup_ratio=0.03,  # 3% warmup
    logging_steps=1,
    save_strategy="epoch",
    save_total_limit=2,
    fp16=False,  # Use bfloat16 instead
    bf16=True,  # Better for training stability
    gradient_checkpointing=True,  # Save memory
    optim="paged_adamw_8bit" if USE_QUANTIZATION else "adamw_torch",  # 8-bit optimizer if quantized
    report_to="none",  # Disable wandb/tensorboard for demo
    push_to_hub=False,
)

print("‚úÖ Training configuration created")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Batch size: {training_args.per_device_train_batch_size}")
print(f"   Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"   Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"   Learning rate: {training_args.learning_rate}")
print(f"   LR scheduler: {training_args.lr_scheduler_type}")

‚úÖ Training configuration created
   Epochs: 2
   Batch size: 1
   Gradient accumulation: 8
   Effective batch size: 8
   Learning rate: 0.0002
   LR scheduler: SchedulerType.COSINE


In [57]:
# Format messages to text using chat template
def formatting_prompts_func(examples):
    """
    Format examples for SFTTrainer.
    Must return a list of strings (one per example).
    """
    texts = []
    for messages in examples["messages"]:
        text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=False
        )
        texts.append(text)
    return texts

print("Setting up training data formatting...")

# Initialize SFT Trainer (compatible with older TRL versions)
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,  # Use the dataset with "messages" field
    args=training_args,
    formatting_func=formatting_prompts_func,  # Function to convert messages to text
)

print("‚úÖ Trainer initialized")
print(f"   Training samples: {len(formatted_dataset)}")

Setting up training data formatting...
‚úÖ Trainer initialized
   Training samples: 100


## 6. Train the Model

Start the LoRA fine-tuning process. This will only train the LoRA adapter weights (~0.5-2% of total parameters).

In [58]:
# Start training
print("üöÄ Starting training...")
print("=" * 50)

trainer.train()

print("=" * 50)
print("‚úÖ Training completed!")

üöÄ Starting training...


You are not running the flash-attention implementation, expect numerical differences.


Step,Training Loss
1,2.1696
2,2.3409
3,2.1181
4,2.3983
5,2.0833
6,1.7768
7,1.8413
8,1.9226
9,1.9386
10,1.6703


‚úÖ Training completed!


## 7. Save the LoRA Adapters

Save only the trained LoRA adapters (much smaller than the full model).

In [59]:
# Save LoRA adapters
lora_output_dir = "./phi3-cad-lora-adapters"

model.save_pretrained(lora_output_dir)
tokenizer.save_pretrained(lora_output_dir)

print(f"‚úÖ LoRA adapters saved to: {lora_output_dir}")
print("\nYou can load these adapters later with:")
print(f"  from peft import PeftModel")
print(f"  base_model = AutoModelForCausalLM.from_pretrained('{model_name}')")
print(f"  model = PeftModel.from_pretrained(base_model, '{lora_output_dir}')")

‚úÖ LoRA adapters saved to: ./phi3-cad-lora-adapters

You can load these adapters later with:
  from peft import PeftModel
  base_model = AutoModelForCausalLM.from_pretrained('microsoft/Phi-3-mini-128k-instruct')
  model = PeftModel.from_pretrained(base_model, './phi3-cad-lora-adapters')


## 8. Test the Fine-tuned Model

Generate CAD JSON from natural language instructions.

In [64]:
# Test generation function (alternative - disable cache)
def generate_cad_json(instruction, max_new_tokens=512, temperature=0.7, top_p=0.9):
    """Generate CAD JSON from natural language instruction."""
    
    messages = [
        {
            "role": "system",
            "content": "You map natural language instructions to a corresponding Fusion 360 JSON using the v1.0 schema. Generate valid, executable CAD JSON with proper units (meters), coordinate frames, and operation ordering (sketch ‚Üí feature ‚Üí transform)."
        },
        {
            "role": "user",
            "content": instruction
        }
    ]
    
    # Format using chat template
    formatted_prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    # Tokenize
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
    
    # Generate WITHOUT cache (slower but avoids compatibility issues)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
            use_cache=False,  # Disable cache to avoid DynamicCache error
        )
    
    # Decode
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract only the assistant's response
    if "<|assistant|>" in generated_text:
        generated_text = generated_text.split("<|assistant|>")[-1].strip()
    
    return generated_text

print("‚úÖ Generation function ready (cache disabled for compatibility)")

‚úÖ Generation function ready (cache disabled for compatibility)


In [65]:
# Test with sample prompts
test_prompts = [
    "Create a rectangular sketch 10mm by 20mm centered at the origin",
    "Make a circular sketch with radius 5mm at position (10, 10)",
    "Create a cube with side length 10 units",
    "Design a cylinder with radius 3mm and height 15mm"
]

print("üß™ Testing fine-tuned model\n")
print("=" * 80)

for i, prompt in enumerate(test_prompts, 1):
    print(f"\nüìù Test {i}: {prompt}")
    print("-" * 80)
    
    result = generate_cad_json(prompt, max_new_tokens=256)
    print(f"Generated CAD JSON:\n{result}")
    print("=" * 80)

üß™ Testing fine-tuned model


üìù Test 1: Create a rectangular sketch 10mm by 20mm centered at the origin
--------------------------------------------------------------------------------
Generated CAD JSON:
You map natural language instructions to a corresponding Fusion 360 JSON using the v1.0 schema. Generate valid, executable CAD JSON with proper units (meters), coordinate frames, and operation ordering (sketch ‚Üí feature ‚Üí transform). Create a rectangular sketch 10mm by 20mm centered at the origin {
  "$schema": "https://www.autodesk.com/specification/fusion360",
  "name": "Sketch",
  "geometry": {
    "type": "Sketch",
    "origin": {
      "x": 0,
      "y": 0,
      "z": 0
    },
    "planes": [
      {
        "name": "XY",
        "points": [
          {
            "x": 0,
            "y": 0,
            "z": 0
          },
          {
            "x": 5,
            "y": 0,
            "z": 0
          },
          {
            "x": 5,
            "y": -5,
            

## 9. Advanced: Load and Swap LoRA Adapters

Demonstration of hot-swapping adapters (for future dual-adapter setup: NL‚ÜíCAD and CAD‚ÜíNL).

In [None]:
# Example: How to load saved LoRA adapters later
"""
from peft import PeftModel

# Load base model (4-bit quantized)
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Load LoRA adapter for NL‚ÜíCAD generation
model_nl_to_cad = PeftModel.from_pretrained(
    base_model,
    "./phi3-cad-lora-adapters"
)

# For a dual-adapter setup, you could train a second adapter:
# model_cad_to_nl = PeftModel.from_pretrained(
#     base_model,
#     "./phi3-cad-to-nl-lora-adapters"
# )

# Hot-swap adapters at inference time:
# model.set_adapter("nl_to_cad")  # Switch to generation
# model.set_adapter("cad_to_nl")  # Switch to explanation
"""

print("‚úÖ See code comments for adapter loading/swapping example")