# Nemotron Next 8B LoRA Fine-tuning with FiQA Dataset

This notebook demonstrates LoRA fine-tuning of [NVIDIA Nemotron Next 8B](https://huggingface.co/nvidia/Nemotron-Next-8B) on the [FiQA dataset](https://huggingface.co/datasets/explodinggradients/fiqa) for financial question answering.

## Table of Contents

1. [Setup & Environment](#setup)
2. [Model Loading](#model-loading)
3. [Dataset Loading & Preprocessing](#dataset)
4. [Baseline Evaluation](#baseline)
5. [LoRA Configuration](#lora-config)
6. [LoRA Training](#training)
7. [Fine-tuned Evaluation](#evaluation)
8. [Visualization & Analysis](#visualization)

---

## GPU Requirements

‚ö†Ô∏è **This notebook requires a GPU with 24GB+ VRAM** (A100, H100, or RTX 4090 recommended)

| Phase | GPU Required | Time Estimate |
|-------|--------------|---------------|
| Model Loading | ‚úÖ Yes | 2-5 min |
| Dataset Prep | ‚úÖ Yes | 10-25 min |
| Baseline Eval | ‚úÖ Yes | 30-60 min |
| LoRA Training | ‚úÖ Yes | 3-6 hours |
| Final Eval | ‚úÖ Yes | 30-60 min |


<a name="setup"></a>
## 1. Setup & Environment

First, let's verify our environment and import required libraries.


In [None]:
# GPU REQUIRED - Verify CUDA availability
import torch

print("=" * 50)
print("Environment Check")
print("=" * 50)
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    raise RuntimeError("‚ùå CUDA not available! This notebook requires a GPU.")

print("=" * 50)
print("‚úÖ GPU environment verified!")


In [None]:
# Core imports
import os
import json
import time
from pathlib import Path
from typing import Dict, List, Optional, Tuple

# Data processing
import pandas as pd
from datasets import load_dataset, Dataset, DatasetDict
from tqdm.auto import tqdm

# NeMo AutoModel imports
try:
    import nemo_automodel
    from nemo_automodel._transformers import NeMoAutoModelForCausalLM
    from nemo_automodel.components._peft.lora import PeftConfig, apply_lora_to_linear_modules
    print(f"‚úÖ NeMo AutoModel imported successfully")
except ImportError as e:
    print(f"‚ö†Ô∏è NeMo AutoModel not found: {e}")
    print("Please install: cd Automodel && uv pip install -e .")
    raise

# Transformers for tokenizer
from transformers import AutoTokenizer, AutoProcessor

# Evaluation
import evaluate

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

print("‚úÖ All imports successful!")


In [None]:
# Configuration
CONFIG = {
    # Model
    "model_name": "nvidia/Nemotron-Next-8B",
    "torch_dtype": torch.bfloat16,
    
    # Dataset
    "dataset_name": "explodinggradients/fiqa",
    "train_split_ratio": 0.8,  # 80% train, 20% validation from original train
    "max_length": 512,
    
    # LoRA
    "lora_rank": 8,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    
    # Training
    "learning_rate": 2e-4,
    "batch_size": 4,
    "num_epochs": 3,
    "gradient_accumulation_steps": 4,
    "warmup_ratio": 0.1,
    
    # Paths
    "output_dir": "./outputs",
    "checkpoint_dir": "./checkpoints",
}

# Create output directories
Path(CONFIG["output_dir"]).mkdir(parents=True, exist_ok=True)
Path(CONFIG["checkpoint_dir"]).mkdir(parents=True, exist_ok=True)

print("‚úÖ Configuration loaded")
print(f"   Model: {CONFIG['model_name']}")
print(f"   Dataset: {CONFIG['dataset_name']}")
print(f"   LoRA rank: {CONFIG['lora_rank']}, alpha: {CONFIG['lora_alpha']}")


---

<a name="model-loading"></a>
## 2. Model Loading

**‚è±Ô∏è Time Estimate: 2-5 minutes** | **GPU REQUIRED**

Load Nemotron Next 8B using NeMo AutoModel APIs.


In [None]:
# GPU REQUIRED - Load Nemotron Next 8B
# ‚è±Ô∏è Time: 2-5 minutes

print("=" * 50)
print("Loading Nemotron Next 8B...")
print("=" * 50)

start_time = time.time()

# Load tokenizer
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    CONFIG["model_name"],
    trust_remote_code=True,
)

# Ensure padding token is set
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print(f"‚úÖ Tokenizer loaded")
print(f"   Vocab size: {tokenizer.vocab_size}")
print(f"   Pad token: {tokenizer.pad_token}")

# Load model using NeMo AutoModel
print("\nLoading model (this may take a few minutes)...")
model = NeMoAutoModelForCausalLM.from_pretrained(
    CONFIG["model_name"],
    torch_dtype=CONFIG["torch_dtype"],
    trust_remote_code=True,
    device_map="auto",
)

load_time = time.time() - start_time

print(f"\n‚úÖ Model loaded successfully in {load_time:.1f} seconds")
print(f"   Model type: {type(model).__name__}")
print(f"   Dtype: {model.dtype}")
print(f"   Device: {next(model.parameters()).device}")


In [None]:
# Print model architecture summary
def count_parameters(model):
    """Count trainable and total parameters."""
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return total_params, trainable_params

total, trainable = count_parameters(model)

print("=" * 50)
print("Model Summary")
print("=" * 50)
print(f"Total parameters: {total:,} ({total/1e9:.2f}B)")
print(f"Trainable parameters: {trainable:,} ({trainable/1e9:.2f}B)")
print(f"Trainable %: {100 * trainable / total:.2f}%")


In [None]:
# Test basic inference to verify model works
print("=" * 50)
print("Testing Model Inference")
print("=" * 50)

# Simple financial question to test
test_prompt = """You are a helpful financial assistant.

Question: What is the difference between a stock and a bond?

Answer:"""

# Tokenize
inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)

# Generate
print("Generating response...")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
    )

# Decode
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("\n" + "=" * 50)
print("Test Prompt:")
print("=" * 50)
print(test_prompt)
print("\n" + "=" * 50)
print("Model Response:")
print("=" * 50)
print(response[len(test_prompt):].strip())
print("\n‚úÖ Model inference test passed!")


---

<a name="dataset"></a>
## 3. Dataset Loading & Preprocessing

**‚è±Ô∏è Time Estimate: 10-25 minutes** | **GPU REQUIRED**

Load FiQA dataset and format for instruction fine-tuning.

### FiQA Dataset Info
- **Source**: [explodinggradients/fiqa](https://huggingface.co/datasets/explodinggradients/fiqa)
- **Splits**: Train (5,500), Validation (500), Test (648)
- **Format**: Question + Ground Truth Answers


In [None]:
# Load FiQA dataset
# ‚è±Ô∏è Time: ~1 minute

print("=" * 50)
print("Loading FiQA Dataset")
print("=" * 50)

# Load dataset with 'main' config
fiqa_dataset = load_dataset("explodinggradients/fiqa", "main")

print(f"‚úÖ Dataset loaded!")
print(f"\nSplits:")
for split_name, split_data in fiqa_dataset.items():
    print(f"  {split_name}: {len(split_data):,} samples")

print(f"\nColumns: {fiqa_dataset['train'].column_names}")


In [None]:
# Explore sample data
print("=" * 50)
print("Sample Data")
print("=" * 50)

sample = fiqa_dataset['train'][0]
print(f"\nQuestion:\n{sample['question']}")
print(f"\nGround Truth Answer (first of {len(sample['ground_truths'])}):")
print(sample['ground_truths'][0][:500] + "..." if len(sample['ground_truths'][0]) > 500 else sample['ground_truths'][0])


In [None]:
# Format data for instruction fine-tuning
def format_for_training(example: Dict) -> Dict:
    """
    Format FiQA sample for instruction fine-tuning.
    
    Input format:
        - question: str
        - ground_truths: List[str]
    
    Output format:
        - text: formatted instruction string
        - input_text: question only (for inference)
        - target_text: answer only (for evaluation)
    """
    question = example['question']
    # Use first ground truth answer (most relevant)
    answer = example['ground_truths'][0] if example['ground_truths'] else ""
    
    # Truncate very long answers to avoid context length issues
    max_answer_len = 1024
    if len(answer) > max_answer_len:
        answer = answer[:max_answer_len] + "..."
    
    # Format as instruction-following
    instruction = f"""You are a helpful financial assistant. Answer the following question about finance.

Question: {question}

Answer: {answer}"""
    
    # Also store components separately for evaluation
    input_prompt = f"""You are a helpful financial assistant. Answer the following question about finance.

Question: {question}

Answer:"""
    
    return {
        "text": instruction,
        "input_text": input_prompt,
        "target_text": answer,
        "question": question,
    }

print("‚úÖ Formatting function defined")


In [None]:
# Apply formatting to all splits
# ‚è±Ô∏è Time: ~1 minute

print("=" * 50)
print("Formatting Dataset")
print("=" * 50)

formatted_dataset = fiqa_dataset.map(
    format_for_training,
    remove_columns=['ground_truths'],  # Keep only new columns
    desc="Formatting samples"
)

print(f"\n‚úÖ Dataset formatted!")
print(f"New columns: {formatted_dataset['train'].column_names}")

# Show formatted example
print("\n" + "=" * 50)
print("Formatted Sample")
print("=" * 50)
print(formatted_dataset['train'][0]['text'][:800] + "...")


In [None]:
# Tokenize dataset
# ‚è±Ô∏è Time: 5-15 minutes

def tokenize_function(examples: Dict) -> Dict:
    """Tokenize examples for training."""
    # Tokenize the full text (question + answer)
    tokenized = tokenizer(
        examples['text'],
        truncation=True,
        max_length=CONFIG['max_length'],
        padding='max_length',
        return_tensors=None,
    )
    
    # For causal LM, labels are same as input_ids
    tokenized['labels'] = tokenized['input_ids'].copy()
    
    return tokenized

print("=" * 50)
print("Tokenizing Dataset")
print("=" * 50)

tokenized_dataset = formatted_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=['text', 'input_text', 'target_text', 'question'],
    desc="Tokenizing"
)

print(f"\n‚úÖ Dataset tokenized!")
print(f"Columns: {tokenized_dataset['train'].column_names}")
print(f"Sample input_ids length: {len(tokenized_dataset['train'][0]['input_ids'])}")


In [None]:
# Dataset summary
print("=" * 50)
print("Dataset Preparation Complete")
print("=" * 50)

print("\nüìä Dataset Statistics:")
print(f"   Train samples: {len(tokenized_dataset['train']):,}")
print(f"   Validation samples: {len(tokenized_dataset['validation']):,}")
print(f"   Test samples: {len(tokenized_dataset['test']):,}")
print(f"   Max sequence length: {CONFIG['max_length']}")

# Keep the formatted dataset for evaluation later
eval_dataset = formatted_dataset  # Has input_text and target_text

print("\n‚úÖ Phase 3: Dataset Loading & Preprocessing complete!")


---

<a name="baseline"></a>
## 4. Baseline Evaluation

**‚è±Ô∏è Time Estimate: 30-60 minutes** | **GPU REQUIRED**

Evaluate the base model on FiQA test set before fine-tuning.

> üöß **TODO: Phase 4** - Implement baseline evaluation


---

<a name="lora-config"></a>
## 5. LoRA Configuration

**‚è±Ô∏è Time Estimate: ~5 minutes** | **GPU REQUIRED**

Configure and apply LoRA adapter to the model.

> üöß **TODO: Phase 5** - Implement LoRA configuration


---

<a name="training"></a>
## 6. LoRA Training

**‚è±Ô∏è Time Estimate: 3-6 hours** | **GPU REQUIRED**

Train the LoRA adapter on FiQA training data.

> üöß **TODO: Phase 6** - Implement LoRA training


---

<a name="evaluation"></a>
## 7. Fine-tuned Evaluation

**‚è±Ô∏è Time Estimate: 30-60 minutes** | **GPU REQUIRED**

Evaluate the fine-tuned model and compare with baseline.

> üöß **TODO: Phase 7** - Implement fine-tuned evaluation


---

<a name="visualization"></a>
## 8. Visualization & Analysis

Create visualizations comparing baseline vs fine-tuned performance.

> üöß **TODO: Phase 8** - Implement visualization and analysis


---

## Summary

This notebook demonstrated LoRA fine-tuning of Nemotron Next 8B on the FiQA financial QA dataset.

### Results

| Metric | Baseline | Fine-tuned | Improvement |
|--------|----------|------------|-------------|
| Exact Match | TBD | TBD | TBD |
| F1 Score | TBD | TBD | TBD |
| BLEU | TBD | TBD | TBD |

### Next Steps

- Experiment with different LoRA ranks
- Try longer training
- Evaluate on additional financial QA datasets
