# Fine-Tune Gemma 3 for Financial Q&A: Step-by-Step Guide

This notebook demonstrates how to fine-tune **Google's Gemma 3** model on a **financial reasoning dataset** to improve accuracy and reasoning capabilities.

## 🌟 What is Gemma 3?

**Gemma 3** is Google's latest open-source model family, built on Gemini 2.0 technology:
- **Sizes**: 1B to 27B parameters (we'll use 4B)
- **Performance**: Rivals proprietary models (Llama3-405B, DeepSeek-V3)
- **Multimodal**: Text, image, and video reasoning
- **Multilingual**: 140+ languages supported
- **Long context**: 128K token window
- **Quantized**: Official 4-bit and 8-bit versions

## 🎯 What You'll Learn

- How to fine-tune Gemma 3 for financial reasoning
- How to work with reasoning chains (`<think>` tags)
- How to use the TRL library with Gemma 3
- How to evaluate financial Q&A models
- How to save and share models on Hugging Face

## 💼 Use Case: Financial Reasoning

**Why this matters:**
- Financial decisions require complex reasoning
- Multi-step calculations need clear logic
- Transparent thinking builds trust
- Domain-specific accuracy is critical

## 🔧 Requirements

- **GPU**: 16GB+ VRAM (dual GPU setup recommended)
- **Time**: 1-2 hours for 1 epoch on 500 samples
- **Dataset**: FinQA with reasoning paths (500 examples)
- **Model**: Gemma 3 4B Instruct

## 📊 Key Stats

| Metric | Value |
|--------|-------|
| Base Model | Gemma 3 4B IT |
| LoRA Rank | 64 |
| Training Examples | 500 |
| Training Epochs | 1 |
| Training Time | ~1.5 hours |
| Improved Accuracy | Significant |

## 📖 Table of Contents

1. [Setup Environment](#1-setup-environment)
2. [Load Model and Tokenizer](#2-load-model-and-tokenizer)
3. [Load and Process Dataset](#3-load-and-process-dataset)
4. [Test Before Fine-tuning](#4-test-before-fine-tuning)
5. [Configure and Train](#5-configure-and-train)
6. [Test After Fine-tuning](#6-test-after-fine-tuning)
7. [Save and Share Model](#7-save-and-share-model)

---

**Credits**: Based on DataCamp tutorial and research on Gemma 3

## 1. Setup Environment

### Install Required Packages

We'll install the latest versions of key libraries:

- **transformers**: Latest version with Gemma 3 support
- **datasets**: For loading datasets
- **accelerate**: For optimized training
- **peft**: For LoRA implementation
- **trl**: For SFTTrainer
- **bitsandbytes**: For quantization

**Important**: We need the latest transformers version for Gemma 3 support!

In [None]:
# Install required packages
%%capture
!pip install -U datasets 
!pip install -U accelerate 
!pip install -U peft 
!pip install -U trl 
!pip install -U bitsandbytes
!pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3

print("✅ All packages installed successfully!")
print("   Note: Using latest transformers version for Gemma 3 support")

In [None]:
# Import necessary libraries
import torch
from transformers import (
    AutoTokenizer,
    Gemma3ForConditionalGeneration,
    TrainingArguments,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig
from trl import SFTTrainer
from datasets import load_dataset

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"  GPU {i}: {torch.cuda.get_device_name(i)} ({torch.cuda.get_device_properties(i).total_memory / 1e9:.2f} GB)")

### Login to Hugging Face

You'll need a Hugging Face account and API token to:
- Download Gemma 3 model (requires agreement to terms)
- Push your fine-tuned model to the Hub

**Get your token**: [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

In [None]:
# Login to Hugging Face
from huggingface_hub import login

# If running on Kaggle with secrets:
# from kaggle_secrets import UserSecretsClient
# user_secrets = UserSecretsClient()
# hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
# login(hf_token)

# For Jupyter/Colab:
login()

print("✅ Logged in to Hugging Face!")

## 2. Load Model and Tokenizer

### Download Gemma 3 4B Instruct

We'll use the **Gemma 3 4B Instruct** model:
- 4 billion parameters (good balance of quality and efficiency)
- Instruction-tuned variant (better for Q&A)
- Supports reasoning with proper prompting

**Important Settings:**
- `device_map="auto"`: Automatically distribute across GPUs
- `attn_implementation='eager'`: Use standard attention (more compatible)
- `.eval()`: Set to evaluation mode initially

In [None]:
# Define model path
# If downloaded from Kaggle, use local path:
# GEMMA_PATH = "/kaggle/input/gemma-3/transformers/gemma-3-4b-it/1"

# Otherwise, use Hugging Face Hub:
GEMMA_PATH = "google/gemma-3-4b-it"

print(f"Model: {GEMMA_PATH}")
print("\nNote: This model requires accepting terms on Hugging Face")
print("Visit: https://huggingface.co/google/gemma-3-4b-it")

In [None]:
# Load Gemma 3 model
print(f"Loading {GEMMA_PATH}...")
print("This may take several minutes on first load...")

model = Gemma3ForConditionalGeneration.from_pretrained(
    GEMMA_PATH,
    device_map="auto",  # Automatically distribute across available GPUs
    attn_implementation='eager',  # Standard attention implementation
    torch_dtype=torch.bfloat16,  # Use BFloat16 for efficiency
).eval()

print("\n✅ Model loaded successfully!")
print(f"Model device map: {model.hf_device_map if hasattr(model, 'hf_device_map') else 'auto'}")

In [None]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(GEMMA_PATH)

print("✅ Tokenizer loaded successfully!")
print(f"\nVocabulary size: {len(tokenizer):,}")
print(f"Model max length: {tokenizer.model_max_length}")
print(f"Padding side: {tokenizer.padding_side}")

## 3. Load and Process Dataset

### About the Financial Reasoning Dataset

**Dataset**: `TheFinAI/Fino1_Reasoning_Path_FinQA`

This dataset contains:
- **Financial questions** from FinQA benchmark
- **Reasoning paths** generated by GPT-4o
- **Structured answers** with calculations
- **Complex financial scenarios**

**Format:**
- `Open-ended Verifiable Question`: The financial question
- `Complex_CoT`: Chain-of-thought reasoning
- `Response`: The final answer

### Prompt Template

We'll structure the data with:
```
### Question:
[financial question]

### Response:
<think>
[reasoning chain]
</think>
[final answer]
```

In [None]:
# Define training prompt style
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Question:
{}

### Response:
<think>
{}
</think>
{}
"""

print("✅ Prompt template defined!")
print("\nThis template structures the data with reasoning chains.")

In [None]:
# Define formatting function
def formatting_prompts_func(examples):
    """
    Format dataset examples into training format.
    
    Args:
        examples: Batch of examples from dataset
        
    Returns:
        Dictionary with formatted texts
    """
    inputs = examples["Open-ended Verifiable Question"]
    complex_cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    
    for question, cot, response in zip(inputs, complex_cots, outputs):
        # Append EOS token if not present
        if not response.endswith(tokenizer.eos_token):
            response += tokenizer.eos_token
        
        # Format with template
        text = train_prompt_style.format(question, cot, response)
        texts.append(text)
    
    return {"text": texts}

print("✅ Formatting function defined!")

In [None]:
# Load the financial reasoning dataset
print("Loading FinQA reasoning dataset...")
print("Using 500 samples for efficient training...")

dataset = load_dataset(
    "TheFinAI/Fino1_Reasoning_Path_FinQA",
    split="train[0:500]",  # Use first 500 samples
    trust_remote_code=True
)

print(f"\n✅ Dataset loaded successfully!")
print(f"Number of examples: {len(dataset):,}")
print(f"\nDataset columns: {dataset.column_names}")

In [None]:
# Explore the dataset structure
print("=== Sample Example ===\n")

sample = dataset[0]
print(f"Question: {sample['Open-ended Verifiable Question']}\n")
print(f"Reasoning (first 300 chars): {sample['Complex_CoT'][:300]}...\n")
print(f"Response: {sample['Response']}")

In [None]:
# Apply formatting to dataset
print("Applying formatting to dataset...")

dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
)

print("\n✅ Dataset formatted successfully!")
print(f"New column added: 'text'")
print(f"\nExample length: {len(dataset['text'][0])} characters")

In [None]:
# Inspect a formatted example
print("=== Formatted Example (first 800 chars) ===\n")
print(dataset["text"][0][:800])
print("\n... (truncated) ...\n")
print("=" * 70)

In [None]:
# Create data collator for language modeling
# SFTTrainer requires this instead of tokenizer directly

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # Causal LM, not masked LM
)

print("✅ Data collator created!")
print("   This will handle tokenization during training")

## 4. Test Before Fine-tuning

Let's establish a **baseline** by testing the model before fine-tuning.

### Test Setup

We'll use a financial question from the dataset and see how the base model performs.

**Expected behavior of base model:**
- May provide generic/short answers
- Limited reasoning depth
- Possible inaccuracies in calculations
- Doesn't follow reasoning format perfectly

In [None]:
# Define prompt style for inference (without answer/reasoning)
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Question:
{}

### Response:
<think>
{}
"""

print("✅ Inference prompt template defined!")

In [None]:
# Select a test question from the dataset
test_idx = 0
question = dataset[test_idx]['Open-ended Verifiable Question']

print("=== Test Question ===")
print(question)
print("\n" + "=" * 70)
print("\n=== Expected Answer (from dataset) ===")
print(dataset[test_idx]['Response'])
print("=" * 70)

In [None]:
# Generate response with base model (before fine-tuning)
print("\n=== BASE MODEL RESPONSE (Before Fine-tuning) ===\n")

# Format the prompt
test_prompt = prompt_style.format(question, "") + tokenizer.eos_token

# Tokenize
inputs = tokenizer([test_prompt], return_tensors="pt").to("cuda")

# Generate
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
    do_sample=False,  # Greedy for consistency
)

# Decode
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# Extract just the response part
if "### Response:" in response[0]:
    response_text = response[0].split("### Response:")[1].strip()
else:
    response_text = response[0]

print(response_text)
print("\n" + "=" * 70)

print("\n💡 Observation: The base model may provide short, less accurate answers")
print("   We'll see significant improvement after fine-tuning!")

## 5. Configure and Train

### Configure LoRA for Gemma 3

**LoRA Settings:**
- **r=64**: Higher rank for better quality (suitable for 4B model)
- **lora_alpha=16**: Scaling factor
- **lora_dropout=0.05**: Slight dropout for regularization
- **target_modules**: All attention and MLP projections

**Why target all modules?**
- Gemma 3 benefits from full coverage
- Better reasoning capabilities
- More accurate financial calculations

In [None]:
# Configure LoRA
peft_config = LoraConfig(
    lora_alpha=16,  # Scaling factor for LoRA
    lora_dropout=0.05,  # Dropout for regularization
    r=64,  # Rank of LoRA update matrices (higher = more capacity)
    bias="none",  # No bias reparameterization
    task_type="CAUSAL_LM",  # Causal language modeling
    target_modules=[
        "q_proj",      # Query projection
        "k_proj",      # Key projection
        "v_proj",      # Value projection
        "o_proj",      # Output projection
        "gate_proj",   # Gate projection (MLP)
        "up_proj",     # Up projection (MLP)
        "down_proj",   # Down projection (MLP)
    ],
)

print("✅ LoRA configuration created!")
print(f"\nLoRA settings:")
print(f"  Rank (r): {peft_config.r}")
print(f"  Alpha: {peft_config.lora_alpha}")
print(f"  Dropout: {peft_config.lora_dropout}")
print(f"  Target modules: {len(peft_config.target_modules)} modules")

In [None]:
# Define training arguments
training_arguments = TrainingArguments(
    output_dir="output",
    
    # Training duration
    num_train_epochs=1,
    
    # Batch sizes
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,  # Effective batch size = 2
    
    # Optimization
    optim="paged_adamw_32bit",  # Memory-efficient optimizer
    learning_rate=2e-4,
    warmup_steps=10,
    
    # Mixed precision
    fp16=False,
    bf16=False,  # Set to True if your GPU supports it
    
    # Logging
    logging_strategy="steps",
    logging_steps=0.2,  # Log 5 times per epoch
    
    # Efficiency
    group_by_length=True,  # Group similar lengths for efficiency
    
    # Experiment tracking
    report_to="none",  # Set to "wandb" if using W&B
)

print("✅ Training arguments configured!")
print(f"\nKey settings:")
print(f"  Epochs: {training_arguments.num_train_epochs}")
print(f"  Batch size: {training_arguments.per_device_train_batch_size}")
print(f"  Gradient accumulation: {training_arguments.gradient_accumulation_steps}")
print(f"  Effective batch size: {training_arguments.per_device_train_batch_size * training_arguments.gradient_accumulation_steps}")
print(f"  Learning rate: {training_arguments.learning_rate}")

In [None]:
# Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=dataset,
    peft_config=peft_config,
    data_collator=data_collator,
    dataset_text_field="text",  # Use the formatted 'text' column
)

print("✅ SFTTrainer initialized!")
print("\nReady to start training...")

In [None]:
# Clear CUDA cache before training
torch.cuda.empty_cache()

print("=" * 70)
print("STARTING FINE-TUNING")
print("=" * 70)
print()
print(f"Training on {len(dataset):,} financial reasoning examples")
print(f"Running for {training_arguments.num_train_epochs} epoch(s)")
print()
print("Expected training time: ~1-2 hours")
print("Monitor GPU usage with: nvidia-smi")
print()
print("=" * 70)
print()

# Start training
# Uncomment the line below to start training
# trainer_stats = trainer.train()

print("⚠️ Training is commented out by default.")
print("Uncomment 'trainer_stats = trainer.train()' above to start actual training.")
print()
print("💡 Expected: Loss will gradually decrease, indicating learning")

### Training Tips

**Monitor these during training:**
- **Loss**: Should decrease gradually (~2.5 → ~1.0 typical)
- **GPU Memory**: Should stay under 16GB per GPU
- **Speed**: ~1-2 seconds per step
- **Total time**: ~1-1.5 hours for 500 samples

**If you run out of memory:**
- Reduce `max_seq_length` in SFTTrainer
- Set `gradient_accumulation_steps=4`
- Enable `gradient_checkpointing=True`

**If training is slow:**
- Ensure `bf16=True` (if supported)
- Check GPU utilization with `nvidia-smi`
- Consider using fewer target modules

## 6. Test After Fine-tuning

Now let's test the **fine-tuned model** and compare it with the baseline!

### What to Expect

**After fine-tuning, the model should:**
- ✅ Provide detailed reasoning in `<think>` tags
- ✅ Show step-by-step calculations
- ✅ Give accurate final answers
- ✅ Follow the reasoning format consistently
- ✅ Understand financial terminology better

In [None]:
# Test the fine-tuned model
# Use the same question as before for comparison

print("=== FINE-TUNED MODEL RESPONSE ===\n")

# Format the prompt
test_prompt = prompt_style.format(question, "") + tokenizer.eos_token

# Tokenize
inputs = tokenizer([test_prompt], return_tensors="pt").to("cuda")

# Generate
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
    do_sample=False,
)

# Decode
response_finetuned = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# Extract response
if "### Response:" in response_finetuned[0]:
    response_text_finetuned = response_finetuned[0].split("### Response:")[1].strip()
else:
    response_text_finetuned = response_finetuned[0]

print(response_text_finetuned)
print("\n" + "=" * 70)

### Comparison: Before vs After

Let's analyze the improvement:

**Base Model (Before):**
- ❌ Short, potentially inaccurate answers
- ❌ Limited reasoning shown
- ❌ May not understand financial context
- ❌ Doesn't follow format

**Fine-Tuned Model (After):**
- ✅ Detailed reasoning in `<think>` tags
- ✅ Step-by-step calculations
- ✅ Accurate financial understanding
- ✅ Follows format consistently
- ✅ Provides precise answers

**Example from Tutorial:**

*Question:* "What portion of the estimated amortization expense will be recognized in 2017?"

*Base Model:* "About $10,509" (incorrect - gives amount instead of ratio)

*Fine-Tuned:* "About 18.05%" (correct - shows reasoning and calculates ratio)

**The fine-tuning enables the model to:**
1. Understand what "portion" means (ratio, not amount)
2. Show all calculation steps
3. Arrive at the correct answer format

### Test with Multiple Financial Questions

Let's test the model on various types of financial questions to validate its reasoning across different scenarios.

In [None]:
# Test with another question from the dataset
test_idx_2 = 10
question_2 = dataset[test_idx_2]['Open-ended Verifiable Question']

print("=== Test Question 2 ===")
print(question_2)
print("\n" + "=" * 70)

# Generate response
test_prompt_2 = prompt_style.format(question_2, "") + tokenizer.eos_token
inputs_2 = tokenizer([test_prompt_2], return_tensors="pt").to("cuda")

outputs_2 = model.generate(
    input_ids=inputs_2.input_ids,
    attention_mask=inputs_2.attention_mask,
    max_new_tokens=1200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)

response_2 = tokenizer.batch_decode(outputs_2, skip_special_tokens=True)

# Extract and display
if "### Response:" in response_2[0]:
    print("\n=== RESPONSE ===")
    print(response_2[0].split("### Response:")[1].strip())
else:
    print(response_2[0])

print("\n" + "=" * 70)

In [None]:
# Function for batch testing
def test_financial_reasoning(model, tokenizer, question, max_tokens=1200):
    """
    Test the model on a financial question.
    
    Args:
        model: The fine-tuned model
        tokenizer: The tokenizer
        question: The financial question
        max_tokens: Maximum tokens to generate
        
    Returns:
        Generated response with reasoning
    """
    # Format prompt
    prompt = prompt_style.format(question, "") + tokenizer.eos_token
    
    # Tokenize
    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
    
    # Generate
    outputs = model.generate(
        input_ids=inputs.input_ids,
        attention_mask=inputs.attention_mask,
        max_new_tokens=max_tokens,
        eos_token_id=tokenizer.eos_token_id,
        use_cache=True,
    )
    
    # Decode
    response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    
    # Extract response
    if "### Response:" in response:
        return response.split("### Response:")[1].strip()
    return response

print("✅ Testing function defined!")
print("\nUse this to test your model on any financial question.")

## 7. Save and Share Model

After training, we'll save the model in two ways:
1. **Locally**: For immediate use
2. **Hugging Face Hub**: For sharing with community

### Model Formats

**Standard PyTorch**: Full model with LoRA merged
- Best for: Production deployment
- Size: ~4-8GB
- Load with: `AutoModelForCausalLM.from_pretrained()`

**LoRA Adapters Only**: Just the fine-tuned weights
- Best for: Sharing, experimentation
- Size: ~100-200MB
- Load with: `PeftModel.from_pretrained()`

In [None]:
# Save model and tokenizer locally
new_model_local = "Gemma-3-4B-Fin-QA-Reasoning"

print(f"Saving model to: {new_model_local}")

# Uncomment when training is complete
# model.save_pretrained(new_model_local)
# tokenizer.save_pretrained(new_model_local)
# print("\n✅ Model and tokenizer saved locally!")

print("⚠️ Saving is commented out - uncomment after training")

In [None]:
# Push to Hugging Face Hub
new_model_online = "your-username/Gemma-3-4B-Fin-QA-Reasoning"  # Update with your username

print(f"Pushing model to: {new_model_online}")

# Uncomment when training is complete
# model.push_to_hub(new_model_online)
# tokenizer.push_to_hub(new_model_online)
# print("\n✅ Model and tokenizer pushed to Hub!")

print("⚠️ Pushing to Hub is commented out")
print("Update 'your-username' with your HF username, then uncomment")

### Loading Your Fine-Tuned Model

Once saved, you can load your model anytime:

```python
from transformers import AutoTokenizer, Gemma3ForConditionalGeneration

# From local
model = Gemma3ForConditionalGeneration.from_pretrained("./Gemma-3-4B-Fin-QA-Reasoning")
tokenizer = AutoTokenizer.from_pretrained("./Gemma-3-4B-Fin-QA-Reasoning")

# From Hub
model = Gemma3ForConditionalGeneration.from_pretrained("your-username/Gemma-3-4B-Fin-QA-Reasoning")
tokenizer = AutoTokenizer.from_pretrained("your-username/Gemma-3-4B-Fin-QA-Reasoning")
```

## 🎉 Congratulations!

You've successfully:
- ✅ Loaded and configured Gemma 3 4B model
- ✅ Prepared financial reasoning dataset with proper formatting
- ✅ Applied LoRA for efficient fine-tuning
- ✅ Trained on 500 financial Q&A examples
- ✅ Tested and compared before/after performance
- ✅ Learned to save and share your model

## 🎯 Key Achievements

### Gemma 3 Mastery
- **Latest model**: Using Google's newest open-source model
- **Efficient training**: LoRA with rank 64
- **Reasoning capabilities**: Step-by-step thinking in `<think>` tags
- **Domain expertise**: Financial question answering

### Reasoning Improvement
- **Before**: Short, potentially inaccurate answers
- **After**: Detailed reasoning chains with accurate results
- **Example**: 
  - Base: "About $10,509" (wrong format)
  - Fine-tuned: "About 18.05%" (correct ratio with reasoning)

### Production Skills
- **Data formatting**: Structured prompt engineering
- **Training pipeline**: End-to-end workflow
- **Model deployment**: Saving and sharing
- **Evaluation**: Before/after comparison

## 🚀 Next Steps

### Immediate
1. **Train longer**: Try 2-3 epochs for better results
2. **More data**: Use full dataset (1000+ examples)
3. **Evaluate systematically**: Create test set with ground truth
4. **Hyperparameter tuning**: Adjust learning rate, LoRA rank

### Advanced
1. **Larger model**: Try Gemma 3 9B or 27B
2. **Multi-task**: Combine financial Q&A with other tasks
3. **Multimodal**: Add financial charts/graphs (Gemma 3 supports images)
4. **Ensemble**: Combine multiple fine-tuned models

### Production
1. **Quantization**: Convert to 4-bit/8-bit for deployment
2. **API**: Wrap in FastAPI or similar
3. **Monitoring**: Track answer quality over time
4. **Benchmarking**: Test on FinQA benchmark

## 📚 Resources

### Papers & Research
- [Gemma 3 Technical Report](https://storage.googleapis.com/deepmind-media/gemma/gemma-3-report.pdf)
- [LoRA Paper](https://arxiv.org/abs/2106.09685)
- [FinQA Dataset Paper](https://arxiv.org/abs/2109.00122)

### Tools & Libraries
- [Gemma Models](https://huggingface.co/collections/google/gemma-3-release-67761845aef84b8eaca4cb1a)
- [TRL Documentation](https://huggingface.co/docs/trl)
- [PEFT Documentation](https://huggingface.co/docs/peft)
- [Unsloth](https://github.com/unslothai/unsloth) (alternative training library)

### Datasets
- [FinQA](https://huggingface.co/datasets/dreamerdeo/finqa)
- [Financial Reasoning Datasets](https://huggingface.co/datasets?search=financial+reasoning)
- [TAT-QA](https://huggingface.co/datasets/nightdessert/TAT-QA)

## 💡 Tips for Better Results

### Data Quality
1. **More examples**: 1000-5000 for best results
2. **Diverse questions**: Cover different financial topics
3. **Quality reasoning**: Ensure CoT chains are logical
4. **Balanced difficulty**: Mix easy and hard questions

### Training
1. **Learning rate**: Try 1e-4 to 5e-4
2. **LoRA rank**: Higher (128) for larger models
3. **Epochs**: 2-3 typically optimal
4. **Batch size**: Increase if you have memory

### Evaluation
1. **Hold-out test set**: Never seen during training
2. **Answer accuracy**: Check if final answers are correct
3. **Reasoning quality**: Evaluate logic in `<think>` tags
4. **Human review**: Sample random predictions

## 🔧 Troubleshooting

### Out of Memory
- Reduce `max_new_tokens` to 1024
- Set `per_device_train_batch_size=1`
- Increase `gradient_accumulation_steps=4`
- Use `gradient_checkpointing=True` in TrainingArguments

### Poor Quality Results
- Train for more epochs (2-3)
- Increase LoRA rank to 128
- Check data quality and formatting
- Ensure reasoning chains are complete

### Slow Training
- Enable BF16: `bf16=True` (if supported)
- Reduce `max_new_tokens` in generation
- Use `group_by_length=True` (already set)
- Consider using Unsloth for 2x speedup

### Model Not Following Format
- Check prompt template formatting
- Ensure EOS tokens are present
- Verify dataset formatting is correct
- Train for more steps/epochs

## 🌟 Why This Approach Works

### Gemma 3 Advantages
- **Latest technology**: Based on Gemini 2.0
- **Multimodal ready**: Can be extended to charts/images
- **Efficient**: Smaller than comparable models
- **Open-source**: Full access and customization

### Fine-Tuning Benefits
- **Domain adaptation**: Learns financial terminology
- **Reasoning improvement**: Explicit thinking process
- **Accuracy gains**: Better than generic models
- **Format consistency**: Follows desired style

### LoRA Efficiency
- **Fast training**: 1-2 hours vs days
- **Low memory**: Fits on consumer GPUs
- **Comparable quality**: Nearly as good as full FT
- **Easy deployment**: Small adapter files

---

**Happy Fine-Tuning!** 🎓✨

For more tutorials, check out:
- [DeepSeek-R1 Reasoning](./Math-Reasoning-Qwen-GRPO.ipynb) (Synthetic data approach)
- [GPT-2 From Scratch](../01-Full-Fine-Tuning/)
- [Falcon-7B LoRA](../02-PEFT/)
- [FLAN-T5 Summarization](../03-Instruction-Tuning/Summarization-FLAN-T5.ipynb)
- [Financial Sentiment OPT](../03-Instruction-Tuning/Financial-Sentiment-OPT.ipynb)

---

**You've now mastered reasoning fine-tuning with Gemma 3!** 🚀

This completes a comprehensive understanding of:
- ✅ Different reasoning approaches (DeepSeek vs Gemma)
- ✅ Synthetic vs real reasoning data
- ✅ Various model architectures
- ✅ Production deployment patterns

**Thank you for learning with us!** 🙏