# Mistral-7B-Instruct Fine-tuning for Engineering Document Q&A

This notebook demonstrates domain adaptation of Mistral-7B-Instruct-v0.3 for engineering document Q&A using LoRA fine-tuning.

## 1. Environment Setup

In [27]:
import os
import json
import pandas as pd
from pathlib import Path
import random
from sklearn.model_selection import train_test_split

# Set base paths
BASE_DIR = Path("/home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune")
FINETUNE_DIR = BASE_DIR / "mistral-finetune"
DATA_DIR = BASE_DIR / "data"
MODEL_DIR = BASE_DIR / "models"
OUTPUT_DIR = BASE_DIR / "output"

# Create directories
DATA_DIR.mkdir(exist_ok=True)
MODEL_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)

print(f"Base directory: {BASE_DIR}")
print(f"Data directory: {DATA_DIR}")
print(f"Model directory: {MODEL_DIR}")
print(f"Output directory: {OUTPUT_DIR}")

Base directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune
Data directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data
Model directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models
Output directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/output


## 2. Data Preparation

Convert CSV Q&A data to JSONL format required by mistral-finetune.

In [28]:
# Load the CSV data
csv_path = BASE_DIR / "rag_eval_QA.csv"
df = pd.read_csv(csv_path)

print(f"Total samples: {len(df)}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst sample:")
print(df.iloc[0])

Total samples: 267

Columns: ['input_query', 'output_expected_answer', 'pdf_name', 'question_number']

First sample:
input_query               What is the maximum defrost duration in minute...
output_expected_answer    The maximum defrost duration for an Ascend ® F...
pdf_name                                           03_ascend_jhd_series.pdf
question_number                                                           1
Name: 0, dtype: object


In [29]:
# Convert to instruction format
def create_instruction_sample(row):
    """
    Convert Q&A pair to Mistral instruct format.
    Format: user asks question, assistant provides answer.
    """
    system_prompt = (
        "You are a technical assistant specialized in commercial refrigeration equipment. "
        "Provide accurate, concise answers based on equipment manuals and documentation."
    )
    
    return {
        "messages": [
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": row['input_query']
            },
            {
                "role": "assistant",
                "content": row['output_expected_answer']
            }
        ]
    }

# Convert all samples
samples = [create_instruction_sample(row) for _, row in df.iterrows()]

print(f"Created {len(samples)} instruction samples")
print(f"\nExample sample:")
print(json.dumps(samples[0], indent=2))

Created 267 instruction samples

Example sample:
{
  "messages": [
    {
      "role": "system",
      "content": "You are a technical assistant specialized in commercial refrigeration equipment. Provide accurate, concise answers based on equipment manuals and documentation."
    },
    {
      "role": "user",
      "content": "What is the maximum defrost duration in minutes for an Ascend \u00ae Freezer according to the default settings?"
    },
    {
      "role": "assistant",
      "content": "The maximum defrost duration for an Ascend \u00ae Freezer is 30 minutes as per the default settings."
    }
  ]
}


In [30]:
# Split into train/validation sets (90/10 split)
train_samples, val_samples = train_test_split(
    samples, 
    test_size=0.1, 
    random_state=42
)

print(f"Training samples: {len(train_samples)}")
print(f"Validation samples: {len(val_samples)}")

Training samples: 240
Validation samples: 27


In [31]:
# Save to JSONL format
train_path = DATA_DIR / "train_instruct.jsonl"
val_path = DATA_DIR / "val_instruct.jsonl"

# Write training data
with open(train_path, 'w') as f:
    for sample in train_samples:
        f.write(json.dumps(sample) + '\n')

# Write validation data
with open(val_path, 'w') as f:
    for sample in val_samples:
        f.write(json.dumps(sample) + '\n')

print(f"✓ Training data saved to: {train_path}")
print(f"✓ Validation data saved to: {val_path}")

✓ Training data saved to: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data/train_instruct.jsonl
✓ Validation data saved to: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data/val_instruct.jsonl


## 3. Download Mistral-7B-Instruct-v0.3 Model

Download the base model for fine-tuning.

In [23]:
# Download Mistral-7B-Instruct-v0.3
model_url = "https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar"
model_tar = MODEL_DIR / "mistral-7B-Instruct-v0.3.tar"
model_extract_dir = MODEL_DIR / "mistral-7B-Instruct-v0.3"

if not model_extract_dir.exists():
    print("Downloading Mistral-7B-Instruct-v0.3...")
    !wget -O {model_tar} {model_url}
    
    print("Extracting model...")
    !tar -xf {model_tar} -C {MODEL_DIR}
    
    print(f"✓ Model extracted to: {model_extract_dir}")
else:
    print(f"✓ Model already exists at: {model_extract_dir}")

# Verify checksum (optional)
expected_checksum = "80b71fcb6416085bcb4efad86dfb4d52"
print(f"\nVerify checksum with: md5sum {model_tar}")
print(f"Expected: {expected_checksum}")

Downloading Mistral-7B-Instruct-v0.3...
--2025-12-09 11:47:19--  https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar
Resolving models.mistralcdn.com (models.mistralcdn.com)... 172.67.70.68, 104.26.6.117, 104.26.7.117, ...
Connecting to models.mistralcdn.com (models.mistralcdn.com)|172.67.70.68|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14496675840 (14G) [application/x-tar]
Saving to: ‘/home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3.tar’


2025-12-09 11:56:21 (25.5 MB/s) - ‘/home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3.tar’ saved [14496675840/14496675840]

Extracting model...
✓ Model extracted to: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3

Verify checksum with: md5sum /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.

## 4. Create Training Configuration

In [32]:
# Training configuration for engineering document Q&A (optimized for 2x RTX 5000 Ada - 33GB each)
config = {
    # Data paths
    "data": {
        "instruct_data": str(train_path),
        "data": "",  # No pretraining data
        "eval_instruct_data": str(val_path)
    },
    
    # Model configuration
    "model_id_or_path": str(model_extract_dir),
    "lora": {
        "rank": 64  # LoRA rank (can increase to 128 with 33GB VRAM)
    },
    
    # Training hyperparameters (optimized for 2x RTX 5000 Ada - 33GB VRAM each)
    "seq_len": 16384,
    "batch_size": 8,
    "max_steps": 500,
    
    # Optimizer settings
    "optim": {
        "lr": 6e-5,  # Learning rate
        "weight_decay": 0.1,
        "pct_start": 0.05
    },
    
    # Logging and evaluation
    "seed": 42,
    "log_freq": 10,
    "eval_freq": 100,
    "no_eval": False,
    "ckpt_freq": 100,
    
    # Save configuration
    "save_adapters": True,  # Save only LoRA adapters (smaller size)
    "run_dir": str(OUTPUT_DIR / "run_001"),
    
    # Weights & Biases (optional)
    "wandb": {
        "project": "mistral-7b-engineering-qa",
        "run_name": "engineering-docs-lora-2xRTX5000-33GB",
        "key": "",  # Add your W&B API key
        "offline": True  # Set to False if using W&B
    }
}

# Save configuration
config_path = BASE_DIR / "train_config.yaml"

import yaml
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"✓ Configuration saved to: {config_path}")
print(f"\nConfiguration:")
print(yaml.dump(config, default_flow_style=False, sort_keys=False))
print(f"\n⚡ Optimized for 2x RTX 5000 Ada (33GB VRAM each):")
print(f"  • 16K context window for longer documents")
print(f"  • Batch size 8 (4 per GPU) for faster training")
print(f"  • Expected memory usage: ~20-24GB per GPU")

✓ Configuration saved to: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/train_config.yaml

Configuration:
data:
  instruct_data: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data/train_instruct.jsonl
  data: ''
  eval_instruct_data: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data/val_instruct.jsonl
model_id_or_path: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3
lora:
  rank: 64
seq_len: 16384
batch_size: 8
max_steps: 500
optim:
  lr: 6.0e-05
  weight_decay: 0.1
  pct_start: 0.05
seed: 42
log_freq: 10
eval_freq: 100
no_eval: false
ckpt_freq: 100
save_adapters: true
run_dir: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/output/run_001
wandb:
  project: mistral-7b-engineering-qa
  run_name: engineering-docs-lora-2xRTX5000-33GB
  key: ''
  offline: true


⚡ Optimized for 2x RTX 5000 Ada (33GB VRAM each):
  • 16K context window for longe

## 5. Launch Fine-tuning


In [None]:
# RECOMMENDED: Multi-GPU training for 2x RTX 5000 Ada
import subprocess

# Change to mistral-finetune directory
os.chdir(FINETUNE_DIR)

# Run training with 2 GPUs
cmd = [
    "torchrun",
    "--nproc_per_node=2",  # 2 GPUs
    "--master_port=29500",
    "train.py",
    str(config_path)
]

print("Starting fine-tuning with 2x RTX 5000 Ada GPUs...")
print(f"Command: {' '.join(cmd)}")
print("\n" + "="*80 + "\n")

# Run training (this will take time depending on your GPU)
subprocess.run(cmd)

## 6. Monitor Training

Training logs and checkpoints will be saved to the `run_dir` specified in config.

In [None]:
# Check training output
run_dir = OUTPUT_DIR / "run_001"

if run_dir.exists():
    print(f"Training artifacts in: {run_dir}")
    print(f"\nDirectory contents:")
    !ls -lh {run_dir}
else:
    print(f"Training not started yet. Run the training cell above.")

## 7. Load and Test Fine-tuned Model

In [None]:
# After training completes, you can load and test the model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Paths
base_model_path = model_extract_dir
adapter_path = run_dir / "checkpoints" / "checkpoint_500"  # Final checkpoint

# Load base model and tokenizer
print("Loading base model...")
tokenizer = AutoTokenizer.from_pretrained(base_model_path)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load LoRA adapter
print("Loading LoRA adapter...")
model = PeftModel.from_pretrained(base_model, adapter_path)

print("✓ Model loaded successfully!")

In [None]:
# Test the fine-tuned model
def generate_answer(question, max_length=512):
    """
    Generate answer for a given question.
    """
    system_prompt = (
        "You are a technical assistant specialized in commercial refrigeration equipment. "
        "Provide accurate, concise answers based on equipment manuals and documentation."
    )
    
    # Format prompt
    prompt = f"<s>[INST] {system_prompt}\n\n{question} [/INST]"
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_length,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    # Decode
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract answer (remove prompt)
    answer = response.split('[/INST]')[-1].strip()
    
    return answer

# Test with sample questions
test_questions = [
    "What is the maximum defrost duration for an Ascend Freezer?",
    "Can I use an extension cord for my Arctic Air Chef Base Refrigerator?",
    "What should I do if the compressor is running too long?"
]

print("Testing fine-tuned model:\n")
print("="*80)

for i, question in enumerate(test_questions, 1):
    print(f"\nQuestion {i}: {question}")
    print(f"\nAnswer: {generate_answer(question)}")
    print("\n" + "="*80)

## 8. Evaluation Metrics (Optional)

In [None]:
# Evaluate on validation set
import numpy as np
from tqdm import tqdm

# Load validation samples
val_df = df.iloc[-len(val_samples):].reset_index(drop=True)

print(f"Evaluating on {len(val_df)} validation samples...\n")

# Generate predictions
predictions = []
for _, row in tqdm(val_df.iterrows(), total=len(val_df)):
    question = row['input_query']
    prediction = generate_answer(question)
    predictions.append(prediction)

# Add to dataframe
val_df['predicted_answer'] = predictions

# Save results
results_path = OUTPUT_DIR / "validation_results.csv"
val_df.to_csv(results_path, index=False)
print(f"\n✓ Results saved to: {results_path}")

## 9. Export Final Model (Optional)

Merge LoRA adapters with base model for deployment.

In [None]:
# Merge LoRA weights with base model
merged_model_path = OUTPUT_DIR / "merged_model"
merged_model_path.mkdir(exist_ok=True)

print("Merging LoRA adapter with base model...")
merged_model = model.merge_and_unload()

print("Saving merged model...")
merged_model.save_pretrained(merged_model_path)
tokenizer.save_pretrained(merged_model_path)

print(f"✓ Merged model saved to: {merged_model_path}")
print(f"\nModel size:")
!du -sh {merged_model_path}

## Summary

This notebook demonstrates:
1. ✓ Data preparation: CSV → JSONL format for Mistral instruct training
2. ✓ Model download: Mistral-7B-Instruct-v0.3
3. ✓ Configuration: Optimized LoRA training setup
4. ✓ Training: Single/multi-GPU fine-tuning with mistral-finetune
5. ✓ Evaluation: Load and test fine-tuned model
6. ✓ Export: Merge and save final model

## Next Steps
- Adjust hyperparameters (learning rate, batch size, max_steps) based on performance
- Experiment with different LoRA ranks (32, 64, 128)
- Add more training data for better domain adaptation
- Implement quantization (4-bit/8-bit) for deployment efficiency
- Deploy using vLLM or TGI for production serving