# Mistral-7B-Instruct Fine-tuning for Engineering Document Q&A

This notebook demonstrates domain adaptation of Mistral-7B-Instruct-v0.3 for engineering document Q&A using LoRA fine-tuning.

## 1. Environment Setup

In [14]:
import os
import json
import pandas as pd
from pathlib import Path
import random
from sklearn.model_selection import train_test_split

# Set base paths
BASE_DIR = Path("/home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune")
FINETUNE_DIR = BASE_DIR / "mistral-finetune"
DATA_DIR = BASE_DIR / "data"
MODEL_DIR = BASE_DIR / "models"
OUTPUT_DIR = BASE_DIR / "output"

# Create directories
DATA_DIR.mkdir(exist_ok=True)
MODEL_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)

print(f"Base directory: {BASE_DIR}")
print(f"Data directory: {DATA_DIR}")
print(f"Model directory: {MODEL_DIR}")
print(f"Output directory: {OUTPUT_DIR}")

Base directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune
Data directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data
Model directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models
Output directory: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/output


## 2. Data Preparation

Convert CSV Q&A data to JSONL format required by mistral-finetune.

In [28]:
# Load the CSV data
csv_path = BASE_DIR / "rag_eval_QA.csv"
df = pd.read_csv(csv_path)

print(f"Total samples: {len(df)}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst sample:")
print(df.iloc[0])

Total samples: 267

Columns: ['input_query', 'output_expected_answer', 'pdf_name', 'question_number']

First sample:
input_query               What is the maximum defrost duration in minute...
output_expected_answer    The maximum defrost duration for an Ascend Â® F...
pdf_name                                           03_ascend_jhd_series.pdf
question_number                                                           1
Name: 0, dtype: object


In [29]:
# Convert to instruction format
def create_instruction_sample(row):
    """
    Convert Q&A pair to Mistral instruct format.
    Format: user asks question, assistant provides answer.
    """
    system_prompt = (
        "You are a technical assistant specialized in commercial refrigeration equipment. "
        "Provide accurate, concise answers based on equipment manuals and documentation."
    )
    
    return {
        "messages": [
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": row['input_query']
            },
            {
                "role": "assistant",
                "content": row['output_expected_answer']
            }
        ]
    }

# Convert all samples
samples = [create_instruction_sample(row) for _, row in df.iterrows()]

print(f"Created {len(samples)} instruction samples")
print(f"\nExample sample:")
print(json.dumps(samples[0], indent=2))

Created 267 instruction samples

Example sample:
{
  "messages": [
    {
      "role": "system",
      "content": "You are a technical assistant specialized in commercial refrigeration equipment. Provide accurate, concise answers based on equipment manuals and documentation."
    },
    {
      "role": "user",
      "content": "What is the maximum defrost duration in minutes for an Ascend \u00ae Freezer according to the default settings?"
    },
    {
      "role": "assistant",
      "content": "The maximum defrost duration for an Ascend \u00ae Freezer is 30 minutes as per the default settings."
    }
  ]
}


In [30]:
# Split into train/validation sets (90/10 split)
train_samples, val_samples = train_test_split(
    samples, 
    test_size=0.1, 
    random_state=42
)

print(f"Training samples: {len(train_samples)}")
print(f"Validation samples: {len(val_samples)}")

Training samples: 240
Validation samples: 27


In [31]:
# Save to JSONL format
train_path = DATA_DIR / "train_instruct.jsonl"
val_path = DATA_DIR / "val_instruct.jsonl"

# Write training data
with open(train_path, 'w') as f:
    for sample in train_samples:
        f.write(json.dumps(sample) + '\n')

# Write validation data
with open(val_path, 'w') as f:
    for sample in val_samples:
        f.write(json.dumps(sample) + '\n')

print(f"âœ“ Training data saved to: {train_path}")
print(f"âœ“ Validation data saved to: {val_path}")

âœ“ Training data saved to: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data/train_instruct.jsonl
âœ“ Validation data saved to: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/data/val_instruct.jsonl


## 3. Download Mistral-7B-Instruct-v0.3 Model

Download the base model for fine-tuning.

In [15]:
model_url = "https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar"
model_tar = MODEL_DIR / "mistral-7B-Instruct-v0.3.tar"
model_extract_dir = MODEL_DIR / "mistral-7B-Instruct-v0.3"

In [None]:
# Download Mistral-7B-Instruct-v0.3


if not model_extract_dir.exists():
    print("Downloading Mistral-7B-Instruct-v0.3...")
    !wget -O {model_tar} {model_url}
    
    print("Extracting model...")
    !tar -xf {model_tar} -C {MODEL_DIR}
    
    print(f"âœ“ Model extracted to: {model_extract_dir}")
else:
    print(f"âœ“ Model already exists at: {model_extract_dir}")

# Verify checksum (optional)
expected_checksum = "80b71fcb6416085bcb4efad86dfb4d52"
print(f"\nVerify checksum with: md5sum {model_tar}")
print(f"Expected: {expected_checksum}")

Downloading Mistral-7B-Instruct-v0.3...
--2025-12-09 11:47:19--  https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar
Resolving models.mistralcdn.com (models.mistralcdn.com)... 172.67.70.68, 104.26.6.117, 104.26.7.117, ...
Connecting to models.mistralcdn.com (models.mistralcdn.com)|172.67.70.68|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14496675840 (14G) [application/x-tar]
Saving to: â€˜/home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3.tarâ€™


2025-12-09 11:56:21 (25.5 MB/s) - â€˜/home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3.tarâ€™ saved [14496675840/14496675840]

Extracting model...
âœ“ Model extracted to: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3

Verify checksum with: md5sum /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-In

## 4. Create Training Configuration

In [None]:
# Training configuration for engineering document Q&A (optimized for single GPU)
config = {
    # Data paths
    "data": {
        "instruct_data": str(train_path),
        "data": "",  # No pretraining data
        "eval_instruct_data": str(val_path)
    },
    
    # Model configuration
    "model_id_or_path": str(model_extract_dir),
    "lora": {
        "rank": 64  # LoRA rank - can increase to 128 if you have more VRAM
    },
    
    # Training hyperparameters (optimized for single GPU - 31GB VRAM)
    "seq_len": 4096,  # 4K context (increase to 8192 if you have 40GB+ VRAM)
    "batch_size": 1,  # Batch size 1 for single GPU
    "max_steps": 500,
    
    # Optimizer settings
    "optim": {
        "lr": 6e-5,  # Learning rate
        "weight_decay": 0.1,
        "pct_start": 0.05
    },
    
    # Logging and evaluation
    "seed": 42,
    "log_freq": 10,
    "eval_freq": 100,
    "no_eval": False,
    "ckpt_freq": 100,
    
    # Save configuration
    "save_adapters": True,  # Save only LoRA adapters (smaller size)
    "run_dir": str(OUTPUT_DIR / "run_001"),
    
    # Weights & Biases (optional)
    "wandb": {
        "project": "mistral-7b-engineering-qa",
        "run_name": "engineering-docs-lora-single-gpu",
        "key": "",  # Add your W&B API key if using wandb
        "offline": True  # Set to False if using W&B
    }
}

# Save configuration
config_path = BASE_DIR / "train_config.yaml"

import yaml
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"âœ“ Configuration saved to: {config_path}")
print(f"\nConfiguration:")
print(yaml.dump(config, default_flow_style=False, sort_keys=False))
print(f"\nâš¡ Optimized for single GPU (31GB VRAM):")
print(f"  â€¢ 4K context window (memory optimized)")
print(f"  â€¢ Batch size 1 for single GPU")
print(f"  â€¢ Expected memory usage: ~20-25GB")
print(f"\nðŸ’¡ Tip: Increase seq_len to 8192 or batch_size to 2 if you have more VRAM")

## 5. Launch Fine-tuning


In [None]:
# Launch single GPU training (recommended for RTX 5000 Ada or similar)
import subprocess

# Change to mistral-finetune directory
os.chdir(FINETUNE_DIR)

# Run training with single GPU
cmd = [
    "torchrun",
    "--nproc_per_node=1",  # Single GPU
    "--master_port=29500",
    "train.py",
    str(config_path)
]

print("Starting fine-tuning on single GPU...")
print(f"Command: {' '.join(cmd)}")
print("\n" + "="*80 + "\n")

# Run training (this will take 2-3 hours depending on your GPU)
subprocess.run(cmd)

## 6. Monitor Training

Training logs and checkpoints will be saved to the `run_dir` specified in config.

In [None]:
# Check training output
run_dir = OUTPUT_DIR / "run_001"

if run_dir.exists():
    print(f"Training artifacts in: {run_dir}")
    print(f"\nDirectory contents:")
    !ls -lh {run_dir}
else:
    print(f"Training not started yet. Run the training cell above.")

## 7. Inference with Fine-tuned Model

**Important**: The Mistral official format (`consolidated.safetensors`) requires different tools than standard HuggingFace transformers.

You have two options for inference:

### Option 1: Use mistral-inference (Recommended - Faster & Less Memory)
Use the official Mistral inference library which natively supports the model format and LoRA adapters:

```bash
pip install mistral-inference
```

**Advantages:**
- Works directly with Mistral official format (no conversion needed)
- More memory efficient
- Faster inference
- Native LoRA support

See cells below for implementation.

### Option 2: Convert to HuggingFace format (More Flexible)
Download the HuggingFace version of the model for use with transformers library:

**Advantages:**
- Compatible with HuggingFace ecosystem
- More deployment options
- Better integration with other tools

You'll need to download the HuggingFace version (~14GB additional download).

### Option 1: Inference with mistral-inference (Recommended)

In [20]:
# Load model with mistral-inference
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from pathlib import Path
import torch

# Paths
mistral_model_path = model_extract_dir  # Original Mistral format model
lora_adapter_path = "/home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/output/run_001/checkpoints/checkpoint_000500"

print(f"Loading Mistral model from: {mistral_model_path}")
print(f"Loading LoRA adapter from: {lora_adapter_path}")

# Load base model first
model = Transformer.from_folder(
    str(mistral_model_path),
    device="cuda" if torch.cuda.is_available() else "cpu",
    dtype=torch.bfloat16
)

# Then load the LoRA adapter
lora_path = Path(lora_adapter_path) / "consolidated" / "lora.safetensors"
model.load_lora(str(lora_path))

print("âœ“ Model loaded successfully with mistral-inference!")
print("âœ“ LoRA adapter loaded!")
print(f"Model device: {next(model.parameters()).device}")

Loading Mistral model from: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/models/mistral-7B-Instruct-v0.3
Loading LoRA adapter from: /home/scumpia-mrl/Desktop/Sujit/Projects/mistral-7B-instruct-finetune/output/run_001/checkpoints/checkpoint_000500
âœ“ Model loaded successfully with mistral-inference!
âœ“ LoRA adapter loaded!
Model device: cuda:0


In [22]:
# Test inference with mistral-inference
from mistral_common.protocol.instruct.messages import UserMessage, AssistantMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load tokenizer
tokenizer_path = mistral_model_path / "tokenizer.model.v3"
tokenizer = MistralTokenizer.from_file(str(tokenizer_path))

def generate_answer_mistral(question, max_tokens=512):
    """Generate answer using mistral-inference."""
    
    # Create chat completion request
    messages = [UserMessage(content=question)]
    request = ChatCompletionRequest(messages=messages)
    
    # Tokenize
    tokens = tokenizer.encode_chat_completion(request).tokens
    
    # Generate
    generated_tokens, _ = generate(
        [tokens],
        model,
        max_tokens=max_tokens,
        temperature=0.7,
        eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id
    )
    
    # Decode
    result = tokenizer.instruct_tokenizer.tokenizer.decode(generated_tokens[0])
    
    return result

# Test with sample questions
test_questions = [
    "What is the maximum defrost duration for an Ascend Freezer?",
    "Can I use an extension cord for my Arctic Air Chef Base Refrigerator?",
    "What is the compressor part number for an EST-48-N-V model under the Turbo Air E-LINE series?"
]

print("Testing fine-tuned model with mistral-inference:\n")
print("="*80)

for i, question in enumerate(test_questions, 1):
    print(f"\nQuestion {i}: {question}")
    answer = generate_answer_mistral(question)
    print(f"\nAnswer: {answer}")
    print("\n" + "="*80)

Testing fine-tuned model with mistral-inference:


Question 1: What is the maximum defrost duration for an Ascend Freezer?

Answer: The maximum defrost duration for an Ascend Freezer is 30 minutes.


Question 2: Can I use an extension cord for my Arctic Air Chef Base Refrigerator?

Answer: No, the Use of extension cords is strictly prohibited and will also void warranty.


Question 3: What is the compressor part number for an EST-48-N-V model under the Turbo Air E-LINE series?

Answer: The compressor part number for an EST-48-N-V model is P0189E1400.

