# UK Cyber Fraud Assistant - Fine-Tuning with Mistral-7B

This notebook fine-tunes Mistral-7B-Instruct-v0.3 on UK cyber fraud guidance data using Unsloth for optimized training on Google Colab Pro A100.

## Setup and Installation

In [1]:
# Install Unsloth and dependencies
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install transformers
!pip install unsloth trl peft accelerate bitsandbytes

[0mLooking in indexes: https://download.pytorch.org/whl/cu121
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch)
  Using cached https://download.pytorch.org/whl/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
Collecting nvidia-nccl-cu12==2.21.5 (from torch)
  Using cached https://download.pytorch.org/whl/nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl (188.7 MB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch)
  Using cached https://download.pytorch.org/whl/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
Collecting triton==3.1.0 (from torch)
  Using cached https://download.pytorch.org/whl/triton-3.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.5 MB)
Collecting sympy==1.13.1 (from torch)
  Using cached https://download.pytorch.org/whl/sympy-1.13.1-py3-none-any.whl (6.2 MB)
[0mInstalling collected packages: triton, sympy, nvidia-nvtx-cu12, nvidia-nccl-cu12, nvidia-cusparse-cu12
  Attempting un

In [2]:
# Verify GPU setup
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
print(f"CUDA version: {torch.version.cuda}")
print(f"Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

CUDA available: True
GPU: NVIDIA L4
CUDA version: 12.6
Available VRAM: 22.2 GB


## Load and Prepare Dataset

In [3]:
from google.colab import drive
import json
from datasets import Dataset
import pandas as pd

# Mount Google Drive
drive.mount('/content/drive')

dataset_path = '/content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/model_training/master_fraud_qa_dataset.json'

# Load the fraud Q&A dataset
with open(dataset_path, 'r') as f:
    fraud_data = json.load(f)

print(f"Total samples: {len(fraud_data)}")
print(f"Sample keys: {list(fraud_data[0].keys())}")

# Preview a sample
sample = fraud_data[0]
print(f"\nSample instruction: {sample['instruction']}")
print(f"\nSample output (first 200 chars): {sample['output'][:200]}...")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Total samples: 111
Sample keys: ['instruction', 'input', 'output', 'system', 'source_document', 'source_url', 'data_source', 'generated_by']

Sample instruction: I think I've been scammed after paying for a loan arrangement fee upfront. What should I do?

Sample output (first 200 chars): It sounds like you may have been a victim of loan fee fraud, which is a type of advance fee fraud. It's understandable to feel concerned, but there are steps you can take. First, you should know that ...


In [4]:
# Format data for instruction tuning with Mistral chat template
def format_fraud_prompt(sample):
    system_message = "You are a helpful UK cyber fraud assistant providing empathetic support to fraud victims. Provide accurate, UK-specific guidance with proper contact numbers and procedures."

    # Mistral chat format
    formatted_text = f"<s>[INST] {system_message}\n\n{sample['instruction']} [/INST] {sample['output']}</s>"

    return formatted_text

# Apply formatting
formatted_data = [format_fraud_prompt(item) for item in fraud_data]

# Create train/validation split (80/20)
split_idx = int(len(formatted_data) * 0.8)
train_data = formatted_data[:split_idx]
val_data = formatted_data[split_idx:]

print(f"Training samples: {len(train_data)}")
print(f"Validation samples: {len(val_data)}")

# Create datasets
train_dataset = Dataset.from_dict({"text": train_data})
val_dataset = Dataset.from_dict({"text": val_data})

# Preview formatted sample
print(f"\nFormatted sample (first 300 chars):\n{formatted_data[0][:300]}...")

Training samples: 88
Validation samples: 23

Formatted sample (first 300 chars):
<s>[INST] You are a helpful UK cyber fraud assistant providing empathetic support to fraud victims. Provide accurate, UK-specific guidance with proper contact numbers and procedures.

I think I've been scammed after paying for a loan arrangement fee upfront. What should I do? [/INST] It sounds like ...


## Load Model and Configure LoRA

In [21]:
from unsloth import FastLanguageModel
import torch

# Full precision Mistral model
model_name = "mistralai/Mistral-7B-Instruct-v0.3"  # Original unquantized model
max_seq_length = 2048
dtype = torch.bfloat16  # Full precision

# Load model without quantization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=False,  # No quantization
    device_map={"": 0},
)

print("Model loaded in full precision for LoRA training")
print(f"Model device: {next(model.parameters()).device}")

==((====))==  Unsloth 2025.8.1: Fast Mistral patching. Transformers: 4.54.1.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.1+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.31.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Model loaded in full precision for LoRA training
Model device: cuda:0


In [22]:
def print_gpu_memory():
    if torch.cuda.is_available():
        print(f"GPU Memory: {torch.cuda.memory_allocated()/1024**3:.2f}GB / {torch.cuda.max_memory_allocated()/1024**3:.2f}GB")

print_gpu_memory()

GPU Memory: 13.55GB / 21.71GB


In [23]:
# Configure LoRA for optimal fraud assistant training
model = FastLanguageModel.get_peft_model(
    model,
    r=64,  # Higher rank for better learning of domain-specific patterns
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_alpha=128,  # 2x rank for stable training
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # Unsloth optimized checkpointing
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

print("LoRA configuration applied")
model.print_trainable_parameters()

LoRA configuration applied
trainable params: 167,772,160 || all params: 7,415,795,712 || trainable%: 2.2624


In [24]:
def print_gpu_memory():
    if torch.cuda.is_available():
        print(f"GPU Memory: {torch.cuda.memory_allocated()/1024**3:.2f}GB / {torch.cuda.max_memory_allocated()/1024**3:.2f}GB")

print_gpu_memory()

GPU Memory: 14.16GB / 21.71GB


## Configure Training Parameters

In [25]:
from trl import SFTTrainer
from transformers import TrainingArguments

# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,
    warmup_steps=10,
    num_train_epochs=5,
    learning_rate=1e-4,
    bf16=torch.cuda.is_bf16_supported(),
    fp16=not torch.cuda.is_bf16_supported(),
    logging_steps=5,
    optim="adamw_torch",                # Full precision optimizer
    weight_decay=0.01,
    lr_scheduler_type="cosine",
    seed=3407,
    output_dir="/content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models",
    save_strategy="epoch",
    save_total_limit=2,
    eval_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    dataloader_pin_memory=True,
    remove_unused_columns=False,
    report_to="none",
)

print("Training arguments configured")
print(f"Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"Total training steps: {len(train_dataset) // (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps) * training_args.num_train_epochs}")

Training arguments configured
Effective batch size: 16
Total training steps: 25


In [26]:
# Move all model parameters to GPU before creating trainer
model = model.to("cuda")

# Verify all parameters are on GPU
print("Checking model device placement...")
for name, param in model.named_parameters():
    if param.device.type == 'meta':
        print(f"Warning: {name} still on meta device")
    elif param.device.type != 'cuda':
        print(f"Moving {name} from {param.device} to cuda")
        param.data = param.data.to("cuda")

print("All parameters moved to GPU")

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=training_args,
)

print("Trainer initialized successfully!")

Checking model device placement...
All parameters moved to GPU


Unsloth: Tokenizing ["text"]:   0%|          | 0/88 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"]:   0%|          | 0/23 [00:00<?, ? examples/s]

Trainer initialized successfully!


## Start Training

In [27]:
# Start training
print("Starting training...")
trainer_stats = trainer.train()

print("Training completed!")
print(f"Final training loss: {trainer_stats.training_loss:.4f}")
print(f"Training time: {trainer_stats.metrics['train_runtime']:.1f} seconds")

Starting training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 88 | Num Epochs = 5 | Total steps = 30
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 167,772,160 of 7,415,795,712 (2.26% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Epoch,Training Loss,Validation Loss
1,2.291,1.461084
2,1.291,1.120283
3,0.8458,1.058014
4,0.5091,1.208806
5,0.1887,1.211068


Unsloth: Not an error, but MistralForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


Training completed!
Final training loss: 0.9055
Training time: 181.6 seconds


## Test the Fine-Tuned Model

In [32]:
# Enable fast inference
FastLanguageModel.for_inference(model)

# Test scenarios for fraud assistant
test_scenarios = [
    "I received a text saying my bank account is frozen and I need to pay £50 to unlock it. Is this legitimate?",
    "Someone called claiming to be from HMRC saying I owe tax money. What should I do?",
    "I paid for a loan arrangement fee but haven't received the loan. How can I get help?",
    "How do I report a romance scam to the authorities?",
    "Is there a way to check if an investment opportunity is legitimate?"
]

def test_fraud_assistant(question):
    system_message = "You are a helpful UK cyber fraud assistant providing empathetic support to fraud victims. Provide accurate, UK-specific guidance with proper contact numbers and procedures."

    # Format input using Mistral chat template
    messages = [
        {"role": "user", "content": f"{system_message}\n\n{question}"}
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt",
    ).to("cuda")

    attention_mask = torch.ones_like(inputs)

    # Generate response
    outputs = model.generate(
        input_ids=inputs,
        attention_mask=attention_mask,
        max_new_tokens=512,
        use_cache=True,
        temperature=0.1,
        do_sample=True,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id
    )

    # Decode response
    response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

    # Extract just the assistant's response
    if "[/INST]" in response:
        response = response.split("[/INST]")[-1].strip()

    if question in response:
        response = response.split(question, 1)[-1].strip()

    # Remove any remaining system instruction fragments
    response = response.replace("Provide accurate, UK-specific guidance with proper contact numbers and procedures.", "").strip()

    # Clean up any leading punctuation or artifacts
    while response.startswith((".", "?", "!", ":")):
        response = response[1:].strip()

    return response

print("Testing fine-tuned fraud assistant:\n")
print("=" * 80)

Testing fine-tuned fraud assistant:



In [33]:
# Test each scenario
for i, scenario in enumerate(test_scenarios, 1):
    print(f"\nTest {i}: {scenario}\n")
    response = test_fraud_assistant(scenario)
    print(f"Assistant: {response}\n")
    print("-" * 80)


Test 1: I received a text saying my bank account is frozen and I need to pay £50 to unlock it. Is this legitimate?

Assistant: This is a classic example of a scam known as 'smishing'. The text message you received is designed to panic and trick you into responding. Your bank account is not frozen, and you do not need to pay any money to unlock it. You should never respond to unsolicited messages like this, and you should never send money to someone you don't know. If you have any doubts about the authenticity of a message, you should contact your bank directly using the contact details on your bank card or statement. By being aware of this scam, you can protect yourself from becoming a victim.

--------------------------------------------------------------------------------

Test 2: Someone called claiming to be from HMRC saying I owe tax money. What should I do?

Assistant: You should be very suspicious of anyone who calls claiming to be from HMRC and says you owe tax money. This is 

## Save Model for Local Deployment

In [36]:
# Save the trained adapter
save_path = "/content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-adapter"

model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print(f"Model adapter saved successfully to: {save_path}")

Model adapter saved successfully to: /content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-adapter


In [38]:
# Export to GGUF format for local deployment
gguf_save_path = "/content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-gguf"

model.save_pretrained_gguf(
    gguf_save_path,
    tokenizer,
    quantization_method="q4_k_m"  # Quantize only for deployment
)

print(f"Model exported to GGUF format for local deployment at: {gguf_save_path}")

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 35.15 out of 52.96 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 32/32 [00:29<00:00,  1.07it/s]


Unsloth: Saving tokenizer... Done.
Done.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at /content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-gguf into bf16 GGUF format.
The output location will be /content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-gguf/unsloth.BF16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: uk-fraud-assistant-gguf
INFO:hf-to-gguf:Model architecture: MistralForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gg

## Create Ollama Modelfile

In [None]:
# Create Ollama Modelfile for easy deployment
gguf_save_path = "/content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-gguf"

modelfile_content = '''FROM ./model-unsloth.Q4_K_M.gguf

TEMPLATE """<s>[INST] You are a helpful UK cyber fraud assistant providing empathetic
support to fraud victims. Provide accurate, UK-specific guidance with proper contact
numbers and procedures.

{{ .Prompt }} [/INST] """

PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER stop "</s>"
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"

SYSTEM """You are a specialized UK cyber fraud assistant. Your role is to:
- Provide empathetic support to fraud victims
- Offer accurate UK-specific guidance and procedures
- Include proper UK contact numbers (Action Fraud: 0300 123 2040)
- Maintain a supportive, non-judgmental tone
- Help victims understand their next steps
"""
'''

with open(f'{gguf_save_path}/Modelfile', 'w') as f:
    f.write(modelfile_content)

print(f"Ollama Modelfile created at: {gguf_save_path}/Modelfile")
print("\nTo deploy locally with Ollama:")
print("1. Download the uk-fraud-assistant-gguf folder from Google Drive")
print("2. cd uk-fraud-assistant-gguf")
print("3. ollama create uk-fraud-assistant -f Modelfile")
print("4. ollama run uk-fraud-assistant")

## Download Files for Local Use

In [None]:
# Create a zip file for easy download
import zipfile
import os

def create_deployment_zip():
    with zipfile.ZipFile('uk-fraud-assistant-deployment.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
        # Add GGUF files
        for root, dirs, files in os.walk('/content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-gguf'):
            for file in files:
                file_path = os.path.join(root, file)
                arcname = os.path.relpath(file_path, '.')
                zipf.write(file_path, arcname)

        # Add adapter files
        for root, dirs, files in os.walk('/content/drive/MyDrive/Dissertation/cyber-fraud-chatbot/trained_models/uk-fraud-assistant-adapter'):
            for file in files:
                file_path = os.path.join(root, file)
                arcname = os.path.relpath(file_path, '.')
                zipf.write(file_path, arcname)

    print("Deployment files packaged into: uk-fraud-assistant-deployment.zip")
    print(f"Zip file size: {os.path.getsize('uk-fraud-assistant-deployment.zip') / 1024 / 1024:.1f} MB")

create_deployment_zip()

# Download the zip file in Colab
try:
    from google.colab import files
    files.download('uk-fraud-assistant-deployment.zip')
    print("\nDeployment package downloaded successfully!")
except ImportError:
    print("\nRunning outside Colab - zip file created locally")