# Getting Started with Fine-Tuning Mistral 7B

This notebook shows you a simple example of how to LoRA finetune Mistral 7B. You can run this notebook in Google Colab with Pro + account with A100 and 40GB RAM.

<a target="_blank" href="https://colab.research.google.com/github/smartrics/mistral-finetune/blob/main/tutorials/mistral_finetune_7b.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


Check out `mistral-finetune` Github repo to learn more: https://github.com/smartrics/mistral-finetune/

## Installation

Clone the `mistral-finetune` repo:


In [1]:
import os
import subprocess

repo_dir = "/content/mistral-finetune"
repo_url = "https://github.com/smartrics/mistral-finetune.git"

if os.path.isdir(repo_dir):
    print("Directory 'mistral-finetune' exists. Pulling latest changes...")
    subprocess.run(["git", "-C", repo_dir, "pull"], check=True)
else:
    print("Directory 'mistral-finetune' does not exist. Cloning repository...")
    subprocess.run(["git", "clone", repo_url, repo_dir], check=True)


Directory 'mistral-finetune' does not exist. Cloning repository...


Install all required dependencies:

In [5]:
!pip install -r /content/mistral-finetune/requirements.txt

Collecting fire (from -r /content/mistral-finetune/requirements.txt (line 1))
  Downloading fire-0.7.0.tar.gz (87 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/87.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.2/87.2 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting mistral-common>=1.3.1 (from -r /content/mistral-finetune/requirements.txt (line 4))
  Downloading mistral_common-1.5.3-py3-none-any.whl.metadata (4.5 kB)
Collecting xformers (from -r /content/mistral-finetune/requirements.txt (line 11))
  Downloading xformers-0.0.29.post3-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch==2.5.1->-r /content/mistral-finetune/requirements.txt (line 9))
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-

## Model download

In [6]:
!pip install huggingface_hub



In [8]:
# huggingface login
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [9]:
from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)

! cp -r /root/mistral_models/7B-Instruct-v0.3 /content/mistral_models
! rm -r /root/mistral_models/7B-Instruct-v0.3

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

consolidated.safetensors:   0%|          | 0.00/14.5G [00:00<?, ?B/s]

params.json:   0%|          | 0.00/202 [00:00<?, ?B/s]

tokenizer.model.v3:   0%|          | 0.00/587k [00:00<?, ?B/s]

## Dataset

Use the data in `/content/mistral-finetune/data`

In [10]:
!ls /content/mistral-finetune/data

prepare.py  test_data.jsonl  training_data.jsonl  validation_data.jsonl


In [11]:
# navigate to the mistral-finetune directory
%cd /content/mistral-finetune/

/content/mistral-finetune


In [12]:
# Now you can verify your training yaml to make sure the data is correctly formatted and to get an estimate of your training time.

!python -m utils.validate_data --train_yaml example/7B.yaml


0it [00:00, ?it/s]Validating data/test_data.jsonl ...

  0% 0/50 [00:00<?, ?it/s][A
 10% 5/50 [00:00<00:00, 46.74it/s][A
 20% 10/50 [00:00<00:00, 47.81it/s][A
 30% 15/50 [00:00<00:00, 48.19it/s][A
 40% 20/50 [00:00<00:00, 48.31it/s][A
 50% 25/50 [00:00<00:00, 48.41it/s][A
 60% 30/50 [00:00<00:00, 47.99it/s][A
 70% 35/50 [00:00<00:00, 47.14it/s][A
 80% 40/50 [00:00<00:00, 47.41it/s][A
 90% 45/50 [00:00<00:00, 47.52it/s][A
100% 50/50 [00:01<00:00, 47.78it/s]
1it [00:01,  1.06s/it]
No errors! Data is correctly formatted!
Stats for data/test_data.jsonl 
 -------------------- 
 {
    "expected": {
        "eta": "00:33:38",
        "data_tokens": 477199,
        "train_tokens": 78643200,
        "epochs": "164.80",
        "max_steps": 300,
        "data_tokens_per_dataset": {
            "data/test_data.jsonl": "477199.0"
        },
        "train_tokens_per_dataset": {
            "data/test_data.jsonl": "78643200.0"
        },
        "epochs_per_dataset": {
            "data/

## Start training

In [13]:
# these info is needed for training
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

In [16]:
# define training configuration
# for your own use cases, you might want to change the data paths, model path, run_dir, and other hyperparameters

config = """
data:
  instruct_data: "data/test_data.jsonl"  # Fill
  data: ""  # Optionally fill with pretraining data
  eval_instruct_data: "data/validation_data.jsonl"  # Optionally fill

# model
model_id_or_path: "/content/mistral_models"  # Change to downloaded path
lora:
  rank: 64

# optim
seq_len: 32768
batch_size: 1
max_steps: 300
optim:
  lr: 6.e-5
  weight_decay: 0.1
  pct_start: 0.05

# other
seed: 0
log_freq: 1
eval_freq: 100
no_eval: False
ckpt_freq: 100

save_adapters: True  # save only trained LoRA adapters. Set to `False` to merge LoRA adapter into the base model and save full fine-tuned model

run_dir: "mistral_models/mistral-7b-instruct-v0.3_trained"  # Fill

wandb:
  project: None # your wandb project name
  run_name: "" # your wandb run name
  key: "" # your wandb api key
  offline: True

"""

# save the same file locally into the example.yaml file
import yaml
with open('example.yaml', 'w') as file:
    yaml.dump(yaml.safe_load(config), file)


In [19]:
# make sure the run_dir has not been created before
# only run this when you ran torchrun previously and created the /content/test_ultra file
# ! rm -r /content/test_ultra

import os
os.environ["WANDB_MODE"] = "disabled"


In [23]:
# start training
!rm -rf /content/mistral-finetune/mistral_models/mistral-7b-instruct-v0.3_trained

!torchrun -m train example.yaml

2025-03-06 18:49:13.240085: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741286953.260215    7490 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741286953.266381    7490 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-06 18:49:13.287159: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='data/test_data.jsonl', eval_instruct_data='data/

## Inference

In [None]:
!pip install mistral_inference

Collecting mistral_inference
  Downloading mistral_inference-1.1.0-py3-none-any.whl (21 kB)
Installing collected packages: mistral_inference
Successfully installed mistral_inference-1.1.0


In [None]:
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file("/content/mistral_models/tokenizer.model.v3")  # change to extracted tokenizer file
model = Transformer.from_folder("/content/mistral_models")  # change to extracted model dir
model.load_lora("/content/test_ultra/checkpoints/checkpoint_000100/consolidated/lora.safetensors")

completion_request = ChatCompletionRequest(messages=[UserMessage(content="Create a workflow JSON action for this instruction: Filter the 'temperatures' table to include only values greater than 36.")])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

Machine learning is a subset of artificial intelligence that involves the use of algorithms to learn from data and make predictions or decisions without being explicitly programmed. It is a type of computer science that enables machines to learn and improve from experience without being explicitly programmed. Machine learning algorithms can learn from data and make predictions or decisions based
