# mistral-unslothify

#### NOTE
This whole thing must be run on GPU. Either on a local machine with Nvidia/Cuda properly installed, on Google Colab with a free GPU runtime (even though they quickly run out), or any other cloud machine where the `!nvcc --version` cell below checks out ✅.

## 1. Setup and installations

Check to see that we have a GPU and Cuda driver.

In [1]:
# check cuda version
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0


In [2]:
!pip install --upgrade pip -q

# install the latest closest available cuda build
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 -q

Download `unsloth` from Github and install it.

In [3]:
!git clone https://github.com/unslothai/unsloth.git
!cd unsloth && pip install . -q
#!pip show unsloth

fatal: destination path 'unsloth' already exists and is not an empty directory.


Install remaining needed libraries.

In [4]:
!pip install numpy -q
!pip install bitsandbytes -q
!pip install unsloth-zoo -q
!pip install xformers -q

Imports

In [5]:
import torch
import json
from unsloth import FastLanguageModel
from unsloth import is_bfloat16_supported
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TrainingArguments
from trl import SFTTrainer

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


## 2. Swedish Meatball Check with pre-trained Mistral 7B

In [8]:
# Download the raw model and see what it knows about Swedish meatballs
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login

# log this jupyter session in to huggingface
huggingface_token = "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" # make sure its a 'read' token
login(huggingface_token)

raw_model_name = "mistralai/Mistral-7B-v0.1"
raw_tokenizer = AutoTokenizer.from_pretrained(raw_model_name)
raw_model = AutoModelForCausalLM.from_pretrained(raw_model_name)

tokenizer_config.json:   0%|          | 0.00/996 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [14]:
prompt = "Tell me about Swedish meatballs."

# Assign the eos_token as the pad_token
raw_tokenizer.pad_token = raw_tokenizer.eos_token

# Prepare the model inputs with padding and attention mask
inputs = raw_tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)

# Extract input IDs and attention mask
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

# Generate output using both input IDs and attention mask
output = raw_model.generate(input_ids=input_ids, attention_mask=attention_mask, max_length=100)

# Decode the response
response = raw_tokenizer.decode(output[0], skip_special_tokens=True)

# Remove the prompt from the response
response_without_prompt = response[len(prompt):].strip()
print(response_without_prompt)

I’m not sure if I’ve ever had them. I’ve had meatballs, but I don’t think I’ve ever had Swedish meatballs.

I’ve had Swedish meatballs. I’ve had them at Ikea.

I’ve had them at Ikea.

I’ve had them at Ikea.

I’ve had them


## 3. Download the unsloth Mistral 7B model and prepare for fine-tuning

In [15]:
model_name = "unsloth/mistral-7b-instruct-v0.3"

In [16]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True
)

==((====))==  Unsloth 2024.11.9: Fast Mistral patching. Transformers = 4.46.3.
   \\   /|    GPU: NVIDIA A10. Max memory: 21.975 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu118. CUDA = 8.6. CUDA Toolkit = 11.8.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


#### Add LoRA adapters

In [17]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    random_state = 1337,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.11.9 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## 4. Prepare fine-tuning data

Read meatball facts from a file formatted as `domain.jsonl`.


In [18]:
jsonl_file = "domain.jsonl"

Load and reformat the domain data.

In [19]:
# use the dataset loader by Huggingface and some formatting functions
dataset = load_dataset("json", data_files=jsonl_file, split="train")

tokenizer.pad_token = tokenizer.eos_token

def format_text(examples):
    texts = [note + tokenizer.pad_token for note in examples["text"]]
    return {"text": texts}

dataset = dataset.map(format_text, batched=True)

Tokenize the domain data with the pre-trained model's tokenizer.

In [20]:
# Initialize the tokenizer
#tokenizer = AutoTokenizer.from_pretrained(model_name)

# Tokenize the text
def tokenize_texts(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)

tokenized_dataset = dataset.map(tokenize_texts, batched=True)

In [21]:
# Remove unneeded columns and set format for PyTorch
tokenized_dataset = tokenized_dataset.remove_columns(["text"])  # Keep only tokenized columns
tokenized_dataset.set_format(type="torch")

## 4. Fine-tune the model

We want the *training loss* to decrease. A loss value around 2-3 is reasonable, if it gets close to 1.0 or drops below, the predictions will be highly confident, but also with some risk of overfitting, meaning that the model has learned the training data too well and may not perform as effectively on unseen data.

*See `README.md` for details about which parameters to tweak to avoid overfitting.*

In [22]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_dataset,
    max_seq_length=1024,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        learning_rate=0.00001, # <<<<<<< THE HIGHER THE RATE THE FASTER TO OVERFIT
        warmup_steps=5,
        num_train_epochs=1,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.3,
        lr_scheduler_type="linear",
        seed=1337,
        output_dir="outputs",
        report_to="none",
    ),
)

trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 62
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.5831
2,2.8371
3,2.8914
4,2.8819
5,2.7523
6,3.0473
7,2.7902
8,2.4909
9,2.6866
10,2.494


## 5. Save the fine-tuned model

In [41]:
name_for_fine_tuned_model = "Mistral_MeatBallz"

In [56]:
# Save the fine-tuned model
trainer.model.save_pretrained(name_for_fine_tuned_model)

# Save the tokenizer
tokenizer.save_pretrained(name_for_fine_tuned_model)

('Mistral_MeatBallz/tokenizer_config.json',
 'Mistral_MeatBallz/special_tokens_map.json',
 'Mistral_MeatBallz/tokenizer.model',
 'Mistral_MeatBallz/added_tokens.json',
 'Mistral_MeatBallz/tokenizer.json')

Now, ask the fine-tuned model about Swedish meatballs...