# Mistral 7B fine-tuning
https://www.datacamp.com/tutorial/mistral-7b-tutorial

https://www.kaggle.com/code/lmaddalena/mistral-7b-fine-tuning/edit

In [1]:
# Update and install the necessary Python libraries

%pip install -U bitsandbytes
%pip install -U transformers
%pip install -U peft
%pip install -U accelerate
%pip install -U trl
%pip install -U datasets

Collecting bitsandbytes
  Obtaining dependency information for bitsandbytes from https://files.pythonhosted.org/packages/9b/63/489ef9cd7a33c1f08f1b2be51d1b511883c5e34591aaa9873b30021cd679/bitsandbytes-0.42.0-py3-none-any.whl.metadata
  Downloading bitsandbytes-0.42.0-py3-none-any.whl.metadata (9.9 kB)
Downloading bitsandbytes-0.42.0-py3-none-any.whl (105.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.42.0
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting peft
  Obtaining dependency information for peft from https://files.pythonhosted.org/packages/8b/1b/aee2a330d050c493642d59ba6af51f3910cb138ea48ede228c84c204a5af/peft-0.7.1-py3-none-any.whl.metadata
  Downloading peft-0.7.1-py3-none-any.whl.metadata (25 kB)
Downl

In [2]:
# Load the necessary modules for effective fine-tuning of the model.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
import os,torch, wandb
from datasets import load_dataset
from trl import SFTTrainer



In [3]:
# Save Hugging Face and Weights and Biases API keys and access them in the Kaggle notebook.

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_hf = user_secrets.get_secret("HUGGINGFACE_TOKEN")
secret_wandb = user_secrets.get_secret("wandb")

In [4]:
# Use the Hugging Face API to save and push the model to the Hugging Face Hub

!huggingface-cli login --token $secret_hf

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [5]:
# To monitor LLM performance, we will initialize Weights and Biases experiments using API.

wandb.login(key = secret_wandb)
run = wandb.init(
    project='Fine tuning mistral 7B', 
    job_type="training", 
    anonymous="allow"
)

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mlmaddalena[0m. Use [1m`wandb login --relogin`[0m to force relogin


In this section, we will set the base model, dataset, and new model name. The name of the new model will be used to save a fine-tuned model.

Note: If you are using the free version of Colab, you should load the sharded version of the model (someone13574/Mistral-7B-v0.1-sharded).

You can also load the model from Hugging Face Hub using the base model name: mistralai/Mistral-7B-v0.1

In [6]:
base_model = "/kaggle/input/mistral/pytorch/7b-v0.1-hf/1"
dataset_name = "mlabonne/guanaco-llama2-1k"
new_model = "mistral_7b_guanaco"

### Data Loading

In [7]:
# Load the dataset from Hugging Face Hub and visualize the 100th row.

#Importing the dataset
dataset = load_dataset(dataset_name, split="train")
dataset["text"][100]

Downloading readme:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/967k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

'<s>[INST] cuanto es 2x2 xD [/INST] La respuesta es 4. </s><s>[INST] puedes demostrarme matematicamente que 2x2 es 4? [/INST] En una multiplicación, el producto es el resultado de sumar un factor tantas veces como indique el otro, es decir, si tenemos una operación v · n = x, entonces x será igual a v sumado n veces o n sumado v veces, por ejemplo, para la multiplicación 3 · 4 podemos sumar "3 + 3 + 3 + 3" o "4 + 4 + 4" y en ambos casos nos daría como resultado 12, para el caso de 2 · 2 al ser iguales los dos factores el producto sería "2 + 2" que es igual a 4 </s>'

### Loading the Mistral 7B model

In [8]:
bnb_config = BitsAndBytesConfig(  
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
        base_model,
        load_in_4bit=True,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
)
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Loading the Tokenizer

In [9]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

(True, True)

### Adding the adopter to the layer
In the next step, we will include an adopter layer in our model. This will enable us to fine-tune the model using a small number of parameters, making the entire process faster and more memory-efficient. To gain a better understanding of parameters, you can refer to the official documentation of PEFT.

In [10]:
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]
)
model = get_peft_model(model, peft_config)

### Hyperparmeters

In [11]:
training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="wandb"
)

### SFT parameters
HuggingFace's TRL library offers a user-friendly API that allows for the creation and training of Supervised fine-tuning (SFT) models on your dataset with minimal coding. We will provide the SFT Trainer with the necessary components, such as the model, dataset, Lora configuration, tokenizer, and training parameters.

In [12]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    max_seq_length= None,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    packing= False,
)



Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

### Model training

In [13]:
trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
25,1.0346
50,1.6637
75,1.0741
100,1.3532
125,1.0504
150,1.2527
175,1.0513
200,1.2937
225,1.0038
250,1.3383


TrainOutput(global_step=250, training_loss=1.2115845413208008, metrics={'train_runtime': 6046.5916, 'train_samples_per_second': 0.165, 'train_steps_per_second': 0.041, 'total_flos': 1.874641569231667e+16, 'train_loss': 1.2115845413208008, 'epoch': 1.0})

### Saving the fine-tuned model

In [14]:
trainer.model.save_pretrained(new_model)
wandb.finish()
model.config.use_cache = True

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁▂▃▃▄▅▆▆▇██
train/global_step,▁▂▃▃▄▅▆▆▇██
train/learning_rate,▁▁▁▁▁▁▁▁▁▁
train/loss,▁█▂▅▁▄▂▄▁▅
train/total_flos,▁
train/train_loss,▁
train/train_runtime,▁
train/train_samples_per_second,▁
train/train_steps_per_second,▁

0,1
train/epoch,1.0
train/global_step,250.0
train/learning_rate,0.0002
train/loss,1.3383
train/total_flos,1.874641569231667e+16
train/train_loss,1.21158
train/train_runtime,6046.5916
train/train_samples_per_second,0.165
train/train_steps_per_second,0.041


We can easily upload our model to the Hugging Face Hub with a single line of code, allowing us to access it from any machine.

In [None]:
#trainer.model.push_to_hub(new_model,use_temp_dir=False)

## Model evaluation
You can view system metrics and model performance by going to wandb.ai and checking the recent run.

## Inference
To perform model inference, we need to provide both the model and tokenizer objects to the pipeline. Then, we can provide the prompt in dataset style to the pipeline object.

In [16]:
logging.set_verbosity(logging.CRITICAL)

prompt = "How do I find true love?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])



<s>[INST] How do I find true love? [/INST] There is no one way to find true love. It is a complex and personal journey that can take many different paths. Some people find love through friends, family, or chance encounters. Others may find love through online dating or other forms of matchmaking. Ultimately, the best way to find true love is to be open to new experiences and to be willing to put yourself out there. 

If you are looking for more specific advice, I can try to help. What are you looking for in a partner? What are your goals for a relationship? What are your values and beliefs? What are your interests and hobbies? What are your dealbreakers? What are your dealmakers? What are your dealbreakers? What are your dealmakers? What are your dealbreakers? What are your dealmakers? What are your dealbreakers? What are your dealmakers? What are your


Let’s generate the response for another prompt.

In [17]:
prompt = "What is Datacamp Career track?"
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] What is Datacamp Career track? [/INST] Datacamp Career track is a program that helps you learn data science and data engineering skills. It provides you with a personalized learning path, projects, and mentorship to help you advance your career. 

You can learn more about it here: https://www.datacamp.com/career-track 

Does this answer your question? 😊 

If you have any other questions, feel free to ask! 😊 

I'm here to help! 😊 

If you have any other questions, feel free to ask! 😊 

I'm here to help! 😊 

If you have any other questions, feel free to ask! 😊 

I'm here to help! 😊 

If you have any other questions, feel free to ask! 😊 


