# OPEN SOURCE MODELS IN MISTRAL

- `Mistral-7B` - A 7B transformer model, fast-deployed and easily customisable. Small, yet very powerful for a variety of use cases.
    - Performant in English and code 
    - 32k context window

- `Mistral-8x7B` - A 7B sparse Mixture-of-Experts (SMoE). Uses 12.9B active parameters out of 45B total.
    - Fluent in English, French, Italian, German, Spanish, and strong in code.
    - 32k context window

- `Mixtral-8x22B` - Currently the most performant open model. A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
    - Fluent in English, French, Italian, German, Spanish, and strong in code.
    - 64k context window.
    - Native function calling capacities.
    - Function calling and json mode available on our API endpoint.

- There are also `Optimized models` in Mistral like `Mistral-small`, `Mistral-large` and `Mistral-Embed`. You can refer them in the [Mistral's Website](https://mistral.ai/technology/#models) and also in [huggingface](https://huggingface.co/mistralai).

# FIINETUNING `Mistral-8x7B Model`
- I chose `Mistral-8x7B Model` model rather than `Mistral-7B` because the FlanV2_19k_smaples dataset which was preprocessed contains English along with other languages. So, it will be easier for the model to understand and train. Whereas the `Mistral-7B` was only for English and code. If we choose that It wouldn't perform that good. (It can perform if our sample size was huge)

In [None]:
## !pip install -r requirements.txt

In [None]:
# Login to your huggingface account
from huggingface_hub import notebook_login
notebook_login()
# To login you have to create your huggingface-access-token with WRITE permission.
# you can also login through your terminal using the cli command --->  huggingface-cli login  and verify account using --> huggingface-cli whoami

In [14]:
import torch
import pandas as pd
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, GPTQConfig, TrainingArguments
from trl import SFTTrainer
import os
from transformers import LongformerTokenizer

In [9]:
# load the dataset from huggingface which we pushed after preprocessing.
dataset = load_dataset("karthiksagarn/FlanV2-2024")
dataset

DatasetDict({
    train: Dataset({
        features: ['inputs', 'targets', 'task_source', 'task_name', 'template_type'],
        num_rows: 19391
    })
})

In [15]:
# Or you can also load the local data csv file using pandas
dataset = pd.read_csv("Datasets/FlanV2_19k_samples.csv")
dataset

Unnamed: 0,inputs,targets,task_source,task_name,template_type
0,Write an article based on this summary:\n\nPur...,They should include long screws and wall ancho...,Flan2021,gem/wiki_lingua_english_en:1.1.0,zs_opt
1,Problem: What would be an example of an negati...,I go here about once every two weeks. They con...,Flan2021,yelp_polarity_reviews:0.2.0,fs_noopt
2,"Input: Qingdao is located in northeast China, ...",Queens Park Rangers manager Harry Redknapp is ...,Flan2021,cnn_dailymail:3.4.0,fs_opt
3,"Input: Steven Lippard, 7, was playing in the d...",A 21-year-old man in Chicago is charged with b...,Flan2021,cnn_dailymail:3.4.0,fs_opt
4,Here is a news article: In his last press conf...,– President Obama held the final press confere...,Flan2021,multi_news:1.0.0,zs_noopt
...,...,...,...,...,...
19386,Consider this response: Wa also occurs as a co...,DIALOG:\nWhat is a The Burning City?\n- The to...,Dialog,wiki_dialog_ii,fs_opt
19387,What came before. The bridge has been toll-fre...,-When was the New Hope Lambertville Bridge bui...,Dialog,wiki_dialog_ii,zs_opt
19388,Consider this response: To provide a high rate...,DIALOG:\nWhat was George Lawrence Stone's tech...,Dialog,wiki_dialog_ii,fs_opt
19389,Read this response and predict the preceding d...,2-way dialog:\n+ What is the difference betwee...,Dialog,wiki_dialog_ii,zs_opt


In [16]:
## Reformat the dataset template to fit into the model.
dataset["text"] = dataset.apply(lambda row: "###HUMAN: " + row["inputs"] + " " + "###ASSISTANT: " + row["targets"], axis=1)


- We can use any tokenizer (for ex: bert-base-uncased, longformer-base), since we are using mistral model to finetune we use mistral's tokenizer only.

In [20]:
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.v1()
tokenizer.pad_token = tokenizer.eos_token

In [None]:
# QLORA Quantization using GPTQConfig
quantization_config = GPTQConfig(bits=4, tokenizer=tokenizer, disable_exllama=True)

In [None]:
# Model Initialization
model = AutoModelForCausalLM.from_pretrained(
                                            "mistralai/Mixtral-8x7B-Instruct-v0.1",
                                            quantization_config=quantization_config,
                                            device_map="auto"
                                            )

model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [None]:
# Initializing Parameter Efficient Finetuning (PEFT) using Lora Config

peft_config = LoraConfig(
    r = 16,
    lora_alpha = 16,
    lora_dropout = 0.05,
    bias = "none",
    task_type = "CAUSAL_LM",
    target_modules = ["q_proj", "v_proj"]
)

model = get_peft_model(model, peft_config)

In [None]:
training_args = TrainingArguments(
    output_dir = "Mistral_8x7B_FlanV2_Finetuned",
    hub_model_id="karthiksagarn/Mistral_8x7B_FlanV2_Finetuned",
    per_device_train_batch_size = 8,
    gradient_accumulation_steps = 1,
    optim = "paged_adamw_32bit",
    learning_rate = 2e-4,
    lr_scheduler_type = "cosine",
    save_strategy = "epoch",
    logging_steps = 100,
    num_train_epochs = 1,
    # max_steps = 250,
    fp16 = True
    )

In [None]:
trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    peft_config = peft_config,
    dataset_text_field = "text",
    max_seq_length = 512,
    tokenizer = tokenizer,
    args = training_args,
    packing=False
    )

In [None]:
trainer.train()

PUSHING THE TRAINED MODLE TO HUB

In [None]:
trainer.push_to_hub()