# OPEN SOURCE MODELS IN MISTRAL

- `Mistral-7B` - A 7B transformer model, fast-deployed and easily customisable. Small, yet very powerful for a variety of use cases.
    - Performant in English and code
    - 32k context window

- `Mistral-8x7B` - A 7B sparse Mixture-of-Experts (SMoE). Uses 12.9B active parameters out of 45B total.
    - Fluent in English, French, Italian, German, Spanish, and strong in code.
    - 32k context window

- `Mixtral-8x22B` - Currently the most performant open model. A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
    - Fluent in English, French, Italian, German, Spanish, and strong in code.
    - 64k context window.
    - Native function calling capacities.
    - Function calling and json mode available on our API endpoint.

- There are also `Optimized models` in Mistral like `Mistral-small`, `Mistral-large` and `Mistral-Embed`. You can refer them in the [Mistral's Website](https://mistral.ai/technology/#models) and also in [huggingface](https://huggingface.co/mistralai).

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# FIINETUNING `Mistral-7b-v0.3 Model`


In [None]:
!pip install -r /content/drive/MyDrive/requirements.txt

Collecting unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git (from -r /content/drive/MyDrive/requirements.txt (line 19))
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-aybajn90/unsloth_dfeb59268d514dddb7e87732499f998b
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-aybajn90/unsloth_dfeb59268d514dddb7e87732499f998b
  Resolved https://github.com/unslothai/unsloth.git to commit 36488b58677a2283de55aed0461df0237bb88fb8
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting peft (from -r /content/drive/MyDrive/requirements.txt (line 7))
  Downloading peft-0.11.1-py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl (from -r /content/drive/MyDrive/requ

In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes

In [None]:
# Login to your huggingface account
from huggingface_hub import notebook_login
notebook_login()
# To login you have to create your huggingface-access-token with WRITE permission.
# you can also login through your terminal using the cli command --->  huggingface-cli login  and verify account using --> huggingface-cli whoami

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
import torch
import pandas as pd
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, GPTQConfig, TrainingArguments
from trl import SFTTrainer
import os
from transformers import LongformerTokenizer
from unsloth import FastLanguageModel

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [None]:
# load the dataset from huggingface which we pushed after preprocessing.
dataset = load_dataset("karthiksagarn/Flan-V2-Submix-2024")
dataset

Downloading data:   0%|          | 0.00/3.45M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1197 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['inputs', 'targets', 'task_source', 'task_name', 'template_type'],
        num_rows: 1197
    })
})

In [None]:
data = dataset['train'].to_pandas()

In [None]:
data

Unnamed: 0,inputs,targets,task_source,task_name,template_type
0,Write an article based on this summary:\n\nPur...,They should include long screws and wall ancho...,Flan2021,gem/wiki_lingua_english_en:1.1.0,zs_opt
1,Problem: What would be an example of an negati...,I go here about once every two weeks. They con...,Flan2021,yelp_polarity_reviews:0.2.0,fs_noopt
2,"Input: Qingdao is located in northeast China, ...",Queens Park Rangers manager Harry Redknapp is ...,Flan2021,cnn_dailymail:3.4.0,fs_opt
3,"Input: Steven Lippard, 7, was playing in the d...",A 21-year-old man in Chicago is charged with b...,Flan2021,cnn_dailymail:3.4.0,fs_opt
4,Here is a news article: In his last press conf...,– President Obama held the final press confere...,Flan2021,multi_news:1.0.0,zs_noopt
...,...,...,...,...,...
1192,"If this is the response, what came before? Res...",2-way dialog:\n A. What was the USS Brister?;...,Dialog,wiki_dialog_ii,zs_opt
1193,See this dialog response. They were rebadged w...,A dialog between 2 people:\nPerson A: What hap...,Dialog,wiki_dialog_ii,zs_opt
1194,Read this response and predict the preceding d...,A 2 person conversation: +What is something in...,Dialog,wiki_dialog_ii,zs_opt
1195,"If this is the response, what came before? Res...",Anonymous 1) What is the proposed route of the...,Dialog,wiki_dialog_ii,zs_opt


- We can use any tokenizer (for ex: bert-base-uncased, longformer-base), since we are using mistral model to finetune we use mistral's tokenizer only.

In [None]:
max_seq_length=2048
dtype=None
load_in_4bit=True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name ="unsloth/mistral-7b-v0.3",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Mistral patching release 2024.7
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/4.14G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/137k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

Unsloth: Will load unsloth/mistral-7b-v0.3-bnb-4bit as a legacy tokenizer.


In [None]:
## Adding LoRa weights into the model.
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # we can choose any number > 0, Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", ## These target modules specifies the modules or layers in the model where LORA
                      "gate_proj", "up_proj", "down_proj",], ## weights will be added. These are typically the Query, key, value & output
                                                             ## projection layers in a transformer Architecture
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "True", # True or "unsloth" for very long context
    random_state = 3407, # this is just a random seed
    use_rslora = False,  # We support rank stabilized LoRA. rs_lora is randomized sparse lora which a variant of lora that uses randomized sparse projections.
    loftq_config = None, # And LoftQ
)

Unsloth 2024.7 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
## Reformat the dataset template to fit into the model.
## ADDING EOS- END_OF_SEQUENCE TOKEN TO DATASET. Orelse the generation will go forever.

EOS_TOKEN = tokenizer.eos_token
data["text"] = data.apply(lambda row: "###HUMAN: " + row["inputs"] + " " + "###ASSISTANT: " + row["targets"] + " " + EOS_TOKEN, axis=1)

In [None]:
data["text"]

0       ###HUMAN: Write an article based on this summa...
1       ###HUMAN: Problem: What would be an example of...
2       ###HUMAN: Input: Qingdao is located in northea...
3       ###HUMAN: Input: Steven Lippard, 7, was playin...
4       ###HUMAN: Here is a news article: In his last ...
                              ...                        
1192    ###HUMAN: If this is the response, what came b...
1193    ###HUMAN: See this dialog response. They were ...
1194    ###HUMAN: Read this response and predict the p...
1195    ###HUMAN: If this is the response, what came b...
1196    ###HUMAN: Read this response and predict the p...
Name: text, Length: 1197, dtype: object

In [None]:
dataset = Dataset.from_pandas(data)

In [None]:
dataset

Dataset({
    features: ['inputs', 'targets', 'task_source', 'task_name', 'template_type', 'text'],
    num_rows: 1197
})

# TRAINING

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,

    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 512,
    dataset_num_proc = 1,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 8,
        warmup_steps = 5,
        num_train_epochs = 1,
        # max_steps = 60, # Set num_train_epochs = 1 for full training runs
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 25,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "/content/drive/MyDrive/results",
        hub_model_id="karthiksagarn/Mistral-FlanV2-Submix-2024",
        save_total_limit=2,
    ),
)



Map:   0%|          | 0/1197 [00:00<?, ? examples/s]

In [None]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,197 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 149
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
25,0.7099
50,1.0432
75,1.3793
100,1.4955
125,1.5315


TrainOutput(global_step=149, training_loss=1.2781856460059249, metrics={'train_runtime': 1277.7618, 'train_samples_per_second': 0.937, 'train_steps_per_second': 0.117, 'total_flos': 2.028105241431245e+16, 'train_loss': 1.2781856460059249, 'epoch': 0.9958228905597326})

# Push Trained model to Hub

In [15]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [1]:
# Define the new model names
adapter_model = "karthiksagarn/Mistral-Flan-v2-adapters"
new_model = "karthiksagarn/Mistral-Flan-v2-finetuned"

In [2]:
import os
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
import torch
import json

# Define the paths to your checkpoint directory
checkpoint_dir = "/content/drive/MyDrive/results/checkpoint-149"  # Adjust this path as necessary

# Load the training arguments
training_args = torch.load(os.path.join(checkpoint_dir, "training_args.bin"))

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(checkpoint_dir)

# Load the adapter configuration
adapter_config_path = os.path.join(checkpoint_dir, "adapter_config.json")
with open(adapter_config_path, "r") as f:
    adapter_config_dict = json.load(f)
config = AutoModelForCausalLM.from_pretrained(checkpoint_dir).config

# Load the model with the configuration
ckpt_model = AutoModelForCausalLM.from_pretrained(checkpoint_dir, config=config)




config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


In [7]:
# Save the model
ckpt_model.save_pretrained(adapter_model)



In [8]:
# reload the base model (you might need a pro subscription for this because you may need a high RAM environment for the 13B model since this is loading the full original model, not quantized)
from transformers import AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained("unsloth/mistral-7b-v0.3", device_map='cpu', trust_remote_code=True, torch_dtype=torch.float16, cache_dir="/content/drive/MyDrive/results")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [9]:
from peft import PeftModel

# load peft model with new adapters
model = PeftModel.from_pretrained(
    model,
    adapter_model,
)

In [10]:
model = model.merge_and_unload() # merge adapters with the base model.

In [17]:
from huggingface_hub import HfApi, HfFolder, Repository

# Set your Hugging Face token
hf_token = "your-hf-token"

# Save the token to the Hugging Face folder
HfFolder.save_token(hf_token)

# Verify the token is set correctly
api = HfApi()
user_info = api.whoami(token=hf_token)
print(user_info)

{'type': 'user', 'id': '65a7a8e9a4bc3661e2191547', 'name': 'karthiksagarn', 'fullname': 'NALLAGULA KARTHIK SAGAR', 'email': 'karthik.sagarn@gmail.com', 'emailVerified': True, 'canPay': False, 'periodEnd': None, 'isPro': False, 'avatarUrl': 'https://cdn-avatars.huggingface.co/v1/production/uploads/65a7a8e9a4bc3661e2191547/838KXYwI1Iy54Bd4RLLxP.jpeg', 'orgs': [], 'auth': {'type': 'access_token', 'accessToken': {'displayName': 'Mistral', 'role': 'write', 'createdAt': '2024-07-08T04:09:34.344Z'}}}


In [18]:
# Define your repository information
model_name = "Mistral-Flan-v2-finetuned"  # Replace with your actual model name
username = user_info['name']
repo_id = f"{username}/{model_name}"

In [21]:
# Create a repository on the Hugging Face Hub if it doesn't exist
api.create_repo(repo_id=model_name, token=hf_token, exist_ok=True)

# Define the local directory to save the model
model_dir = "./temp_model"
os.makedirs(model_dir, exist_ok=True)

# Save the merged model and tokenizer
model.save_pretrained(model_dir)
tokenizer.save_pretrained(model_dir)

('./temp_model/tokenizer_config.json',
 './temp_model/special_tokens_map.json',
 './temp_model/tokenizer.model',
 './temp_model/added_tokens.json',
 './temp_model/tokenizer.json')

In [24]:
# Clone the repository to a local directory
repo_url = f"https://huggingface.co/{repo_id}"
clone_dir = "./temp_repo"
repo = Repository(local_dir=clone_dir, clone_from=repo_url, use_auth_token=hf_token)

# Copy the model files to the repository directory
for filename in os.listdir(model_dir):
    src_file = os.path.join(model_dir, filename)
    dst_file = os.path.join(clone_dir, filename)
    if os.path.isdir(src_file):
        shutil.copytree(src_file, dst_file)
    else:
        shutil.copy2(src_file, dst_file)

# Push the contents to the Hugging Face Hub
repo.push_to_hub()

print(f"Model pushed to: https://huggingface.co/{repo_id}")

For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.
  return inner_f
/content/./temp_repo is already a clone of https://huggingface.co/karthiksagarn/Mistral-Flan-v2-finetuned. Make sure you pull the latest changes with `repo.git_pull()`.
Adding files tracked by Git LFS: ['tokenizer.json']. This may take a bit of time if the files are large.


Upload file model-00002-of-00003.safetensors:   0%|          | 1.00/4.66G [00:00<?, ?B/s]

Upload file tokenizer.json:   0%|          | 1.00/1.87M [00:00<?, ?B/s]

Upload file tokenizer.model:   0%|          | 1.00/574k [00:00<?, ?B/s]

Upload file model-00001-of-00003.safetensors:   0%|          | 1.00/4.61G [00:00<?, ?B/s]

Upload file model-00003-of-00003.safetensors:   0%|          | 1.00/4.23G [00:00<?, ?B/s]

To https://huggingface.co/karthiksagarn/Mistral-Flan-v2-finetuned
   97cc083..4f1a418  main -> main

   97cc083..4f1a418  main -> main



Model pushed to: https://huggingface.co/karthiksagarn/Mistral-Flan-v2-finetuned


In [None]:
model.push_to_hub(new_model, use_auth_token=True, max_shard_size="5GB")

In [None]:
#Push the tokenizer
from transformers import AutoTokenizer

# tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.push_to_hub(new_model, use_auth_token=True)

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]