# **FineTuning of TheBloke/Mistral-7B-Instruct-v0.1-GGUF on psychology dataset**

### **Install all required libraries**

In [None]:
!pip install -q -U bitsandbytes
# !pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install transformers==4.31
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install evaluate
!pip install -qqq trl==0.7.1

### **Import all required moduel**

In [None]:
import torch
import time
import evaluate
import pandas as pd
import numpy as np
from datasets import Dataset, load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer,BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging)
import random
from peft import LoraConfig , PeftModel,AutoPeftModelForCausalLM, prepare_model_for_kbit_training
from trl import SFTTrainer

### **Download psychology dataset from huggingface community**

In [None]:
psychology_dataset = "jkhedri/psychology-dataset"

# and split it on train features
dataset = load_dataset(psychology_dataset, split = "train")
dataset

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/96.4k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['question', 'response_j', 'response_k'],
    num_rows: 9846
})

### **specify the model that are using for finetuning and your finetune model***

In [None]:
# the model that we use for finetuning

# model_name =  "TheBloke/Mistral-7B-Instruct-v0.1-GGUF"
model_name =  "TheBloke/Llama-2-7B-GGUF"

# our finetune model name
psychology_finetune_model = "psychology_chatbot"



#BitsAndBytesConfig
- **load_in_4bit**: Load a large model in 4bit ,for training 4-bit base models (e.g. using LoRA adapters) one should use "bnb_4bit_quant_type='nf4"

- **Note :** that once a model has been loaded in 4-bit it is currently not possible to push the quantized weights on the Hub. Note also that you cannot train 4-bit weights as this is not supported yet. However you can use 4-bit models to train extra parameters, this will be covered in the next section.

- **Training :** According to QLoRA paper, for training 4-bit base models (e.g. using LoRA adapters) one should use bnb_4bit_quant_type='nf4'.

- **NF4 (Normal Float 4) data type :** which is a new 4bit datatype adapted for weights that have been initialized using a normal distribution. For that run:

- **Use nested quantization for more memory efficient inference**<br>
We also advise users to use the nested quantization technique. This saves more memory at no additional performance - from our empirical observations, this enables fine-tuning llama-13b model on an NVIDIA-T4 16GB with a sequence length of 1024, batch size of 1 and gradient accumulation steps of 4.



In [None]:
compute_dtype = getattr(torch, "float16")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
)

### **Check GPU compatibiligy and load base model**

device_map="auto" will be good enough as 🤗 Accelerate will attempt to fill all the space in your GPU(s), then loading them to the CPU, and finally if there is not enough RAM it will be loaded to the disk (the absolute slowest option).

In [None]:
# check gpu compatibility with bfloat16
if compute_dtype == torch.float16 and True:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)


# load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    device_map = "auto"
)

model.config.use_cache = False
model.config.pretraining_tp = 1

(…)Llama-2-7B-GGUF/resolve/main/config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

OSError: ignored

### **Every model having their own tokenizer we load tokenizer related with our model which are used for finetuning**

In [None]:
# load Llama tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'



## **Common LoRA parameters in PEFT**
- Instantiate a base model.
- Create a configuration (LoraConfig) where you define LoRA-specific parameters.
- Wrap the base model with get_peft_model() to get a trainable PeftModel.
- Train the PeftModel as you normally would train the base model.

## **LoraConfig allows you to control how LoRA is applied to the base model through the following parameters:**<br>

- **r :** the rank of the update matrices, expressed in int. Lower rank results in smaller update matrices with fewer trainable parameters.
- **target_modules :** The modules (for example, attention blocks) to apply the LoRA update matrices.
- **alpha**: LoRA scaling factor.<br>
- **bias**: Specifies if the bias parameters should be trained. Can be 'none', 'all' or 'lora_only'.
- **modules_to_save:** List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint. These typically include model’s custom head that is randomly initialized for the fine-tuning task.
-**layers_to_transform:** List of layers to be transformed by LoRA. If not specified, all layers in target_modules are transformed.
-**layers_pattern**: Pattern to match layer names in target_modules, if layers_to_transform is specified. By default PeftModel will look at common layer pattern (layers, h, blocks, etc.), use it for exotic and custom models.
-**rank_pattern:** The mapping from layer names or regexp expression to ranks which are different from the default rank specified by r.
-**alpha_pattern:** The mapping from layer names or regexp expression to alphas which are different from the default alpha specified by lora_alpha.

In [None]:
# Load LoRa Configuration
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM"
)

### **Training Arguments**

In [None]:


# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results1"

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False



# set training parameter
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001 ,                   # Weight decay to apply to all layers except bias/LayerNorm weights
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=0.3,
    warmup_ratio=0.03,                      # Ratio of steps for a linear warmup (from 0 to learning rate)
    group_by_length=True,                   # Group sequences into batches with same length,Saves memory and speeds up trainiiderablyng cons
    lr_scheduler_type="cosine",             # Learning rate schedule
    report_to="tensorboard"
)


### **Dataset structure formate**

In [None]:
# Let us assume you have a dataset with multiple fields, question and answer etc. Therefore you can just run:


def formatting_prompts_func_psychology(example):
    output_texts = []
    for i in range(len(example['question'])):
        text = f"### Question: {example['question'][i]}\n ### Response J: {example['response_j'][i]}\n ### Response K: {example['response_k'][i]}"
        output_texts.append(text)
    print(output_texts)
    return output_texts

### **Start training , by using Supervised Finetuning technique**

In [None]:
# Set supervised Finetuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="question",
    formatting_func=formatting_prompts_func_psychology,
    max_seq_length=1024,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=False,
)


# start training of your model
trainer.train()

# after training your model , then save your train model
trainer.model.save_pretrained(psychology_finetune_model)

### **Checking moduel result**

In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

prompt = "I'm feeling really anxious lately and I don't know why."
pipe = pipeline(task = "text-generation", model=model , tokenizer = tokenizer, max_length = 200)
result = pipe(f"[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

***the follow command are very import b/z google colab System/Gpu Ram are allocate to the following
variable so we need to delet this from run time that we can use gpu/sytem ram***

In [None]:
# the follow command are very import b/z google colab System/Gpu Ram are allocate to the following
# variable so we need to delet this from run time that we can use gpu/sytem ram
del model
del pipe
del trainer
import gc
gc.collect()
gc.collect()

### **after executing the above command , model pipe and trainer remove from runtime so we need to reload the base model and merge their weights to our trainable weights**

In [None]:
# Reload the model in FP16 and merge it with LoRa weights

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage = True,
    return_dict = True,
    torch_dtype = torch.float16,
    device_map = "auto"

)

model = PeftModel.from_pretrained(base_model, psychology_finetune_model)
model = model.merge_and_unload()


#reload tokenizer and save it
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer_padding_side = 'right'

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

### **Push our finetune model to the huggingface community , you need the following**
**1 :create account on huggingface<br>
2: create repo<br>
3: create "write access" token<br>
4: copy name of repo and paste in the below code**

In [None]:

!huggingface-cli login

model.push_to_hub("LangChain12/Final_psychologybot", check_pr=True)

tokenizer.push_to_hub("LangChain12/Final_psychologybot",check_pr=True)