## Finetuning Open Source model using PEFT, LoRA, 4Bit quantization, and TRL!

The goal of this notebook is to experiment with a new way to make it very easy to build a task-specific model for your use-case.

To create your model, just go to the first code cell, and describe the model you want to build in the prompt. Be descriptive and clear.

Select a temperature (high=creative, low=precise), and the number of training examples to generate to train the model. From there, just run all the cells.

You can change the model you want to fine-tune by changing `model_name` in the `Define Hyperparameters` cell.

Split into train and test sets.

In [9]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 datasets einops

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
# Training and validation datasaet from HF Datasets
# https://huggingface.co/datasets/OpenAssistant/oasst1?row=1
import pandas as pd 

# Training data
dataframes = []
with open('finetuning_data_train.jsonl', 'r') as file:
    for line in file:
        # Convert each JSON line to a DataFrame and append to the list
        dataframes.append(pd.read_json(line, lines=True))

# Concatenate all DataFrame objects
train_dataset = pd.concat(dataframes, ignore_index=True)


# Validation data
dataframes = []
with open('finetuning_data_val.jsonl', 'r') as file:
    for line in file:
        # Convert each JSON line to a DataFrame and append to the list
        dataframes.append(pd.read_json(line, lines=True))

# Concatenate all DataFrame objects
valid_dataset = pd.concat(dataframes, ignore_index=True)

In [5]:
train_dataset

Unnamed: 0,prompt,response
0,Kannst du mir alle Lab Member aus Steins;Gate ...,Alles klar. Hier ist eine Liste der Future Gad...
1,¿Para qué sirve la etiqueta <br> de html?,La etiqueta HTML <br> se utiliza para insertar...
2,¿Para qué sirve la etiqueta <br> de html?¿La e...,La etiqueta `<br>` se puede utilizar en la may...
3,¿Para qué sirve la etiqueta <br> de html?¿Es c...,"No. Es como la tecla ""Intro"" pulsada una sola ..."
4,Cuál es el código para crear un gráfico en Pyt...,Para crear un gráfico en Python con la librerí...
...,...,...
20582,"Мне нужна конфигурация компьютера, на котором ...",Предоставленная конфигурация сможет запустить ...
20583,"Мне нужна конфигурация компьютера, на котором ...",ДОГОВОР НА РАЗРАБОТКУ ВЕБ-САЙТА\n\nМежду _____...
20584,¿Cómo recuperar una batería de litio que se ha...,Lo primero sería asegurarse de que el fallo es...
20585,¿Cómo recuperar una batería de litio que se ha...,"Un placer, cualquier otra consulta estaré aquí..."


In [6]:
valid_dataset

Unnamed: 0,prompt,response
0,What are some options of foods that I can make...,1. Pizza (flat dough with tomato and cheese an...
1,What are some options of foods that I can make...,If you are looking for lactose-free options fo...
2,What are some options of foods that I can make...,"Before we get the ingredients together, you ha..."
3,How to code the snake game in C that will be s...,Here is a basic outline of how to code the sna...
4,How to code the snake game in C that will be s...,"Based on our previous conversation, it is uncl..."
...,...,...
1090,Quien invento la nitroglicerina?,La nitroglicerina fue inventada por el químico...
1091,¿Por qué aprender Excel importante como ingeni...,Aprender Microsoft Excel es importante para lo...
1092,¿Por qué aprender Excel importante como ingeni...,"Sí, existen varias herramientas como Microsoft..."
1093,¿Por qué aprender Excel importante como ingeni...,Aprender Excel es importante para ingenieros i...


# Install necessary libraries

In [7]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

2023-12-18 09:47:34.559638: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F AVX512_VNNI
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-18 09:47:34.711241: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.
2023-12-18 09:47:34.711339: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.
2023-12-18 09:47:34.711528: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-1

/usr/local/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32


# Define Hyperparameters

In [10]:
model_name = "microsoft/phi-2" 
dataset_name = "finetuning_data_train.jsonl"
new_model = "phi_2_finetuned"

lora_r = 64
lora_alpha = 16
lora_dropout = 0.05
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
bnb_4bit_use_double_quant=True
# use_nested_quant = False

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=bnb_4bit_use_double_quant,
)


output_dir = "./outputs"
num_train_epochs = 2
fp16 = False
bf16 = True
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 1
gradient_checkpointing = False
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
group_by_length = True
logging_steps=100
logging_strategy="steps"
max_seq_length = 512 
packing = False
device_map = {"": 0}


################################## newly added
save_strategy="epoch"





model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    trust_remote_code=True
)

model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    target_modules=['Wqkv','out_proj'],
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    save_strategy=save_strategy,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps=5  # Evaluate every 20 steps
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_mapped,
    eval_dataset=valid_dataset_mapped,  # Pass validation dataset here
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

model.safetensors.index.json:   0%|          | 0.00/24.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/577M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

# Load Datasets and Train

In [None]:
# Preprocess datasets
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)

trainer.train()
trainer.model.save_pretrained(new_model)

# Cell 4: Test the model
logging.set_verbosity(logging.CRITICAL)
prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\nWrite a function that reverses a string. [/INST]" # replace the command here with something relevant to your task
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(prompt)
print(result[0]['generated_text'])

# Run Inference

In [None]:
from transformers import pipeline

prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\nWrite a function that reverses a string. [/INST]" # replace the command here with something relevant to your task
num_new_tokens = 100  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

# Merge the model and save

In [None]:
# Merge and save the fine-tuned model
model_path = "/saved_model"  # change to your preferred path

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Save the merged model
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

# Load a fine-tuned model from Drive and run inference

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

In [None]:
from transformers import pipeline

prompt = "What is 2 + 2?"  # change to your desired prompt
gen = pipeline('text-generation', model=model, tokenizer=tokenizer)
result = gen(prompt)
print(result[0]['generated_text'])