#### **Problem Statement**

- Fine-tuning a language model involves training or adjusting the parameters of a pretrained model for a specific task or domain.

- Fine-tuning involves adjusting the weights of some or all layers of the pre-trained model for the specific task.

- In **Supervised fine-tuning**, a model is trained on a task specific labeled dataset, where each datapoint has a label or right answer. The model learns to adjust its parameters to ensure the labels are predicted accurately.

- Types of Supervised fine-tuning : Basic hyperparameter tuning, task specific fine-tuning, transfer learning, few-shot learning, multi-task learning.

- **Reinforcement Learning with Human Feedback** is based on training the language model through interactions with human feedback. Through incorporation of human feedback into the learning process, RLHF enhances model performance leading to more accurate responses.

- Types of RLHF: Reward model training, proximal policy optimization, comparative ranking, preference learning.

- PEFT : **Parameter Efficient fine tuning** focuses on training a subset of the pretrained model parameters.

- Types of PEFT : LoRA, QLoRA

- Hu, Zhiqiang, et al. "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models." arXiv preprint arXiv:2304.01933 (2023).

- https://doi.org/10.48550/arXiv.2312.12148

- https://huggingface.co/datasets/Falah/sentiments-dataset-381-classes

- https://towardsdatascience.com/fine-tuning-large-language-models-llms-23473d763b91

- https://www.turing.com/resources/finetuning-large-language-models


#### **Installing Dependencies**

In [None]:
# Installing dependencies : datasets, transformers, peft, accelerate, evaluate, bitsandbytes
!pip install -U git+https://github.com/huggingface/transformers.git
!pip install -U git+https://github.com/huggingface/peft.git
!pip install -U git+https://github.com/huggingface/accelerate.git
!pip install -U bitsandbytes --quiet
!pip install datasets --quiet
!pip install evaluate --quiet

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-5w6xiuy7
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-5w6xiuy7
  Resolved https://github.com/huggingface/transformers.git to commit 838b87abe231fd70be5132088d0dee72a7bb8d62
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.39.0.dev0-py3-none-any.whl size=8756970 sha256=0b0efc8eef9f551665661801a6849b865853826bc80e257ac130ee52df626e5c
  Stored in directory: /tmp/pip-ephem-wheel-cache-m05ih8th/wheels/e7/9c/5b/e1a9c8007c343041e61cc484433d512ea9274272e3fcbe7c16
Successfully bu

#### **Importing required libraries**

In [None]:
# Importing dependencies :
from datasets import load_dataset, Dataset #load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig # BitsAndBytes, AutoModelForCausalLM, AutoTokenizer from transformers class
import torch # PyTorch
import pandas as pd
from transformers import TrainingArguments, Trainer #Importing TrainingArguments class and Trainer from transformers library
import transformers, evaluate #importing evaluate

In [None]:
# Importing userdata from google.colab
from google.colab import userdata
HF_TOKEN = userdata.get("HF_TOKEN")

In [None]:
# Verify if PyTorch uses a gpu
dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
# selecting the model to fine-tune
model_id = "mistralai/Mistral-7B-v0.1"

In [None]:
# Specify the pamaters in BitsAndBytes Config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

#### **Tokenizer**

- Tokenizer splits the sequence into tokens available in the tokenizer vocabulary.

- Tokens are either words or subwords. `TESLA` as an example is split into "Te", "##sla". To indicate these are not separate words, a double hash prefix is added.

- These tokens can be converted into IDs, understandable by the model.

- The tokenizer returns a dictonary with all the arguments required for its corresponding model to work properly.

- https://huggingface.co/docs/transformers/en/glossary

In [None]:
#
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [None]:
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
print(model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
   

#### **Model preparation**

In [None]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [None]:
def print_trainable_parameters(model):

  """
  Print number of trainable model parameters

  """
  trainable_params = 0
  all_param = 0
  for _, param in model.named_parameters():
    all_param += param.numel()
    if param.requires_grad:
      trainable_params += param.numel()

    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params/all_param}"
    )

- NF4 stands for Normal Float 4 from QLoRA

- https://arxiv.org/abs/2305.14314

- https://huggingface.co/docs/transformers/peft

In [None]:
#
from peft import LoraConfig, get_peft_model

target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
config = LoraConfig(
    r = 8,
    lora_alpha = 32,
    target_modules = target_modules,
    modules_to_save = ["lm_head"],
    lora_dropout = 0.05,
    bias = "none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 0 || all params: 131072000 || trainable%: 0.0
trainable params: 0 || all params: 139460608 || trainable%: 0.0
trainable params: 32768 || all params: 139493376 || trainable%: 0.02349072116513977
trainable params: 65536 || all params: 139526144 || trainable%: 0.04697040864255519
trainable params: 65536 || all params: 141623296 || trainable%: 0.04627487274409996
trainable params: 98304 || all params: 141656064 || trainable%: 0.06939625260235947
trainable params: 106496 || all params: 141664256 || trainable%: 0.07517492627074539
trainable params: 106496 || all params: 143761408 || trainable%: 0.07407829505954755
trainable params: 139264 || all params: 143794176 || trainable%: 0.09684954138893637
trainable params: 147456 || all params: 143802368 || trainable%: 0.10254073145721773
trainable params: 147456 || all params: 152190976 || trainable%: 0.09688879319625364
trainable params: 180224 || all params: 152223744 || trainable%: 0.1183941448713809
trainable params: 212992 ||

#### **Data Preprocessing**

In [None]:
dataset = load_dataset("qwedsacf/story-generation")

Downloading readme:   0%|          | 0.00/759 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/213M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/427223 [00:00<?, ? examples/s]

In [None]:
df = pd.DataFrame(dataset['train'])
df.head()

Unnamed: 0,summary,story,source
0,The only political ideology that makes any sen...,How would you take a society like ours and tur...,cmv
1,I think factoring in a potential mate's race i...,"I understand the view I suppose, but it seems ...",cmv
2,I think factoring in a potential mate's race i...,"First and foremost : you are in college, right...",cmv
3,I think public funding of elections could solv...,Most private funding comes from about 47 peopl...,cmv
4,I think public funding of elections could solv...,"Unfortunately, public funding for elections wo...",cmv


In [None]:
df.isnull().sum()

summary    0
story      0
source     0
dtype: int64

In [None]:
dataset = Dataset.from_pandas(df)

In [None]:
dataset_ = dataset.train_test_split(test_size=0.1)
dataset_

DatasetDict({
    train: Dataset({
        features: ['summary', 'story', 'source'],
        num_rows: 384500
    })
    test: Dataset({
        features: ['summary', 'story', 'source'],
        num_rows: 42723
    })
})

In [None]:
dataset_ = dataset_.map(lambda samples: tokenizer(samples["story"]), batched= True)
dataset_

Map:   0%|          | 0/384500 [00:00<?, ? examples/s]

Map:   0%|          | 0/42723 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['summary', 'story', 'source', 'input_ids', 'attention_mask'],
        num_rows: 384500
    })
    test: Dataset({
        features: ['summary', 'story', 'source', 'input_ids', 'attention_mask'],
        num_rows: 42723
    })
})

#### **Training and test dataset**

In [None]:
training_dataset = dataset_['train'].shuffle(seed=42).select(range(2500))

In [None]:
type(training_dataset)

datasets.arrow_dataset.Dataset

#### **Prediction from the model**

In [None]:
num_epochs = 3
lr = 2e-4
training_args = TrainingArguments(
    output_dir="outputs",
    learning_rate = lr,
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 4,
    num_train_epochs = num_epochs,
    warmup_steps = 100,
    fp16=True,
    logging_steps=1,
    optim= "paged_adamw_8bit"
)

In [None]:
trainer = Trainer(
    args = training_args,
    model = model,
    train_dataset = training_dataset,
    data_collator = transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

In [None]:
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss
1,2.6676
2,3.2309
3,3.3178
4,3.2751
5,2.3173
6,3.0397
7,2.7216
8,3.054
9,2.8258
10,2.819




TrainOutput(global_step=1875, training_loss=1.7535488040924072, metrics={'train_runtime': 5178.5784, 'train_samples_per_second': 1.448, 'train_steps_per_second': 0.362, 'total_flos': 6.804021758968627e+16, 'train_loss': 1.7535488040924072, 'epoch': 3.0})

#### **Share adaptor**

In [None]:
from huggingface_hub import notebook_login

In [None]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
model.push_to_hub("outputs", use_auth_token = True)



README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/jsathish1990/outputs/commit/63b41f45bfd4ff5f86de215111a55eb6f50812b8', commit_message='Upload model', commit_description='', oid='63b41f45bfd4ff5f86de215111a55eb6f50812b8', pr_url=None, pr_revision=None, pr_num=None)

#### **Inference**

In [None]:
from peft import PeftModel, PeftConfig

In [None]:
peft_model_id = 'jsathish1990/outputs'

In [None]:
config = PeftConfig.from_pretrained("jsathish1990/outputs")

adapter_config.json:   0%|          | 0.00/692 [00:00<?, ?B/s]

In [None]:
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict = True, load_in_8bit = True, device_map = "auto")

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

tokenizer_config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [None]:
model = PeftModel.from_pretrained(model, peft_model_id)

adapter_model.safetensors:   0%|          | 0.00/608M [00:00<?, ?B/s]

#### **Model Inference**

In [None]:
batch = tokenizer("the human soul is like a", return_tensors = "pt")

In [None]:
with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens = 15)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [None]:
print(tokenizer.decode(output_tokens[0]))

In [None]:
tokenizer.decode(output_tokens[0])

'<s> the human soul is like a star . it has a bright and luminous center that is the source'