# Text Classification - Qlora Finetune




## $\color{blue}{Sections:}$
* Preamble
* Admin - importing libraries
* Data - Load dataset
* Model - Get Quantized model
* QLoRA - Modify with LoRA
* Train

## $\color{blue}{Preamble:}$

This note book is for finetuning a 7b version of Mistral instruct. We will focus on the subtask of predicting the correct book, with the assumtion that if we can get very high prediction accuracy on the book, we can use further classifiers for the chapters of each book.

The finetuning will focus on QLoRA, and we will rely on Hugging Faces SFTTrainer to finetune the model.

## $\color{blue}{Admin:}$

In [None]:
from google.colab import drive

In [None]:
drive.mount("/content/drive")
%cd '/content/drive/MyDrive/'

Mounted at /content/drive
/content/drive/MyDrive


In [None]:
%%capture
!pip install dill

In [None]:
import dill
def save_data(docs, filename):
    """Save a list of Langchain Documents to a .dill file."""
    with open(filename, 'wb') as f:
        dill.dump(docs, f)
    print(f"Documents saved to {filename}")

def load_data(filename):
    """Load a list of Langchain Documents from a .dill file."""
    with open(filename, 'rb') as f:
        docs = dill.load(f)
    print(f"Documents loaded from {filename}")
    return docs

## $\color{blue}{Data:}$

In [None]:
%%capture
!pip install datasets

In [None]:
path = "class/datasets/"
trainDataset = load_data(path + "Dataset_train")
devDataset = load_data(path + "Dataset_dev")
testDataset = load_data(path + "Dataset_test")

Documents loaded from class/datasets/Dataset_train
Documents loaded from class/datasets/Dataset_dev
Documents loaded from class/datasets/Dataset_test


# $\color{blue}{Model:}$

In [None]:
!pip install -qU bitsandbytes accelerate loralib transformers peft

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m92.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import torch
torch.cuda.is_available()

True

In [None]:
import os
from getpass import getpass
from huggingface_hub import login

# Prompt for your Hugging Face token securely
token = getpass("Please enter your Hugging Face token: ")

Please enter your Hugging Face token: ··········


In [None]:
# Use the token for Hugging Face login
if token:
    print("HuggingFace token has been successfully entered.")
    login(token=token)
else:
    print("Continuing without Hugging Face login")

HuggingFace token has been successfully entered.


In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    use_cache=False,
    device_map="auto"
)

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [None]:
print(base_model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): MistralRMSNo

# $\color{blue}{QLoRA:}$

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

lora_config = LoraConfig(
    r=32,
    lora_alpha=64,
    #target_modules=["q_proj", "v_proj", "k_proj"],
    lora_dropout=0.12,
    bias="none",
    task_type="CAUSAL_LM"
)

base_model = prepare_model_for_kbit_training(base_model)
model = get_peft_model(base_model, lora_config)
print_trainable_parameters(model)

trainable params: 13631488 || all params: 3765702656 || trainable%: 0.36199055648434075


In [None]:
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralSdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.12, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k

# $\color{blue}{Train:}$

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="class/models/mistral-7b-instruct-ft",
    overwrite_output_dir=True,
    num_train_epochs=15,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit", # from the QLoRA paper
    logging_steps=1,
    save_strategy="epoch",
    learning_rate=1e-4,
    bf16=True, # ensure proper upcasting for compute dtypes
    tf32=True,
    max_grad_norm=0.4,
    warmup_ratio=0.03,
    lr_scheduler_type="constant",
    disable_tqdm=True,
    eval_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    save_total_limit=2
)

In [None]:

def generate_prompt(example, return_response=True):
  full_prompt =  """[INST]The task is to make a book classification from small passage of text.
The books are Telemachia, Odyssey, and Nostros from James Joyce's Ulysses, Dubliners by James Joyce, Dracula by Bram Stoker, and Republic by Plato.

Read the Text, choose the correct classification from the list below. You will provide a single word response from the list with no explanation.

Telemachia
Odyssey
Nostros
Dubliners
Dracula
Republic

Text: """
  full_prompt += f"{example['input']}[/INST]"
  full_prompt += "\nAnswer: "
  if return_response:
    full_prompt += f"{example['output']}</s>"

  return [full_prompt]

In [None]:
trainDataset[0]['output']

'Dubliners'

In [None]:
!pip install trl -U -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/310.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.0/310.9 kB[0m [31m1.3 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m143.4/310.9 kB[0m [31m2.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.9/310.9 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from trl import SFTTrainer

max_seq_length = 2048

trainer = SFTTrainer(
    model=model,
    train_dataset=trainDataset,
    eval_dataset=devDataset,
    peft_config=lora_config,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    formatting_func=generate_prompt,
    args=training_args,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/12000 [00:00<?, ? examples/s]

Map:   0%|          | 0/964 [00:00<?, ? examples/s]

In [None]:
trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


{'loss': 3.3869, 'grad_norm': 3.5421648025512695, 'learning_rate': 0.0001, 'epoch': 1.0}
{'eval_loss': 3.257153034210205, 'eval_runtime': 0.3626, 'eval_samples_per_second': 2.758, 'eval_steps_per_second': 2.758, 'epoch': 1.0}
{'loss': 3.2242, 'grad_norm': 1.6185256242752075, 'learning_rate': 0.0001, 'epoch': 2.0}
{'eval_loss': 3.1920177936553955, 'eval_runtime': 0.3274, 'eval_samples_per_second': 3.055, 'eval_steps_per_second': 3.055, 'epoch': 2.0}
{'loss': 3.1468, 'grad_norm': 1.0416638851165771, 'learning_rate': 0.0001, 'epoch': 3.0}
{'eval_loss': 3.1597187519073486, 'eval_runtime': 0.3282, 'eval_samples_per_second': 3.047, 'eval_steps_per_second': 3.047, 'epoch': 3.0}
{'loss': 3.0995, 'grad_norm': 1.0396945476531982, 'learning_rate': 0.0001, 'epoch': 4.0}
{'eval_loss': 3.1308038234710693, 'eval_runtime': 0.3265, 'eval_samples_per_second': 3.062, 'eval_steps_per_second': 3.062, 'epoch': 4.0}
{'loss': 3.054, 'grad_norm': 0.9713788032531738, 'learning_rate': 0.0001, 'epoch': 5.0}
{'eva

TrainOutput(global_step=15, training_loss=2.93871701558431, metrics={'train_runtime': 180.9928, 'train_samples_per_second': 0.995, 'train_steps_per_second': 0.083, 'train_loss': 2.93871701558431, 'epoch': 15.0})

In [None]:
trainer.push_to_hub()

adapter_model.safetensors:   0%|          | 0.00/54.5M [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

events.out.tfevents.1732196580.64a8353cf34c.678.0:   0%|          | 0.00/13.4k [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/rjnClarke/mistral-7b-instruct-ft/commit/a78f90cb3607a9e0c64e8e463e6c1564782bc78f', commit_message='End of training', commit_description='', oid='a78f90cb3607a9e0c64e8e463e6c1564782bc78f', pr_url=None, repo_url=RepoUrl('https://huggingface.co/rjnClarke/mistral-7b-instruct-ft', endpoint='https://huggingface.co', repo_type='model', repo_id='rjnClarke/mistral-7b-instruct-ft'), pr_revision=None, pr_num=None)

In [None]:
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(
    "rjnClarke/mistral-7b-instruct-ft",
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
)
tokenizer = AutoTokenizer.from_pretrained("rjnClarke/mistral-7b-instruct-ft")

adapter_config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.51M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/437 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/54.5M [00:00<?, ?B/s]

In [None]:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [None]:
sample = devDataset[295]
output = sample['output']
prompt = generate_prompt(sample, return_response=False)[0]

In [None]:
sample

{'input': 'as steps and points of departure into a world which is above hypotheses, in order that she may soar beyond them to the first principle of the whole; and clinging to this and then to that which depends on this, by successive steps she descends again without the aid of any sensible object, from ideas, through ideas, and in ideas she ends.   I understand you, he replied;',
 'output': 'Republic'}

In [None]:
output

'Republic'

In [None]:
prompt

"[INST]The task is to make a book classification from small passage of text.\nThe books are Telemachia, Odyssey, and Nostros from James Joyce's Ulysses, Dubliners by James Joyce, Dracula by Bram Stoker, and Republic by Plato.\n\nRead the Text, choose the correct classification from the list below. You will provide a single word response from the list with no explanation.\n\nTelemachia \nOdyssey\nNostros\nDubliners\nDracula\nRepublic\n\nText: as steps and points of departure into a world which is above hypotheses, in order that she may soar beyond them to the first principle of the whole; and clinging to this and then to that which depends on this, by successive steps she descends again without the aid of any sensible object, from ideas, through ideas, and in ideas she ends.   I understand you, he replied;[/INST]\nAnswer: "

In [None]:
tokens = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
ids = tokens["input_ids"].cuda()
ams = tokens["attention_mask"]

In [None]:
outputs = model.generate(input_ids=ids, attention_mask=ams, max_new_tokens=5, do_sample=False, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

In [None]:
outputs

tensor([[    1,   733, 16289, 28793,  1014,  3638,   349,   298,  1038,   264,
          1820, 16776,   477,  1741, 12280,   302,  2245, 28723,    13,  1014,
          4796,   460,  9207, 28719,   595,   515, 28725, 11424,   846,  7353,
         28725,   304,   418,   504,  2737,   477,  4797, 17780,   358, 28742,
         28713,   500,   346,   819,   274, 28725, 22263,   404,   486,  4797,
         17780,   358, 28725,  2985,   323,  3712,   486,  1896,   314,   662,
         11665, 28725,   304,  6090,   486,  1641,  1827, 28723,    13,    13,
          3390,   272,  7379, 28725,  4987,   272,  4714, 16776,   477,   272,
          1274,  3624, 28723,   995,   622,  3084,   264,  2692,  1707,  2899,
           477,   272,  1274,   395,   708, 13268, 28723,    13,    13, 28738,
         12034,   595,   515, 28705,    13, 28762, 28715,   846,  7353,    13,
         28759,   504,  2737,    13, 28757,   437,  2294,   404,    13,  9587,
           323,  3712,    13,  4781,   651,    13,  

In [None]:
decoded = tokenizer.batch_decode(outputs.detach().cpu().numpy())

In [None]:
decoded

["<s> [INST]The task is to make a book classification from small passage of text.\nThe books are Telemachia, Odyssey, and Nostros from James Joyce's Ulysses, Dubliners by James Joyce, Dracula by Bram Stoker, and Republic by Plato.\n\nRead the Text, choose the correct classification from the list below. You will provide a single word response from the list with no explanation.\n\nTelemachia \nOdyssey\nNostros\nDubliners\nDracula\nRepublic\n\nText: as steps and points of departure into a world which is above hypotheses, in order that she may soar beyond them to the first principle of the whole; and clinging to this and then to that which depends on this, by successive steps she descends again without the aid of any sensible object, from ideas, through ideas, and in ideas she ends.   I understand you, he replied;[/INST]\nAnswer:  Republic\n\nExplan"]

In [None]:
import re

def extract_answer(text):
    # The regex pattern looks for "Answer: " followed by any characters until the end of the line or string
    pattern = r'Answer: (.*?)(?=\n|$)'
    match = re.search(pattern, text)

    # If a match is found, return it; otherwise, return None or a suitable message
    if match:
        return match.group(1).strip()
    else:
        return None

In [None]:
print(f"Prompt:\n{prompt}\n")
print(f"-------------")
print(f"\nResponse: {extract_answer(decoded[0])}" )
print(f"-------------")
print(f"\nGround truth: {sample['output']}")

Prompt:
[INST]The task is to make a book classification from small passage of text.
The books are Telemachia, Odyssey, and Nostros from James Joyce's Ulysses, Dubliners by James Joyce, Dracula by Bram Stoker, and Republic by Plato.

Read the Text, choose the correct classification from the list below. You will provide a single word response from the list with no explanation.

Telemachia 
Odyssey
Nostros
Dubliners
Dracula
Republic

Text: as steps and points of departure into a world which is above hypotheses, in order that she may soar beyond them to the first principle of the whole; and clinging to this and then to that which depends on this, by successive steps she descends again without the aid of any sensible object, from ideas, through ideas, and in ideas she ends.   I understand you, he replied;[/INST]
Answer: 

-------------

Response: Republic
-------------

Ground truth: Republic
