# QLoRA with TinyLlama

## Templating Instruction Data
To have the LLM follow instructions, we will need to prepare instruction data that follows a chat template

In [1]:
from transformers import AutoTokenizer
from datasets import load_dataset

In [2]:
template_tokz = AutoTokenizer.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v1.0')

Typically message format

In [3]:
"""
[ { "content": "These instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?
\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in 
settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\n
Does this feature apply to all sections of the theme or just specific ones as listed in the text material?", "role": "user" }, 
{ "content": "This feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.", 
"role": "assistant" }, { "content": "Can you guide me through the process of enabling the secondary image hover feature on my 
Collection pages and Featured Collections sections?", "role": "user" }, 
{ "content": "Sure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.", "role": "assistant" }, { "content": "Can you provide me with a link to the documentation for my theme?", "role": "user" }, { "content": "I don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.", "role": "assistant" }, { "content": "Can you confirm if this feature also works for the Quick Shop section of my theme?", "role": "user" }, { "content": "The secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.", "role": "assistant" } ]
"""

'\n[ { "content": "These instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\n\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme\'s built-in \nsettings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\n\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?", "role": "user" }, \n{ "content": "This feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.", \n"role": "assistant" }, { "content": "Can you guide me through the process of enabling the secondary image hover feature on my \nCollection pages and Featured Collections sections?", "role": "user" }, \n{ "content"

So let's write a format prompt function to turn our df into this message format for the model

In [4]:
def fmt_prompt(row):
    """Format the prompt to using the <|user|> template TinyLlama is using."""
    report_text = row['report_text']
    label = row['label']
    message = [{'role': 'user',
    'content': f"In the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. {report_text}"}, 
    {'role':'assistant', 'content': label}]
    prompt = template_tokz.apply_chat_template(message, tokenize=False)
    return {'text': prompt}

In [5]:
row = {'report_text': """BILATERAL SCREENING MAMMOGRAPHY.
History: Screening.
Comparison available dating from 01/2023.
Findings:
There are scattered fibroglandular densities bilaterally. No skin thickening or nipple retraction is seen. No grouped calcifications are identified. No spiculated or circumscribed masses are seen.
IMPRESSION:
No mammographic evidence of malignancy. BI-RADS Category 1.""", 'label': 'Negative'}

In [6]:
fmt_prompt(row)

{'text': "<|user|>\nIn the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. BILATERAL SCREENING MAMMOGRAPHY.\nHistory: Screening.\nComparison available dating from 01/2023.\nFindings:\nThere are scattered fibroglandular densities bilaterally. No skin thickening or nipple retraction is seen. No grouped calcifications are identified. No spiculated or circumscribed masses are seen.\nIMPRESSION:\nNo mammographic evidence of malignancy. BI-RADS Category 1.</s>\n<|assistant|>\nNegative</s>\n"}

Okay let's load our synthetic dataset

In [7]:
from pathlib import Path
import pandas as pd
from datasets import Dataset

In [8]:
DS_PATH = Path.cwd()/'data'

In [9]:
df = pd.read_csv(DS_PATH/'synth-train-n1132.csv')
df = df.drop(columns=['Unnamed: 0']) # Let's remove this column from my df
df.head(2)

Unnamed: 0,report_text,label
0,"REPORT:\n""BILATERAL SCREENING MAMMOGRAM\nCLINI...",Positive
1,SCREENING MAMMOGRAM.\nCompared to Previous: Ye...,Not Stated


In [10]:
ds = Dataset.from_pandas(df) ; ds

Dataset({
    features: ['report_text', 'label'],
    num_rows: 1132
})

In [11]:
ds = ds.map(fmt_prompt)

Map:   0%|          | 0/1132 [00:00<?, ? examples/s]

Verify

In [35]:
ds['text'][55]

"<|user|>\nIn the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. BILATERAL SCREENING MAMMOGRAPHY\nHistory: Screening mammogram.\nComparison available dating from 02/2022.\nFindings:\nThere are scattered fibroglandular densities bilaterally. No skin thickening or nipple retraction is seen. No grouped calcifications are identified. No spiculated or circumscribed masses are seen.\nIMPRESSION:\nNo mammographic evidence of malignancy. BI-RADS Category 1.</s>\n<|assistant|>\nNegative</s>\n"

## Model Quantization
> Using bitsandbytes package to compress the pretrained model to a 4-bit quantization. Following the QLoRA paper and load the model in `4-bit`, normalized float representation and double quantization

In [12]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

Why the intermediate step? I'm not sure rn

In [13]:
mn = 'TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T'

In [14]:
# 4-bit quantization config - Q in QLoRA
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, # using 4-bit precision model loading
    bnb_4bit_quant_type='nf4', # quantization type
    bnb_4bit_compute_dtype='float16', #compute dtype
    bnb_4bit_use_double_quant=True # Apply nested quantization
)

In [15]:
model = AutoModelForCausalLM.from_pretrained(
    mn,
    device_map='auto',
    quantization_config=bnb_config # leave this out for regular SFT
)

In [16]:
model.config.use_cache = False
model.config.pretraining_tp = 1

In [17]:
# Load LLaMa Tokz
tokz = AutoTokenizer.from_pretrained(mn, trust_remote_code=True)
tokz.pad_token="<PAD>"
tokz.padding_side='left'

Note: This quantization procedure allows us to decrease the size of the original model while retaining most of hte orignal weights' precision. Loading the model now requires ~1GB VRAM compared to the 4GB of VRAM it would need without quantization. Note that during fine-tuning, more VRAM will be necessary so it does not cap out on the 1GB VRAM needed to load the model.

In [18]:
"""We need to define our LoRA configuration using the `peft` library which represents hyperparameters of the ft"""
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

A few questions for myself: how do we know about target modules? and how to set these settings

In [19]:
# Prepare LoRA config
peft_config = LoraConfig(
    lora_alpha=32, # LoRA scaling
    lora_dropout=0.1, # dropout for LoRA layers
    r=64, # rank
    bias="none",
    task_type='CAUSAL_LM',
    target_modules= ["k_proj", "gate_proj", "v_proj", "up_proj", "q_proj", "o_proj", "down_proj"] #layers to target
)

In [20]:
# Prepare model for training
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

#### Definitions:

`r`
This is the rank of the compressed matrices (recall this from Figure 12-13) Increasing this value will also increase the sizes of compressed matrices leading to less compression and thereby improved representative power. Values typically range between 4 and 64.


`lora_alpha`
Controls the amount of change that is added to the original weights. In essence, it balances the knowledge of the original model with that of the new task. A rule of thumb is to choose a value twice the size of r.


`target_modules`
Controls which layers to target. The LoRA procedure can choose to ignore specific layers, like specific projection layers. This can speed up training but reduce performance and vice versa.


## Training Configuration

In [21]:
from transformers import TrainingArguments

In [22]:
output_dir = Path.cwd()/'models' ; output_dir

PosixPath('/teamspace/studios/this_studio/models')

In [23]:
bs = 2
lr = 2e-4

`num_train_epochs`
The total number of training rounds. Higher values tend to degrade performance so we generally like to keep this low.


`learning_rate`
Determines the step size at each iteration of weight updates. The authors of QLoRA found that higher learning rates work better for larger models (>33B parameters).


`lr_scheduler_type`
A cosine-based scheduler to adjust the learning rate dynamically. It will linearly increase the learning rate, starting from zero, until it reaches the set value. After that, the learning rate is decayed following the values of a cosine function.


`optim`
The paged optimizers used in the original QLoRA paper.

In [24]:
training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=bs,
    gradient_accumulation_steps=4,
    optim='paged_adamw_32bit',
    learning_rate=lr,
    lr_scheduler_type='cosine',
    num_train_epochs=1,
    logging_steps=10,
    fp16=True,
    gradient_checkpointing=True
)

Why not HF Trainer?

In [25]:
from trl import SFTTrainer

In [26]:
trainer = SFTTrainer(
    model=model,
    train_dataset=ds,
    dataset_text_field="text",
    tokenizer=tokz,
    args=training_arguments,
    max_seq_length=512,

    # Leave this out for regular SFT
    peft_config=peft_config,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/1132 [00:00<?, ? examples/s]



In [27]:
trainer.train()

Step,Training Loss
10,1.8699
20,1.11
30,0.9227
40,0.8506
50,0.767
60,0.7721
70,0.768
80,0.6632
90,0.7337
100,0.6822


TrainOutput(global_step=141, training_loss=0.8366218462903449, metrics={'train_runtime': 215.0656, 'train_samples_per_second': 5.264, 'train_steps_per_second': 0.656, 'total_flos': 2645407998074880.0, 'train_loss': 0.8366218462903449, 'epoch': 0.9964664310954063})

In [28]:
trainer.model.save_pretrained("TinyLlama-1.1B-qlora")


## Merge Weights
> fter we have trained our QLoRA weights, we still need to combine them with the original weights to use them. We reload the model in 16 bits, instead of the quantized 4 bits, to merge the weights. Although the tokenizer was not updated during training, we save it to the same folder as the model for easier access:

In [29]:
from peft import AutoPeftModelForCausalLM

In [30]:
model = AutoPeftModelForCausalLM.from_pretrained(
    'TinyLlama-1.1B-qlora',
    low_cpu_mem_usage=True,
    device_map='auto'
)

In [31]:
# Merge LoRA and base model
merged_model = model.merge_and_unload()

After merging the adapter with base model, we can use it with the prompt template

In [32]:
from transformers import pipeline

In [33]:
prompt = """<|user|>
In the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. Microcalcifications right breast with underlying dilated ducts, most of which are related to milk of calcium within microcysts and some indeterminate. Ultrasound guided core biopsy in the area of the palpable lump containing the microcalcifications will be done done at 6:00 today.</s>
<|assistant|>
"""

In [34]:
# Run our instruction-tuned model
pipe = pipeline(task="text-generation", model=merged_model, tokenizer=tokz)
print(pipe(prompt)[0]["generated_text"])

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


<|user|>
In the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. Microcalcifications right breast with underlying dilated ducts, most of which are related to milk of calcium within microcysts and some indeterminate. Ultrasound guided core biopsy in the area of the palpable lump containing the microcalcifications will be done done at 6:00 today.</s>
<|assistant|>
Negative


Okay I expected this to output 