# QLoRA with TinyLlama

## Templating Instruction Data
To have the LLM follow instructions, we will need to prepare instruction data that follows a chat template

In [1]:
from transformers import AutoTokenizer
from datasets import load_dataset

ModuleNotFoundError: No module named 'markupsafe'

In [None]:
template_tokz = AutoTokenizer.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v1.0')

Typically message format

In [None]:
"""
[ { "content": "These instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?
\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in 
settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\n
Does this feature apply to all sections of the theme or just specific ones as listed in the text material?", "role": "user" }, 
{ "content": "This feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.", 
"role": "assistant" }, { "content": "Can you guide me through the process of enabling the secondary image hover feature on my 
Collection pages and Featured Collections sections?", "role": "user" }, 
{ "content": "Sure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.", "role": "assistant" }, { "content": "Can you provide me with a link to the documentation for my theme?", "role": "user" }, { "content": "I don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.", "role": "assistant" }, { "content": "Can you confirm if this feature also works for the Quick Shop section of my theme?", "role": "user" }, { "content": "The secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.", "role": "assistant" } ]
"""

'\n[ { "content": "These instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\n\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme\'s built-in \nsettings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\n\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?", "role": "user" }, \n{ "content": "This feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.", \n"role": "assistant" }, { "content": "Can you guide me through the process of enabling the secondary image hover feature on my \nCollection pages and Featured Collections sections?", "role": "user" }, \n{ "content"

So let's write a format prompt function to turn our df into this message format for the model

In [None]:
def fmt_prompt(row):
    """Format the prompt to using the <|user|> template TinyLlama is using."""
    report_text = row['report_text']
    label = row['label']
    message = [{'role': 'user',
    'content': f"In the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. {report_text}"}, 
    {'role':'assistant', 'content': label}]
    prompt = template_tokz.apply_chat_template(message, tokenize=False)
    return {'text': prompt}

In [None]:
row = {'report_text': """BILATERAL SCREENING MAMMOGRAPHY.
History: Screening.
Comparison available dating from 01/2023.
Findings:
There are scattered fibroglandular densities bilaterally. No skin thickening or nipple retraction is seen. No grouped calcifications are identified. No spiculated or circumscribed masses are seen.
IMPRESSION:
No mammographic evidence of malignancy. BI-RADS Category 1.""", 'label': 'Negative'}

In [None]:
fmt_prompt(row)

{'text': "<|user|>\nIn the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. BILATERAL SCREENING MAMMOGRAPHY.\nHistory: Screening.\nComparison available dating from 01/2023.\nFindings:\nThere are scattered fibroglandular densities bilaterally. No skin thickening or nipple retraction is seen. No grouped calcifications are identified. No spiculated or circumscribed masses are seen.\nIMPRESSION:\nNo mammographic evidence of malignancy. BI-RADS Category 1.</s>\n<|assistant|>\nNegative</s>\n"}

Okay let's load our synthetic dataset

In [None]:
from pathlib import Path
import pandas as pd
from datasets import Dataset

In [None]:
DS_PATH = Path.cwd()/'data'

In [None]:
df = pd.read_csv(DS_PATH/'synth-itr1-n166.csv')
df = df.drop(columns=['Unnamed: 0']) # Let's remove this column from my df
df.head(2)

Unnamed: 0,report_text,label
0,BILATERAL SCREENING MAMMOGRAPHY.\nHistory: Scr...,Negative
1,Screening Mammogram\nClinical History: 48 year...,Negative


In [None]:
ds = Dataset.from_pandas(df) ; ds

Dataset({
    features: ['report_text', 'label'],
    num_rows: 166
})

In [None]:
ds = ds.map(fmt_prompt)

Map:   0%|          | 0/166 [00:00<?, ? examples/s]

Verify

In [None]:
ds['text'][55]

"<|user|>\nIn the following radiology report, classify the patient's current microcalcification status as Positive, Negative or Not Stated. MAMMOGRAM LEFT BREAST.\nHISTORY: Screening.\nCOMPARISONS: 09/20/2022\nTECHNIQUE: Standard CC and MLO views of the left breast were obtained.\nFINDINGS:\nThe breasts are heterogeneously dense. This may lower the sensitivity of mammography.\nThere are scattered fibroglandular densities. No definite mass, architectural distortion, suspicious calcifications, or skin thickening are seen.\nIMPRESSION:\nNegative left mammogram. BI-RADS 2.</s>\n<|assistant|>\nNegative</s>\n"

## Model Quantization
> Using bitsandbytes package to compress the pretrained model to a 4-bit quantization. Following the QLoRA paper and load the model in `4-bit`, normalized float representation and double quantization

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

Why the intermediate step? I'm not sure rn

In [None]:
mn = 'TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T'

In [None]:
# 4-bit quantization config - Q in QLoRA
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, # using 4-bit precision model loading
    bnb_4bit_quant_type='nf4', # quantization type
    bnb_4bit_compute_dtype='float16', #compute dtype
    bnb_4bit_use_double_quant=True # Apply nested quantization
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    mn,
    device_map='auto',
    quantization_config=bnb_config # leave this out for regular SFT
)

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend


RuntimeError: CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend

In [None]:
model.config.use_cache = False
model.config.pretraining_tp = 1