## Setup

We begin with some initial setup downloading a few packages we need, including `peft` (the Huggingface Parameter Efficient Finetuning library) as well as the Huggingface `transformers` library.  

In [None]:
!pip3 install -q -U bitsandbytes
!pip3 install -q -U peft
!pip3 install -q -U trl
!pip3 install -q -U datasets
!pip3 install -q -U transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.9/190.9 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m35.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.3/155.3 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.8/79.8 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
[?25h

A few of the improts below are new to us: `torch` (for pytorch), the `transformers` library (and several packages within it), `trl` (a library for finetuning) and `peft` itself.

In [None]:
from google.colab import userdata, drive
import os, random, json
from datetime import datetime
import pandas as pd

import torch

import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GemmaTokenizer

from trl import SFTTrainer
from peft import PeftModelForCausalLM, LoraConfig

from datasets import Dataset

Severeal huggingface models require an authentication token; you can obtain this from Huggingface and then store it under your colab 'secrets'

In [None]:
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')
drive.mount('/content/drive')

Mounted at /content/drive


Next, lets make sure that we're using an A-100 GPU; this is the smallest single GPU configuration that will work with the Llama-7B model.

In [None]:
!nvidia-smi

Mon Mar 11 13:34:50 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0              41W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Download and Run Mistral 7B

In [None]:
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"":0}, token=os.environ['HF_TOKEN'])

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Notice the flow above to download a pre-trained model: we grab a model-id (in this case obtained from [here](https://huggingface.co/mistralai)), then instantiate a tokneizer for the model (why is this model specific?) and finally instantiate the model itself.

The Mistral model works best with instructions and prompts in a specific format (this is how it was pre-trained), and so we setup that format below. Read more [here](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2#instruction-format).

What we need should read like:

`<s>[INST] What is your favourite condiment? [/INST]
Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s>
`


In [None]:
template_mx_instruction = "<s>[INST]{instruction}: {context}[\INST]Review: {response}</s>"
template_mx_prompt = "<s>[INST]{instruction}: {context}[\INST]Review:"

Lets ask the model to generate a positive review of my favorite jeans. First we put this into the instruction format above..

In [None]:
instruction = "Write a positive review for the following product"
context = '''FIT 2. Our best-selling fit. Mid-rise with a slim fit and a tailored leg. Constructed in Aero Stretch - lightweight, modern comfort with superior stretch and retention.
In Limestone, a beige garment dye with a clean hem. Machine Washable. Cotton. Mid rise slim fit, tailored through hip and thigh 11" rise, 32" inseam, 13 1/2" leg opening.
Model is 6'3" wearing a size 32.'''

rev_prompt = template_mx_prompt.format(
    instruction = instruction,
    context = context,
)

print(rev_prompt)

<s>[INST]Write a positive review for the following product: FIT 2. Our best-selling fit. Mid-rise with a slim fit and a tailored leg. Constructed in Aero Stretch - lightweight, modern comfort with superior stretch and retention.
In Limestone, a beige garment dye with a clean hem. Machine Washable. Cotton. Mid rise slim fit, tailored through hip and thigh 11" rise, 32" inseam, 13 1/2" leg opening.
Model is 6'3" wearing a size 32.[\INST]Review:


And next, lets pass it to the model. First we tokenize (this generates a tensor), and then pass that input tensor to the model. We follow this up by decoding the output tnesor produced by the model

In [None]:
device = "cuda:0"
inputs = tokenizer(rev_prompt, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST]Write a positive review for the following product: FIT 2. Our best-selling fit. Mid-rise with a slim fit and a tailored leg. Constructed in Aero Stretch - lightweight, modern comfort with superior stretch and retention.
In Limestone, a beige garment dye with a clean hem. Machine Washable. Cotton. Mid rise slim fit, tailored through hip and thigh 11" rise, 32" inseam, 13 1/2" leg opening.
Model is 6'3" wearing a size 32.[\INST]Review:

I recently had the pleasure of trying on the FIT 2 mid-rise slim fit pants from [Brand Name], and I must say, I was thoroughly impressed! The mid-rise design sits comfortably on my waist, providing a flattering and secure fit. The slim fit and tailored leg shape accentuates my figure in all the right places, making me feel confident and stylish.

The Aero Stretch fabric used in these pants is truly a game-changer. It's lightweight, modern, and offers superior stretch and retention, allowing me the freedom to move around easily


Here is the review that was produced:

`
I recently had the pleasure of trying on the FIT 2 mid-rise slim fit pants from [Brand Name], and I must say, I was thoroughly impressed! The mid-rise design sits comfortably on my waist, providing a flattering and secure fit. The slim fit and tailored leg shape accentuates my figure in all the right places, making me feel confident and stylish. The Aero Stretch fabric used in these pants is truly a game-changer. It's lightweight, modern, and offers superior stretch and retention, allowing me the freedom to move around easily
`



Hmm.. what do you think?

## Finetuning Mistral-7B with LoRA

If like me you felt that the out of the box review reads like marketing copy, lets finetune! Lets grab some custom instructions. This instruction set consists of 2850 product-description/ review pairs obtained from this [Amazon Dataset](https://https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/). Specifically, we took the [Clothing, Jewelry and Shoes](https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/categoryFiles/Clothing_Shoes_and_Jewelry.json.gz) portion of the data, and then filtered down to recent reviews on popular products. No post processing/ cleaning was done on the data beyond this filtering, and so as you can see, it can be somewhat messy. As an aside, we generated the 'negative review', 'netural review', and 'positive review' instructions corresponding to a pair based on the review's star rating.


In [None]:
os.system("wget -O review_instructions.json https://www.dropbox.com/scl/fi/sei6z5lxsbeikbr8zjt7s/review_instructions.json?rlkey=au3np38hyww5of251m6llbela&dl=1")

0

In [None]:
df = pd.read_json('/content/review_instructions.json')
ds = Dataset.from_pandas(df)

Lets take a look...

In [None]:
df.head()

Unnamed: 0,instruction,input,response
0,Write a negative review for the following product,"The iconic, timeless chuck taylor all star sne...",I love the classic look of Chuck Taylors I lov...
1,Write a negative review for the following product,"The iconic, timeless chuck taylor all star sne...",Sizes are inaccurate! I already returned/repla...
2,Write a neutral review for the following product,Move worry-free with this wireless sports bra ...,"If you can't buy a bra in any physical store, ..."
3,Write a negative review for the following product,Berkshire Style 4489 Silky Sheer Control Top P...,"Great fit, poor quality The stockings were bou..."
4,Write a positive review for the following product,One of our bestselling bras! Why have countles...,Great Support & wear well at a Reasonable pric...


Now lets setup our finetune via Lora. This requires instantiating a 'LoraConfig' object. Notice that the key parameter here is the rank, `r`,here set to 4. We ask Lora to modify the Q,K and V matrices along with the output projection and the linear feedforward layers.  

In [None]:
lora_config = LoraConfig(
    r=4,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)

Next, we define a formatting function that will be required by our finetuner. This function will simply take in a batch of instructions and format them so that they match with the format of the instructions the model for instruction tuned with; all we are really doing is apply the template we defined earlier, `template_mx_instruction`.

In [None]:
def formatting_func(example):
    prompts = []
    for i in range(len(example['instruction'])):
      prompt = template_mx_instruction.format(
          instruction=example['instruction'][i],
          context=example['input'][i],
          response=example['response'][i],
      )
      prompts.append(prompt)
    return prompts

Next we define our supervised fine tuning trainer. This is part of a linrary (`trl`) that handles all of the gory details of finetuning for us. The key inputs, viz. `model`, `tokenizer`, and `train dataset` you already recongize, along with `peft_config` which we defined earlier and the formatting function.

The remainder of the inputs are passed in as `TrainingArguments`. The main thing to look at here is `per_device_train_batch_size`: we can fit in a batch size of 1 given our limited memory.

Finally notice that we enable gradient checkpointing: this minimizes the memory overhead during gradient computation (the second bucket of memory consumption we discussed in the slides)





In [None]:
tokenizer.pad_token = tokenizer.eos_token #reset certain tokens; specific to the mistral model.
model.gradient_checkpointing_enable()

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=ds,
    max_seq_length=512,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        num_train_epochs=1,
        # max_steps=10,
        learning_rate=2e-4,
        fp16= True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)



Map:   0%|          | 0/2850 [00:00<?, ? examples/s]

Let's train this model!

In [None]:
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss
1,3.7144
2,3.7001
3,3.5014
4,2.8608
5,2.3524
6,2.3875
7,2.4946
8,2.2509
9,2.2035
10,2.2434


TrainOutput(global_step=712, training_loss=1.8313453875565797, metrics={'train_runtime': 1041.3711, 'train_samples_per_second': 2.737, 'train_steps_per_second': 0.684, 'total_flos': 4.43723298243625e+16, 'train_loss': 1.8313453875565797, 'epoch': 1.0})

In [None]:
trainer.save_model()

In [None]:
WEIGHT_PATH = '/content/drive/MyDrive/ColabData/Amazon/mx_ft-'+datetime.today().isoformat()
os.system("cp -r /content/outputs "+ WEIGHT_PATH)

0

## Load adapters and evaluate fine-tuned model

In [None]:
WEIGHT_PATH = "/content/drive/MyDrive/ColabData/Amazon/mx_ft-2024-03-07T21:29:45.835355"
os.system("cp -r "+ WEIGHT_PATH + " /content/outputs")

0

We download the LoRA adapters we saved earlier and use these to directly update the weights of our model.

In [None]:
adapters_path = '/content/outputs/'

model_ft = PeftModelForCausalLM.from_pretrained(model, adapters_path)
model_ft = model_ft.merge_and_unload()

Lets pass it the same prompt as earlier to get a positive review on my jeans...

In [None]:
device = "cuda:0"
inputs = tokenizer(rev_prompt, return_tensors="pt").to(device)

outputs = model_ft.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST]Write a positive review for the following product: FIT 2. Our best-selling fit. Mid-rise with a slim fit and a tailored leg. Constructed in Aero Stretch - lightweight, modern comfort with superior stretch and retention.
In Limestone, a beige garment dye with a clean hem. Machine Washable. Cotton. Mid rise slim fit, tailored through hip and thigh 11" rise, 32" inseam, 13 1/2" leg opening.
Model is 6'3" wearing a size 32.[\INST]Review: Great fit, great color, great price. I bought these for my husband and he loves them.  He is 6'3" and 180 lbs.  He wears a 32x32 in jeans.  These fit him perfectly.  The color is a nice light beige.  They are a little baggy in the seat, but that is how he likes them.  He is very happy with them.  I bought them for him for Christmas and he has already worn them several times.  I would definitely buy them again.  I would recommend them to anyone.  They


Here's the review:

`Great fit, great color, great price. I bought these pants for my husband who is 6'3" and 180 lbs. He wears a 32x32 in jeans. These pants fit him perfectly. They are a little baggy in the seat and thighs, but that is how he likes them. The color is a nice light gray. They are a little more of a gray than the picture shows, but that is okay with him. He likes the color. The price was great. He will probably buy another pair. I would definitely buy these pants again.`


So which review is better? Why don't we ask GPT4!

Specifically, here is what we ask GPT4:

`Below are two reviews on the following product: FIT 2. Our best-selling fit. Mid-rise with a slim fit and a tailored leg. Constructed in Aero Stretch - lightweight, modern comfort with superior stretch and retention. In Limestone, a beige garment dye with a clean hem. Machine Washable. Cotton. Mid rise slim fit, tailored through hip and thigh 11" rise, 32" inseam, 13 1/2" leg opening. Model is 6'3" wearing a size 32.`

`Review 1: I recently had the pleasure of trying on the FIT 2 mid-rise slim fit pants from [Brand Name], and I must say, I was thoroughly impressed! The mid-rise design sits comfortably on my waist, providing a flattering and secure fit. The slim fit and tailored leg shape accentuates my figure in all the right places, making me feel confident and stylish. The Aero Stretch fabric used in these pants is truly a game-changer. It's lightweight, modern, and offers superior stretch and retention, allowing me the freedom to move around easily`

`Review 2: Great fit, great color, great price. I bought these pants for my husband who is 6'3" and 180 lbs. He wears a 32x32 in jeans. These pants fit him perfectly. They are a little baggy in the seat and thighs, but that is how he likes them. The color is a nice light gray. They are a little more of a gray than the picture shows, but that is okay with him. He likes the color. The price was great. He will probably buy another pair. I would definitely buy these pants again.
Which of the above reviews appears more relaistic and authentic? Please be brief`

And here is what ChatGPT gave me:


`Review 2 seems more realistic and authentic because it provides specific details about the buyer's experience, including the recipient's height, weight, and fit preference. It also mentions a slight discrepancy in color expectation versus reality, which is a common detail in genuine reviews. The mention of considering a repeat purchase adds to its authenticity.`