Step 1. Installing Necessary Libraries

Parameter-Efficient Fine-Tuning (PEFT) techniques facilitate the streamlined adaptation of pre-trained language models (PLMs) to diverse downstream applications, eliminating the need to fine-tune all parameters of the model.
1. The conventional approach of fine-tuning entire large-scale PLMs proves to be excessively expensive.
2. In contrast, PEFT methods selectively fine-tune only a limited subset of additional model parameters, significantly reducing both computational and storage expenses.
3. Recent advancements in PEFT have demonstrated performance levels comparable to those achieved through full fine-tuning, highlighting the efficacy of these resource-efficient techniques.

We have used Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam,

1.   LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.

2. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.

In [1]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.9/133.9 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.8/100.8 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━

Step 2 : Data is downloded from Kaggle [here](https://www.kaggle.com/competitions/instacart-market-basket-analysis/data).


In [2]:
import pandas as pd
df_product = pd.read_csv("/content/departments.csv")
df_dept = pd.read_csv('/content/products.csv')

Step 3 : Merging the data

In [3]:
df_joined = pd.merge(df_product, df_dept, on = ['department_id'])
df_joined['text'] = df_joined.apply(lambda row: row['product_name'] + " ->: " + row['department'], axis = 1)

Step 4 : Train test split

In [4]:
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df_joined, test_size=0.2, random_state=42)

In [5]:
train_df.head(10)

Unnamed: 0,department_id,department,product_id,product_name,aisle_id,text
7361,4,produce,38057,Spicy Organic Microgreens,83,Spicy Organic Microgreens ->: produce
32534,15,canned goods,32461,Spaghetti Cool Shapes Star Wars,59,Spaghetti Cool Shapes Star Wars ->: canned goods
1457,1,frozen,17540,Burnt Sugar Vanilla Ice Cream,37,Burnt Sugar Vanilla Ice Cream ->: frozen
5201,3,bakery,22335,Sliced Italian Bread,112,Sliced Italian Bread ->: bakery
38539,17,household,29431,Air Effects Value Pack Hawaiian Aloha,101,Air Effects Value Pack Hawaiian Aloha ->: hous...
18377,11,personal care,9014,Mint Mouthwash,20,Mint Mouthwash ->: personal care
28627,13,pantry,37330,Pure Cane Washed Raw Sugar,17,Pure Cane Washed Raw Sugar ->: pantry
30283,14,breakfast,11892,Mixed Berry BelVita Bites,48,Mixed Berry BelVita Bites ->: breakfast
32030,15,canned goods,20544,Peeled Whole Tomatoes,81,Peeled Whole Tomatoes ->: canned goods
28655,13,pantry,37534,Unprocessed Wheat Bran,17,Unprocessed Wheat Bran ->: pantry


In [6]:
test_df.head(10)

Unnamed: 0,department_id,department,product_id,product_name,aisle_id,text
33626,16,dairy eggs,5570,Coffee Rich Original Non-Dairy Creamer,53,Coffee Rich Original Non-Dairy Creamer ->: dai...
18192,11,personal care,7582,Cold Snap,11,Cold Snap ->: personal care
47099,19,snacks,49614,Sandies Pecan Shortbread Cookies,61,Sandies Pecan Shortbread Cookies ->: snacks
48183,20,deli,39968,Miso Soup,1,Miso Soup ->: deli
22197,11,personal care,37799,Body Envy Volumizing Shampoo,22,Body Envy Volumizing Shampoo ->: personal care
31573,15,canned goods,9984,Bean Salad,81,Bean Salad ->: canned goods
45362,19,snacks,35795,Mexican Restaurant Style Corn Tortilla Chips,107,Mexican Restaurant Style Corn Tortilla Chips -...
14131,7,beverages,47666,Grapefruit No Sugar Added 100% Juice,98,Grapefruit No Sugar Added 100% Juice ->: bever...
26903,13,pantry,21534,"Salsa, Diablo, Hot",51,"Salsa, Diablo, Hot ->: pantry"
39417,17,household,43688,Toilet Bowl Cleaner with Lime & Rust Remover,114,Toilet Bowl Cleaner with Lime & Rust Remover -...


In [7]:
from datasets import Dataset,DatasetDict
train_dataset_dict = DatasetDict({
    "train": Dataset.from_pandas(train_df),
})

In [8]:
train_dataset_dict

DatasetDict({
    train: Dataset({
        features: ['department_id', 'department', 'product_id', 'product_name', 'aisle_id', 'text', '__index_level_0__'],
        num_rows: 39750
    })
})

Step 5 : Loading the model

Load the [Falcon 7B model](https://huggingface.co/tiiuae/falcon-7b), quantize it in 4bit and attach LoRA adapters on it.

In [9]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "ybelkada/falcon-7b-sharded-bf16"
# model_name = "tiiuae/falcon-7b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

config.json:   0%|          | 0.00/581 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

pytorch_model-00001-of-00008.bin:   0%|          | 0.00/1.92G [00:00<?, ?B/s]

pytorch_model-00002-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

pytorch_model-00003-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

pytorch_model-00004-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

pytorch_model-00005-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

pytorch_model-00006-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

pytorch_model-00007-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

pytorch_model-00008-of-00008.bin:   0%|          | 0.00/921M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Let's also load the tokenizer below

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/180 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Step 6 : Check the base parameters for the model

In [11]:
import transformers
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)


sequences = pipeline(
   ["“Free & Clear Stage 4 Overnight Diapers” ->:","Bread Rolls ->:","French Milled Oval Almond Gourmande Soap ->:"],
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq[0]['generated_text']}")

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
The current implementation of Falcon calls `torch.scaled_dot_product_attention` directly, this will be deprecated in the future in favor of the `BetterTransformer` API. Please install the latest optimum library with `pip install -U optimum` and call `model.to_bettertransformer()` to benefit from `torch.scaled_dot_product_attention` and future performance optimizations.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


Result: “Free & Clear Stage 4 Overnight Diapers” ->: “Free & Clear Stage 4 Overnight Diapers” ->
“The Best Overnight Diapers for Babies: 2021 Review and Buyer’s Guide” ->
“Best Overnight Diapers for Adults: 2021 Review” ->
“Best Overnight Adult Diapers (Overnight Incontinence) – Top Picks” ->
“Best Overnight Diapers For Adults & Toddlers 2021: Reviews And Buying Guide” ->
“Best Adult Overnight Diapers of 2021: Reviews and Guide” ->
Best Overnight Diapers for Seniors (2021):
“The Best Overnight Diapers for Bed Wetting for Men and Women” ->
“Best Adult Overnight Diapers for Bed Wetting 2021: Top Reviews and Buying Guide” ->
“Best Adult Incontinence Products For Bedwetting Adults” ->
Result: Bread Rolls ->:->: Ingredients: (1 Cup) Wheat Flour, Maida (All Purpose Flour), Oil, Milk, Water, Sugar, Egg, Salt, Yeast, Milk Powder, Baking Powder, Bread Crumbs. Method of Preparation: (1) Take a bowl, put wheat flour, salt, sugar, & egg. (2) Add enough water to knead it (3) Add milk, oil, & mix it

Step 7 : Load the configuration file in order to create the LoRA model. According to QLoRA paper, it is important to consider all linear layers in the transformer block for maximum performance. Therefore we will add `dense`, `dense_h_to_4_h` and `dense_4h_to_h` layers in the target modules in addition to the mixed query key value layer.

In [12]:
from peft import LoraConfig

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

Step 8 : Load the trainer

Use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below.

In [13]:
from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 1
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 120 #500
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

Step 9 : Pass everthing to the trainer

In [14]:
from trl import SFTTrainer

max_seq_length = 512

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_dict['train'],
    # train_dataset=data['train'],
    peft_config=peft_config,
    dataset_text_field="text",
    # dataset_text_field="prediction",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)



Map:   0%|          | 0/39750 [00:00<?, ? examples/s]

Step 10 :Pre-process the model by upcasting the layer norms in float 32 for more stable training


In [15]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

Step 11 : Train the model using trainer.train()

In [16]:
trainer.train()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,3.6912
2,4.0539
3,4.0929
4,4.2015
5,3.5446
6,3.273
7,2.7623
8,2.5928
9,2.6596
10,2.5903


TrainOutput(global_step=120, training_loss=2.265092126528422, metrics={'train_runtime': 591.0539, 'train_samples_per_second': 3.248, 'train_steps_per_second': 0.203, 'total_flos': 937141044034560.0, 'train_loss': 2.265092126528422, 'epoch': 0.05})

In [17]:
lst_test_data = list(test_df['text'])

In [18]:
len(lst_test_data)

9938

In [19]:
sample_size = 25
lst_test_data_short = lst_test_data[:sample_size]

Step 12 : Import Transformers

In [20]:
import transformers

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    # torch_dtype=torch.bfloat16,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map="auto",
)

sequences = pipeline(
    lst_test_data_short,
    max_length=100,  #200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)

for ix,seq in enumerate(sequences):
    print(ix,seq[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_tok

0 Coffee Rich Original Non-Dairy Creamer ->: dairy eggs meat seafood produce dry goods pantry canned goods canned goods coffee ->: beverages coffee dairy eggs meat seafood produce dry goods pantry canned goods canned goods coffee -> beverages -> dairy -> eggs -> meat seafood -> produce dry goods -> pantry canned goods canned goods coffee beverages dairy eggs meat seafood produce dry goods pantry canned goods canned goods beverages dairy eggs meat seafood produce dry goods -> pantry canned goods canned goods coffee beverages dairy
1 Cold Snap ->: personal care personal care other personal care household products air fresheners air fresheners ->: household pets pets dogs cats pets household pets cats supplies cats litter ->: pets pets supplies cats supplies litter household pets household cleaners household cleaning household supplies household supplies paper goods household supplies household supplies personal care ->: personal care personal care other personal care household personal c

Step 13: Get the answers

In [21]:
def correct_answer(ans):
  return (ans.split("->:")[1]).strip()

answers = []
for ix,seq in enumerate(sequences):
    # print(ix,seq[0]['generated_text'])
    answers.append(correct_answer(seq[0]['generated_text']))

answers

['dairy eggs meat seafood produce dry goods pantry canned goods canned goods coffee',
 'personal care personal care other personal care household products air fresheners air fresheners',
 'snacks cookies and crackers',
 'deli international international deli international soup',
 'personal care > hair care > shampoo',
 'canned goods pasta',
 'snacks -> international snacks -> mexican snacks -> tortilla chips snacks -> international snacks -> mexican snacks -> chips -> snacks international snacks mexican snacks chips snacks snack foods chips snacks tortilla chips -> snacks snacks other snacks snacks snacks -> chips snacks international snacks mexican snacks snacks snacks international snacks mexican snacks snacks -> chips snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks snacks',
 'beverages international canned',
 'pantry missing in action salsa',
 'household supplies personal care h

Step 14 : Evaluate the response

In [22]:
df_evaluate = test_df.iloc[:sample_size][['product_name','department']]

df_evaluate = df_evaluate.reset_index(drop=True)

df_evaluate['department_predicted'] = answers

df_evaluate

Unnamed: 0,product_name,department,department_predicted
0,Coffee Rich Original Non-Dairy Creamer,dairy eggs,dairy eggs meat seafood produce dry goods pant...
1,Cold Snap,personal care,personal care personal care other personal car...
2,Sandies Pecan Shortbread Cookies,snacks,snacks cookies and crackers
3,Miso Soup,deli,deli international international deli internat...
4,Body Envy Volumizing Shampoo,personal care,personal care > hair care > shampoo
5,Bean Salad,canned goods,canned goods pasta
6,Mexican Restaurant Style Corn Tortilla Chips,snacks,snacks -> international snacks -> mexican snac...
7,Grapefruit No Sugar Added 100% Juice,beverages,beverages international canned
8,"Salsa, Diablo, Hot",pantry,pantry missing in action salsa
9,Toilet Bowl Cleaner with Lime & Rust Remover,household,household supplies personal care household sup...
