##### What is this notebook about?
- This notebook shows how to finetune LLM that has been already finetuned on instruction dataset, using hugging face trainer. 
- Llama 3.2 1B Instruct model was used as an example

In [1]:
# Set cuda device
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "3"

# Conda env: 
# Setup: conda env create -f environment_mlenv2
# Activate: conda activate mlenv2

In [2]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    GenerationConfig,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
    Trainer
)
from peft import (
    LoraConfig,
    PeftModel,
    prepare_model_for_kbit_training,
    get_peft_model,
)
import os, torch #, wandb
from datasets import load_dataset
from trl import SFTTrainer, setup_chat_format

import bitsandbytes as bnb

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# from huggingface_hub import login
# from kaggle_secrets import UserSecretsClient
# user_secrets = UserSecretsClient()
# hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
# login(token = hf_token)

# wb_token = user_secrets.get_secret("wandb")
# wandb.login(key=wb_token)
# run = wandb.init(
#     project='Fine-tune Llama 3.2 on Customer Support Dataset', 
#     job_type="training", 
#     anonymous="allow"
# )

In [4]:
#base_model = "/kaggle/input/llama-3.2/transformers/3b-instruct/1"
base_model = "meta-llama/Llama-3.2-1B-Instruct"
new_model = "output_dir/Ecommerce-ChatBot-Instruct"
dataset_name = "bitext/Bitext-customer-support-llm-chatbot-training-dataset"

In [5]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

In [6]:
#Importing the dataset
dataset = load_dataset(dataset_name, split="train")
dataset

Dataset({
    features: ['flags', 'instruction', 'category', 'intent', 'response'],
    num_rows: 26872
})

In [7]:
# Check a sample & check format

instruction = """You are a top-rated customer service agent named John. 
    Be polite to customers and answer all their questions.
    """
sample_instruction = dataset['instruction'][3]
sample_response = dataset['response'][3]
print("Sample instruction:", sample_instruction[:100])
print("Sample response:", sample_response[:100])

message = [ #{"role": "system", "content": instruction },
            {"role": "user", "content": sample_instruction},
            {"role": "assistant", "content": sample_response}]
    
tokenized_chat = tokenizer.apply_chat_template(message, tokenize=False)
tokenized_chat
#print(tokenizer.decode(tokenized_chat[0]))


Sample instruction: I need to cancel purchase {{Order Number}}
Sample response: I understood that you need assistance with canceling your purchase with the order number {{Order Num


'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 10 Apr 2025\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI need to cancel purchase {{Order Number}}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI understood that you need assistance with canceling your purchase with the order number {{Order Number}}. We apologize for any inconvenience this may have caused. To initiate the cancellation process, please follow these steps:\n\n1. Sign in to your account: Visit our {{Online Company Portal Info}} and login using your credentials.\n2. Access your order history: Once logged in, navigate to the "{{Online Order Interaction}}" or "{{Online Order Interaction}}" section.\n3. Locate the purchase: Look for the order with the specific order number {{Order Number}}.\n4. Cancel the purchase: Click on the "{{Online Order Interaction}}" option associated with the purchase.\n5. Confirm the cancellation: Follow any fur

In [8]:
# Select subset of data for train/test & check template 

#dataset = dataset.shuffle(seed=65).select(range(5000))
dataset = dataset.select(range(5000))
instruction = """You are a top-rated customer service agent named John. 
    Be polite to customers and answer all their questions.
    """
def format_chat_template(row):
    
    row_json = [{"role": "system", "content": instruction },
               {"role": "user", "content": row["instruction"]},
               {"role": "assistant", "content": row["response"]}]
    
    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

dataset = dataset.map(
    format_chat_template,
    num_proc= 4,
)
print(dataset)

dataset['text'][3]

Map (num_proc=4): 100%|██████████| 5000/5000 [00:00<00:00, 10900.27 examples/s]

Dataset({
    features: ['flags', 'instruction', 'category', 'intent', 'response', 'text'],
    num_rows: 5000
})





'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 10 Apr 2025\n\nYou are a top-rated customer service agent named John. \n    Be polite to customers and answer all their questions.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI need to cancel purchase {{Order Number}}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI understood that you need assistance with canceling your purchase with the order number {{Order Number}}. We apologize for any inconvenience this may have caused. To initiate the cancellation process, please follow these steps:\n\n1. Sign in to your account: Visit our {{Online Company Portal Info}} and login using your credentials.\n2. Access your order history: Once logged in, navigate to the "{{Online Order Interaction}}" or "{{Online Order Interaction}}" section.\n3. Locate the purchase: Look for the order with the specific order number {{Order Number}}.\n4. Cancel the purchase: Click on t

In [9]:
# Split dataset into train & test
dataset = dataset.train_test_split(test_size=0.1)
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['flags', 'instruction', 'category', 'intent', 'response', 'text'],
        num_rows: 4500
    })
    test: Dataset({
        features: ['flags', 'instruction', 'category', 'intent', 'response', 'text'],
        num_rows: 500
    })
})


In [10]:
# Load models

# Set torch dtype and attention implementation
if torch.cuda.get_device_capability()[0] >= 8:
    torch_dtype = torch.bfloat16
    attn_implementation = "flash_attention_2"
else:
    torch_dtype = torch.float16
    attn_implementation = "eager"
print(torch_dtype, attn_implementation)

# QLoRA config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch_dtype,
    bnb_4bit_use_double_quant=True,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    attn_implementation=attn_implementation
)


torch.bfloat16 flash_attention_2


In [11]:
# Get modules for LoRA
def find_all_linear_names(model):
    cls = bnb.nn.Linear4bit
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            #print(name)
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names:  # needed for 16 bit
        lora_module_names.remove('lm_head')
    return list(lora_module_names)
modules = find_all_linear_names(model)
print(modules)

['o_proj', 'k_proj', 'down_proj', 'v_proj', 'up_proj', 'q_proj', 'gate_proj']


In [12]:
# LoRA config
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=modules
)
#tokenizer.chat_template = None # sbujimal added
#model, tokenizer = setup_chat_format(model, tokenizer) # sbujimal commented out
model = get_peft_model(model, peft_config)

In [13]:
#Hyperparamter
training_arguments = TrainingArguments(
    output_dir=new_model,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    gradient_accumulation_steps=1, #2,
    optim="paged_adamw_32bit",
    num_train_epochs=1,
    eval_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=10,
    logging_strategy="steps",
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    #report_to="wandb"
    report_to="tensorboard"

)


In [14]:
# Setting sft parameters
trainer = SFTTrainer( #Trainer( 
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    #peft_config=peft_config,
    #max_seq_length= 512,
    #dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    #packing= False,
)

  trainer = SFTTrainer( #Trainer(
Converting train dataset to ChatML: 100%|██████████| 4500/4500 [00:00<00:00, 15229.46 examples/s]
Applying chat template to train dataset: 100%|██████████| 4500/4500 [00:00<00:00, 24294.90 examples/s]
Tokenizing train dataset: 100%|██████████| 4500/4500 [00:02<00:00, 2107.00 examples/s]
Truncating train dataset: 100%|██████████| 4500/4500 [00:00<00:00, 5497.38 examples/s]
Converting eval dataset to ChatML: 100%|██████████| 500/500 [00:00<00:00, 14404.31 examples/s]
Applying chat template to eval dataset: 100%|██████████| 500/500 [00:00<00:00, 21014.81 examples/s]
Tokenizing eval dataset: 100%|██████████| 500/500 [00:00<00:00, 2240.46 examples/s]
Truncating eval dataset: 100%|██████████| 500/500 [00:00<00:00, 5448.59 examples/s]
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_nam

In [15]:
# Disable caching k, v. Its on by default in model config. Not useful for training, only needed for generation
model.config.use_cache = False
# Pad token was not set by default
tokenizer.pad_token = tokenizer.eos_token

# Train
trainer.train()

Step,Training Loss,Validation Loss
57,0.6208,0.667337
114,0.5851,0.564961
171,0.5213,0.506749
228,0.4866,0.482343


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


TrainOutput(global_step=282, training_loss=0.6259131219158781, metrics={'train_runtime': 69.4425, 'train_samples_per_second': 64.802, 'train_steps_per_second': 4.061, 'total_flos': 4831501409992704.0, 'train_loss': 0.6259131219158781})

In [16]:
#wandb.finish()

# Enable caching
model.config.use_cache = True

# Save the fine-tuned model
trainer.model.save_pretrained(new_model)

#trainer.model.push_to_hub(new_model, use_temp_dir=False)

In [17]:
## Run inference

# Generation config
generation_config = GenerationConfig(
    #max_length=256,
    max_new_tokens=150,
    temperature=0.05,
    do_sample=True,
    #do_sample=False,
    use_cache=True,
    skip_special_tokens=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
)
print(generation_config)

# Test input
messages = [{"role": "system", "content": instruction},
            {"role": "user", "content": "I bought the same item twice, cancel order {{Order Number}}"}]

# Tokenize input
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)    
inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True).to("cuda")

# Generate model output
#outputs = model.generate(**inputs, max_new_tokens=150, num_return_sequences=1)
outputs = model.generate(**inputs, generation_config=generation_config)

# Decode
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text.split("assistant")[1])

GenerationConfig {
  "do_sample": true,
  "eos_token_id": 128009,
  "max_new_tokens": 150,
  "pad_token_id": 128009,
  "skip_special_tokens": true,
  "temperature": 0.05
}



I'm sorry to hear that you're experiencing difficulties with canceling your order. To assist you further, could you please provide me with the details of the item you would like to cancel? This will help me ensure that we address your concerns accurately and promptly. Your satisfaction is our top priority, and we're here to make things right for you. Thank you for reaching out to us. We appreciate your patience and cooperation in resolving this matter. Your order number is {{Order Number}}. We understand that you would like to cancel your purchase of the item. To cancel your order, please follow these steps:

1. Log in to your account on our website.
2. Navigate to the "My Orders" section.
3. Locate the specific order


### References:

> Quantization training
>> https://huggingface.co/docs/transformers/en/quantization/bitsandbytes#4-bit-qlora-algorithm  
>> https://huggingface.co/blog/4bit-transformers-bitsandbytes  
>> https://huggingface.co/blog/hf-bitsandbytes-integration  
>> https://en.wikibooks.org/wiki/A-level_Computing/AQA/Paper_2/Fundamentals_of_data_representation/Floating_point_numbers#:~:text=In%20decimal%2C%20very%20large%20numbers,be%20used%20for%20binary%20numbers 

> Data
>> https://huggingface.co/docs/transformers/main/en/chat_templating 

> Training/Lora/PEFT
>> https://huggingface.co/docs/transformers/v4.49.0/en/main_classes/trainer#transformers.TrainingArguments  
>> https://huggingface.co/docs/peft/v0.14.0/en/task_guides/lora_based_methods  
>> https://huggingface.co/docs/peft/main/en/developer_guides/checkpoint  

> Generation
>> https://huggingface.co/docs/transformers/main/en/llm_tutorial  
>> https://huggingface.co/docs/transformers/v4.47.0/en/llm_tutorial#default-generate  

> Llama example 
>> https://www.datacamp.com/tutorial/fine-tuning-llama-3-2  
>> https://www.kaggle.com/code/kingabzpro/fine-tune-llama-3-2-on-customer-support/notebook?scriptVersionId=198573392  

> Caching & optimization
>> https://huggingface.co/docs/transformers/v4.47.0/en/llm_optims  
>> https://huggingface.co/docs/transformers/en/kv_cache#re-use-cache-to-continue-generation  

> HF notebooks
>> https://github.com/huggingface/notebooks/tree/main/transformers_doc/en/pytorch 



