<a href="https://colab.research.google.com/github/harikris001/FineTuning-LLMs/blob/main/Finetuning_Tinyllama_1B_LLM(openai_gsm8k).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# INSTALLING DEPENDENCIES

In [1]:
!pip install -U bitsandbytes -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [26]:
!pip install wandb -q

# IMPORTING PACKAGES

In [1]:
import torch

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType

# MODEL IMPORTING AND CONFIGURATION

The model is available on this [Hugging Face Card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0).
we are using a 4bit quantized model for better and faster training.

Techniques used for optimization:
- 4 bit quant model
- LoRA - low rank adaptation.
- Qlora - Quantized LoRA.


In [29]:
model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code = True,
)

In [30]:
Lora_config = LoraConfig(
    r = 4,
    lora_alpha = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout = 0.05,
    bias = "none",
    task_type = TaskType.CAUSAL_LM
)

lora_model = get_peft_model(model, Lora_config)

In [31]:
data = load_dataset('openai/gsm8k', 'main', split='train[:200]')

The tokenization method is different for each model. Refer the hugging face model to understand the format.

In [32]:
def tokenize(batch):
  texts = [
      f"### Instruction:\n{question}\n ### Response:\n{answer}"
      for question, answer in zip(batch['question'], batch['answer'])
  ]
  inputs = tokenizer(texts, max_length=256, return_tensors="pt", padding='max_length', truncation=True)
  inputs['labels'] = inputs['input_ids'].clone()
  return inputs

In [33]:
tokenised_data = data.map(tokenize, batched=True, batch_size=100, remove_columns=data.column_names)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [34]:
training_args = TrainingArguments(
    output_dir="./tinyLlama-results",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=1e-3,
    fp16=True,
    num_train_epochs=50,
    logging_steps=25,
    save_strategy='epoch',
    report_to=None,
    remove_unused_columns=False,
    label_names=['labels'],
)

In [35]:
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenised_data,
    processing_class=tokenizer,
)

To disable the automatic reporting to Wandb run the folowing script. You can skip this if you want to wandb reports.

In [36]:
import wandb
wandb.init(mode='disabled') # Optional

In [37]:
trainer.train()

Step,Training Loss
25,1.482
50,0.6881
75,0.5226
100,0.3648
125,0.2454
150,0.1613
175,0.1064
200,0.0743
225,0.0563
250,0.0422


TrainOutput(global_step=650, training_loss=0.16001124693797183, metrics={'train_runtime': 1188.3381, 'train_samples_per_second': 8.415, 'train_steps_per_second': 0.547, 'total_flos': 1.590741172224e+16, 'train_loss': 0.16001124693797183, 'epoch': 50.0})

In [38]:
lora_model.save_pretrained('./tinyLlama-tuned-math')
tokenizer.save_pretrained('./tinyLlama-tuned-math')

('./tinyLlama-tuned-math/tokenizer_config.json',
 './tinyLlama-tuned-math/special_tokens_map.json',
 './tinyLlama-tuned-math/chat_template.jinja',
 './tinyLlama-tuned-math/tokenizer.model',
 './tinyLlama-tuned-math/added_tokens.json',
 './tinyLlama-tuned-math/tokenizer.json')

In [39]:
from torch.utils.data import DataLoader
from transformers import default_data_collator


from peft import PeftModelForCausalLM
import numpy as np

In [40]:
model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'
adapter_path = './tinyLlama-tuned-math'

In [41]:
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Loading the finetuned model for evaluation.
tuned_model = PeftModelForCausalLM.from_pretrained(
    base_model,
    adapter_path,
    device_map="auto",
    trust_remote_code=True,
).eval()

In [62]:
eval_data = load_dataset('openai/gsm8k', 'main', split='train[200:400]')
eval_ds = eval_data.map(tokenize, batched=True, remove_columns=data.column_names)
eval_ds = eval_ds.with_format('torch')

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [63]:
eval_dataloader = DataLoader(eval_ds, batch_size=8, collate_fn=default_data_collator)

In [64]:
@torch.no_grad()
def compute_perplexity(model):
    losses = []

    for batch in eval_dataloader:
      batch = {k:v.to('cuda') for k,v in batch.items()}
      loss = model(**batch).loss
      losses.append(loss.item())

    return np.exp(np.mean(losses))

**Perplexity is exactly a measure of how confused the model when predicting the next token. It tells whether the model has learned from the training**


_Untrained model has an average perplexity of 200 - 400_

In [65]:
print(f"Perplexity of Model: {compute_perplexity(tuned_model)}")

Perplexity of Model: 8.934295206610912


In [66]:
def generate_response(model, prompt):
  token_ids = tokenizer(f"### Instruction: \n{prompt}\n### Response: \n", return_tensors='pt').input_ids.to('cuda')

  with torch.no_grad():
    output_ids = model.generate(
        token_ids,
        max_new_tokens=256
      )
  return tokenizer.decode(output_ids[0], skip_special_tokens=True)

In [68]:
eval_data[0]

{'question': 'Sansa is a famous artist, she can draw a portrait and sell it according to its size. She sells an 8-inch portrait for $5, and a 16-inch portrait for twice the price of the 8-inch portrait. If she sells three 8-inch portraits and five 16-inch portraits per day, how many does she earns every 3 days?',
 'answer': 'Sansa earns $5 x 3 = $<<5*3=15>>15 every day by selling three 8-inch portraits.\nThe price of the 16-inch portrait is $5 x 2 = $<<5*2=10>>10 each.\nSo, she earns $10 x 5 = $<<10*5=50>>50 every day by selling five 16-inch portraits.\nHer total earnings is $50 + $15 = $<<50+15=65>>65 every day.\nTherefore, the total amount she earns after 3 days is $65 x 3 = $<<65*3=195>>195.\n#### 195'}

In [70]:
print(generate_response(tuned_model, eval_data[0]['question']))

### Instruction: 
Sansa is a famous artist, she can draw a portrait and sell it according to its size. She sells an 8-inch portrait for $5, and a 16-inch portrait for twice the price of the 8-inch portrait. If she sells three 8-inch portraits and five 16-inch portraits per day, how many does she earns every 3 days?
### Response: 
Sansa draws an 8-inch portrait at a price of $5 per portrait, so she earns an 8-inch portrait 1$ exercise.
Sansa draws an 16-inch portrait at a price of $2*$16= $<<2*16=32>>32 each day, so she earns an 8-inch portrait 16$ a day.
Thus, she earns a 8+16= $<<8+16=24>>24 16-inch portraits a day.
She sells three 8-inch portraits per day, so she sells a total of three 8-inch portraits every 3 days.
#### 3
