<a href="https://colab.research.google.com/github/fatemafaria142/Instructions-Tuning-Across-Various-LLMs-with-Alpaca-Dataset/blob/main/Instructions_Tuning_using_TinyLlama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install accelerate peft bitsandbytes transformers trl datasets

Collecting accelerate
  Downloading accelerate-0.26.1-py3-none-any.whl (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft
  Downloading peft-0.7.1-py3-none-any.whl (168 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.42.0-py3-none-any.whl (105.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
Collecting trl
  Downloading trl-0.7.9-py3-none-any.whl (141 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.1/141.1 kB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.16.1-py3-none-any.whl (507 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m52.3 MB/s[0m eta [36m0:00:0

# **Load the required packages**

In [2]:
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer
import os

### **Dataset Link:** https://huggingface.co/datasets/tatsu-lab/alpaca?row=0

In [3]:
dataset="tatsu-lab/alpaca"
model_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0"
output_model="tinyllama-instructions-v1"

# **Dataset preparation**

In [21]:
def prepare_train_data(data_id, num_samples=1000):
    data = load_dataset(data_id, split="train")

    # Take the first 1000 rows
    data = data.select(range(min(num_samples, len(data))))

    data_df = data.to_pandas()
    data_df["text"] = data_df[["instruction", "input", "output"]].apply(lambda x: "user\n" + x["instruction"] + "  \n input\n" + x["instruction"] + "\nassistant\n" + x["output"] + "\n", axis=1)
    data = Dataset.from_pandas(data_df)

    return data



In [30]:
def prepare_train_data(data_id, num_samples=1000):
    data = load_dataset(data_id, split="train")

    # Take the first 1000 rows
    data = data.select(range(min(num_samples, len(data))))

    data_df = data.to_pandas()
    template = "Below is an instruction that describes a task. Write a response that appropriately completes the request."

    data_df["text"] = data_df[["instruction", "input", "output"]].apply(lambda x: f"user\n{template}\n input\n{x['instruction']}\nassistant\n{x['output']}\n", axis=1)
    data = Dataset.from_pandas(data_df)

    return data


In [31]:
data = prepare_train_data(dataset)

In [32]:
data

Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 1000
})

In [33]:
data[0]

{'instruction': 'Give three tips for staying healthy.',
 'input': '',
 'output': '1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 'text': 'user\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n input\nGive three tips for staying healthy.\nassistant\n1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.\n'}

In [34]:
data[1]

{'instruction': 'What are the three primary colors?',
 'input': '',
 'output': 'The three primary colors are red, blue, and yellow.',
 'text': 'user\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n input\nWhat are the three primary colors?\nassistant\nThe three primary colors are red, blue, and yellow.\n'}

In [35]:
data[2]

{'instruction': 'Describe the structure of an atom.',
 'input': '',
 'output': 'An atom is made up of a nucleus, which contains protons and neutrons, surrounded by electrons that travel in orbits around the nucleus. The protons and neutrons have a positive charge, while the electrons have a negative charge, resulting in an overall neutral atom. The number of each particle determines the atomic number and the type of atom.',
 'text': 'user\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n input\nDescribe the structure of an atom.\nassistant\nAn atom is made up of a nucleus, which contains protons and neutrons, surrounded by electrons that travel in orbits around the nucleus. The protons and neutrons have a positive charge, while the electrons have a negative charge, resulting in an overall neutral atom. The number of each particle determines the atomic number and the type of atom.\n'}

## **We have to model the Model (not the base version)**

In [36]:
def get_model_and_tokenizer(mode_id):

    tokenizer = AutoTokenizer.from_pretrained(mode_id)
    tokenizer.pad_token = tokenizer.eos_token
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True
    )
    model = AutoModelForCausalLM.from_pretrained(
        mode_id, quantization_config=bnb_config, device_map="auto"
    )
    model.config.use_cache=False
    model.config.pretraining_tp=1
    return model, tokenizer

In [37]:
model, tokenizer = get_model_and_tokenizer(model_id)

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

# **Setting up the LoRA**

In [38]:
peft_config = LoraConfig(
        r=8, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
    )

In [45]:
training_arguments = TrainingArguments(
        output_dir=output_model,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        optim="paged_adamw_32bit",
        learning_rate=2e-4,
        lr_scheduler_type="cosine",
        save_strategy="epoch",
        logging_steps=10,
        num_train_epochs=3,
        max_steps=250,
        fp16=True,
        # push_to_hub=True
    )

In [46]:
trainer = SFTTrainer(
        model=model,
        train_dataset=data,
        peft_config=peft_config,
        dataset_text_field="text",
        args=training_arguments,
        tokenizer=tokenizer,
        packing=False,
        max_seq_length=1024
    )

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [47]:
trainer.train()

Step,Training Loss
10,2.5667
20,1.9626
30,1.5285
40,1.3096
50,1.1975
60,1.2216
70,1.1377
80,1.1023
90,1.1081
100,1.0748


TrainOutput(global_step=250, training_loss=1.191483455657959, metrics={'train_runtime': 225.2734, 'train_samples_per_second': 4.439, 'train_steps_per_second': 1.11, 'total_flos': 646897267482624.0, 'train_loss': 1.191483455657959, 'epoch': 1.0})

# **Merging the LoRA with the base model**

In [48]:
from peft import AutoPeftModelForCausalLM, PeftModel
from transformers import AutoModelForCausalLM
import torch
import os

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, load_in_8bit=False,
                                             device_map="auto",
                                             trust_remote_code=True)

model_path = "/content/tinyllama-instructions-v1/checkpoint-250"

peft_model = PeftModel.from_pretrained(model, model_path, from_transformers=True, device_map="auto")

model = peft_model.merge_and_unload()

In [49]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (up_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (down_proj): Linear(in_features=5632, out_features=2048, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head)

# **Inference from the LLM for one examples**

In [50]:
def formatted_prompt(template, instruction)-> str:
    return f"<|im_start|>user\n{template}<|im_end|>\n<|im_start|>input\n{instruction}<|im_end|>\n<|im_start|>assistant:"

In [56]:
from transformers import GenerationConfig
from time import perf_counter

def generate_response(instruction, user_input):
    prompt = formatted_prompt(instruction, user_input)

    inputs = tokenizer([prompt], return_tensors="pt")
    generation_config = GenerationConfig(
        penalty_alpha=0.6,
        do_sample=True,
        top_k=5,
        temperature=0.5,
        repetition_penalty=1.2,
        max_new_tokens=120,
        pad_token_id=tokenizer.eos_token_id
    )
    start_time = perf_counter()

    inputs = tokenizer(prompt, return_tensors="pt").to('cuda')

    outputs = model.generate(**inputs, generation_config=generation_config)
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()  # Remove leading and trailing whitespaces
    print(decoded_output)
    output_time = perf_counter() - start_time
    print(f"Time taken for inference: {round(output_time, 2)} seconds")

In [57]:
instruction = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.'
user_input = 'What does DNA stand for?'
generate_response(instruction=instruction, user_input=user_input)

<|im_start|>user
Below is an instruction that describes a task. Write a response that appropriately completes the request.<|im_end|>
<|im_start|>input
What does DNA stand for?<|im_end|>
<|im_start|>assistant: DNA stands for Deoxyribonucleic Acid, which is a molecule made up of four nucleotides (A, T, G and C). It contains genetic information and carries instructions to cells in our bodies.

DNA is also known as the blueprint of life because it provides a template for making proteins and other cellular components. It plays a critical role in regulating gene expression and is essential for the development of all living things including humans.
Time taken for inference: 3.26 seconds


In [58]:
instruction = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.'
user_input = 'Variable x is defined as “4x + 2y = 10”. Find the value of x.'
generate_response(instruction=instruction, user_input=user_input)

<|im_start|>user
Below is an instruction that describes a task. Write a response that appropriately completes the request.<|im_end|>
<|im_start|>input
Variable x is defined as “4x + 2y = 10”. Find the value of x.<|im_end|>
<|im_start|>assistant: The value of x is equal to 8.

The expression (4x+2y) = 10 represents multiplication and division, so we can write it as (4 * y)/(x + 2). Since 4 * y is equivalent to 4*(-y), we get (-y)(-y) = -2y^2 = 5/6, which simplifies to y=3/2. So the answer is 3/2.

Hope this helps! If you have any more questions, feel free to ask.
Time taken for inference: 6.16 seconds


# **Inference from the LLM for list of examples**

In [None]:
def formatted_prompt(template, instruction)-> str:
    return f"<|im_start|>user\n{template}<|im_end|>\n<|im_start|>input\n{instruction}<|im_end|>\n<|im_start|>assistant:"

In [None]:
from transformers import GenerationConfig
from time import perf_counter

def generate_response(instruction, user_input):
    prompt = formatted_prompt(instruction, user_input)

    inputs = tokenizer([prompt], return_tensors="pt")
    generation_config = GenerationConfig(
        penalty_alpha=0.6,
        do_sample=True,
        top_k=5,
        temperature=0.5,
        repetition_penalty=1.2,
        max_new_tokens=120,
        pad_token_id=tokenizer.eos_token_id
    )
    start_time = perf_counter()

    inputs = tokenizer(prompt, return_tensors="pt").to('cuda')

    outputs = model.generate(**inputs, generation_config=generation_config)
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()  # Remove leading and trailing whitespaces
    print(decoded_output)
    output_time = perf_counter() - start_time
    print(f"Time taken for inference: {round(output_time, 2)} seconds")

In [61]:
instruction = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.'
# Example usage
user_inputs = [
    'What is the force on a 1 kg mass due to the gravitational force?',
    'Generate a list of methods to reduce food waste.',
    'Generate a list of country names in the western hemisphere.',
]

responses = []

for user_input in user_inputs:
    response = generate_response(instruction=instruction, user_input=user_input)
    responses.append(response)

# Print or use the responses as needed
for response in responses:
    print(response)



<|im_start|>user
Below is an instruction that describes a task. Write a response that appropriately completes the request.<|im_end|>
<|im_start|>input
What is the force on a 1 kg mass due to the gravitational force?<|im_end|>
<|im_start|>assistant: The force of gravity acting upon a 1 kg body has magnitude of -980 N and its direction is downwards (i.e., towards Earth's center). This means there will be a force of approximately -980 N per unit weight, which is equal to -98 N/kg = -98 N or -98 N.
Time taken for inference: 4.98 seconds
<|im_start|>user
Below is an instruction that describes a task. Write a response that appropriately completes the request.<|im_end|>
<|im_start|>input
Generate a list of methods to reduce food waste.<|im_end|>
<|im_start|>assistant: 1) Buy in bulk and freeze or can items for later use. 2) Plan meals ahead, so you know what ingredients are needed. 3) Cook small batches when possible, as it reduces wastage. 4) Use leftovers to make new dishes. 5) Don't overbu