<a href="https://colab.research.google.com/github/gupta24789/fine-tuning-llms/blob/main/tinyLlama/01_fine_tune_sft_qlora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# !pip install -q accelerate peft bitsandbytes transformers trl sentencepiece

In [4]:
## Library
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

#### Prepare Data

In [7]:
dataset = load_dataset("HuggingFaceH4/ultrachat_200k", split = "test_sft")
dataset = dataset.shuffle(seed=42).select(range(2000))
dataset

Dataset({
    features: ['prompt', 'prompt_id', 'messages'],
    num_rows: 2000
})

In [8]:
template_tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
print(template_tokenizer.chat_template)

{% for message in messages %}
{% if message['role'] == 'user' %}
{{ '<|user|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'system' %}
{{ '<|system|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>
'  + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %}


In [9]:
def process_chats(example):
  chats = example['messages']
  prompt = template_tokenizer.apply_chat_template(chats, tokenize=False, add_generation_prompt=False)
  return {"text": prompt}


dataset = dataset.map(process_chats, batch_size = 128)

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [12]:
print(dataset['text'][3])

<|user|>
Can you summarize the activities and attractions available in both Conway and Myrtle Beach, South Carolina for summer travelers?
Generate according to: Are you traveling towards the beach this summer? Are you looking for some fun things to do and places to go? Well look no further because you live in the right place: Conway and Myrtle Beach, South Carolina.
Conway has a beautiful riverfront and a museum on Main Street. There are multiple things that you can do in a group or just by yourself. Here on the coast we have some of the most beautiful places. Inside of Conway, we have a recreational center that is available to all people that are want to come and be apart of the summer camps that they offer for kids. Also, within this recreational center we have a community pool, and down the road there an outside pool owned by the recreation center.
In Myrtle beach you can visit Broadway at the Beach, Family Kingdom Amusement Park, take a walk on the boardwalk, take a trip to Myrtle 

#### Load Quantized Base Model & Tokenizer

In [16]:
## Tokenizer
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T")
tokenizer.pad_token = "<PAD>"
tokenizer.padding_side = "left"
tokenizer.chat_template = template_tokenizer.chat_template

## Load Quantized model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T",
    quantization_config=bnb_config,
    device_map = "auto"
)

model.config.use_cache = False

#### Prepare model for PEFT traning

In [17]:
peft_config = LoraConfig(
  lora_alpha = 32,
  lora_dropout = 0.1,
  r = 64,
  bias = "none",
  task_type = "CAUSAL_LM",
  target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]

)

model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
model = get_peft_model(model, peft_config)

#### Training

In [19]:
training_args = SFTConfig(
    output_dir = "checkpoints",
    dataset_text_field = "text",
    max_seq_length = 512,
    per_device_train_batch_size = 2,
    max_steps = 100,
    logging_steps = 10,
    gradient_accumulation_steps = 4,
    optim = "paged_adamw_32bit",
    learning_rate = 2e-4,
    lr_scheduler_type = "cosine",
    num_train_epochs = 1,
    fp16 = True,
    report_to = "none",
    gradient_checkpointing = True
)

trainer = SFTTrainer(
    model = model,
    args = training_args,
    train_dataset = dataset,
    tokenizer = tokenizer,
    peft_config = peft_config
)

## Training
trainer.train()

## Save model
trainer.model.save_pretrained("fine-tuned-tinyLlama")

  trainer = SFTTrainer(


Map:   0%|          | 0/2000 [00:00<?, ? examples/s]



Step,Training Loss
10,1.6449
20,1.4162
30,1.4357
40,1.4407
50,1.4569
60,1.4423
70,1.3486
80,1.4189
90,1.4312
100,1.4999


#### Merge Adapter

In [20]:
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(
    "fine-tuned-tinyLlama",
    low_cpu_mem_usage=True,
    device_map="auto",
)

# Merge LoRA and base model
merged_model = model.merge_and_unload()

#### Inference

In [22]:
from transformers import pipeline

# Use our predefined prompt template
prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

# Run our instruction-tuned model
pipe = pipeline(task="text-generation", model=merged_model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])

Device set to use cuda:0


<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
Large Language Models (LLMs) are a type of artificial intelligence (AI) that can generate human-like language. They are trained on large amounts of text data, and they can be used to generate text in a variety of contexts, such as chatbots, machine translation, and natural language processing (NLP).

LLMs are built on the concept of recurrent neural networks (RNNs), which are a type of neural network that can process sequential data. RNNs are often used in NLP applications because they can process large amounts of text data quickly and efficiently.

One of the most important features of LLMs is their ability to generate human-like language. LLMs can generate text that is grammatically correct, has a natural flow, and sounds like a human speaking. This makes them ideal for use in chatbots, where they can respond to users' questions and provide answers in a conversational manner.

LLMs can also be used for machine 