# Fine Tuning a Tiny LLM using mini-platypus Dataset

This notebook demonstrates how to fine-tune a small Large Language Model (TinyLlama 1.1B) using the `mini-platypus` dataset, a compact instruction dataset derived from the Open-Platypus collection. It is optimized for environments with limited GPU resources, using LoRA (Low-Rank Adaptation) and 4-bit quantization for efficient fine-tuning. (QLoRA)


This kind of setup can be adapted for fine-tuning larger models in production and integrating feedback loops into LLM pipelines.


In [None]:
# Install required packages
!pip install -q accelerate peft bitsandbytes transformers trl datasets

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 MB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m376.2/376.2 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m494.8/494.8 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m80.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m59.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m42.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# Imports
import os
import torch
import warnings
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline
)
from peft import LoraConfig
from trl import SFTTrainer
warnings.filterwarnings('ignore')
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
torch.cuda.empty_cache()

In [None]:
# Using a small model suitable for Kaggle's Free GPU
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
dataset_name = "mlabonne/mini-platypus"
new_model = "tinyllama-mini-platypus"

In [None]:
# LoRA and quantization settings
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False

In [None]:
# Training configuration
output_dir = "./results"
num_train_epochs = 1
fp16 = False
bf16 = False
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 25

In [None]:
# Load dataset
dataset_name = "mlabonne/mini-platypus"
dataset = load_dataset(dataset_name, split="train")
dataset = dataset.map(lambda x: {"text": x["instruction"][:256]})
dataset = dataset.select(range(10))

README.md:   0%|          | 0.00/316 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/2.25M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:
# Setup quantization config
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
   load_in_4bit=use_4bit,
   bnb_4bit_quant_type=bnb_4bit_quant_type,
   bnb_4bit_compute_dtype=compute_dtype,
   bnb_4bit_use_double_quant=use_nested_quant,
)

In [None]:
# GPU check
if compute_dtype == torch.float16 and use_4bit:
   major, _ = torch.cuda.get_device_capability()
   if major >= 8:
       print("=" * 80)
       print("Your GPU supports bfloat16: accelerate training with bf16=True")
       print("=" * 80)

In [None]:
# Load model
model = AutoModelForCausalLM.from_pretrained(
   model_name,
   quantization_config=bnb_config,

   device_map={"": 0},

)
model.config.use_cache = False
model.config.pretraining_tp = 1


config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [None]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

In [None]:
# PEFT config
peft_config = LoraConfig(
   lora_alpha=lora_alpha,
   lora_dropout=lora_dropout,
   r=lora_r,
   bias="none",
   task_type="CAUSAL_LM",
)

In [None]:
# Training arguments
training_arguments = TrainingArguments(
   output_dir=output_dir,
   num_train_epochs=num_train_epochs,
   per_device_train_batch_size=per_device_train_batch_size,
   gradient_accumulation_steps=gradient_accumulation_steps,
   optim=optim,
   save_steps=save_steps,
   logging_steps=logging_steps,
   learning_rate=learning_rate,
   weight_decay=weight_decay,
   fp16=fp16,
   bf16=bf16,
   max_grad_norm=max_grad_norm,
   max_steps=max_steps,
   warmup_ratio=warmup_ratio,
   group_by_length=group_by_length,
   lr_scheduler_type=lr_scheduler_type,
   report_to="none"
)

In [None]:
# Create trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    args=training_arguments,
)

Adding EOS to train dataset:   0%|          | 0/10 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/10 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/10 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
# Before training
prompt = "What is a large language model?"
instruction = f"### Instruction:\n{prompt}\n\n### Response:\n"

pretrain_pipe = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=128
)

pretrain_result = pretrain_pipe(instruction)
print("Before fine-tuning:", pretrain_result[0]['generated_text'][len(instruction):])

Device set to use cuda:0


Before fine-tuning: A large language model is a neural network that has been trained on a large corpus of text data. It is capable of generating human-like text based on the training data.


In [None]:
# Train model
trainer.train()

Step,Training Loss


TrainOutput(global_step=10, training_loss=2.2709699630737306, metrics={'train_runtime': 3.1971, 'train_samples_per_second': 3.128, 'train_steps_per_second': 3.128, 'total_flos': 5578677080064.0, 'train_loss': 2.2709699630737306})

In [None]:
# Save model
trainer.model.save_pretrained(new_model)

In [None]:
# After training
prompt = "What is a large language model?"
instruction = f"### Instruction:\n{prompt}\n\n### Response:\n"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=128)
result = pipe(instruction)
print(result[0]['generated_text'][len(instruction):])

Device set to use cuda:0


A large language model is a machine learning model designed to generate large amounts of text, such as news articles, reviews, or chatbot responses. They are often used in natural language processing (NLP) and information extraction tasks. Large language models typically use deep neural networks, which can be trained on large volumes of text data to generate accurate and relevant results.


In [None]:
from IPython.display import display, Markdown
import pandas as pd

def clean(text, max_len=300):
    return text.replace("|", "¦").replace("\n", " ").strip()[:max_len] + "..."

prompts = [
    "What is a large language model?",
    "Explain what machine learning is.",
    "What's the capital of Germany? What is there to see in the capital of Germany?",
    "How do airplanes fly?",
    "Explain photosynthesis in a simple way",
    "Describe a sustainable city of the future, including transport, energy, and social systems.",
    "Generate a short dialog between a doctor and a patient concerned about climate change."
]

results = []
for prompt in prompts:
    instruction = f"### Instruction:\n{prompt}\n\n### Response:\n"
    before = pretrain_pipe(instruction, max_length=10000)[0]['generated_text'][len(instruction):].strip()
    after = pipe(instruction, max_length=10000)[0]['generated_text'][len(instruction):].strip()
    results.append({
        "Prompt": prompt,
        "Before Fine-Tuning": clean(before),
        "After Fine-Tuning": clean(after)
    })

df = pd.DataFrame(results)


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


In [None]:
# Display the comparison table
def make_comparison_view(df_subset):
    md = "## Prompt vs Output Comparison\n\n\n"
    for row in df_subset.itertuples():
        md += f"<details>\n<summary><strong>{row.Prompt}</strong></summary>\n\n"
        md += f"**Before Fine-Tuning:**\n\n```\n{row._2.strip()}\n```\n\n"
        md += f"**After Fine-Tuning:**\n\n```\n{row._3.strip()}\n```\n"
        md += "</details>\n\n"
    return md

display(Markdown(make_comparison_view(df)))

## Prompt vs Output Comparison


<details>
<summary><strong>What is a large language model?</strong></summary>

**Before Fine-Tuning:**

```
A large language model is a model that can generate large amounts of text or speech. It is a model that has been pre-trained on a large dataset of text or speech, usually a corpus of text or audio data. The larger the dataset, the more accurate and powerful the language model. Large language models ...
```

**After Fine-Tuning:**

```
A large language model is an artificial neural network that can process and generate large amounts of text. It is commonly used in natural language processing (NLP) and machine learning applications, such as sentiment analysis, text classification, and question answering. Large language models are t...
```
</details>

<details>
<summary><strong>Explain what machine learning is.</strong></summary>

**Before Fine-Tuning:**

```
Machine learning is a field of study that involves the development of algorithms that can learn from data without being explicitly programmed. It involves collecting large volumes of data, training a model on the data, and then making predictions based on the training data. Machine learning algorith...
```

**After Fine-Tuning:**

```
Machine learning is the branch of computer science that deals with the study of algorithms that can learn from data, allowing machines to perform tasks that require previous knowledge or experience. It is a subset of artificial intelligence, which is the study of how machines can reason, learn, and ...
```
</details>

<details>
<summary><strong>What's the capital of Germany? What is there to see in the capital of Germany?</strong></summary>

**Before Fine-Tuning:**

```
The capital of Germany is Berlin, and there are many things to see and do there. Some popular attractions include the Berlin Wall Memorial, the Brandenburg Gate, the Holocaust Memorial, and the Berlin Wall Museum.   If I had never visited the capital of Germany, I would be missing out on a lot of in...
```

**After Fine-Tuning:**

```
Germany has a capital city, Berlin, which is the capital of the country. The capital city of Germany is Berlin. Berlin is a vibrant, energetic city with a lot to see and do. The city is famous for its art, music, and culture. It also has a lot of green spaces, including the Brandenburg Gate park, an...
```
</details>

<details>
<summary><strong>How do airplanes fly?</strong></summary>

**Before Fine-Tuning:**

```
Airplanes fly by using the force of an airplane's propellers and jet engines. The propellers turn the airplane's wings, which moves the airplane forward, while the jet engines provide the power and thrust needed to lift the airplane off the ground. The combination of these two forces allows the airp...
```

**After Fine-Tuning:**

```
Airplanes fly by using an airfoil, which is a curved surface with different shapes and thicknesses depending on the size and shape of the plane. The airfoil is designed to control the airflow around the plane, which makes it possible for planes to fly at high altitudes. The airfoil helps to direct t...
```
</details>

<details>
<summary><strong>Explain photosynthesis in a simple way</strong></summary>

**Before Fine-Tuning:**

```
Photosynthesis is a process whereby plants, algae, and some bacteria convert light energy into chemical energy. It is an essential process in the life cycle of plants, as it allows them to create food from their food sources.  Photosynthesis is a two-step process, where light energy is absorbed by c...
```

**After Fine-Tuning:**

```
Photosynthesis is a process where plants, algae, and some bacteria use light energy to produce sugar and oxygen. The process involves four steps:  1. Carbon dioxide - Carbon dioxide is taken in from the air and combined with water to form carbon dioxide gas. 2. Water - Water is used to split into ox...
```
</details>

<details>
<summary><strong>Describe a sustainable city of the future, including transport, energy, and social systems.</strong></summary>

**Before Fine-Tuning:**

```
The sustainable city of the future would be a thriving metropolis that utilizes renewable energy sources, has a strong commitment to cycling and walking, and a comprehensive system of public transit that is both efficient and sustainable. The city would also have an extensive network of bike lanes, ...
```

**After Fine-Tuning:**

```
The sustainable city of the future will be an amalgamation of various elements, including transport, energy, and social systems, to ensure that people can live, work, and travel efficiently while minimizing environmental impact. The transport system will be powered by renewable energy sources, such ...
```
</details>

<details>
<summary><strong>Generate a short dialog between a doctor and a patient concerned about climate change.</strong></summary>

**Before Fine-Tuning:**

```
Doctor: Hi, how are you doing today? Patient: I'm feeling pretty good, thanks. Just wondering if I should be concerned about climate change. Doctor: Absolutely. Climate change is one of the biggest threats to our planet's health. It's already causing some serious problems, like rising sea levels, ex...
```

**After Fine-Tuning:**

```
Doctor: Hi there, how can I help you with this matter?  Patient: I'm not sure what you're talking about. Climate change is a hot topic right now, and I'm just concerned about the effects it could have on my health.  Doctor: I understand your concern, but I'd like you to know that climate change is n...
```
</details>

