## Here's a step-by-step guide to fine-tuning an LLM

In [3]:
!pip install -q transformers accelerate peft bitsandbytes datasets

In [13]:
import torch
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    TrainingArguments, Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

## Load Model with Quantization (QLoRA)

In [14]:
# 1. Model & Tokenizer
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)


## Prepare Dataset

In [15]:
# 2. Dataset Preparation
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

def tokenize_function(examples):
    tokenized = tokenizer(
        examples["text"],
        truncation=True,
        max_length=128,  # Reduced for Colab memory
        padding="max_length"
    )
    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])


Map:   0%|          | 0/4358 [00:00<?, ? examples/s]

Map:   0%|          | 0/36718 [00:00<?, ? examples/s]

Map:   0%|          | 0/3760 [00:00<?, ? examples/s]

## Configure PEFT (Parameter-Efficient Fine-Tuning)

In [16]:
# 3. PEFT Configuration
peft_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["c_attn", "c_proj"],  # DistilGPT-2 specific
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 405,504 || all params: 82,318,080 || trainable%: 0.4926


## Training Setup

In [17]:
# 4. Training Setup
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,  # Start with 1 epoch for testing
    per_device_train_batch_size=2,  # Reduced for stability
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=5,
    optim="paged_adamw_8bit",
    report_to="none",
    save_strategy="no",
    label_names=["input_ids", "attention_mask", "labels"]  # Critical fix
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    data_collator=data_collator
)

## Start Training

In [18]:
trainer.train()

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss
5,4.5813
10,4.4691
15,4.2299
20,4.4174
25,4.3051
30,4.3801
35,4.2839
40,4.3317
45,4.2217
50,4.3236


TrainOutput(global_step=2295, training_loss=3.877235587126289, metrics={'train_runtime': 836.1224, 'train_samples_per_second': 43.915, 'train_steps_per_second': 2.745, 'total_flos': 1210721740259328.0, 'train_loss': 3.877235587126289, 'epoch': 1.0})

##  Save & Test

In [19]:
# 6. Save and Test
model = model.merge_and_unload()
model.save_pretrained("fine_tuned_model")
tokenizer.save_pretrained("fine_tuned_model")

# Test inference
input_text = "The future of AI is"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The future of AI is a question of the future of AI and the future of AI is a question of the future of AI is a question of the future of AI is a question of the future of AI is a question of the future of AI is


In [20]:
output = model.generate(
    **inputs,
    max_length=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [21]:
print(output)

tensor([[  464,  2003,   286,  9552,   318,   852,  3177,   290,   262,  2478,
           286,  9552,   318,   852,   925,   416,   867,   661,   290,   262,
          2003,   286,  9552,   318,   852,  3114,   656,   287,   262,  2003,
            13,   314,   716,   257, 11444,   351,   257,  7506,   329,  9552,
           290,   314,   716,   407,   257,  9379,   475,   314,   716,   281,
          9552,   290,   314,   716,   407,   257,  9379,   475,   314,   716,
           257,  9379,   475,   314,   716,   257,  9379,   475,   314,   716,
           257,  9379,   475,   314,   716,   257,  9379,   475,   314,   716,
           257,  9379,   314,   716,   257,  9379,   475,   314,   716,   257,
          9379,   475,   314,   716,   257,  9379,   475,   314,   716,   257]],
       device='cuda:0')


In [22]:
print(tokenizer.decode(outputs[0]))

The future of AI is a question of the future of AI and the future of AI is a question of the future of AI is a question of the future of AI is a question of the future of AI is a question of the future of AI is


In [27]:
input_text = "what AI is ?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs,
    max_length=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(output[0]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


what AI is ? This is the first time in a long history that it has been used in a field of biology that has been shown to be more than a few years ago. I am not sure what that is, but I am sure there is still more than one in the world of artificial intelligence (AI). The idea of "intelligence" is the result of an experiment which had been proposed by the Nobel Prize winner and a Nobel Prize winner who had been invited to study the implications of artificial intelligence


In [31]:
input_text = "what quantum computing do?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs,
    max_length=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(output[0]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


what quantum computing do? In the 1990s researchers at the University of Michigan at Michigan were able to build a computer that could be able to analyze the quantum information stored in the quantum computer in a way that could be used to calculate the information in the quantum computer with the quantum information stored in the quantum computer and how it was made to use it for the research in the field of quantum computing. This is one of the few ways that quantum computers could be used to calculate the information stored in the quantum


In [33]:
input_text = "what is Large Language Model?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs,
    max_length=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(output[0]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


what is Large Language Model?

The biggest problem for a large language model is the fact that the only possible candidate is a very small language that has a very small number of unique features that have only a few unique features that are possible in the language that are not the most unique features in the language that are not the most unique features that are the most unique features that are the most unique features that are the most unique features that are the most unique features that are the most unique features that are
