<a href="https://colab.research.google.com/github/humayun-mhk/Huggingface-Transformers-Guide/blob/main/PEFT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Install Needed Libraries**

In [25]:
!pip install -q transformers datasets peft accelerate bitsandbytes


In [26]:
!!pip install -U bitsandbytes
!pip install -U transformers accelerate peft




# **Load a Model in 4-bit (QLoRA)**

In [3]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype="float16"
)

model_name = "meta-llama/Llama-3.1-8B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

# **Add LoRA to the Model**

In [5]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],  # attention matrices
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()


trainable params: 3,407,872 || all params: 8,033,669,120 || trainable%: 0.0424


# **Load a Dataset**

In [14]:
from datasets import load_dataset

# Load IMDB dataset
ds = load_dataset("imdb")

# Take only 100 samples for quick testing
train_data = ds["train"].select(range(100))
test_data = ds["test"].select(range(20))



README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

plain_text/test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

plain_text/unsupervised-00000-of-00001.p(…):   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

# **Training the Model (Full QLoRA Training)**

In [27]:
# Format for causal LM

def format(example):
    prompt = example["text"]
    # We'll just treat the review as the instruction
    text = f"<s>Instruction: {prompt}\nAnswer:</s>"
    return {"text": text}

train_data = train_data.map(format)
test_data = test_data.map(format)


# Tokenize


def tokenize(batch):
    tokenized = tokenizer(batch["text"], padding="max_length", truncation=True, max_length=128)
    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

tokenized_train = train_data.map(tokenize, batched=True, remove_columns=train_data.column_names)
tokenized_test = test_data.map(tokenize, batched=True, remove_columns=test_data.column_names)



# Data collator


from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # causal LM
)

# Training

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

In [20]:


from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="lora-llama-test",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=1,  # small batch
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="no"  # skip saving checkpoints to save time
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    data_collator=data_collator
)

trainer.train()


Step,Training Loss
10,2.686
20,2.5296
30,2.3822
40,2.3069
50,2.4424


TrainOutput(global_step=50, training_loss=2.469395179748535, metrics={'train_runtime': 41.8039, 'train_samples_per_second': 2.392, 'train_steps_per_second': 1.196, 'total_flos': 576639939379200.0, 'train_loss': 2.469395179748535, 'epoch': 1.0})

# **Save Only LoRA Weights**

In [21]:
model.save_pretrained("lora-llama-test")


# **Inference with the Fine-Tuned LoRA Model**

In [28]:
from peft import PeftModel
from transformers import AutoModelForCausalLM

# Load base + LoRA adapters
base_model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
ft_model = PeftModel.from_pretrained(base_model, "lora-llama-test")

# Example prompts related to movie reviews
prompts = [
    "Write a short positive review for a movie about a superhero.",
    "Write a short negative review for a boring action movie.",
    "Summarize the movie review in one sentence."
]

for input_text in prompts:
    inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
    output = ft_model.generate(**inputs, max_new_tokens=100)
    print("Prompt:", input_text)
    print("Output:", tokenizer.decode(output[0], skip_special_tokens=True))
    print("="*50)


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Prompt: Write a short positive review for a movie about a superhero.
Output: Write a short positive review for a movie about a superhero. Write about the character, the plot, the action, the music, the costumes, and the special effects. You can also write about the audience's reaction to the movie.
Write a short positive review for a movie about a superhero.
Write about the character, the plot, the action, the music, the costumes, and the special effects. You can also write about the audience's reaction to the movie.
Write a short positive review for a movie about a superhero. Write about the character, the


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Prompt: Write a short negative review for a boring action movie.
Output: Write a short negative review for a boring action movie. This is a typical movie with a lot of action, but very little substance. The movie is too long, and the plot is too thin to sustain the running time. The characters are one-dimensional, and the dialogue is predictable. The action scenes are well-choreographed, but they are not particularly exciting or memorable. The movie is not bad, but it is not worth watching.
Prompt: Summarize the movie review in one sentence.
Output: Summarize the movie review in one sentence. This is a review of the movie "The Blind Side." It is a good movie, and Sandra Bullock was great in it.
What is the main theme of the movie? The main theme of the movie is that people should be judged by their actions and not by their race, religion, or other external characteristics.
What is the movie about? The movie is about a poor black teenager who is taken in by a wealthy white family and gi