# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
from datasets import load_dataset

# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("sms_spam", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

splits = ["train", "test"]


In [2]:
dataset["test"][0]

{'sms': 'Yup... Hey then one day on fri we can ask miwa and jiayin take leave go karaoke \n',
 'label': 0}

In [3]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Let's use a lambda function to tokenize all the examples
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["sms"], padding="max_length", truncation=True,max_length=512), batched=True
    )

tokenized_dataset["train"][0]

{'sms': 'Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE. KEEP UR SAME NUMBER, Get extra free mins/texts. Text YES for a call\n',
 'label': 1,
 'input_ids': [101,
  2018,
  2115,
  4684,
  2184,
  11047,
  7898,
  1029,
  10651,
  2000,
  1996,
  6745,
  4950,
  1013,
  2678,
  11640,
  2005,
  2489,
  1012,
  2562,
  24471,
  2168,
  2193,
  1010,
  2131,
  4469,
  2489,
  8117,
  2015,
  1013,
  6981,
  1012,
  3793,
  2748,
  2005,
  1037,
  2655,
  102,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
 

In [4]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)

model

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [5]:
tokenized_dataset

{'train': Dataset({
     features: ['sms', 'label', 'input_ids', 'attention_mask'],
     num_rows: 4459
 }),
 'test': Dataset({
     features: ['sms', 'label', 'input_ids', 'attention_mask'],
     num_rows: 1115
 })}

In [6]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/spam_not_spam",
        # Set the learning rate
        learning_rate=2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        # Evaluate and save the model after each epoch
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=10,
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.evaluate()






  0%|          | 0/140 [00:00<?, ?it/s]

{'eval_loss': 0.7289905548095703,
 'eval_accuracy': 0.1336322869955157,
 'eval_runtime': 40.5571,
 'eval_samples_per_second': 27.492,
 'eval_steps_per_second': 3.452}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [7]:
from peft import LoraConfig , get_peft_model
config = LoraConfig(
    r=32,
    lora_alpha=32,
    target_modules=["out_lin"], #out_lin
    lora_dropout=0.1,
    bias="lora_only",
    #modules_to_save=["decode_head"],
)


In [8]:
lora_model = get_peft_model(model, config)

In [9]:
lora_model.print_trainable_parameters()

trainable params: 299,520 || all params: 67,249,922 || trainable%: 0.4454


In [10]:
print(tokenized_dataset["train"][0])

{'sms': 'Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE. KEEP UR SAME NUMBER, Get extra free mins/texts. Text YES for a call\n', 'label': 1, 'input_ids': [101, 2018, 2115, 4684, 2184, 11047, 7898, 1029, 10651, 2000, 1996, 6745, 4950, 1013, 2678, 11640, 2005, 2489, 1012, 2562, 24471, 2168, 2193, 1010, 2131, 4469, 2489, 8117, 2015, 1013, 6981, 1012, 3793, 2748, 2005, 1037, 2655, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [19]:
# LoRA takes in "labels", not "label" so we need to rename the 
# training and testing sets
lora_train = tokenized_dataset["train"].rename_column("label", "labels")
lora_test = tokenized_dataset["test"].rename_column("label", "labels")



In [12]:
import numpy as np
from sklearn.metrics import accuracy_score
from transformers import Trainer, TrainingArguments, DataCollatorWithPadding



# Create the Trainer instance
lora_trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/spam_not_spam",
        learning_rate=2e-5,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=2,
        weight_decay=0.01,
    ),
    train_dataset=lora_train,
    eval_dataset=lora_test,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)



# Proceed with training
lora_trainer.train()

  0%|          | 0/2230 [00:00<?, ?it/s]

{'loss': 0.4831, 'grad_norm': 0.3182447850704193, 'learning_rate': 1.5515695067264575e-05, 'epoch': 0.45}
{'loss': 0.2134, 'grad_norm': 0.33531975746154785, 'learning_rate': 1.1031390134529149e-05, 'epoch': 0.9}


  0%|          | 0/279 [00:00<?, ?it/s]

{'eval_runtime': 47.2032, 'eval_samples_per_second': 23.621, 'eval_steps_per_second': 5.911, 'epoch': 1.0}




{'loss': 0.1661, 'grad_norm': 0.08042547851800919, 'learning_rate': 6.547085201793723e-06, 'epoch': 1.35}
{'loss': 0.1485, 'grad_norm': 0.10257997363805771, 'learning_rate': 2.062780269058296e-06, 'epoch': 1.79}


  0%|          | 0/279 [00:00<?, ?it/s]

{'eval_runtime': 43.396, 'eval_samples_per_second': 25.694, 'eval_steps_per_second': 6.429, 'epoch': 2.0}




{'train_runtime': 940.0457, 'train_samples_per_second': 9.487, 'train_steps_per_second': 2.372, 'train_loss': 0.24238441193584903, 'epoch': 2.0}


TrainOutput(global_step=2230, training_loss=0.24238441193584903, metrics={'train_runtime': 940.0457, 'train_samples_per_second': 9.487, 'train_steps_per_second': 2.372, 'total_flos': 1189423698677760.0, 'train_loss': 0.24238441193584903, 'epoch': 2.0})

In [13]:
lora_model.save_pretrained("lora_model")

  0%|          | 0/279 [00:00<?, ?it/s]



## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [14]:
from peft import AutoPeftModelForCausalLM

lora_model_trained = AutoPeftModelForCausalLM.from_pretrained("lora_model")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:

# Create the Trainer instance
lora_trained = Trainer(
    model=lora_model_trained,
    args=TrainingArguments(
        output_dir="./data/spam_not_spam",
        learning_rate=2e-5,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=2,
        weight_decay=0.01,
    ),
    train_dataset=lora_train,
    eval_dataset=lora_test,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

lora_trained.evaluate()



  0%|          | 0/279 [00:00<?, ?it/s]

{'eval_loss': 0.6219468116760254,
 'eval_accuracy': 0.9381165919282511,
 'eval_runtime': 45.8677,
 'eval_samples_per_second': 24.309,
 'eval_steps_per_second': 6.083}

For Lin2

{'eval_loss': 0.6714442372322083,
 'eval_accuracy': 0.7524663677130045,
 'eval_runtime': 46.4629,
 'eval_samples_per_second': 23.998,
 'eval_steps_per_second': 6.005}

In [16]:
import pandas as pd

items_for_manual_review = tokenized_dataset["test"].select(
    [0, 1, 22, 31, 43, 292, 448, 487]
)

results = lora_trained.predict(items_for_manual_review)
df = pd.DataFrame(
    {
        "sms": [item["sms"] for item in items_for_manual_review],
        "predictions": results.predictions.argmax(axis=1),
        "labels": results.label_ids,
    }
)
# Show all the cell
pd.set_option("display.max_colwidth", None)

  0%|          | 0/2 [00:00<?, ?it/s]

In [17]:
df

Unnamed: 0,sms,predictions,labels
0,Yup... Hey then one day on fri we can ask miwa and jiayin take leave go karaoke \n,0,0
1,Happy new years melody!\n,1,0
2,PRIVATE! Your 2003 Account Statement for shows 800 un-redeemed S. I. M. points. Call 08715203652 Identifier Code: 42810 Expires 29/10/0\n,0,1
3,URGENT! We are trying to contact U. Todays draw shows that you have won a £800 prize GUARANTEED. Call 09050003091 from land line. Claim C52. Valid 12hrs only\n,1,1
4,I had askd u a question some hours before. Its answer\n,0,0
5,"SMS. ac JSco: Energy is high, but u may not know where 2channel it. 2day ur leadership skills r strong. Psychic? Reply ANS w/question. End? Reply END JSCO\n",0,1
6,"Yun ah.the ubi one say if ü wan call by tomorrow.call 67441233 look for irene.ere only got bus8,22,65,61,66,382. Ubi cres,ubi tech park.6ph for 1st 5wkg days.èn\n",0,0
7,Burger King - Wanna play footy at a top stadium? Get 2 Burger King before 1st Sept and go Large or Super with Coca-Cola and walk out a winner\n,1,1
