In [None]:
# !pip install datasets==3.6.0
# !pip install peft
# !pip install torch
# !pip uninstall tensorflow tensorflow-gpu -y

^C




In [22]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix

from datasets import Dataset
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
from transformers import TrainingArguments, Trainer

from peft import get_peft_model, LoraConfig, TaskType

In [23]:
reviews = pd.read_csv("../data/imdb.csv").sample(500)
reviews.head()

Unnamed: 0,review,sentiment
9211,I first saw a poster advertising this film on ...,negative
4851,I watch a lot of movies. A LOT of movies. Gett...,positive
9702,Director Warren Beatty's intention to turn Che...,negative
7301,People may say I am harsh but I can't help it....,negative
6335,"This ""Debuted"" today on the SciFi channel and ...",negative


In [24]:
reviews = reviews.rename(columns={"review": "text", "sentiment": "label"})
reviews['label'] = LabelEncoder().fit_transform(reviews['label'])
reviews.head()

Unnamed: 0,text,label
9211,I first saw a poster advertising this film on ...,0
4851,I watch a lot of movies. A LOT of movies. Gett...,1
9702,Director Warren Beatty's intention to turn Che...,0
7301,People may say I am harsh but I can't help it....,0
6335,"This ""Debuted"" today on the SciFi channel and ...",0


To help with this task, we're going to use the [Datasets library](https://huggingface.co/docs/datasets/en/index), which allows for working with various types of datasets, from text, to audio, to images.

First, convert the reviews DataFrame into a dataset object by using the [from_pandas function](https://huggingface.co/docs/datasets/en/index).

In [25]:
ds = Dataset.from_pandas(reviews)

Once converted, we can perform a train/test split using a method of the Dataset object. 

Peform an 80/20 train/test split using the [train_test_split method](https://huggingface.co/docs/datasets/en/index). Save the result back to the same Dataset object.

In [26]:
ds = ds.train_test_split(test_size=0.2, seed=42)

Now, extract the training and test portion into separate Datasets. Name these new datasets train_dataset and test_dataset, respectively.

In [27]:
train_dataset = ds['train']
test_dataset = ds['test']


We're going to be working with a DistilBERT model, which means that we'll need to tokenize our input in the way that DistilBERT expects. For this, we can use the [DistilBertTokenizerFast](https://huggingface.co/docs/transformers/en/model_doc/distilbert?usage=AutoModel#transformers.DistilBertTokenizerFast) tokenizer.

In [28]:
tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")

Then we'll apply our tokenizer to the training and test datsets using the map method.

In [29]:
train_dataset = train_dataset.map(lambda ds: tokenizer(ds['text'], padding="max_length", truncation=True), batched=True)
test_dataset = test_dataset.map(lambda ds: tokenizer(ds['text'], padding="max_length", truncation=True), batched=True)

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Finally, we'll set the format to torch so that we're working with [PyTorch](https://pytorch.org/) tensors and only extract the columns that we need.

In [30]:
train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])
test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])

Now, let's load in the pretrained DistilBERT model.

### Part 1: Fine-tuning All Parameters

In [31]:
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Now, we need to set up a [Trainer](https://huggingface.co/docs/transformers/en/main_classes/trainer) for our model.

First, create a [TraininingArguments](https://huggingface.co/docs/transformers/v4.52.3/en/main_classes/trainer#transformers.TrainingArguments) object. Set num_train_epochs to 5, weight_decay to 0.01, and report_to = 'none'

In [32]:
# Your Code Here

Finally, create a Trainer object using the model, the Training Arguments that you created, and with the train_dataset equal to train_dataset.

In [33]:
# Your Code Here

Now, use the [train method](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.Trainer.train) on your Trainer object.

In [34]:
#Your Code Here

Once the model has been fit, use the [predict method](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.Trainer.predict) of the Trainer object to generate a set of predictions on the test_dataset.

In [35]:
#Your Code Here

Extract the actual test labels and the predicted labels from the predictions. Note that both the true labels and predicted probabilites are contained as an attribute of the predictions.

In [36]:
#Your Code Here

Finally, look the confusion matrix and classification report for these predictions.

In [37]:
#Your Code Here

### Part 2: Training Only a Subset of the Parameters

Let's first reload the pretrained distilbert model.

Then, we'll make none of the parameters trainable by setting the `requires_grad` attribute to False.

In [38]:
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

for param in model.distilbert.parameters():
    param.requires_grad = False

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Then, we'll go back and make the last 2 layers trainable.

In [39]:
for i in [4, 5]:
    for param in model.distilbert.transformer.layer[i].parameters():
        param.requires_grad = True

We'll now set up TrainingArguments and a Trainer as before to train the model.

In [40]:
training_args = TrainingArguments(
    num_train_epochs=5,
    weight_decay=0.01,
    report_to = 'none'
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)

trainer.train()

ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.26.0`: Please run `pip install transformers[torch]` or `pip install 'accelerate>=0.26.0'`

How do the results of fine-tuning only the last two layers compare to fine-tuning all layers? How does the training time compare?

In [None]:
#Your Code Here

### Part 3: Parameter-Efficient Fine-Tuning

Now, let's see how we can use the [peft library](https://huggingface.co/docs/peft/en/index) to more efficiently fine-tune our model.

First, we'll re-initalize the model.

In [None]:
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

Create a [LoraConfig object](https://huggingface.co/docs/peft/en/index) were you set the Lora attention dimension to 8, the lora_alpha to 16, the lora_dropout to 0.1, the target_modules to "q_lin" and "v_lin" (these are the "query" and "value" projections), and set the task_type to TaskType.SEQ_CLS.

In [None]:
#Your Code Here

Now, use the [get_peft_model function](https://huggingface.co/docs/peft/v0.15.0/en/package_reference/peft_model#peft.get_peft_model) to create the Lora model pass in the distilbert model and the LoraConfig object that you created.

In [None]:
#Your Code Here

How many trainable parameters does the resulting model have? Hint: you can use the [print_trainable_parameters function](https://huggingface.co/docs/peft/v0.15.0/en/package_reference/peft_model#peft.PeftModel.print_trainable_parameters).

In [None]:
#Your Code Here

We'll again set up the same TrainingArguments and create our Trainer object to grain the model.

In [None]:
training_args = TrainingArguments(
    num_train_epochs=5,
    weight_decay=0.01,
    report_to = 'none'
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)

trainer.train()

How does the performance of the lora model compare to the previous two? How does the training time compare?

In [None]:
#Your Code Here