# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

Load pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [7]:
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}
# Load the pre-trained model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label=id2label,
    label2id=label2id
)



Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
from transformers import pipeline
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name)
#tokenizer.pad_token = tokenizer.

print(tokenizer.model_max_length)  # Check the maximum length of the tokenizer

512


### Load the Dataset

Choose some prompts. Then evaluate the model generated responses.

In [9]:
# Load the finance instruction dataset
from datasets import load_dataset, DatasetDict

# Login using e.g. `huggingface-cli login` to access this dataset
# ds = load_dataset("Josephgflowers/Finance-Instruct-500k", split="train[:5000]")

# Just read the first 5000 entries only due to resource limits
# ds = load_dataset("talkmap/banking-conversation-corpus", split="train[:5000]")

# ds = load_dataset("KidzRizal/twitter-sentiment-analysis", split="train[:5000]")

dataset_name = "AiresPucrs/sentiment-analysis"
ds = load_dataset(dataset_name, split="train[:5000]")

# split into train and test sets
ds = ds.train_test_split(test_size=0.1)
# explore the dataset
print(ds)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 4500
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 500
    })
})


### Quickly check the model

Use the first 10 texts from the test set to check how the model perform
> The same set of prompts will be used before and after training

In [10]:
check_df = ds['test'].to_pandas()
check_df = check_df.loc[check_df['text'].str.len() < 512][:8]
check_df


Unnamed: 0,text,label
6,i saw this movie when it aired on the wb and f...,1
7,cates is insipid and unconvincing kline over a...,0
15,feels like an impressionistic film if there is...,1
45,the man who gave us splash cocoon and parentho...,0
46,you know you're in trouble when the opening na...,0
49,valley girl is the definitive 1980's movie wit...,1
66,to put it simply the fan was a disappointment ...,0
72,this film is really a big piece of trash tryin...,0


Check the model before training

In [11]:

orig_classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

# Quickly check the model
for prompt in check_df['text'].tolist():
    print(orig_classifier(prompt, truncation=True, max_length=512))

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.700641393661499}]
[{'label': 'POSITIVE', 'score': 0.6883817911148071}]
[{'label': 'POSITIVE', 'score': 0.727775514125824}]
[{'label': 'POSITIVE', 'score': 0.6609954237937927}]
[{'label': 'POSITIVE', 'score': 0.6965090036392212}]
[{'label': 'POSITIVE', 'score': 0.6845595836639404}]
[{'label': 'POSITIVE', 'score': 0.6999365091323853}]
[{'label': 'POSITIVE', 'score': 0.6931561827659607}]


**Before training**, the model seems simply picks *POSITIVE*. This is expected according to the warning message. It needs to be trained.

### Preprocess the Data

#### Tokenize the dataset

In [12]:
# quick check that things are working

inputs = tokenizer(ds['train'][0]['text'], max_length=512, padding="max_length", truncation=True, return_tensors="pt")
inputs['input_ids'].shape
#print(tokenizer.decode(inputs['input_ids']))
outputs = model(**inputs)  # Forward pass with the tokenized inputs
print(outputs)

SequenceClassifierOutput(loss=None, logits=tensor([[-0.2287,  0.5141]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


Define a function to group the *tokenized* text into smaller (block size 128) chunks

In [13]:
# tokenizer function
def tokenize_func(examples):
    return tokenizer(
        examples["text"],
        max_length=512,
        truncation=True,
        # return_tensors="pt",
    )


In [14]:
# Do the simple tokenization first and drop the un-used features.

tokenized_datasets = {}
for split in ds.keys():
    tokenized_datasets[split] = ds[split].map(
        tokenize_func,
        batched=True,
        remove_columns=["text"],
    )


Map: 100%|██████████| 4500/4500 [00:00<00:00, 9364.08 examples/s]
Map: 100%|██████████| 500/500 [00:00<00:00, 9618.95 examples/s]


In [15]:
tokenized_datasets

{'train': Dataset({
     features: ['label', 'input_ids', 'token_type_ids', 'attention_mask'],
     num_rows: 4500
 }),
 'test': Dataset({
     features: ['label', 'input_ids', 'token_type_ids', 'attention_mask'],
     num_rows: 500
 })}

### Setup PEFT for LORA Training

In [16]:
from peft import LoraConfig, get_peft_model
config = LoraConfig(
    task_type="SEQ_CLS",
    r=8,
    #lora_alpha=32,
    #lora_dropout=0.1,
    target_modules=["query", "value"],
)

lora_model = get_peft_model(model, config)
lora_model.print_trainable_parameters()


trainable params: 296,450 || all params: 109,780,228 || trainable%: 0.2700


#### Set the training Arguments and the Trainer

In [17]:
# define a compute metric function
from sklearn.metrics import accuracy_score
import numpy as np

def compute_metrics(eval_preds):
    # Convert logits to predictions
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return { "accuracy": accuracy_score(y_true=labels, y_pred=predictions) }

In [18]:
from transformers import TrainingArguments

save_path = "./data/lora-finetuned-sentiment-analysis"

training_args = TrainingArguments(
    output_dir=save_path,
    num_train_epochs=2,
    logging_steps=10,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=1e-5,
    weight_decay=0.01,
    load_best_model_at_end=False,
    push_to_hub=False,
)

In [19]:
# Train
from transformers import Trainer
from transformers import DataCollatorWithPadding

# let the data_collator handle the batching jobs
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['test'],
    data_collator=data_collator,
    processing_class=tokenizer,
    compute_metrics=compute_metrics,
)

No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


#### Evaluate the trainer **Before** training starts

In [20]:
# Evaluate the fine-tuned model
from transformers import pipeline
results = trainer.evaluate()
print(results)



{'eval_loss': 0.7559791207313538, 'eval_model_preparation_time': 0.0029, 'eval_accuracy': 0.508, 'eval_runtime': 97.475, 'eval_samples_per_second': 5.13, 'eval_steps_per_second': 0.646}


#### Now train the model. Without GPU, this will take a long time

In [21]:
trainer.train()



Epoch,Training Loss,Validation Loss,Model Preparation Time,Accuracy
1,0.6891,0.684114,0.0029,0.574
2,0.6856,0.679783,0.0029,0.606




TrainOutput(global_step=1126, training_loss=0.6923937458763326, metrics={'train_runtime': 6558.9588, 'train_samples_per_second': 1.372, 'train_steps_per_second': 0.172, 'total_flos': 2109364641648960.0, 'train_loss': 0.6923937458763326, 'epoch': 2.0})

#### Evaluate the trainer **After** training completed

In [22]:
# Evaluate the fine-tuned model
from transformers import pipeline
results = trainer.evaluate()
print(results)



{'eval_loss': 0.679783284664154, 'eval_model_preparation_time': 0.0029, 'eval_accuracy': 0.606, 'eval_runtime': 94.2286, 'eval_samples_per_second': 5.306, 'eval_steps_per_second': 0.669, 'epoch': 2.0}


#### Check the response for the same set of prompts after training

In [23]:
from transformers import pipeline

new_clfr = pipeline("sentiment-analysis", model=lora_model, tokenizer=tokenizer)

for prompt in check_df['text'].tolist():
    print(new_clfr(prompt, truncation=True, max_length=512))

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.5392293334007263}]
[{'label': 'POSITIVE', 'score': 0.5339582562446594}]
[{'label': 'POSITIVE', 'score': 0.5598429441452026}]
[{'label': 'POSITIVE', 'score': 0.5076249837875366}]
[{'label': 'POSITIVE', 'score': 0.5189483165740967}]
[{'label': 'POSITIVE', 'score': 0.5159575939178467}]
[{'label': 'POSITIVE', 'score': 0.5249590873718262}]
[{'label': 'NEGATIVE', 'score': 0.5063350200653076}]


###  Save the PEFT Tuned model to disk


In [24]:
# Saving the model
from transformers import AutoModelForSequenceClassification
from peft import PeftModel
from peft import PeftConfig

# save_path = "./data/lora-finetuned-sentiment-analysis" # (already defined above)
lora_model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)


('./data/lora-finetuned-sentiment-analysis/tokenizer_config.json',
 './data/lora-finetuned-sentiment-analysis/special_tokens_map.json',
 './data/lora-finetuned-sentiment-analysis/vocab.txt',
 './data/lora-finetuned-sentiment-analysis/added_tokens.json',
 './data/lora-finetuned-sentiment-analysis/tokenizer.json')

## Performing Inference with a Saved PEFT Model

In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [25]:
# Load the fine-tuned AutoPeftModelForSequenceClassification model for inference
# save_path = "./data/lora-finetuned-sentiment-analysis" # (already defined above)
# Load the fine-tuned model for inference
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel, PeftConfig

loaded_lora_model = AutoModelForSequenceClassification.from_pretrained(
    save_path,
    num_labels=2,
    id2label=id2label,
    label2id=label2id
)

tokenizer = AutoTokenizer.from_pretrained(save_path)

new_clfr = pipeline("sentiment-analysis", model=loaded_lora_model, tokenizer=tokenizer)

for prompt in check_df['text'].tolist():
    print(new_clfr(prompt, truncation=True, max_length=512))

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.5392293334007263}]
[{'label': 'POSITIVE', 'score': 0.5339582562446594}]
[{'label': 'POSITIVE', 'score': 0.5598429441452026}]
[{'label': 'POSITIVE', 'score': 0.5076249837875366}]
[{'label': 'POSITIVE', 'score': 0.5189483165740967}]
[{'label': 'POSITIVE', 'score': 0.5159575939178467}]
[{'label': 'POSITIVE', 'score': 0.5249590873718262}]
[{'label': 'NEGATIVE', 'score': 0.5063350200653076}]


## Conclusion

The fine-tuned model does a better job in the area of finance related topics as the additional training
dataset added more infomation to the original model.