# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# Importing necessary libraries for PEFT and evaluation
import torch, evaluate, os
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer, DataCollatorWithPadding, BitsAndBytesConfig
from datasets import load_dataset
from peft import PeftModel, PeftConfig, LoraConfig, get_peft_model
import numpy as np
import random
import pandas as pd
postfix=random.randint(1,100)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Set the device to GPU if CUDA is available, otherwise fallback to CPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [3]:
# Loading a subset of the "emotion" dataset from the Hugging Face datasets library.
emo_ds = load_dataset("emotion", split="train[0:5000]")

# Splitting the loaded dataset into training and testing sets.
emo_split_ds = emo_ds.train_test_split(test_size=0.2)
splits = ["train", "test"]
emo_split_ds

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 4000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1000
    })
})

In [4]:
emo_split_ds['train'][0]

{'text': 'i am feeling suspicious lj cut text suspicions', 'label': 4}

In [5]:
# Labels' names in the training split of the dataset.
emo_split_ds['train'].features["label"].names

['sadness', 'joy', 'love', 'anger', 'fear', 'surprise']

In [6]:
id2label = {1 : "sadness", 2 : "joy", 3 : "love", 4 : "anger", 5 : "fear", 6: "surprise"}
label2id = {"sadness" : 1, "joy" : 2, "love" : 3, "anger" : 4, "fear" : 5, "surprise": 6}

In [7]:
model_name = "gpt2"

# Configuring the BitsAndBytesConfig for 8-bit quantization.
# - 'load_in_8bit=True' enables loading the model in 8-bit precision to reduce memory usage.
# - 'bnb_8bit_quant_type="nf8"' specifies the type of 8-bit quantization to use, in this case, "nf8" (Normalized Float 8-bit).
# - 'bnb_8bit_compute_dtype=torch.float16' sets the computation to use 16-bit floating point precision during model operations.
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_quant_type="nf8",
    bnb_8bit_compute_dtype=torch.float16
)

# Loading a pre-trained GPT-2 model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    num_labels=6, 
    id2label=id2label,
    label2id=label2id,
    quantization_config=bnb_config,
)
model

Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Linear8bitLt(in_features=768, out_features=2304, bias=True)
          (c_proj): Linear8bitLt(in_features=768, out_features=768, bias=True)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Linear8bitLt(in_features=768, out_features=3072, bias=True)
          (c_proj): Linear8bitLt(in_features=3072, out_features=768, bias=True)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), ep

In [8]:
# Loading the tokenizer associated with the pre-trained model "gpt2".
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token # Setting the padding token to End of Sentence token

# Defining a preprocessing function to tokenize the input text.
def preprocess_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,  # Truncate the input text to the maximum length specified.
        padding="max_length",  # Pad all sequences to the maximum length.
        max_length=128,  # Set the maximum length of the tokenized sequences to 128 tokens.
        return_attention_mask=True, # Include attention masks in the output, which indicates padded tokens.
        return_tensors="pt",  # Ensure tensors are returned
    )

# Tokenizing each split ("train" and "test") of the dataset using the preprocessing function.
tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = emo_split_ds[split].map(preprocess_function, batched=True)

print(tokenized_ds["train"][0])
print(tokenized_ds["train"][0]["input_ids"])

Map: 100%|██████████| 4000/4000 [00:00<00:00, 16955.81 examples/s]
Map: 100%|██████████| 1000/1000 [00:00<00:00, 15776.84 examples/s]

{'text': 'i am feeling suspicious lj cut text suspicions', 'label': 4, 'input_ids': [72, 716, 4203, 13678, 300, 73, 2005, 2420, 30508, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 




In [9]:
def compute_metrics(eval_pred):
    # Calculate accuracy by comparing model predictions with true labels.
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [10]:
model.config.pad_token_id = tokenizer.pad_token_id # Align the model's padding token ID with the token ID used by the tokenizer.

# Freeze the base model parameters to prevent them from being updated during fine-tuning.
for param in model.base_model.parameters():
    param.requires_grad = False

# Evaluation without Trainer
# Load the accuracy metric for evaluating the model's performance.
metric = evaluate.load("accuracy")
eval_dataset = tokenized_ds["test"]

all_labels = []
all_preds = []

print("Starting evaluation...")
for idx, example in enumerate(eval_dataset):
    with torch.no_grad():
        inputs = preprocess_function(example)
        outputs = model(**inputs)
        logits = outputs.logits
        preds = torch.argmax(logits, dim=-1).cpu().numpy()
        
        all_labels.append(example["label"])  # Append the single label
        all_preds.extend(preds)  # Extend the predictions list

    # Print progress every 500 examples
    if idx % 500 == 0:
        print(f"Processed {idx + 1}/{len(eval_dataset)} examples...")

# Calculate the accuracy of the model by comparing all predictions with the true labels.
accuracy = metric.compute(predictions=all_preds, references=all_labels)
print("Pre-fine-tuning Evaluation Results:", accuracy)

Downloading builder script: 100%|██████████| 4.20k/4.20k [00:00<?, ?B/s]


Starting evaluation...
Processed 1/1000 examples...
Processed 501/1000 examples...
Pre-fine-tuning Evaluation Results: {'accuracy': 0.033}


In [11]:
# Set CUDA_LAUNCH_BLOCKING to catch CUDA errors more precisely
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

# Load the model without quantization and move it to the appropriate device (GPU if available)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    num_labels=6,
    id2label=id2label,
    label2id=label2id,
).to(device)  # Move the model to the appropriate device (GPU or CPU)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
model.config.pad_token_id = tokenizer.pad_token_id
for param in model.base_model.parameters():
    param.requires_grad = False
    
# Define training arguments
training_args =TrainingArguments(
        output_dir=model_name + "-emo-pre-tuning-" + str(postfix),
        learning_rate=2e-5,
        per_device_train_batch_size=32,
        per_device_eval_batch_size=32,
        num_train_epochs=5,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    )

# Initialize the Trainer 
pre_trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

pre_trainer.train()
pre_trainer.evaluate()

  pre_trainer = Trainer(
                                                 
 20%|██        | 125/625 [00:47<02:42,  3.08it/s]

{'eval_loss': 7.799188613891602, 'eval_accuracy': 0.304, 'eval_runtime': 11.0202, 'eval_samples_per_second': 90.743, 'eval_steps_per_second': 2.904, 'epoch': 1.0}


                                                 
 40%|████      | 250/625 [01:36<01:48,  3.44it/s]

{'eval_loss': 6.490077972412109, 'eval_accuracy': 0.304, 'eval_runtime': 8.8218, 'eval_samples_per_second': 113.355, 'eval_steps_per_second': 3.627, 'epoch': 2.0}


                                                 
 60%|██████    | 375/625 [02:22<01:12,  3.45it/s]

{'eval_loss': 5.592576026916504, 'eval_accuracy': 0.304, 'eval_runtime': 8.8196, 'eval_samples_per_second': 113.384, 'eval_steps_per_second': 3.628, 'epoch': 3.0}


 80%|████████  | 500/625 [03:00<00:35,  3.48it/s]

{'loss': 6.9522, 'grad_norm': 155.72743225097656, 'learning_rate': 4.000000000000001e-06, 'epoch': 4.0}


                                                 
 80%|████████  | 500/625 [03:09<00:35,  3.48it/s]

{'eval_loss': 5.078474521636963, 'eval_accuracy': 0.304, 'eval_runtime': 8.7644, 'eval_samples_per_second': 114.098, 'eval_steps_per_second': 3.651, 'epoch': 4.0}


                                                 
100%|██████████| 625/625 [03:56<00:00,  3.45it/s]

{'eval_loss': 4.912605285644531, 'eval_accuracy': 0.304, 'eval_runtime': 8.7313, 'eval_samples_per_second': 114.531, 'eval_steps_per_second': 3.665, 'epoch': 5.0}


100%|██████████| 625/625 [03:58<00:00,  2.62it/s]


{'train_runtime': 238.1659, 'train_samples_per_second': 83.975, 'train_steps_per_second': 2.624, 'train_loss': 6.59064140625, 'epoch': 5.0}


100%|██████████| 32/32 [00:08<00:00,  3.80it/s]


{'eval_loss': 4.912605285644531,
 'eval_accuracy': 0.304,
 'eval_runtime': 8.7146,
 'eval_samples_per_second': 114.75,
 'eval_steps_per_second': 3.672,
 'epoch': 5.0}

In [13]:
df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]
predictions = pre_trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)
df.head(10)

100%|██████████| 32/32 [00:08<00:00,  3.90it/s]


Unnamed: 0,text,label,predicted_label
0,i got the feeling watching it that only from s...,1,0
1,i feel so regretful not going but,0,0
2,i know you cant just ged rid of your feelings ...,1,0
3,i feel awful about not working this summer im ...,0,0
4,i guess i m a sucker for the grand and endless...,1,0
5,im not feeling joyful or spiritually fit,1,0
6,i feel this strategy is worthwhile,1,0
7,i have struggled to fit all the work in for th...,3,0
8,i am not normally the kind of person who gets ...,2,0
9,i started to feel dissatisfied by the ease and...,3,0


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [14]:
# Apply LoRA (Low-Rank Adaptation) for efficient fine-tuning
lora_config = LoraConfig(
    task_type="SEQ_CLS",
    r=4,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["c_attn", "c_proj"],  # Applying LoRA to Conv1D layers in GPT2Attention
    bias="none"
)
peft_model = get_peft_model(model, lora_config).to(device)  # Move PEFT model to GPU

# Define training arguments
training_args = TrainingArguments(
    output_dir=model_name + "-lora-peft-fine-tuning-" + str(postfix),
    evaluation_strategy="epoch",
    per_device_train_batch_size=16,  # Batch size for training
    per_device_eval_batch_size=16,   # Batch size for evaluation
    num_train_epochs=10,
    learning_rate=1e-3,
    weight_decay=0.01,
    fp16=True,  # Enable mixed precision for faster training on GPU
    save_strategy="epoch",
    load_best_model_at_end=True,
)

# Initialize the Trainer for fine-tuning
ft_trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)


# train model
ft_trainer.train()
ft_trainer.evaluate()

  ft_trainer = Trainer(
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 10%|█         | 250/2500 [00:59<08:35,  4.37it/s]
 10%|█         | 250/2500 [01:06<08:35,  4.37it/s]

{'eval_loss': 0.5746538639068604, 'eval_accuracy': 0.817, 'eval_runtime': 7.355, 'eval_samples_per_second': 135.962, 'eval_steps_per_second': 8.566, 'epoch': 1.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 20%|██        | 500/2500 [02:04<07:30,  4.44it/s]  

{'loss': 0.7337, 'grad_norm': 5.961215972900391, 'learning_rate': 0.0008032, 'epoch': 2.0}


                                                  
 20%|██        | 500/2500 [02:10<07:30,  4.44it/s]

{'eval_loss': 0.3088444769382477, 'eval_accuracy': 0.896, 'eval_runtime': 5.5722, 'eval_samples_per_second': 179.461, 'eval_steps_per_second': 11.306, 'epoch': 2.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 30%|███       | 750/2500 [03:08<06:40,  4.37it/s]  
 30%|███       | 750/2500 [03:14<06:40,  4.37it/s]

{'eval_loss': 0.2879340648651123, 'eval_accuracy': 0.906, 'eval_runtime': 5.6369, 'eval_samples_per_second': 177.404, 'eval_steps_per_second': 11.176, 'epoch': 3.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 40%|████      | 1000/2500 [04:12<05:42,  4.38it/s] 

{'loss': 0.247, 'grad_norm': 3.7211077213287354, 'learning_rate': 0.0006031999999999999, 'epoch': 4.0}


                                                   
 40%|████      | 1000/2500 [04:18<05:42,  4.38it/s]

{'eval_loss': 0.29920694231987, 'eval_accuracy': 0.922, 'eval_runtime': 5.6407, 'eval_samples_per_second': 177.284, 'eval_steps_per_second': 11.169, 'epoch': 4.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 50%|█████     | 1250/2500 [05:17<04:51,  4.29it/s]
 50%|█████     | 1250/2500 [05:22<04:51,  4.29it/s]

{'eval_loss': 0.25252997875213623, 'eval_accuracy': 0.935, 'eval_runtime': 5.549, 'eval_samples_per_second': 180.213, 'eval_steps_per_second': 11.353, 'epoch': 5.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 60%|██████    | 1500/2500 [06:25<03:49,  4.36it/s]

{'loss': 0.1545, 'grad_norm': 9.811891555786133, 'learning_rate': 0.0004032, 'epoch': 6.0}



 60%|██████    | 1500/2500 [06:31<03:49,  4.36it/s]

{'eval_loss': 0.24325823783874512, 'eval_accuracy': 0.931, 'eval_runtime': 5.4531, 'eval_samples_per_second': 183.382, 'eval_steps_per_second': 11.553, 'epoch': 6.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
                                                   
 70%|███████   | 1750/2500 [07:34<02:51,  4.36it/s]

{'eval_loss': 0.27847930788993835, 'eval_accuracy': 0.926, 'eval_runtime': 5.4354, 'eval_samples_per_second': 183.978, 'eval_steps_per_second': 11.591, 'epoch': 7.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
 80%|████████  | 2000/2500 [08:31<01:51,  4.47it/s]

{'loss': 0.0971, 'grad_norm': 1.6098593473434448, 'learning_rate': 0.0002032, 'epoch': 8.0}



 80%|████████  | 2000/2500 [08:36<01:51,  4.47it/s]

{'eval_loss': 0.2658595144748688, 'eval_accuracy': 0.928, 'eval_runtime': 5.4764, 'eval_samples_per_second': 182.602, 'eval_steps_per_second': 11.504, 'epoch': 8.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
                                                   
 90%|█████████ | 2250/2500 [09:40<00:57,  4.37it/s]

{'eval_loss': 0.3127102255821228, 'eval_accuracy': 0.93, 'eval_runtime': 5.5567, 'eval_samples_per_second': 179.962, 'eval_steps_per_second': 11.338, 'epoch': 9.0}


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
100%|██████████| 2500/2500 [10:38<00:00,  4.45it/s]

{'loss': 0.0599, 'grad_norm': 1.1508469581604004, 'learning_rate': 3.2000000000000003e-06, 'epoch': 10.0}


                                                   
100%|██████████| 2500/2500 [10:54<00:00,  4.45it/s]

{'eval_loss': 0.2961682677268982, 'eval_accuracy': 0.933, 'eval_runtime': 5.7425, 'eval_samples_per_second': 174.139, 'eval_steps_per_second': 10.971, 'epoch': 10.0}


100%|██████████| 2500/2500 [10:57<00:00,  3.80it/s]


{'train_runtime': 657.36, 'train_samples_per_second': 60.849, 'train_steps_per_second': 3.803, 'train_loss': 0.25843915786743166, 'epoch': 10.0}


100%|██████████| 63/63 [00:05<00:00, 11.85it/s]


{'eval_loss': 0.24325823783874512,
 'eval_accuracy': 0.931,
 'eval_runtime': 5.7356,
 'eval_samples_per_second': 174.349,
 'eval_steps_per_second': 10.984,
 'epoch': 10.0}

In [15]:
peft_model.save_pretrained("emo-gpt2-peft-fine-tuned", save_adapter=True, save_config=True)
peft_model.merge_and_unload()

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=6, bias=False)
)

In [16]:
df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]
predictions = ft_trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)
df.head(10)

100%|██████████| 63/63 [00:04<00:00, 14.21it/s]


Unnamed: 0,text,label,predicted_label
0,i got the feeling watching it that only from s...,1,1
1,i feel so regretful not going but,0,0
2,i know you cant just ged rid of your feelings ...,1,1
3,i feel awful about not working this summer im ...,0,0
4,i guess i m a sucker for the grand and endless...,1,1
5,im not feeling joyful or spiritually fit,1,1
6,i feel this strategy is worthwhile,1,1
7,i have struggled to fit all the work in for th...,3,3
8,i am not normally the kind of person who gets ...,2,2
9,i started to feel dissatisfied by the ease and...,3,3


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [17]:
# Load the saved PEFT model
from peft import AutoPeftModelForSequenceClassification
lora_model = AutoPeftModelForSequenceClassification.from_pretrained("emo-gpt2-peft-fine-tuned",  num_labels=6)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [18]:
infer_input = tokenizer(
        'one of the best restaurants i ever visited', 
        truncation=True,  
        padding="max_length", 
        max_length=128,  
        return_attention_mask=True,
        return_tensors="pt"
        )
with torch.no_grad():
    infer_output = lora_model(**infer_input)
    logits = infer_output.logits

probabilities = torch.nn.functional.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).numpy()[0]
print("Predicted class:", predicted_class)

Predicted class: 0


In [19]:
infer_input = tokenizer(
        'I hate this city', 
        truncation=True,  
        padding="max_length", 
        max_length=128,  
        return_attention_mask=True,
        return_tensors="pt"
        )
with torch.no_grad():
    infer_output = lora_model(**infer_input)
    logits = infer_output.logits

probabilities = torch.nn.functional.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).numpy()[0]
print("Predicted class:", predicted_class)

Predicted class: 3
