<a href="https://colab.research.google.com/github/mdeevan/LightweightFineTuning/blob/main/LightweightFineTuning_roberta.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: <br>
**LoRA** (Low Rank Adaptation). It decomposes a large matrix into small matrices, reducing number of parameters. It requires less memory and speeds up fine-tuning.
<br>https://huggingface.co/docs/peft/developer_guides/lora
<br>

* Model: <br>
**distilbert/distilroberta-base** :  
<br>https://huggingface.co/distilbert/distilroberta-base
<br>

* Evaluation approach: <br>
**seqeval** framework for sequence labeling evaluation. It evaluates the precision, recall and f1 score.
<br>https://huggingface.co/spaces/evaluate-metric/seqeval
<br>

* Fine-tuning dataset: <br>
**financial_phrasebank** based on the financial news, a multi-class-classification with three sentiments (positive, negative and neutral)
<br>https://huggingface.co/datasets/financial_phrasebank


## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [41]:
!pip install accelerate




In [None]:
!kill -9 -1

In [1]:
!pip install transformers --upgrade
!pip install evaluate seqeval
!pip install peft

Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting datasets>=2.0.0 (from evaluate)
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
Collecting dill (from evaluate)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from evaluate)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K

#### IMPORTS

In [2]:
from transformers import (AutoModelForSequenceClassification,
                          AutoTokenizer, DataCollatorWithPadding,
                          TrainingArguments, Trainer)
from datasets     import load_dataset

from peft import LoraConfig, get_peft_model, TaskType

import torch
import evaluate
import numpy as np
import pandas as pd


#### Define Variables & Load dataset

In [3]:
# https://www.evidentlyai.com/classification-metrics/multi-class-metrics

accuracy  = evaluate.load('accuracy')
f1        = evaluate.load('f1')
precision = evaluate.load('precision')
recall    = evaluate.load('recall')

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.55k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.36k [00:00<?, ?B/s]

In [4]:
checkpoint = "distilbert/distilroberta-base"
data_file = "financial_phrasebank"
data_file_subset = "sentences_66agree"

In [5]:
# import numpy as np
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)

    return accuracy.compute(predictions=predictions, references=labels)

In [6]:
raw_dataset = load_dataset(path=data_file,
                           name=data_file_subset,
                           split="train").train_test_split(test_size=0.2,
                                                           shuffle=True,
                                                           seed=42)

raw_train = raw_dataset.pop('train')
raw_train_valid = raw_train.train_test_split(test_size=.1, shuffle=True, seed=42)
raw_dataset['train'] = raw_train_valid.pop('train')
raw_dataset['eval'] = raw_train_valid.pop('test')
raw_dataset


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading data:   0%|          | 0.00/339k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4217 [00:00<?, ? examples/s]

DatasetDict({
    test: Dataset({
        features: ['sentence', 'label'],
        num_rows: 844
    })
    train: Dataset({
        features: ['sentence', 'label'],
        num_rows: 3035
    })
    eval: Dataset({
        features: ['sentence', 'label'],
        num_rows: 338
    })
})

In [7]:
# raw_dataset

In [8]:
labels = raw_dataset["train"].features['label'].names
labels

['negative', 'neutral', 'positive']

In [9]:
label2id = {l:i for i, l in enumerate(labels)}
id2label = {i:l for i, l in enumerate(labels)}

In [10]:
input_max_length = max([len(s) for s in raw_dataset['train']['sentence']])
input_max_length

315

In [11]:
print(label2id)
print(id2label)
print(len(label2id))

{'negative': 0, 'neutral': 1, 'positive': 2}
{0: 'negative', 1: 'neutral', 2: 'positive'}
3


In [12]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint,
                                                           num_labels = len(label2id),
                                                           id2label=id2label,
                                                           label2id=label2id)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

print(device)

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilbert/distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


cpu


In [13]:
print("model     = ", model)
print("tokenizer = ", tokenizer)

model     =  RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-5): 6 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
  

In [14]:
# data_collator

In [15]:
def tokenize_function(data):
    return tokenizer(data['sentence'],
#                      max_length=input_max_length,
                     truncation=True,
#                      padding='max_length'
                    )


tokenized_dataset = raw_dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/844 [00:00<?, ? examples/s]

Map:   0%|          | 0/3035 [00:00<?, ? examples/s]

Map:   0%|          | 0/338 [00:00<?, ? examples/s]

In [16]:
tokenized_dataset

DatasetDict({
    test: Dataset({
        features: ['sentence', 'label', 'input_ids', 'attention_mask'],
        num_rows: 844
    })
    train: Dataset({
        features: ['sentence', 'label', 'input_ids', 'attention_mask'],
        num_rows: 3035
    })
    eval: Dataset({
        features: ['sentence', 'label', 'input_ids', 'attention_mask'],
        num_rows: 338
    })
})

In [17]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer,
                                        padding=True,
#                                         padding='max_length',
#                                         max_length=input_max_length)
                                       )

In [18]:
data_collator

DataCollatorWithPadding(tokenizer=RobertaTokenizerFast(name_or_path='distilbert/distilroberta-base', vocab_size=50265, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': '<mask>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	0: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
	1: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
	3: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
	50264: AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=False, special=True),
}, padding=True, max_length=None, pad_to_multiple_of=N

In [19]:
[len(x) for x in tokenized_dataset['train'][:10]['sentence']]

[120, 75, 152, 46, 152, 90, 50, 167, 160, 49]

In [20]:
[len(x) for x in tokenized_dataset['train'][:10]['input_ids']]

[23, 20, 41, 12, 34, 19, 13, 57, 45, 16]

In [21]:

def evaluate_samples(model=model, ds=tokenized_dataset['train'], sample_start=0, sample_count=10):
  samples = ds[sample_start : sample_start+sample_count]

  samples = {k: v for k, v in samples.items() if k not in ['sentence', 'label']}

  batch = data_collator(samples ).to(device)

  output = model(**batch).logits

  predictions=torch.argmax(output, dim=1).cpu().numpy()

  return predictions


In [22]:
print(evaluate_samples(model, tokenized_dataset['train'], 20, 10))

[0 0 0 0 0 0 0 0 0 0]


In [23]:
sample_start = 20
sample_count = 10

samples = tokenized_dataset['train'][sample_start : sample_start+sample_count]

samples = {k: v for k, v in samples.items() if k not in ['sentence', 'label']}
print([len(x) for x in samples['input_ids']])



[27, 24, 32, 28, 25, 27, 10, 31, 41, 17]


In [24]:
# data_collator([samples[i] for i in range(2)])

In [25]:
samples.keys()

dict_keys(['input_ids', 'attention_mask'])

In [26]:
batch = data_collator(samples ).to(device)
# batch
# {k: v.shape for k, v in batch.items()}


In [27]:
{k: v.shape for k, v in batch.items()}

{'input_ids': torch.Size([10, 41]), 'attention_mask': torch.Size([10, 41])}

In [28]:
output = model(**batch).logits
# output

In [29]:
  predictions=torch.argmax(output, dim=1).cpu().numpy()
  predictions

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [30]:
references=tokenized_dataset['train']['label'][sample_start:sample_start+sample_count]
references

[1, 2, 2, 1, 2, 0, 1, 1, 1, 2]

In [31]:
batch.keys()

dict_keys(['input_ids', 'attention_mask'])

In [32]:


clf =  evaluate.combine(["accuracy",'f1','precision','recall'])
accuracy_metric = accuracy.compute (predictions = predictions, references  = references )
f1_metric       = f1.compute       (predictions = predictions, references  = references,  average = "macro")
precision_metric= precision.compute(predictions = predictions, references  = references,   average = "macro", zero_division=0)
recall_metric   = recall.compute   (predictions = predictions, references  = references,  average = "macro")


print(accuracy_metric)
print(f1_metric)
print(precision_metric)
print(recall_metric)

{'accuracy': 0.1}
{'f1': 0.06060606060606061}
{'precision': 0.03333333333333333}
{'recall': 0.3333333333333333}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [33]:
# !pip install peft

In [34]:
# from peft import LoraConfig, get_peft_model, TaskType


In [35]:
# config = LoraConfig()
peft_config = LoraConfig(task_type=TaskType.SEQ_CLS ,
                         inference_mode  = False,
                         r               = 32,
                         lora_alpha      = 64,
                         lora_dropout    = 0.1,
                         target_modules  = ['query','value', 'key'],# 'out_proj'],
                         modules_to_save = ['classifier']
                         )


In [36]:
peft_model = get_peft_model(model, peft_config)

In [37]:
peft_model.print_trainable_parameters()

trainable params: 1,477,635 || all params: 83,598,342 || trainable%: 1.7675410356822627


In [38]:
peft_model.print_trainable_parameters()

trainable params: 1,477,635 || all params: 83,598,342 || trainable%: 1.7675410356822627


In [39]:

peft_model.print_trainable_parameters()

trainable params: 1,477,635 || all params: 83,598,342 || trainable%: 1.7675410356822627


In [40]:
#

In [41]:
Training_Arguments = TrainingArguments(
    per_device_train_batch_size = 8,
    per_device_eval_batch_size  = 8,
    output_dir                  = "roberta_pfet_classifier2",
    learning_rate               = 2e-5,
    num_train_epochs            = 20,
    weight_decay                = 0.005,
    save_strategy               = 'epoch',
    evaluation_strategy         = 'epoch',
    deepspeed                   = False,
    load_best_model_at_end      = True)

In [None]:
tokenized_dataset['train'].features

{'sentence': Value(dtype='string', id=None),
 'label': ClassLabel(names=['negative', 'neutral', 'positive'], id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)}

In [None]:
tokenized_dataset['train'].rename_column('label','labels')

Dataset({
    features: ['sentence', 'labels', 'input_ids', 'attention_mask'],
    num_rows: 3035
})

In [None]:
trainer = Trainer(
                  model=peft_model,
                  args=Training_Arguments,
                  train_dataset=tokenized_dataset['train'],
                  eval_dataset =tokenized_dataset['eval'],
                  compute_metrics=compute_metrics,
                  tokenizer=tokenizer,
                  data_collator=data_collator
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [None]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.477404,0.798817
2,0.719000,0.385329,0.83432
3,0.366300,0.352769,0.849112
4,0.323800,0.356402,0.849112
5,0.323800,0.307785,0.884615
6,0.290300,0.323073,0.881657
7,0.294000,0.318361,0.893491
8,0.249700,0.368127,0.87574
9,0.249700,0.333022,0.890533
10,0.246500,0.299579,0.908284


TrainOutput(global_step=7600, training_loss=0.28527731569189774, metrics={'train_runtime': 287.0573, 'train_samples_per_second': 211.456, 'train_steps_per_second': 26.476, 'total_flos': 855810065988360.0, 'train_loss': 0.28527731569189774, 'epoch': 20.0})

In [47]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [48]:
#  model.save_pretrained("fineTunedMultiLabelClassifier")

In [49]:
saved_checkpoint = '/content/drive/MyDrive/ftMLC-roberta-May-15-04'

In [None]:
trainer.save_model(saved_checkpoint)

In [None]:
tokenized_dataset['train'].features

{'sentence': Value(dtype='string', id=None),
 'label': ClassLabel(names=['negative', 'neutral', 'positive'], id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)}

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [45]:
from peft import PeftConfig, PeftModel, AutoPeftModelForSequenceClassification

In [50]:
config = PeftConfig.from_pretrained(saved_checkpoint)
base_model = AutoModelForSequenceClassification.from_pretrained(
    config.base_model_name_or_path,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint

)
base_model.to(device)

# Load the LoRA model
base_tokenizer=AutoTokenizer.from_pretrained(config.base_model_name_or_path)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilbert/distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [51]:
inference_model = PeftModel.from_pretrained(base_model, saved_checkpoint )
inference_model.eval()

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): RobertaForSequenceClassification(
      (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
          (word_embeddings): Embedding(50265, 768, padding_idx=1)
          (position_embeddings): Embedding(514, 768, padding_idx=1)
          (token_type_embeddings): Embedding(1, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
          (layer): ModuleList(
            (0-5): 6 x RobertaLayer(
              (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                  (query): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.1, inplace=False)
                    )
                    (lora_A): ModuleDic

In [53]:
# inference_model

In [60]:
sample_start=10
sample_count=50

In [61]:
print(evaluate_samples(inference_model, tokenized_dataset['test'], sample_start, sample_start+sample_count))

[0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 1]


In [63]:

references=tokenized_dataset['test']['label'][sample_start:sample_start+sample_count]
print(references)

[2, 2, 1, 1, 0, 1, 1, 1, 1, 0, 2, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 0, 0, 2, 1, 1, 1, 1, 1]


In [54]:
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(saved_checkpoint,
                                                           num_labels = len(label2id),
                                                           id2label=id2label,
                                                           label2id=label2id)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilbert/distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
trainer = Trainer(
    model=inference_model,
    args=training_args,
    train_dataset=tokenized_input["train"],
    eval_dataset=tokenized_input["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

In [None]:
f

In [43]:
# saved_checkpoint = '/content/drive/MyDrive/ftMLC'
peft_tokenizer = AutoTokenizer.from_pretrained(saved_checkpoint)
# inputs = tokenizer2(text, return_tensors='pt')

In [71]:
# from transformers import AutoModelForSequenceClassification

peft_trained_model = AutoModelForSequenceClassification.from_pretrained(saved_checkpoint)
peft_trained_model = AutoPeftModelForSequenceClassification()

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilbert/distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading adapter weights from /content/drive/MyDrive/ftMLC-roberta-May-15-04 led to unexpected keys not found in the model:  ['classifier.modules_to_save.default.modules_to_save.out_proj.lora_A.default.weight', 'classifier.modules_to_save.default.modules_to_save.out_proj.lora_B.default.weight', 'classifier.modules_to_save.default.original_module.out_proj.lora_A.default.weight', 'classifier.modules_to_save.default.original_module.out_proj.lora_B.default.weight', 'classifier.modules_to_save.default.out_proj.base_layer.bias', 'classifier.modules_to_save.default.out_proj.base_layer.weight', 'classifier.modules_to_save.d

In [None]:
# config =   PeftConfig.from_pretrained("stevhliu/vit-base-patch16-224-in21k-lora")

In [55]:
Training_Arguments2 = TrainingArguments(
    per_device_train_batch_size = 8,
    per_device_eval_batch_size  = 8,
    output_dir                  = "evaluate",
    learning_rate               = 2e-5,
    num_train_epochs            = 20,
    weight_decay                = 0.005,
    save_strategy               = 'epoch',
    evaluation_strategy         = 'epoch',
    deepspeed                   = False,
    load_best_model_at_end      = True)


In [57]:
#!export CUDA_LAUNCH_BLOCKING=1
trainer2 = Trainer(
                  model=inference_model,
                  args=Training_Arguments2,
                  train_dataset=tokenized_dataset['train'],
                  eval_dataset =tokenized_dataset['eval'],
                  compute_metrics=compute_metrics,
                  tokenizer=tokenizer,
                  data_collator=data_collator

                  )

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [58]:
print(trainer2.evaluate())


{'eval_loss': 1.3771162033081055, 'eval_accuracy': 0.011834319526627219, 'eval_runtime': 30.2783, 'eval_samples_per_second': 11.163, 'eval_steps_per_second': 1.42}


In [59]:
print(evaluate_samples(peft_trained_model, tokenized_dataset['test'], 10, 50))

NameError: name 'peft_trained_model' is not defined

In [65]:

sample_start=10
sample_count=50

print(evaluate_samples(inference_model, tokenized_dataset['test'], sample_start, sample_count))

references=tokenized_dataset['test']['label'][sample_start:sample_start+sample_count]
print(references)

[0 0 2 0 2 2 0 2 0 2 0 2 0 0 0 2 2 0 2 2 2 0 2 0 2 2 0 2 0 2 0 2 0 2 0 0 0
 0 1 2 1 0 2 2 0 2 2 0 2 0]
[2, 2, 1, 1, 0, 1, 1, 1, 1, 0, 2, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 0, 0, 2, 1, 1, 1, 1, 1]


In [69]:
raw_dataset['test'][10:13]

{'sentence': ['Ragutis , which is controlled by the Finnish brewery , reported a 5.4-per-cent rise in beer sales to 10.44 million litres and held an 11.09-per-cent market share .',
  'Adanac Molybdenum of Canada has ordered grinding technology for its molybdenum project in British Columbia , Canada , while Shalkiya Zinc of Kazakhstan has awarded a contract for grinding technology for the Shalkiya zinc-lead project in Kazakhstan .',
  'The connectivity unit has more than 100 e-invoice customers , and the number of annual transactions stands at nearly one million .'],
 'label': [2, 2, 1]}

In [None]:
with torch.no_grad():
  logits = model2(**inputs).logits

NameError: name 'inputs' is not defined

In [None]:
prediction = np.argmax(logits, axis=-1).numpy()
print(model2.config.id2label[prediction[0]])

NameError: name 'np' is not defined

In [None]:
print(prediction[0])

1


In [None]:
tokenized_dataset['test'].features

{'sentence': Value(dtype='string', id=None),
 'label': ClassLabel(names=['negative', 'neutral', 'positive'], id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)}

In [None]:

test_set = tokenized_dataset['test']
# test_set = {k: v for k, v in tokenized_dataset['test'].items() if k not in ['sentence', 'label']}
# print([len(x) for x in samples['input_ids']])



In [None]:
test_set[:]


In [None]:
# Training_Arguments2 = TrainingArguments(
#     per_device_train_batch_size = 8,
#     per_device_eval_batch_size  = 8,
#     output_dir                  = "evaluate",
#     learning_rate               = 2e-5,
#     num_train_epochs            = 20,
#     weight_decay                = 0.005,
#     save_strategy               = 'epoch',
#     evaluation_strategy         = 'epoch',
#     deepspeed                   = False,
#     load_best_model_at_end      = True)


In [None]:
# #!export CUDA_LAUNCH_BLOCKING=1
# trainer2 = Trainer(
#                   model=peft_trained_model,
#                   args=Training_Arguments2,
#                   train_dataset=tokenized_dataset['train'],
#                   eval_dataset =tokenized_dataset['test'],
#                   compute_metrics=compute_metrics,
#                   tokenizer=peft_tokenizer,
#                   data_collator=data_collator
# )

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


In [None]:
# print(trainer2.evaluate())

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


In [60]:

def evaluate_samples(model=model, ds=tokenized_dataset['train'], sample_start=0, sample_count=10):
  samples = ds[sample_start : sample_start+sample_count]

  samples = {k: v for k, v in samples.items() if k not in ['sentence', 'label']}

  batch = data_collator(samples ).to(device)

  output = model(**batch).logits

  predictions=torch.argmax(output, dim=1).cpu().numpy()

  return predictions


In [61]:
print(evaluate_samples(inference_model, tokenized_dataset['test'], 200, 10))

[1 2 0 0 0 0 0 0 2 0]


In [62]:
tokenized_dataset['test'] #.features

Dataset({
    features: ['sentence', 'label', 'input_ids', 'attention_mask'],
    num_rows: 844
})

In [63]:
print(evaluate_samples(model, tokenized_dataset['train'], 1, 100))

KeyboardInterrupt: 

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame()
df['label'] = tokenized_dataset['eval'][:]['label']
df['val'] = 1

df.groupby(['label']).sum(['val'])

Unnamed: 0_level_0,val
label,Unnamed: 1_level_1
0,39
1,200
2,99


In [None]:
df.groupby(['label']).sum(['val'])

Unnamed: 0_level_0,val
label,Unnamed: 1_level_1
0,107
1,506
2,231
