Notebook prepared by Henrique Lopes Cardoso (hlc@fe.up.pt).

# TRANSFORMERS

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In this notebook we will explore [Hugging Face Transformers](https://huggingface.co/docs/transformers/index).
You may also want to check the [Hugging Face course](https://huggingface.co/course/), which will explain you how to use this technology in a much greater depth.

Training transformer models is computationally expensive. Hugging Face makes available several pretrained [models](https://huggingface.co/models) that can be used as is, or fine-tuned to a specific NLP task, such as one of sentence classification. That's what we'll do in this notebook.

Hugging Face also makes available several [datasets](https://huggingface.co/datasets) that can be used to train or fine-tune a model.

See:
- https://huggingface.co/docs/transformers/tasks/sequence_classification#preprocess
- https://huggingface.co/docs/transformers/training#prepare-a-dataset
- https://huggingface.co/docs/transformers/accelerate
- https://huggingface.co/docs/transformers/model_summary#autoencoding-models

## Loading a dataset

In this notebook, we'll start by using a local dataset (instead of using a dataset stored at Hugging Face).
Let's load data for our classification task.

In [None]:
!pip install pandas
!pip install datasets
!pip install transformers
!pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
!pip install optuna

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cu113
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import pandas as pd

# Importing the dataset
dataset = pd.read_excel("/content/drive/MyDrive/Colab Notebooks/OpArticles_ADUs.xlsx")
dataset = dataset.drop(columns=['article_id', 'annotator', 'node','ranges'])
dataset['label'].replace(['Value', 'Value(+)', 'Value(-)', 'Fact', 'Policy'],[0,1,2,3,4], inplace=True)

dataset.head()

Unnamed: 0,tokens,label
0,O facto não é apenas fruto da ignorância,0
1,havia no seu humor mais jornalismo (mais inves...,0
2,É tudo cómico na FIFA,0
3,o que todos nós permitimos que esta organizaçã...,0
4,não nos fazem rir à custa dos poderosos,0


For ease of usage with Transformer models, we convert the dataset into a Hugging Face dataset and split it into train, validation and test sets.

In [None]:
from datasets import Dataset

dataset_hf = Dataset.from_pandas(dataset)

In [None]:
from datasets import DatasetDict

# 90% train, 10% test+validation
train_test = dataset_hf.train_test_split(test_size=0.1)

# Split the 10% test+validation set in half test, half validation
valid_test = train_test['test'].train_test_split(test_size=0.5)

# gather everyone if you want to have a single DatasetDict
train_valid_test_dataset = DatasetDict({
    'train': train_test['train'],
    'validation': valid_test['train'],
    'test': valid_test['test']
})

In [None]:
train_valid_test_dataset

DatasetDict({
    train: Dataset({
        features: ['tokens', 'label'],
        num_rows: 15068
    })
    validation: Dataset({
        features: ['tokens', 'label'],
        num_rows: 837
    })
    test: Dataset({
        features: ['tokens', 'label'],
        num_rows: 838
    })
})

## Fine-tuning a pretrained model

As a starting example, we'll use a lighter BERT-based model. We will need to load:
- the [tokenizer](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer) (which is used to [preprocess](https://huggingface.co/docs/transformers/preprocessing) the data before it can be used by the model)
- the [model](https://huggingface.co/docs/transformers/autoclass_tutorial#automodel) itself

In [None]:
model_name = "neuralmind/bert-base-portuguese-cased" # or neuralmind/bert-large-portuguese-cased

### Tokenizer

We first load the tokenizer for our model:

In [None]:
from transformers import AutoTokenizer

def get_tokenizer(name):
    return AutoTokenizer.from_pretrained(name, model_max_len=512, use_fast=True)

tokenizer = get_tokenizer(model_name)

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Now we need to [preprocess](https://huggingface.co/docs/transformers/preprocessing) our data. We will do it for the three partitions (train, validation and test) in a single step. For that, we'll make use of [map](https://huggingface.co/docs/datasets/process#map) with the help of an auxiliary function.

In [None]:
def preprocess_function(sample):
    return tokenizer(sample["tokens"], truncation=True)

In [None]:
def get_tokenized_data(dataset, function):
    return dataset.map(function, batched=True)

tokenized_dataset = get_tokenized_data(train_valid_test_dataset,preprocess_function)

  0%|          | 0/16 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [None]:
tokenized_dataset

DatasetDict({
    train: Dataset({
        features: ['tokens', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 15068
    })
    validation: Dataset({
        features: ['tokens', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 837
    })
    test: Dataset({
        features: ['tokens', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 838
    })
})

When preprocessing the text, we have actually translated the text into numbers, which is known as [encoding](https://huggingface.co/course/chapter2/4?fw=pt#encoding).

In [None]:
tokenized_dataset['train'][321]

{'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 'input_ids': [101,
  146,
  2700,
  118,
  3176,
  2835,
  173,
  6928,
  22294,
  117,
  11127,
  20185,
  2225,
  3909,
  102],
 'label': 3,
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'tokens': 'o vice-campeão mundial em Surf, Nicolau Von Rupp'}

Encoding is done in a two-step process: tokenization, followed by conversion to input IDs.

In [None]:
tokens = tokenizer.tokenize(tokenized_dataset['train'][321]['tokens'])
print(tokens)
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

['o', 'vice', '-', 'campeão', 'mundial', 'em', 'Sur', '##f', ',', 'Nicolau', 'Von', 'Ru', '##pp']
[146, 2700, 118, 3176, 2835, 173, 6928, 22294, 117, 11127, 20185, 2225, 3909]


The tokenizer actually adds two special tokens when preprocessing: one at the beginning, and one at the end.

In [None]:
inputs = tokenizer(tokenized_dataset['train'][321]['tokens'])
inputs['input_ids']   # or inputs.input_ids

[101,
 146,
 2700,
 118,
 3176,
 2835,
 173,
 6928,
 22294,
 117,
 11127,
 20185,
 2225,
 3909,
 102]

We can [decode](https://huggingface.co/course/chapter2/4?fw=pt#decoding) the sequence to check what are these tokens:

In [None]:
tokenizer.decode(inputs['input_ids'])

'[CLS] o vice - campeão mundial em Surf, Nicolau Von Rupp [SEP]'

As with enconding, we can decode in two separate steps:

In [None]:
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'])
print(tokens)
print(tokenizer.convert_tokens_to_string(tokens))

['[CLS]', 'o', 'vice', '-', 'campeão', 'mundial', 'em', 'Sur', '##f', ',', 'Nicolau', 'Von', 'Ru', '##pp', '[SEP]']
[CLS] o vice - campeão mundial em Surf, Nicolau Von Rupp [SEP]


### Loading the model

We now load the pretrained model:

In [None]:
from transformers import AutoModel

model = AutoModel.from_pretrained(model_name)
model.cuda()

Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(29794, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

Loading the model in this way only gets us the base Transformer module: given some inputs, we obtain the hidden state of the model -- a high-dimensional vector representing the "contextual understanding" of that input by the Transformer model.

In other words, we are leaving out the *head* of the model, which is needed for whatever NLP task we want to address.

Since we want to use the model for classification, we should load it with an appropriate classification head:

In [None]:
from transformers import AutoModelForSequenceClassification
import torch

def get_model(name):
    return AutoModelForSequenceClassification.from_pretrained(name, num_labels=5)

model = get_model(model_name)

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.weight', 'pre_classi

### Fine-tuning

The next step is to [fine-tune](https://huggingface.co/docs/transformers/training) the model with our train data. To do so, we can make use of a [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer).
There are several aspects of training that you can specify via [TrainingArguments](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments).

In [None]:
from transformers import TrainingArguments, Trainer
from transformers import DataCollatorWithPadding
from datasets import load_metric
import numpy as np

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

def get_trainingArgs():
    return TrainingArguments(
        output_dir="./results",
        learning_rate=2e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=5,
        weight_decay=0.01,
        evaluation_strategy="epoch", # run validation at the end of each epoch
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="accuracy"
    )

training_args = get_trainingArgs()

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

def get_trainer(model_, args_, dataset_, tokenizer_, data_collator_, compute_metrics_):
    return Trainer(
        model=model_,
        args=args_,
        train_dataset=dataset_["train"],
        eval_dataset=dataset_["validation"],
        tokenizer=tokenizer_,
        data_collator=data_collator_,
        compute_metrics=compute_metrics_
    )

def get_trainer_hyper(model_, args_, dataset_, tokenizer_, data_collator_, compute_metrics_):
    return Trainer(
        model_init=model_,
        args=args_,
        train_dataset=dataset_["train"].shard(index=1, num_shards=10) ,
        eval_dataset=dataset_["validation"],
        tokenizer=tokenizer_,
        data_collator=data_collator_,
        compute_metrics=compute_metrics_
    )

def model_init():
    return AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=5)

trainer = get_trainer(model,training_args,tokenized_dataset,tokenizer,data_collator,compute_metrics)

Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

In [None]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 15068
  Num Epochs = 5
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 4710


Epoch,Training Loss,Validation Loss,Accuracy
1,0.438,1.31314,0.578256
2,0.3003,1.567134,0.589008
3,0.2065,1.98707,0.578256
4,0.3623,1.68155,0.589008
5,0.2967,1.757388,0.584229


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/checkpoint-942
Configuration saved in ./results/checkpoint-942/config.json
Model weights saved in ./results/checkpoint-942/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-942/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-942/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Sa

TrainOutput(global_step=4710, training_loss=0.3158597561457608, metrics={'train_runtime': 877.2285, 'train_samples_per_second': 85.884, 'train_steps_per_second': 5.369, 'total_flos': 1923183949142064.0, 'train_loss': 0.3158597561457608, 'epoch': 5.0})

We can check the model's performance in the evaluation set.

In [None]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16


{'epoch': 5.0,
 'eval_accuracy': 0.5782556750298686,
 'eval_loss': 1.3131401538848877,
 'eval_runtime': 2.5774,
 'eval_samples_per_second': 324.747,
 'eval_steps_per_second': 20.563}

And more importantly, we can check how the model fares in our test set.

In [None]:
trainer.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 838
  Batch size = 16


PredictionOutput(predictions=array([[ 4.0634084 , -1.7176632 , -2.3444445 ,  1.4286603 , -1.6890676 ],
       [ 1.958936  , -3.2778645 ,  4.053345  ,  0.26320112, -3.117488  ],
       [ 0.88952374, -0.14764343, -1.3051353 ,  4.047368  , -3.7304068 ],
       ...,
       [ 3.963823  , -2.5194633 , -0.7919692 ,  1.8499377 , -2.9910946 ],
       [ 1.0131023 , -2.4238143 ,  4.737028  , -1.3351924 , -2.300545  ],
       [ 2.6582563 , -2.4311845 , -1.1460187 ,  4.421519  , -3.6323564 ]],
      dtype=float32), label_ids=array([0, 2, 0, 0, 3, 4, 3, 0, 0, 2, 0, 4, 0, 0, 3, 0, 3, 3, 0, 0, 0, 0,
       0, 3, 3, 0, 0, 3, 0, 3, 3, 0, 0, 4, 3, 2, 0, 4, 0, 3, 0, 0, 0, 1,
       0, 0, 2, 0, 0, 1, 0, 4, 0, 3, 3, 0, 0, 3, 0, 0, 0, 4, 2, 3, 0, 1,
       0, 0, 0, 0, 0, 2, 0, 0, 1, 2, 3, 0, 3, 0, 1, 1, 0, 0, 0, 1, 0, 0,
       3, 2, 0, 1, 3, 3, 4, 0, 3, 2, 0, 2, 4, 0, 0, 3, 1, 0, 3, 1, 0, 3,
       0, 1, 4, 3, 1, 3, 3, 4, 0, 0, 0, 2, 0, 3, 2, 0, 0, 0, 0, 2, 0, 0,
       4, 2, 0, 3, 1, 0, 3, 0, 3, 0, 0, 0, 1

#### Saving the model

The model can be saved for future loading.

In [None]:
trainer.save_model()

Saving model checkpoint to ./results
Configuration saved in ./results/config.json
Model weights saved in ./results/pytorch_model.bin
tokenizer config file saved in ./results/tokenizer_config.json
Special tokens file saved in ./results/special_tokens_map.json


#### Loading and using a saved model

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer2 = AutoTokenizer.from_pretrained("./results")
model2 = AutoModelForSequenceClassification.from_pretrained("./results", num_labels=5)

Didn't find file ./results/added_tokens.json. We won't load it.
loading file ./results/vocab.txt
loading file ./results/tokenizer.json
loading file None
loading file ./results/special_tokens_map.json
loading file ./results/tokenizer_config.json
loading configuration file ./results/config.json
Model config BertConfig {
  "_name_or_path": "./results",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidd

To exploit the model, we can use a pipeline.

In [None]:
from transformers import TextClassificationPipeline

pipe = TextClassificationPipeline(model=model2, tokenizer=tokenizer2) #, return_all_scores=True)

In [None]:
pipe("Considero que a Praxe é muito boa")

[{'label': 'LABEL_1', 'score': 0.6839435696601868}]

We can also use the model in a step-by-step fashion, as follows.

In [None]:
import torch

inputs = "Considero que a Praxe é muito boa"

# tokenize inputs
tokenized_inputs = tokenizer2(inputs, return_tensors="pt")
print(tokenized_inputs)

# obtain model outputs
outputs = model2(**tokenized_inputs)
print(outputs)

# get the most likely label
labels = ['Value', 'Value(+)', 'Value(-)', 'Fact', 'Policy']
prediction = torch.argmax(outputs.logits)
print(labels[prediction])

{'input_ids': tensor([[  101,  1158,  2776, 22280,   179,   123,  2485,  2650,   253,   785,
          3264,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
SequenceClassifierOutput(loss=None, logits=tensor([[ 0.8035,  1.9816, -1.2588, -0.4059, -1.7807]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
Value(+)


Let's check again the performance of the model in the test set, possibly with additional metrics.

In [None]:
y_pred= []
for p in tokenized_dataset['test']['tokens']:
    ti = tokenizer2(p, return_tensors="pt")
    out = model2(**ti)
    pred = torch.argmax(out.logits)
    y_pred.append(pred)   # our labels are already 0 and 1

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

y_test = tokenized_dataset['test']['label']

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='macro'))
print('Recall: ', recall_score(y_test, y_pred, average='macro'))
print('F1: ', f1_score(y_test, y_pred, average='macro'))

[[309  22  54  29  11]
 [ 27  25   2   4   1]
 [ 30   0 103   5   2]
 [ 78   9  24  69   0]
 [ 11   1   2   1  19]]
Accuracy:  0.6264916467780429
Precision:  0.5778241183504341
Recall:  0.5657317571096236
F1:  0.5626968419297291


We can do the same using a Trainer, as before.

In [None]:
trainer2 = Trainer(
    model=model2,
    tokenizer=tokenizer2,
    compute_metrics=compute_metrics
)

No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
trainer2.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 838
  Batch size = 8


PredictionOutput(predictions=array([[ 0.37601352, -1.5194659 ,  2.8067954 ,  0.48898408, -1.9309189 ],
       [ 1.3340023 ,  2.0924182 , -1.9254825 , -0.24892056, -1.7757592 ],
       [ 0.6551271 , -0.8557426 , -0.7193791 ,  3.2252967 , -2.5290272 ],
       ...,
       [ 2.8552709 , -1.5310497 ,  0.7302812 ,  1.508096  , -2.8344948 ],
       [ 1.3395143 , -0.9876522 , -1.1002288 , -1.0114254 ,  2.6395814 ],
       [ 0.7530727 , -1.8906747 ,  3.2568834 ,  0.32288778, -2.2073815 ]],
      dtype=float32), label_ids=array([2, 1, 2, 3, 0, 1, 3, 0, 0, 0, 0, 0, 3, 0, 0, 3, 0, 0, 0, 0, 1, 2,
       0, 3, 0, 0, 0, 2, 2, 0, 3, 3, 0, 2, 2, 0, 0, 2, 0, 0, 2, 2, 2, 3,
       0, 3, 0, 0, 2, 0, 0, 1, 3, 0, 2, 0, 3, 3, 3, 0, 3, 0, 0, 1, 2, 2,
       3, 2, 0, 0, 3, 0, 2, 3, 0, 0, 0, 3, 3, 0, 0, 0, 0, 2, 0, 4, 0, 0,
       0, 3, 0, 3, 0, 0, 0, 0, 3, 0, 3, 0, 3, 4, 0, 0, 0, 0, 0, 2, 3, 0,
       0, 0, 0, 0, 0, 2, 3, 0, 0, 3, 2, 0, 3, 4, 2, 3, 0, 0, 0, 0, 2, 0,
       0, 0, 3, 3, 0, 2, 3, 0, 0, 2, 0, 1, 0

## Now to try with large

In [None]:
model_name = "neuralmind/bert-large-portuguese-cased"
tokenizer = get_tokenizer(model_name)
model = get_model(model_name)
model.cuda()

training_args = get_trainingArgs()

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = get_trainer_hyper(model_init,training_args,tokenized_dataset,tokenizer,data_collator,compute_metrics)


loading configuration file https://huggingface.co/neuralmind/bert-large-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c534071830642050813fa94003dbf1234413b3f1d5dc66d259fbc82ff7d5fd59.c8340a82acfbbcd2dd960b86d2886ee120b21896ef0294150f0391918ae6ced5
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-large-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "output_past": true,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_t

In [None]:
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

[32m[I 2022-06-03 14:37:13,765][0m A new study created in memory with name: no-name-b72bf92d-f5f8-42d7-9e75-116c93459412[0m
Trial:
loading configuration file https://huggingface.co/neuralmind/bert-large-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c534071830642050813fa94003dbf1234413b3f1d5dc66d259fbc82ff7d5fd59.c8340a82acfbbcd2dd960b86d2886ee120b21896ef0294150f0391918ae6ced5
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-large-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3"

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.359169,0.481481
2,No log,1.340444,0.483871


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-0/checkpoint-95
Configuration saved in ./results/run-0/checkpoint-95/config.json
Model weights saved in ./results/run-0/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-0/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-0/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.384671,0.474313
2,No log,1.336015,0.480287
3,No log,1.320006,0.480287
4,No log,1.312931,0.480287
5,No log,1.310998,0.480287


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-1/checkpoint-95
Configuration saved in ./results/run-1/checkpoint-95/config.json
Model weights saved in ./results/run-1/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-1/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-1/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.345829,0.482676
2,No log,1.28098,0.504182
3,No log,1.255065,0.494624


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-2/checkpoint-95
Configuration saved in ./results/run-2/checkpoint-95/config.json
Model weights saved in ./results/run-2/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-2/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-2/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.402706,0.480287
2,No log,1.367511,0.48626


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-3/checkpoint-95
Configuration saved in ./results/run-3/checkpoint-95/config.json
Model weights saved in ./results/run-3/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-3/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-3/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.220242,0.516129
2,No log,1.129954,0.578256


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-4/checkpoint-95
Configuration saved in ./results/run-4/checkpoint-95/config.json
Model weights saved in ./results/run-4/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-4/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-4/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.042326,0.589008
2,No log,1.044779,0.60454
3,No log,1.07542,0.621266


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-5/checkpoint-95
Configuration saved in ./results/run-5/checkpoint-95/config.json
Model weights saved in ./results/run-5/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-5/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-5/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.335973,0.483871
2,No log,1.306502,0.48626


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-6/checkpoint-95
Configuration saved in ./results/run-6/checkpoint-95/config.json
Model weights saved in ./results/run-6/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-6/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-6/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.331141,0.485066
2,No log,1.323429,0.485066


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-7/checkpoint-95
Configuration saved in ./results/run-7/checkpoint-95/config.json
Model weights saved in ./results/run-7/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-7/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-7/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.345891,0.483871
2,No log,1.339527,0.483871


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-8/checkpoint-95
Configuration saved in ./results/run-8/checkpoint-95/config.json
Model weights saved in ./results/run-8/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-8/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-8/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.034963,0.565114
2,No log,1.00519,0.60693
3,No log,1.320055,0.58184
4,No log,1.579533,0.58184


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-9/checkpoint-95
Configuration saved in ./results/run-9/checkpoint-95/config.json
Model weights saved in ./results/run-9/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-9/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-9/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples =

In [None]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

loading configuration file https://huggingface.co/neuralmind/bert-large-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c534071830642050813fa94003dbf1234413b3f1d5dc66d259fbc82ff7d5fd59.c8340a82acfbbcd2dd960b86d2886ee120b21896ef0294150f0391918ae6ced5
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-large-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.042326,0.589008
2,No log,1.044779,0.60454
3,No log,1.07542,0.621266


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/checkpoint-95
Configuration saved in ./results/checkpoint-95/config.json
Model weights saved in ./results/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving 

TrainOutput(global_step=285, training_loss=0.8099569220291941, metrics={'train_runtime': 250.1772, 'train_samples_per_second': 18.071, 'train_steps_per_second': 1.139, 'total_flos': 398719590221382.0, 'train_loss': 0.8099569220291941, 'epoch': 3.0})

Mounted at /content/drive


In [None]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16


{'epoch': 3.0,
 'eval_accuracy': 0.6212664277180406,
 'eval_loss': 1.0754202604293823,
 'eval_runtime': 8.6313,
 'eval_samples_per_second': 96.973,
 'eval_steps_per_second': 6.14}

In [None]:
trainer.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 838
  Batch size = 16


PredictionOutput(predictions=array([[ 2.6564143 , -0.48987678, -0.79702866,  0.00727477, -1.2074796 ],
       [ 2.5086095 ,  1.4203523 , -2.0547144 , -0.636079  , -1.0968518 ],
       [ 2.024578  ,  0.9369103 , -1.5880567 ,  1.9643435 , -2.9455173 ],
       ...,
       [ 1.1626482 ,  0.95665365, -1.7678899 ,  1.7051103 , -2.248682  ],
       [ 2.474597  ,  0.5440818 , -1.8086386 , -0.06718405, -0.32802072],
       [ 0.8980047 ,  2.997771  , -2.2581744 , -0.3603385 , -1.4703652 ]],
      dtype=float32), label_ids=array([0, 0, 3, 0, 1, 0, 0, 0, 3, 1, 4, 0, 4, 0, 0, 4, 0, 0, 0, 3, 0, 3,
       2, 0, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 3, 2, 3, 4, 1, 3, 3, 2, 0,
       4, 4, 0, 2, 3, 0, 0, 0, 3, 2, 0, 0, 0, 3, 0, 1, 0, 2, 0, 3, 0, 0,
       2, 3, 2, 2, 0, 0, 3, 3, 3, 4, 3, 3, 1, 4, 0, 3, 0, 1, 0, 3, 0, 3,
       0, 0, 2, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 3, 0, 3, 2, 0, 3, 2, 0,
       1, 2, 2, 4, 0, 0, 0, 0, 1, 3, 0, 1, 0, 0, 0, 0, 3, 0, 0, 0, 4, 0,
       2, 0, 0, 2, 3, 0, 3, 0, 0, 0, 3, 1, 2

In [None]:
trainer.save_model()

Saving model checkpoint to ./results
Configuration saved in ./results/config.json
Model weights saved in ./results/pytorch_model.bin
tokenizer config file saved in ./results/tokenizer_config.json
Special tokens file saved in ./results/special_tokens_map.json


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer2 = AutoTokenizer.from_pretrained("./results")
model2 = AutoModelForSequenceClassification.from_pretrained("./results", num_labels=5)

from transformers import TextClassificationPipeline

pipe = TextClassificationPipeline(model=model2, tokenizer=tokenizer2) #, return_all_scores=True)

import torch

inputs = "I consider that this class is great"

# tokenize inputs
tokenized_inputs = tokenizer2(inputs, return_tensors="pt")
print(tokenized_inputs)

# obtain model outputs
outputs = model2(**tokenized_inputs)
print(outputs)

# get the most likely label
labels = ['Value', 'Value(+)', 'Value(-)', 'Fact', 'Policy']
prediction = torch.argmax(outputs.logits)
print(labels[prediction])

Didn't find file ./results/added_tokens.json. We won't load it.
loading file ./results/vocab.txt
loading file ./results/tokenizer.json
loading file None
loading file ./results/special_tokens_map.json
loading file ./results/tokenizer_config.json
loading configuration file ./results/config.json
Model config BertConfig {
  "_name_or_path": "./results",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 16,
  "num_hid

{'input_ids': tensor([[  101,   290,  4747, 12230,   352, 12230,   145,  1548,   847,  2498,
           352,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
SequenceClassifierOutput(loss=None, logits=tensor([[ 5.3657, -2.2033, -2.5468,  0.7432, -1.9797]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
Value


## Translating to english and using distilbert-base-uncased-finetuned-sst-2-english

### Translate Text

First, translate all tokens:

In [None]:
import pandas as pd

# Importing the dataset
dataset = pd.read_excel("/content/drive/MyDrive/Colab Notebooks/OpArticles_ADUs_translated.xlsx")

dataset.drop(['article_id', 'annotator', 'node','ranges'], axis=1, inplace=True)
dataset['label'].replace(['Value', 'Value(+)', 'Value(-)', 'fact', 'policy'],[0,1,2,3,4], inplace=True)

dataset.head()

Unnamed: 0,tokens,label
0,The fact is not just the result of ignorance,0
1,there was more journalism in his humor (more i...,0
2,It's all comical in FIFA,0
3,what we all allow this organization to do is u...,0
4,do not make us laugh at the expense of the pow...,0


In [None]:
from datasets import Dataset

dataset_hf = Dataset.from_pandas(dataset)

In [None]:
from datasets import DatasetDict

# 90% train, 10% test+validation
train_test = dataset_hf.train_test_split(test_size=0.1)

# Split the 10% test+validation set in half test, half validation
valid_test = train_test['test'].train_test_split(test_size=0.5)

# gather everyone if you want to have a single DatasetDict
train_valid_test_dataset = DatasetDict({
    'train': train_test['train'],
    'validation': valid_test['train'],
    'test': valid_test['test']
})

model_name = "distilbert-base-uncased"

### Tokenizer

We first load the tokenizer for our model:

In [None]:
from transformers import AutoTokenizer

tokenizer = get_tokenizer(model_name)

loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.19.2",
  "vocab_size": 30522
}

loading file https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10

In [None]:
tokenized_dataset = get_tokenized_data(train_valid_test_dataset,preprocess_function)

tokens = tokenizer.tokenize(tokenized_dataset['train'][321]['tokens'])
print(tokens)
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

  0%|          | 0/16 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

['read', '##ability', 'is', 'complete']
[3191, 8010, 2003, 3143]


In [None]:
inputs = tokenizer(tokenized_dataset['train'][321]['tokens'])
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'])

As before, we can do the same via a Trainer.

In [None]:
from transformers import TrainingArguments, Trainer
from transformers import DataCollatorWithPadding
from datasets import load_metric
from transformers import AutoModelForSequenceClassification

model = get_model(model_name)
model.cuda()

loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": t

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
       

In [None]:
training_args = get_trainingArgs()

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = get_trainer_hyper(model_init,training_args,tokenized_dataset,tokenizer,data_collator,compute_metrics)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
  

In [None]:
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

[32m[I 2022-06-03 17:18:48,566][0m A new study created in memory with name: no-name-a9d4634b-3608-4dce-aa9a-ea86bb409ae8[0m
Trial:
loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_la

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.33583,0.468339
2,No log,1.28872,0.470729
3,No log,1.240675,0.499403
4,No log,1.211824,0.502987
5,No log,1.202328,0.518519


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-0/checkpoint-95
Configuration saved in ./results/run-0/checkpoint-95/config.json
Model weights saved in ./results/run-0/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-0/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-0/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluatio

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.534918,0.468339


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-1/checkpoint-95
Configuration saved in ./results/run-1/checkpoint-95/config.json
Model weights saved in ./results/run-1/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-1/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-1/checkpoint-95/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./results/run-1/checkpoint-95 (score: 0.46833930704898447).
[32m[I 2022-06-03 17:20:01,126][0m Trial 1 finished with value: 0.46833930704898447 and parameters: {'learning_rate': 2.53269

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.252175,0.518519
2,No log,1.144922,0.547192
3,No log,1.123885,0.540024


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-2/checkpoint-95
Configuration saved in ./results/run-2/checkpoint-95/config.json
Model weights saved in ./results/run-2/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-2/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-2/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluatio

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.254752,0.510155
2,No log,1.152419,0.544803
3,No log,1.126435,0.553166


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-3/checkpoint-95
Configuration saved in ./results/run-3/checkpoint-95/config.json
Model weights saved in ./results/run-3/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-3/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-3/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluatio

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.298124,0.491039
2,No log,1.184856,0.51374
3,No log,1.150294,0.514934
4,No log,1.138864,0.535245


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-4/checkpoint-95
Configuration saved in ./results/run-4/checkpoint-95/config.json
Model weights saved in ./results/run-4/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-4/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-4/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluatio

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.184644,0.516129
2,No log,1.114256,0.541219
3,No log,1.136018,0.547192


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-5/checkpoint-95
Configuration saved in ./results/run-5/checkpoint-95/config.json
Model weights saved in ./results/run-5/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-5/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-5/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluatio

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.213039,0.514934
2,No log,1.134226,0.550777
3,No log,1.145952,0.538829
4,No log,1.187429,0.554361


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-6/checkpoint-95
Configuration saved in ./results/run-6/checkpoint-95/config.json
Model weights saved in ./results/run-6/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-6/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-6/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluatio

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.196807,0.518519
2,No log,1.128127,0.555556


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-7/checkpoint-95
Configuration saved in ./results/run-7/checkpoint-95/config.json
Model weights saved in ./results/run-7/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-7/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-7/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluatio

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.535019,0.471924


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
[32m[I 2022-06-03 17:24:05,453][0m Trial 8 pruned. [0m
Trial:
loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.54618,0.414576


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
[32m[I 2022-06-03 17:24:15,730][0m Trial 9 pruned. [0m


In [None]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": t

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.196807,0.518519
2,No log,1.128127,0.555556


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/checkpoint-95
Configuration saved in ./results/checkpoint-95/config.json
Model weights saved in ./results/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
 

TrainOutput(global_step=190, training_loss=1.205171042994449, metrics={'train_runtime': 23.8995, 'train_samples_per_second': 126.112, 'train_steps_per_second': 7.95, 'total_flos': 70068912137070.0, 'train_loss': 1.205171042994449, 'epoch': 2.0})

In [None]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16


{'epoch': 2.0,
 'eval_accuracy': 0.5555555555555556,
 'eval_loss': 1.1281273365020752,
 'eval_runtime': 1.2887,
 'eval_samples_per_second': 649.491,
 'eval_steps_per_second': 41.127}

Note that we can still fine-tune the model with our training data, but the performance of the model is already quite good without any further training!

In [None]:
trainer.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 838
  Batch size = 16


PredictionOutput(predictions=array([[ 1.8305806 , -0.8160609 ,  0.31971425, -0.84789896, -0.7547183 ],
       [ 1.116262  , -1.1291218 ,  1.4328969 , -0.48992723, -1.4267808 ],
       [ 0.3933662 , -0.36917025, -0.5203425 ,  1.5133762 , -1.233177  ],
       ...,
       [ 1.0023553 ,  0.26465368, -0.90682924, -0.7929119 ,  0.22302988],
       [ 1.7526362 , -0.47164264, -0.13339244, -0.7202064 , -0.7628599 ],
       [ 0.98748255,  0.00676977, -0.1810726 ,  0.28504792, -1.3275362 ]],
      dtype=float32), label_ids=array([0, 3, 3, 4, 0, 0, 0, 1, 3, 1, 0, 2, 3, 3, 0, 3, 1, 3, 3, 0, 0, 3,
       1, 2, 0, 0, 3, 4, 0, 0, 3, 0, 0, 0, 2, 0, 3, 0, 2, 3, 0, 0, 3, 2,
       4, 0, 3, 4, 0, 0, 3, 3, 1, 0, 2, 0, 3, 0, 4, 3, 2, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 1, 0, 3, 3, 3, 3, 0, 0, 3,
       0, 0, 3, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 3, 0, 3, 0, 0, 2,
       0, 2, 3, 1, 0, 3, 0, 0, 0, 4, 3, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 2, 2, 1, 2, 0, 3, 0, 0, 2, 0

In [None]:
trainer.save_model()

Saving model checkpoint to ./results
Configuration saved in ./results/config.json
Model weights saved in ./results/pytorch_model.bin
tokenizer config file saved in ./results/tokenizer_config.json
Special tokens file saved in ./results/special_tokens_map.json


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer2 = AutoTokenizer.from_pretrained("./results")
model2 = AutoModelForSequenceClassification.from_pretrained("./results", num_labels=5)

Didn't find file ./results/added_tokens.json. We won't load it.
loading file ./results/vocab.txt
loading file ./results/tokenizer.json
loading file None
loading file ./results/special_tokens_map.json
loading file ./results/tokenizer_config.json
loading configuration file ./results/config.json
Model config DistilBertConfig {
  "_name_or_path": "./results",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "single_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropo

In [None]:
from transformers import TextClassificationPipeline

pipe = TextClassificationPipeline(model=model2, tokenizer=tokenizer2) #, return_all_scores=True)

In [None]:
import torch

inputs = "I consider that this class is great"

# tokenize inputs
tokenized_inputs = tokenizer2(inputs, return_tensors="pt")
print(tokenized_inputs)

# obtain model outputs
outputs = model2(**tokenized_inputs)
print(outputs)

# get the most likely label
labels = ['Value', 'Value(+)', 'Value(-)', 'Fact', 'Policy']
prediction = torch.argmax(outputs.logits)
print(labels[prediction])

{'input_ids': tensor([[ 101, 1045, 5136, 2008, 2023, 2465, 2003, 2307,  102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}
SequenceClassifierOutput(loss=None, logits=tensor([[ 1.6240,  1.5464, -1.6092, -0.5582, -0.8779]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
Value


#### Testing for other models

In [None]:
model_name = "YituTech/conv-bert-base"
tokenizer = get_tokenizer(model_name)
model = get_model(model_name)

training_args = get_trainingArgs()

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = get_trainer_hyper(model_init,training_args,tokenized_dataset,tokenizer,data_collator,compute_metrics)
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

Could not locate the tokenizer configuration file, will try to use the model config instead.
https://huggingface.co/YituTech/conv-bert-base/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpw69rg7bj


Downloading:   0%|          | 0.00/674 [00:00<?, ?B/s]

storing https://huggingface.co/YituTech/conv-bert-base/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/7651fc6ae3906f28c62923bc7c76b0436327540c1ebb62a60b454ec79e102dd1.2a398d65585c12446cf5e632a1839e1754dc16cbbf6b87ccf28ba24c8536394e
creating metadata file for /root/.cache/huggingface/transformers/7651fc6ae3906f28c62923bc7c76b0436327540c1ebb62a60b454ec79e102dd1.2a398d65585c12446cf5e632a1839e1754dc16cbbf6b87ccf28ba24c8536394e
loading configuration file https://huggingface.co/YituTech/conv-bert-base/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/7651fc6ae3906f28c62923bc7c76b0436327540c1ebb62a60b454ec79e102dd1.2a398d65585c12446cf5e632a1839e1754dc16cbbf6b87ccf28ba24c8536394e
Model config ConvBertConfig {
  "_name_or_path": "YituTech/conv-bert-base",
  "architectures": [
    "ConvBertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "conv_kernel_size": 9,
  "embedding_size": 768,
  

Downloading:   0%|          | 0.00/260k [00:00<?, ?B/s]

storing https://huggingface.co/YituTech/conv-bert-base/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/75608c7373c277fa55de32e1bd71af40f547910ef3a49ed431d3a9fb9b4f5c8c.16ff552dabca3af1d1d07bc63a184047eb39f686be4a6738ba0167c6b1bb0b84
creating metadata file for /root/.cache/huggingface/transformers/75608c7373c277fa55de32e1bd71af40f547910ef3a49ed431d3a9fb9b4f5c8c.16ff552dabca3af1d1d07bc63a184047eb39f686be4a6738ba0167c6b1bb0b84
loading file https://huggingface.co/YituTech/conv-bert-base/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/75608c7373c277fa55de32e1bd71af40f547910ef3a49ed431d3a9fb9b4f5c8c.16ff552dabca3af1d1d07bc63a184047eb39f686be4a6738ba0167c6b1bb0b84
loading file https://huggingface.co/YituTech/conv-bert-base/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/YituTech/conv-bert-base/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/YituTech/conv-bert-b

Downloading:   0%|          | 0.00/403M [00:00<?, ?B/s]

storing https://huggingface.co/YituTech/conv-bert-base/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/f71042767b7bb431c7632b9f245661cd34a5edaac1eaf25f3a9e78a73bb711b2.3ee89f2fd82df871ab2d6f643874ee269c534627432695a69f22271e9d077426
creating metadata file for /root/.cache/huggingface/transformers/f71042767b7bb431c7632b9f245661cd34a5edaac1eaf25f3a9e78a73bb711b2.3ee89f2fd82df871ab2d6f643874ee269c534627432695a69f22271e9d077426
loading weights file https://huggingface.co/YituTech/conv-bert-base/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/f71042767b7bb431c7632b9f245661cd34a5edaac1eaf25f3a9e78a73bb711b2.3ee89f2fd82df871ab2d6f643874ee269c534627432695a69f22271e9d077426
All model checkpoint weights were used when initializing ConvBertForSequenceClassification.

Some weights of ConvBertForSequenceClassification were not initialized from the model checkpoint at YituTech/conv-bert-base and are newly initialized: ['classifier.o

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.302309,0.468339


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-0/checkpoint-95
Configuration saved in ./results/run-0/checkpoint-95/config.json
Model weights saved in ./results/run-0/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-0/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-0/checkpoint-95/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./results/run-0/checkpoint-95 (score: 0.46833930704898447).
[32m[I 2022-06-03 17:29:38,543][0m Trial 0 finished with value: 0.46833930704898447 and parameters: {'learning_rate': 4.966194861

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.414554,0.468339
2,No log,1.361353,0.468339
3,No log,1.354329,0.468339


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-1/checkpoint-95
Configuration saved in ./results/run-1/checkpoint-95/config.json
Model weights saved in ./results/run-1/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-1/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-1/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.321613,0.468339
2,No log,1.214207,0.520908
3,No log,1.170834,0.538829


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-2/checkpoint-95
Configuration saved in ./results/run-2/checkpoint-95/config.json
Model weights saved in ./results/run-2/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-2/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-2/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.334838,0.468339
2,No log,1.262417,0.498208
3,No log,1.232595,0.519713


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-3/checkpoint-95
Configuration saved in ./results/run-3/checkpoint-95/config.json
Model weights saved in ./results/run-3/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-3/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-3/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.336268,0.468339


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-4/checkpoint-95
Configuration saved in ./results/run-4/checkpoint-95/config.json
Model weights saved in ./results/run-4/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-4/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-4/checkpoint-95/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./results/run-4/checkpoint-95 (score: 0.46833930704898447).
[32m[I 2022-06-03 17:34:01,841][0m Trial 4 finished with value: 0.46833930704898447 and parameters: {'learning_rate': 3.322901745

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.399305,0.468339


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-5/checkpoint-95
Configuration saved in ./results/run-5/checkpoint-95/config.json
Model weights saved in ./results/run-5/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-5/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-5/checkpoint-95/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./results/run-5/checkpoint-95 (score: 0.46833930704898447).
[32m[I 2022-06-03 17:34:29,026][0m Trial 5 finished with value: 0.46833930704898447 and parameters: {'learning_rate': 6.417841183

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.339046,0.468339
2,No log,1.238942,0.517324
3,No log,1.19377,0.535245
4,No log,1.189814,0.540024


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-6/checkpoint-95
Configuration saved in ./results/run-6/checkpoint-95/config.json
Model weights saved in ./results/run-6/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-6/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-6/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.492073,0.468339
2,No log,1.447672,0.468339


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-7/checkpoint-95
Configuration saved in ./results/run-7/checkpoint-95/config.json
Model weights saved in ./results/run-7/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-7/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-7/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.339568,0.468339
2,No log,1.264847,0.505376


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-8/checkpoint-95
Configuration saved in ./results/run-8/checkpoint-95/config.json
Model weights saved in ./results/run-8/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-8/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-8/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.428778,0.468339
2,No log,1.375619,0.468339


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/run-9/checkpoint-95
Configuration saved in ./results/run-9/checkpoint-95/config.json
Model weights saved in ./results/run-9/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/run-9/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/run-9/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****


In [None]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

loading configuration file https://huggingface.co/YituTech/conv-bert-base/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/7651fc6ae3906f28c62923bc7c76b0436327540c1ebb62a60b454ec79e102dd1.2a398d65585c12446cf5e632a1839e1754dc16cbbf6b87ccf28ba24c8536394e
Model config ConvBertConfig {
  "_name_or_path": "YituTech/conv-bert-base",
  "architectures": [
    "ConvBertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "conv_kernel_size": 9,
  "embedding_size": 768,
  "eos_token_id": 2,
  "head_ratio": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.339046,0.468339
2,No log,1.238942,0.517324
3,No log,1.19377,0.535245
4,No log,1.189814,0.540024


The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16
Saving model checkpoint to ./results/checkpoint-95
Configuration saved in ./results/checkpoint-95/config.json
Model weights saved in ./results/checkpoint-95/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-95/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-95/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch s

TrainOutput(global_step=380, training_loss=1.1323948910361843, metrics={'train_runtime': 99.8506, 'train_samples_per_second': 60.37, 'train_steps_per_second': 3.806, 'total_flos': 333554720329776.0, 'train_loss': 1.1323948910361843, 'epoch': 4.0})

In [None]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 837
  Batch size = 16


{'epoch': 4.0,
 'eval_accuracy': 0.5400238948626045,
 'eval_loss': 1.1898143291473389,
 'eval_runtime': 2.6519,
 'eval_samples_per_second': 315.62,
 'eval_steps_per_second': 19.985}

In [None]:
trainer.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `ConvBertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `ConvBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 838
  Batch size = 16


PredictionOutput(predictions=array([[ 2.6817122 , -1.2507981 ,  0.5966191 ,  0.5713838 , -2.0820625 ],
       [ 1.0768876 , -1.9822626 ,  2.4556732 , -0.0561501 , -1.8347462 ],
       [ 1.0974739 , -0.6719573 , -0.4694883 ,  3.0028603 , -2.6525857 ],
       ...,
       [ 1.5312034 , -0.7468105 , -1.3785452 , -1.0388649 ,  1.5676636 ],
       [ 2.6586072 , -0.15121448, -0.71098953,  0.49706525, -1.424864  ],
       [ 1.9450687 ,  0.81677234, -1.168717  ,  1.3485212 , -2.1756992 ]],
      dtype=float32), label_ids=array([0, 3, 3, 4, 0, 0, 0, 1, 3, 1, 0, 2, 3, 3, 0, 3, 1, 3, 3, 0, 0, 3,
       1, 2, 0, 0, 3, 4, 0, 0, 3, 0, 0, 0, 2, 0, 3, 0, 2, 3, 0, 0, 3, 2,
       4, 0, 3, 4, 0, 0, 3, 3, 1, 0, 2, 0, 3, 0, 4, 3, 2, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 1, 0, 3, 3, 3, 3, 0, 0, 3,
       0, 0, 3, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 3, 0, 3, 0, 0, 2,
       0, 2, 3, 1, 0, 3, 0, 0, 0, 4, 3, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 2, 2, 1, 2, 0, 3, 0, 0, 2, 0

In [None]:
trainer.save_model()

Saving model checkpoint to ./results
Configuration saved in ./results/config.json
Model weights saved in ./results/pytorch_model.bin
tokenizer config file saved in ./results/tokenizer_config.json
Special tokens file saved in ./results/special_tokens_map.json


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer2 = AutoTokenizer.from_pretrained("./results")
model2 = AutoModelForSequenceClassification.from_pretrained("./results", num_labels=5)

from transformers import TextClassificationPipeline

pipe = TextClassificationPipeline(model=model2, tokenizer=tokenizer2) #, return_all_scores=True)

import torch

inputs = "I consider that this class is great"

# tokenize inputs
tokenized_inputs = tokenizer2(inputs, return_tensors="pt")
print(tokenized_inputs)

# obtain model outputs
outputs = model2(**tokenized_inputs)
print(outputs)

# get the most likely label
labels = ['Value', 'Value(+)', 'Value(-)', 'Fact', 'Policy']
prediction = torch.argmax(outputs.logits)
print(labels[prediction])

Didn't find file ./results/added_tokens.json. We won't load it.
loading file ./results/vocab.txt
loading file None
loading file ./results/special_tokens_map.json
loading file ./results/tokenizer_config.json
loading configuration file ./results/config.json
Model config ConvBertConfig {
  "_name_or_path": "./results",
  "architectures": [
    "ConvBertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "conv_kernel_size": 9,
  "embedding_size": 768,
  "eos_token_id": 2,
  "head_ratio": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_ty

{'input_ids': tensor([[ 101, 1045, 4632, 1504, 1519, 1961, 1499, 1803,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}
SequenceClassifierOutput(loss=None, logits=tensor([[ 2.0991, -1.7663,  1.1772,  1.3581, -2.6614]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
Value


## Testing for augmented data

###Load train and test

In [None]:
import pandas as pd

model_name = "neuralmind/bert-base-portuguese-cased" # or neuralmind/bert-large-portuguese-cased
# Importing the dataset
train = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/OpArticles_ADUs_train_aug.csv")
train = train.drop(columns=['article_id', 'annotator', 'node','ranges'])
train['label'].replace(['Value', 'Value(+)', 'Value(-)', 'Fact', 'Policy'],[0,1,2,3,4], inplace=True)

test_valid = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/OpArticles_ADUs_test.csv")
test_valid = test_valid.drop(columns=['article_id', 'annotator', 'node','ranges'])
test_valid['label'].replace(['Value', 'Value(+)', 'Value(-)', 'Fact', 'Policy'],[0,1,2,3,4], inplace=True)

train = Dataset.from_pandas(train)
dataset_hf = Dataset.from_pandas(test_valid)

# Split the 10% test+validation set in half test, half validation
valid_test = dataset_hf.train_test_split(test_size=0.5)

# gather everyone if you want to have a single DatasetDict
train_valid_test_dataset = DatasetDict({
    'train': train,
    'validation': valid_test['train'],
    'test': valid_test['test']
})

In [None]:
tokenizer = get_tokenizer(model_name)
model = get_model(model_name)
tokenized_dataset = get_tokenized_data(train_valid_test_dataset,preprocess_function)

training_args = get_trainingArgs()

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpz97625tz


Downloading:   0%|          | 0.00/43.0 [00:00<?, ?B/s]

storing https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/f1a9ba41d40e8c6f5ba4988aa2f7702c3b43768183e4b82483e04f2848841ecf.a6c00251b9344c189e2419373d6033016d0cd3d87ea59f6c86069046ac81956d
creating metadata file for /root/.cache/huggingface/transformers/f1a9ba41d40e8c6f5ba4988aa2f7702c3b43768183e4b82483e04f2848841ecf.a6c00251b9344c189e2419373d6033016d0cd3d87ea59f6c86069046ac81956d
https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp42mm1obi


Downloading:   0%|          | 0.00/647 [00:00<?, ?B/s]

storing https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
creating metadata file for /root/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-base-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hi

Downloading:   0%|          | 0.00/205k [00:00<?, ?B/s]

storing https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/aa6d50227b77416b26162efcf0cc9e9a702d13920840322060a2b41a44a8aff4.af25fb1e29ad0175300146695fd80069be69b211c52fa5486fa8aae2754cc814
creating metadata file for /root/.cache/huggingface/transformers/aa6d50227b77416b26162efcf0cc9e9a702d13920840322060a2b41a44a8aff4.af25fb1e29ad0175300146695fd80069be69b211c52fa5486fa8aae2754cc814
https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/added_tokens.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp38oncjhe


Downloading:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

storing https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/added_tokens.json in cache at /root/.cache/huggingface/transformers/9188d297517828a862f4e0b0700968574ca7ad38fbc0832c409bf7a9e5576b74.5cc6e825eb228a7a5cfd27cb4d7151e97a79fb962b31aaf1813aa102e746584b
creating metadata file for /root/.cache/huggingface/transformers/9188d297517828a862f4e0b0700968574ca7ad38fbc0832c409bf7a9e5576b74.5cc6e825eb228a7a5cfd27cb4d7151e97a79fb962b31aaf1813aa102e746584b
https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpf_c3e7so


Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

storing https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/eecc45187d085a1169eed91017d358cc0e9cbdd5dc236bcd710059dbf0a2f816.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
creating metadata file for /root/.cache/huggingface/transformers/eecc45187d085a1169eed91017d358cc0e9cbdd5dc236bcd710059dbf0a2f816.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/aa6d50227b77416b26162efcf0cc9e9a702d13920840322060a2b41a44a8aff4.af25fb1e29ad0175300146695fd80069be69b211c52fa5486fa8aae2754cc814
loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer.json from cache at None
loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/added_tokens.json from cache at 

Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

storing https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
creating metadata file for /root/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
loading weights file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.pre

  0%|          | 0/33 [00:00<?, ?ba/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
trainer = get_trainer_hyper(model_init,training_args,tokenized_dataset,tokenizer,data_collator,compute_metrics)

loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-base-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 

In [None]:
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

[32m[I 2022-06-03 18:22:56,646][0m A new study created in memory with name: no-name-72e05d6e-6a6a-435a-9e7c-d152b2d095d4[0m
Trial:
loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-base-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.159097,0.490442
2,No log,1.276811,0.537634
3,0.803400,1.462714,0.534648


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-0/checkpoint-203
Configuration saved in ./results/run-0/checkpoint-203/config.json
Model weights saved in ./results/run-0/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-0/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-0/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exam

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.428919,0.333333
2,No log,1.2332,0.433692
3,1.230100,1.22271,0.45221
4,1.230100,1.223443,0.455197


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-1/checkpoint-203
Configuration saved in ./results/run-1/checkpoint-203/config.json
Model weights saved in ./results/run-1/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-1/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-1/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exam

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.329477,0.384707


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-2/checkpoint-203
Configuration saved in ./results/run-2/checkpoint-203/config.json
Model weights saved in ./results/run-2/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-2/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-2/checkpoint-203/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./results/run-2/checkpoint-203 (score: 0.3847072879330944).
[32m[I 2022-06-03 18:29:06,084][0m Trial 2 finished with value: 0.3847072879330944 and parameters: {'learning_rate': 1.732282154432

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.605097,0.271804
2,No log,1.571981,0.330346
3,1.575000,1.541595,0.360215
4,1.575000,1.515898,0.373357
5,1.443600,1.509057,0.376941


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-3/checkpoint-203
Configuration saved in ./results/run-3/checkpoint-203/config.json
Model weights saved in ./results/run-3/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-3/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-3/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exam

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.190896,0.459379


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-4/checkpoint-203
Configuration saved in ./results/run-4/checkpoint-203/config.json
Model weights saved in ./results/run-4/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-4/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-4/checkpoint-203/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./results/run-4/checkpoint-203 (score: 0.459378733572282).
[32m[I 2022-06-03 18:33:42,510][0m Trial 4 finished with value: 0.459378733572282 and parameters: {'learning_rate': 3.07030286863938

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.344893,0.433094
2,No log,1.197015,0.558542
3,0.801500,1.768558,0.495818
4,0.801500,2.26941,0.498805
5,0.156900,2.509208,0.511947


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-5/checkpoint-203
Configuration saved in ./results/run-5/checkpoint-203/config.json
Model weights saved in ./results/run-5/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-5/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-5/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exam

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.480748,0.311231


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
[32m[I 2022-06-03 18:38:19,055][0m Trial 6 pruned. [0m
Trial:
loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-base-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.347806,0.4092
2,No log,1.188177,0.459379


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-7/checkpoint-203
Configuration saved in ./results/run-7/checkpoint-203/config.json
Model weights saved in ./results/run-7/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-7/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-7/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exam

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.485977,0.431302
2,No log,1.419295,0.38172


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-8/checkpoint-203
Configuration saved in ./results/run-8/checkpoint-203/config.json
Model weights saved in ./results/run-8/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-8/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-8/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exam

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.379395,0.421744
2,No log,1.217828,0.548387


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/run-9/checkpoint-203
Configuration saved in ./results/run-9/checkpoint-203/config.json
Model weights saved in ./results/run-9/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/run-9/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/run-9/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exam

In [None]:
best_run

BestRun(run_id='9', objective=0.5483870967741935, hyperparameters={'learning_rate': 7.589678062009658e-05, 'num_train_epochs': 2, 'seed': 11, 'per_device_train_batch_size': 8})

In [None]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
Model config BertConfig {
  "_name_or_path": "neuralmind/bert-base-portuguese-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.379395,0.421744
2,No log,1.217828,0.548387


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16
Saving model checkpoint to ./results/checkpoint-203
Configuration saved in ./results/checkpoint-203/config.json
Model weights saved in ./results/checkpoint-203/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-203/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-203/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1674
  Batch size = 16


TrainOutput(global_step=406, training_loss=0.8777332775698506, metrics={'train_runtime': 90.945, 'train_samples_per_second': 71.274, 'train_steps_per_second': 4.464, 'total_flos': 165928878591360.0, 'train_loss': 0.8777332775698506, 'epoch': 2.0})

In [None]:
trainer.evaluate()

In [None]:
trainer.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: tokens. If tokens are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 1675
  Batch size = 16


PredictionOutput(predictions=array([[ 1.6157498 , -1.4119229 , -0.96386194,  2.384488  , -2.6535382 ],
       [ 2.469078  , -1.8339196 , -0.44145027,  0.28556722, -0.7932506 ],
       [ 2.533299  , -1.4997307 ,  0.21426061, -0.662246  , -0.8990004 ],
       ...,
       [-0.7315779 ,  2.756217  , -1.6405042 ,  0.16013697, -0.93971694],
       [ 2.3459826 , -2.0584548 ,  0.9242012 ,  0.25697085, -1.7578737 ],
       [ 1.2710487 , -2.2667406 ,  1.1286571 ,  2.1010783 , -2.988243  ]],
      dtype=float32), label_ids=array([0, 0, 2, ..., 3, 2, 0]), metrics={'test_loss': 1.1175456047058105, 'test_accuracy': 0.564179104477612, 'test_runtime': 4.9916, 'test_samples_per_second': 335.564, 'test_steps_per_second': 21.035})

In [None]:
import shutil

shutil.rmtree('./results')