# Fine-Tuning and Performance Evaluation of Jerteth BERT Models for Sentiment Classification with Early Stopping

In this Jupyter notebook, we delve into the crucial steps of our Natural Language Processing (NLP) project - fine-tuning our preprocessed BERT models on the Serbian Wordnet training data, and assessing their performance. 

Our primary objective is to adapt BERT models to effectively classify sentiments, leveraging a semi-automated, iterative approach that uses seed words and expands them based on their relationships in WordNet. 

The performance evaluation metrics are instrumental in assessing the success of our fine-tuning process. We will analyze these metrics in two ways:

1. **In-notebook Review:** For an immediate performance evaluation, we will print the confusion matrix and classification reports within this notebook.

2. **Persistent Reports:** We'll create a lasting record of our results by storing these metrics in a separate 'reports' folder. This approach facilitates progress trackingover time, and enables comparisons among different models and fine-tuning iterations.

Keep in mind that the fine-tuning and evaluation processes are iterative. Based on our results and insights, we may need to adjust our strategies and fine-tune our models 
ifferently.

Throughout this notebook, we will go through:

1. **Model Training:** Execution of Python scripts for fine-tuning our BERT models on the training set.
2. **Model Testing:** Performance evaluation of the newly fine-tuned models on our test data.
3. **Results Analysis:** Examination, interpretation, and storage of the confusion matrices and classifIn our previous work, we fine-tuned our BERT models for sentiment classification on the Serbian Wordnet training data. However, the models appeared to be overfitting. Overfitting is a common problem in machine learning where a model learns the training data too well, essentially memorizing it, rather than generalizing from it. This means that it performs poorly on unseen data, which is a big problem if we want our models to be applicable to real-world data.

To overcome this issue, we're going to introduce early stopping in this notebook. Early stopping is a method used to prevent overfitting by ending the training process before the learner passes a certain point of over-specialization, i.e., before the model starts to overfit.

We'll fine-tune our BERT models again, but this time, we'll include an early stopping line in our trainer call. Then, we'll evaluate the performance of these newly fine-tuned models and compare the results to the ones from the previous notebook. Our aim is to obtain models that generalize better and thus, perform better on unseen data.
!
Let's get started!


### Importing Required Modules

In this initial code cell, we import the necessary modules that contain functions for training and testing our BERT models. The modules imported are:

1. **`trainJerteh355`:** This module contains the `train_model` and `test_model` functions for handling the training and testing processes respectively. The BERT model used in this module is the "Jerteh 355" model, which is pre-trained exclusively on the Serbian language using a RoBERTa architecture. It is tailored to deal with the specificities of the Serbian language, managing everything from data preprocessing to model training, testing, and memory management for GPU use. It most recent model that possesses 355 million parameters.


By encapsulating the training and testing processes within these modules, we maintain a clean and streamlined notebook. This allows us to focus on the implementation, results interpretation, and performance evaluation of the 
erstand.


In [5]:
import trainJerteh355


## Iteration 0 - Training and Testing
In this section, we use the data from the 0th iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models

s.


In [6]:
trainJerteh355.train_model(0, "POS", eval="f1", epochs =32)



config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1334 [00:00<?, ? examples/s]

Map:   0%|          | 0/11998 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.02254694327712059, 'eval_f1': 0.42857142857142855, 'eval_runtime': 6.0033, 'eval_samples_per_second': 222.212, 'eval_steps_per_second': 13.992, 'epoch': 1.0}




  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.02193531207740307, 'eval_f1': 0.6666666666666666, 'eval_runtime': 6.0055, 'eval_samples_per_second': 222.129, 'eval_steps_per_second': 13.987, 'epoch': 2.0}




  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.031137406826019287, 'eval_f1': 0.5, 'eval_runtime': 5.8927, 'eval_samples_per_second': 226.38, 'eval_steps_per_second': 14.255, 'epoch': 3.0}




  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.03725547343492508, 'eval_f1': 0.625, 'eval_runtime': 5.9249, 'eval_samples_per_second': 225.152, 'eval_steps_per_second': 14.178, 'epoch': 4.0}




  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.034559041261672974, 'eval_f1': 0.631578947368421, 'eval_runtime': 5.965, 'eval_samples_per_second': 223.639, 'eval_steps_per_second': 14.082, 'epoch': 5.0}
{'train_runtime': 1329.6685, 'train_samples_per_second': 288.746, 'train_steps_per_second': 1.131, 'train_loss': 0.019427638358258188, 'epoch': 5.0}


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Max memory allocated by tensors- before:
    5.00 GB
Max memory allocated by tensors- after:
    5.00 GB


In [7]:
trainJerteh355.test_model(0, "POS")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/295 [00:00<?, ?B/s]

[[4399    9]
 [  12   25]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      4408
           1       0.74      0.68      0.70        37

    accuracy                           1.00      4445
   macro avg       0.87      0.84      0.85      4445
weighted avg       1.00      1.00      1.00      4445



In [8]:
trainJerteh355.test_model_local(0, "POS")

[[4399    9]
 [  12   25]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      4408
           1       0.74      0.68      0.70        37

    accuracy                           1.00      4445
   macro avg       0.87      0.84      0.85      4445
weighted avg       1.00      1.00      1.00      4445



In [9]:
trainJerteh355.upload_local_model_to_hub(0, "POS")

README.md:   0%|          | 0.00/1.70k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/1.70k [00:00<?, ?B/s]

In [10]:
trainJerteh355.train_model(0, "NEG", eval="f1", epochs =32)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1334 [00:00<?, ? examples/s]

Map:   0%|          | 0/11998 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.022386685013771057, 'eval_f1': 0.5, 'eval_runtime': 5.8677, 'eval_samples_per_second': 227.346, 'eval_steps_per_second': 14.316, 'epoch': 1.0}




  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.03419999033212662, 'eval_f1': 0.625, 'eval_runtime': 5.8727, 'eval_samples_per_second': 227.152, 'eval_steps_per_second': 14.303, 'epoch': 2.0}




  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.0419187918305397, 'eval_f1': 0.64, 'eval_runtime': 5.9492, 'eval_samples_per_second': 224.231, 'eval_steps_per_second': 14.119, 'epoch': 3.0}




  0%|          | 0/84 [00:00<?, ?it/s]

{'eval_loss': 0.039360471069812775, 'eval_f1': 0.6666666666666666, 'eval_runtime': 5.8672, 'eval_samples_per_second': 227.367, 'eval_steps_per_second': 14.317, 'epoch': 4.0}
{'train_runtime': 1081.6341, 'train_samples_per_second': 354.959, 'train_steps_per_second': 1.39, 'train_loss': 0.030993616327326348, 'epoch': 4.0}
Max memory allocated by tensors- before:
    5.01 GB
Max memory allocated by tensors- after:
    5.01 GB


In [11]:
trainJerteh355.upload_local_model_to_hub(0, "NEG")

README.md:   0%|          | 0.00/1.64k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


README.md:   0%|          | 0.00/1.64k [00:00<?, ?B/s]

In [12]:
trainJerteh355.test_model(0, "NEG")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.50k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

[[4384    7]
 [  34   20]]
              precision    recall  f1-score   support

           0       0.99      1.00      1.00      4391
           1       0.74      0.37      0.49        54

    accuracy                           0.99      4445
   macro avg       0.87      0.68      0.74      4445
weighted avg       0.99      0.99      0.99      4445



## Iteration 2 - Training and Testing
In this section, we use the data from the 2nd iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models.

y.


In [13]:
trainJerteh355.train_model(2, "POS", eval="f1", epochs = 32)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1349 [00:00<?, ? examples/s]

Map:   0%|          | 0/12137 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.046928539872169495, 'eval_f1': 0.24, 'eval_runtime': 5.9214, 'eval_samples_per_second': 227.817, 'eval_steps_per_second': 14.355, 'epoch': 0.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.06389033794403076, 'eval_f1': 0.41379310344827586, 'eval_runtime': 5.8272, 'eval_samples_per_second': 231.499, 'eval_steps_per_second': 14.587, 'epoch': 2.0}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05569850653409958, 'eval_f1': 0.6111111111111112, 'eval_runtime': 5.8459, 'eval_samples_per_second': 230.76, 'eval_steps_per_second': 14.54, 'epoch': 2.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.06108378246426582, 'eval_f1': 0.6842105263157895, 'eval_runtime': 5.9207, 'eval_samples_per_second': 227.843, 'eval_steps_per_second': 14.356, 'epoch': 4.0}
{'train_runtime': 1085.829, 'train_samples_per_second': 357.684, 'train_steps_per_second': 1.385, 'train_loss': 0.039176060024060703, 'epoch': 4.0}
Max memory allocated by tensors- before:
    5.01 GB
Max memory allocated by tensors- after:
    5.01 GB


In [14]:
trainJerteh355.test_model_local(2, "POS")

[[4430    0]
 [  54   12]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4430
           1       1.00      0.18      0.31        66

    accuracy                           0.99      4496
   macro avg       0.99      0.59      0.65      4496
weighted avg       0.99      0.99      0.98      4496



In [15]:
trainJerteh355.upload_local_model_to_hub(2, "POS")

README.md:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


README.md:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

In [16]:
trainJerteh355.test_model(2, "POS")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.50k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

[[4430    0]
 [  54   12]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4430
           1       1.00      0.18      0.31        66

    accuracy                           0.99      4496
   macro avg       0.99      0.59      0.65      4496
weighted avg       0.99      0.99      0.98      4496



In [17]:
trainJerteh355.train_model(2, "NEG", eval="f1", epochs = 32)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1349 [00:00<?, ? examples/s]

Map:   0%|          | 0/12137 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.0331762321293354, 'eval_f1': 0.6363636363636364, 'eval_runtime': 5.8505, 'eval_samples_per_second': 230.579, 'eval_steps_per_second': 14.529, 'epoch': 0.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.038763415068387985, 'eval_f1': 0.6666666666666666, 'eval_runtime': 5.837, 'eval_samples_per_second': 231.11, 'eval_steps_per_second': 14.562, 'epoch': 2.0}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.036288246512413025, 'eval_f1': 0.6666666666666666, 'eval_runtime': 5.9349, 'eval_samples_per_second': 227.299, 'eval_steps_per_second': 14.322, 'epoch': 2.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.04672776907682419, 'eval_f1': 0.6222222222222222, 'eval_runtime': 5.8555, 'eval_samples_per_second': 230.381, 'eval_steps_per_second': 14.516, 'epoch': 4.0}
{'train_runtime': 1062.5418, 'train_samples_per_second': 365.523, 'train_steps_per_second': 1.415, 'train_loss': 0.03277868220680638, 'epoch': 4.0}
Max memory allocated by tensors- before:
    5.01 GB
Max memory allocated by tensors- after:
    5.01 GB


In [18]:
trainJerteh355.test_model_local(2, "NEG")

[[4396   24]
 [  32   44]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4420
           1       0.65      0.58      0.61        76

    accuracy                           0.99      4496
   macro avg       0.82      0.79      0.80      4496
weighted avg       0.99      0.99      0.99      4496



In [19]:
trainJerteh355.upload_local_model_to_hub(2, "NEG")

README.md:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


README.md:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

In [20]:
trainJerteh355.test_model(2, "NEG")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.50k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

[[4396   24]
 [  32   44]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4420
           1       0.65      0.58      0.61        76

    accuracy                           0.99      4496
   macro avg       0.82      0.79      0.80      4496
weighted avg       0.99      0.99      0.99      4496



## Iteration 4 - Training and Testing
In this section, we use the data from the 4th iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models.


In [21]:
trainJerteh355.train_model(4, "POS", eval="f1", epochs = 32)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1352 [00:00<?, ? examples/s]

Map:   0%|          | 0/12163 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.049195993691682816, 'eval_f1': 0.24, 'eval_runtime': 5.8586, 'eval_samples_per_second': 230.772, 'eval_steps_per_second': 14.509, 'epoch': 0.98}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.04398298263549805, 'eval_f1': 0.5142857142857142, 'eval_runtime': 5.8151, 'eval_samples_per_second': 232.498, 'eval_steps_per_second': 14.617, 'epoch': 1.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05767863988876343, 'eval_f1': 0.45714285714285713, 'eval_runtime': 5.8532, 'eval_samples_per_second': 230.983, 'eval_steps_per_second': 14.522, 'epoch': 2.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.06107961758971214, 'eval_f1': 0.47368421052631576, 'eval_runtime': 5.8665, 'eval_samples_per_second': 230.46, 'eval_steps_per_second': 14.489, 'epoch': 4.0}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.07606474310159683, 'eval_f1': 0.3888888888888889, 'eval_runtime': 5.9099, 'eval_samples_per_second': 228.767, 'eval_steps_per_second': 14.383, 'epoch': 4.98}
{'train_runtime': 1366.7288, 'train_samples_per_second': 284.779, 'train_steps_per_second': 1.1, 'train_loss': 0.03343838603556657, 'epoch': 4.98}


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Max memory allocated by tensors- before:
    5.01 GB
Max memory allocated by tensors- after:
    5.01 GB


In [22]:
trainJerteh355.test_model(4, "POS")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/295 [00:00<?, ?B/s]

[[4428    6]
 [  40   32]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4434
           1       0.84      0.44      0.58        72

    accuracy                           0.99      4506
   macro avg       0.92      0.72      0.79      4506
weighted avg       0.99      0.99      0.99      4506



In [23]:
trainJerteh355.train_model(4, "NEG", eval="f1", epochs = 32)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1352 [00:00<?, ? examples/s]

Map:   0%|          | 0/12163 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05004332214593887, 'eval_f1': 0.5454545454545454, 'eval_runtime': 6.0828, 'eval_samples_per_second': 222.267, 'eval_steps_per_second': 13.974, 'epoch': 0.98}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05249148607254028, 'eval_f1': 0.5945945945945946, 'eval_runtime': 6.089, 'eval_samples_per_second': 222.039, 'eval_steps_per_second': 13.96, 'epoch': 1.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05133790895342827, 'eval_f1': 0.5641025641025641, 'eval_runtime': 6.0025, 'eval_samples_per_second': 225.239, 'eval_steps_per_second': 14.161, 'epoch': 2.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.06241669878363609, 'eval_f1': 0.5454545454545454, 'eval_runtime': 6.1621, 'eval_samples_per_second': 219.406, 'eval_steps_per_second': 13.794, 'epoch': 4.0}
{'train_runtime': 1030.1468, 'train_samples_per_second': 377.826, 'train_steps_per_second': 1.46, 'train_loss': 0.04330951630757117, 'epoch': 4.0}
Max memory allocated by tensors- before:
    5.01 GB
Max memory allocated by tensors- after:
    5.01 GB


In [24]:
trainJerteh355.test_model(4, "NEG")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/295 [00:00<?, ?B/s]

[[4418    8]
 [  55   25]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4426
           1       0.76      0.31      0.44        80

    accuracy                           0.99      4506
   macro avg       0.87      0.66      0.72      4506
weighted avg       0.98      0.99      0.98      4506



## Iteration 6 - Training and Testing
In this section, we use the data from the 6th iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models.


In [25]:
trainJerteh355.train_model(6, "POS", eval="f1", epochs = 32)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1354 [00:00<?, ? examples/s]

Map:   0%|          | 0/12177 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.060339123010635376, 'eval_f1': 0.24, 'eval_runtime': 5.644, 'eval_samples_per_second': 239.903, 'eval_steps_per_second': 15.06, 'epoch': 0.98}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05143671855330467, 'eval_f1': 0.4, 'eval_runtime': 5.6602, 'eval_samples_per_second': 239.215, 'eval_steps_per_second': 15.017, 'epoch': 1.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05124771222472191, 'eval_f1': 0.5581395348837209, 'eval_runtime': 5.6551, 'eval_samples_per_second': 239.43, 'eval_steps_per_second': 15.031, 'epoch': 2.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.06368540972471237, 'eval_f1': 0.27586206896551724, 'eval_runtime': 5.6474, 'eval_samples_per_second': 239.755, 'eval_steps_per_second': 15.051, 'epoch': 4.0}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05702666938304901, 'eval_f1': 0.5294117647058824, 'eval_runtime': 5.6574, 'eval_samples_per_second': 239.331, 'eval_steps_per_second': 15.024, 'epoch': 4.98}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.10350248217582703, 'eval_f1': 0.4375, 'eval_runtime': 5.6537, 'eval_samples_per_second': 239.49, 'eval_steps_per_second': 15.034, 'epoch': 5.99}
{'train_runtime': 1588.5917, 'train_samples_per_second': 245.289, 'train_steps_per_second': 0.947, 'train_loss': 0.031313852830366654, 'epoch': 5.99}


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Max memory allocated by tensors- before:
    5.01 GB
Max memory allocated by tensors- after:
    5.01 GB


In [26]:
trainJerteh355.test_model(6, "POS")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/295 [00:00<?, ?B/s]

[[4418   20]
 [  40   33]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4438
           1       0.62      0.45      0.52        73

    accuracy                           0.99      4511
   macro avg       0.81      0.72      0.76      4511
weighted avg       0.99      0.99      0.99      4511



In [28]:
trainJerteh355.train_model(6, "NEG", eval="f1", epochs = 32)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at jerteh/Jerteh-355 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/1354 [00:00<?, ? examples/s]

Map:   0%|          | 0/12177 [00:00<?, ? examples/s]

  0%|          | 0/1504 [00:00<?, ?it/s]



  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.04871431365609169, 'eval_f1': 0.4878048780487805, 'eval_runtime': 6.271, 'eval_samples_per_second': 215.916, 'eval_steps_per_second': 13.555, 'epoch': 0.98}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.044503096491098404, 'eval_f1': 0.47368421052631576, 'eval_runtime': 6.4862, 'eval_samples_per_second': 208.75, 'eval_steps_per_second': 13.105, 'epoch': 1.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05655065178871155, 'eval_f1': 0.5306122448979592, 'eval_runtime': 6.518, 'eval_samples_per_second': 207.733, 'eval_steps_per_second': 13.041, 'epoch': 2.99}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.05942423269152641, 'eval_f1': 0.5306122448979592, 'eval_runtime': 6.4963, 'eval_samples_per_second': 208.426, 'eval_steps_per_second': 13.084, 'epoch': 4.0}




  0%|          | 0/85 [00:00<?, ?it/s]

{'eval_loss': 0.0636511817574501, 'eval_f1': 0.5490196078431373, 'eval_runtime': 6.6391, 'eval_samples_per_second': 203.943, 'eval_steps_per_second': 12.803, 'epoch': 4.98}
{'train_runtime': 1328.3446, 'train_samples_per_second': 293.346, 'train_steps_per_second': 1.132, 'train_loss': 0.035773181113876215, 'epoch': 4.98}


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Max memory allocated by tensors- before:
    7.66 GB
Max memory allocated by tensors- after:
    7.66 GB


In [29]:
trainJerteh355.test_model(6, "NEG")



config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/295 [00:00<?, ?B/s]

[[4423    4]
 [  58   26]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4427
           1       0.87      0.31      0.46        84

    accuracy                           0.99      4511
   macro avg       0.93      0.65      0.72      4511
weighted avg       0.98      0.99      0.98      4511

