# Fine-Tuning and Performance Evaluation of GPT2-Orao based Models for Sentiment Classification with Early Stopping

In this Jupyter notebook, we delve into the crucial steps of our Natural Language Processing (NLP) project - fine-tuning our preprocessed BERT models on the Serbian Wordnet training data, and assessing their performance. 

Our primary objective is to adapt GPT2 models to effectively classify sentiments, leveraging a semi-automated, iterative approach that uses seed words and expands them based on their relationships in WordNet. 

The performance evaluation metrics are instrumental in assessing the success of our fine-tuning process. We will analyze these metrics in two ways:

1. **In-notebook Review:** For an immediate performance evaluation, we will print the confusion matrix and classification reports within this notebook.

2. **Persistent Reports:** We'll create a lasting record of our results by storing these metrics in a separate 'reports' folder. This approach facilitates progress trackingover time, and enables comparisons among different models and fine-tuning iterations.

Keep in mind that the fine-tuning and evaluation processes are iterative. Based on our results and insights, we may need to adjust our strategies and fine-tune our models 
ifferently.

Throughout this notebook, we will go through:

1. **Model Training:** Execution of Python scripts for fine-tuning our GPT2 models on the training set.
2. **Model Testing:** Performance evaluation of the newly fine-tuned models on our test data.
3. **Results Analysis:** Examination, interpretation, and storage of the confusion matrices and classifIn our previous work, we fine-tuned our BERT models for sentiment classification on the Serbian Wordnet training data. However, the models appeared to be overfitting. Overfitting is a common problem in machine learning where a model learns the training data too well, essentially memorizing it, rather than generalizing from it. This means that it performs poorly on unseen data, which is a big problem if we want our models to be applicable to real-world data.

To overcome this issue, we're going to introduce early stopping in this notebook. Early stopping is a method used to prevent overfitting by ending the training process before the learner passes a certain point of over-specialization, i.e., before the model starts to overfit.

We'll GPT2-Orao based models, but this time, we'll include an early stopping line in our trainer call. 
!
Let's get started!


### Importing Required Modules

In this initial code cell, we import the necessary modules that contain functions for training and testing our BERT models. The modules imported are:

1. **`trainSRGPT`:** This module contains the `train_model` and `test_model` functions for handling the training and testing processes respectively. The GPT2 model used in this module is the "Jerteh" GPT2 - Orao model, which is pre-trained exclusively on the Serbian language using a GPT2 architecture. It is tailored to deal with the specificities of the Serbian language, managing everything from data preprocessing to model training, testing, and memory management for GPU use. Also 'test_model_local' and 'upload_local_model_to_hub' had been added since there was some problem with incorrcte uploding. In this way it can be checked if local and model uploded on site are the same.  



In [1]:
import trainSRGPT

## Iteration 0 - Training and Testing
In this section, we use the data from the 0th iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models

s.


In [2]:
trainSRGPT.train_model(0, "POS", eval="f1", epochs =32)

0


Map:   0%|          | 0/2667 [00:00<?, ? examples/s]

Map:   0%|          | 0/10665 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTPOS0 into local empty directory.


Download file pytorch_model.bin:   0%|          | 8.15k/2.88G [00:00<?, ?B/s]

Download file training_args.bin: 100%|##########| 4.30k/4.30k [00:00<?, ?B/s]

Clean file training_args.bin:  23%|##3       | 1.00k/4.30k [00:00<?, ?B/s]

  0%|          | 0/85312 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


{'loss': 0.0789, 'learning_rate': 1.9882783195798953e-05, 'epoch': 0.19}
{'loss': 0.0457, 'learning_rate': 1.97655663915979e-05, 'epoch': 0.38}
{'loss': 0.0524, 'learning_rate': 1.964834958739685e-05, 'epoch': 0.56}
{'loss': 0.0416, 'learning_rate': 1.9531132783195802e-05, 'epoch': 0.75}
{'loss': 0.0315, 'learning_rate': 1.941391597899475e-05, 'epoch': 0.94}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.12009615451097488, 'eval_f1': 0.3666666666666667, 'eval_runtime': 57.8955, 'eval_samples_per_second': 46.066, 'eval_steps_per_second': 46.066, 'epoch': 1.0}




{'loss': 0.0247, 'learning_rate': 1.92966991747937e-05, 'epoch': 1.13}
{'loss': 0.0377, 'learning_rate': 1.917948237059265e-05, 'epoch': 1.31}
{'loss': 0.0225, 'learning_rate': 1.90622655663916e-05, 'epoch': 1.5}
{'loss': 0.043, 'learning_rate': 1.894504876219055e-05, 'epoch': 1.69}
{'loss': 0.0279, 'learning_rate': 1.88278319579895e-05, 'epoch': 1.88}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.10047691315412521, 'eval_f1': 0.25641025641025644, 'eval_runtime': 58.9911, 'eval_samples_per_second': 45.21, 'eval_steps_per_second': 45.21, 'epoch': 2.0}




{'loss': 0.0057, 'learning_rate': 1.8710615153788448e-05, 'epoch': 2.06}
{'loss': 0.0182, 'learning_rate': 1.85933983495874e-05, 'epoch': 2.25}
{'loss': 0.0567, 'learning_rate': 1.847618154538635e-05, 'epoch': 2.44}
{'loss': 0.0038, 'learning_rate': 1.8358964741185298e-05, 'epoch': 2.63}
{'loss': 0.0169, 'learning_rate': 1.8241747936984245e-05, 'epoch': 2.81}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.11446953564882278, 'eval_f1': 0.24242424242424246, 'eval_runtime': 60.327, 'eval_samples_per_second': 44.209, 'eval_steps_per_second': 44.209, 'epoch': 3.0}




{'loss': 0.0344, 'learning_rate': 1.8124531132783196e-05, 'epoch': 3.0}
{'loss': 0.003, 'learning_rate': 1.8007314328582147e-05, 'epoch': 3.19}
{'loss': 0.0099, 'learning_rate': 1.7890097524381094e-05, 'epoch': 3.38}
{'loss': 0.0033, 'learning_rate': 1.7772880720180045e-05, 'epoch': 3.56}
{'loss': 0.0059, 'learning_rate': 1.7655663915978996e-05, 'epoch': 3.75}
{'loss': 0.0131, 'learning_rate': 1.7538447111777944e-05, 'epoch': 3.94}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.12751102447509766, 'eval_f1': 0.1935483870967742, 'eval_runtime': 58.7251, 'eval_samples_per_second': 45.415, 'eval_steps_per_second': 45.415, 'epoch': 4.0}




{'loss': 0.0, 'learning_rate': 1.7421230307576895e-05, 'epoch': 4.13}
{'loss': 0.0051, 'learning_rate': 1.7304013503375846e-05, 'epoch': 4.31}
{'loss': 0.0, 'learning_rate': 1.7186796699174793e-05, 'epoch': 4.5}
{'loss': 0.009, 'learning_rate': 1.7069579894973744e-05, 'epoch': 4.69}
{'loss': 0.011, 'learning_rate': 1.6952363090772695e-05, 'epoch': 4.88}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.13059721887111664, 'eval_f1': 0.07692307692307693, 'eval_runtime': 58.7783, 'eval_samples_per_second': 45.374, 'eval_steps_per_second': 45.374, 'epoch': 5.0}
{'train_runtime': 10546.1245, 'train_samples_per_second': 32.361, 'train_steps_per_second': 8.089, 'train_loss': 0.022838059998659233, 'epoch': 5.0}


Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 1.00/2.88G [00:00<?, ?B/s]

To https://huggingface.co/Tanor/SRGPTSENTPOS0
   6be9a5a..3fdbc0f  main -> main

To https://huggingface.co/Tanor/SRGPTSENTPOS0
   3fdbc0f..29fd26c  main -> main



Max memory allocated by tensors:
    6.42 GB


In [None]:
trainSRGPT.test_model_local(0, "POS")

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[[4392   16]
 [  17   20]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      4408
           1       0.56      0.54      0.55        37

    accuracy                           0.99      4445
   macro avg       0.78      0.77      0.77      4445
weighted avg       0.99      0.99      0.99      4445



In [None]:
trainSRGPT.upload_local_model_to_hub(0, "POS")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [2]:
trainSRGPT.test_model(0, "POS")

    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.2.0.dev20230928)
    Python  3.11.5 (you have 3.11.4)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


[[4392   16]
 [  17   20]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      4408
           1       0.56      0.54      0.55        37

    accuracy                           0.99      4445
   macro avg       0.78      0.77      0.77      4445
weighted avg       0.99      0.99      0.99      4445



In [22]:
trainSRGPT.train_model(0, "NEG", eval="f1", epochs =32)

6897376768


Map:   0%|          | 0/2667 [00:00<?, ? examples/s]

Map:   0%|          | 0/10665 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTNEG0 into local empty directory.


Download file pytorch_model.bin:   0%|          | 16.4k/2.88G [00:00<?, ?B/s]

Download file training_args.bin: 100%|##########| 4.30k/4.30k [00:00<?, ?B/s]

Clean file training_args.bin:  23%|##3       | 1.00k/4.30k [00:00<?, ?B/s]

  0%|          | 0/85312 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'loss': 0.0857, 'learning_rate': 1.9882783195798953e-05, 'epoch': 0.19}
{'loss': 0.0721, 'learning_rate': 1.97655663915979e-05, 'epoch': 0.38}
{'loss': 0.0439, 'learning_rate': 1.964834958739685e-05, 'epoch': 0.56}
{'loss': 0.0735, 'learning_rate': 1.9531132783195802e-05, 'epoch': 0.75}
{'loss': 0.0521, 'learning_rate': 1.941391597899475e-05, 'epoch': 0.94}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.09305841475725174, 'eval_f1': 0.4358974358974359, 'eval_runtime': 145.7993, 'eval_samples_per_second': 18.292, 'eval_steps_per_second': 18.292, 'epoch': 1.0}




{'loss': 0.053, 'learning_rate': 1.92966991747937e-05, 'epoch': 1.13}
{'loss': 0.0446, 'learning_rate': 1.917948237059265e-05, 'epoch': 1.31}
{'loss': 0.028, 'learning_rate': 1.90622655663916e-05, 'epoch': 1.5}
{'loss': 0.0602, 'learning_rate': 1.894504876219055e-05, 'epoch': 1.69}
{'loss': 0.0511, 'learning_rate': 1.88278319579895e-05, 'epoch': 1.88}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.12556378543376923, 'eval_f1': 0.45614035087719296, 'eval_runtime': 148.2646, 'eval_samples_per_second': 17.988, 'eval_steps_per_second': 17.988, 'epoch': 2.0}




{'loss': 0.0502, 'learning_rate': 1.8710615153788448e-05, 'epoch': 2.06}
{'loss': 0.0205, 'learning_rate': 1.85933983495874e-05, 'epoch': 2.25}
{'loss': 0.0453, 'learning_rate': 1.847618154538635e-05, 'epoch': 2.44}
{'loss': 0.0194, 'learning_rate': 1.8358964741185298e-05, 'epoch': 2.63}
{'loss': 0.0173, 'learning_rate': 1.8241747936984245e-05, 'epoch': 2.81}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.18492776155471802, 'eval_f1': 0.41935483870967744, 'eval_runtime': 152.5787, 'eval_samples_per_second': 17.48, 'eval_steps_per_second': 17.48, 'epoch': 3.0}




{'loss': 0.0123, 'learning_rate': 1.8124531132783196e-05, 'epoch': 3.0}
{'loss': 0.0038, 'learning_rate': 1.8007314328582147e-05, 'epoch': 3.19}
{'loss': 0.0, 'learning_rate': 1.7890097524381094e-05, 'epoch': 3.38}
{'loss': 0.0079, 'learning_rate': 1.7772880720180045e-05, 'epoch': 3.56}
{'loss': 0.011, 'learning_rate': 1.7655663915978996e-05, 'epoch': 3.75}
{'loss': 0.0091, 'learning_rate': 1.7538447111777944e-05, 'epoch': 3.94}


  0%|          | 0/2667 [00:00<?, ?it/s]

{'eval_loss': 0.1939464956521988, 'eval_f1': 0.34782608695652173, 'eval_runtime': 143.395, 'eval_samples_per_second': 18.599, 'eval_steps_per_second': 18.599, 'epoch': 4.0}
{'train_runtime': 14574.2518, 'train_samples_per_second': 23.417, 'train_steps_per_second': 5.854, 'train_loss': 0.03567274771457086, 'epoch': 4.0}


Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 1.00/2.88G [00:00<?, ?B/s]

remote: error: cannot lock ref 'refs/heads/main': is at a389eea1e06718e42191d0023420c3a1cb0a55cb but expected d6d012c115db433bfc7a5feb851c584f9402c310        
To https://huggingface.co/Tanor/SRGPTSENTNEG0
 ! [remote rejected] main -> main (failed to update ref)
error: failed to push some refs to 'https://huggingface.co/Tanor/SRGPTSENTNEG0'



Push attempt 1 failed with error: remote: error: cannot lock ref 'refs/heads/main': is at a389eea1e06718e42191d0023420c3a1cb0a55cb but expected d6d012c115db433bfc7a5feb851c584f9402c310        
To https://huggingface.co/Tanor/SRGPTSENTNEG0
 ! [remote rejected] main -> main (failed to update ref)
error: failed to push some refs to 'https://huggingface.co/Tanor/SRGPTSENTNEG0'



Several commits (2) will be pushed upstream.
The progress bars may be unreliable.
To https://huggingface.co/Tanor/SRGPTSENTNEG0
   a389eea..acf49c4  main -> main



Max memory allocated by tensors:
    6.42 GB


In [23]:
trainSRGPT.test_model_local(0, "NEG")

[[4370   21]
 [  30   24]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4391
           1       0.53      0.44      0.48        54

    accuracy                           0.99      4445
   macro avg       0.76      0.72      0.74      4445
weighted avg       0.99      0.99      0.99      4445



In [None]:
trainSRGPT.upload_local_model_to_hub(0, "NEG")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [24]:
trainSRGPT.test_model(0, "NEG")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

[[4370   21]
 [  30   24]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4391
           1       0.53      0.44      0.48        54

    accuracy                           0.99      4445
   macro avg       0.76      0.72      0.74      4445
weighted avg       0.99      0.99      0.99      4445



## Iteration 2 - Training and Testing
In this section, we use the data from the 2nd iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models.

y.


In [2]:
trainSRGPT.train_model(2, "POS", eval="f1", epochs =32)

0


Map:   0%|          | 0/2698 [00:00<?, ? examples/s]

Map:   0%|          | 0/10788 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTPOS2 into local empty directory.


Download file pytorch_model.bin:   0%|          | 6.25k/2.88G [00:00<?, ?B/s]

Download file training_args.bin: 100%|##########| 4.30k/4.30k [00:00<?, ?B/s]

Clean file training_args.bin:  23%|##3       | 1.00k/4.30k [00:00<?, ?B/s]

  0%|          | 0/86304 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


{'loss': 0.0521, 'learning_rate': 1.988413051538747e-05, 'epoch': 0.19}
{'loss': 0.0312, 'learning_rate': 1.9768261030774935e-05, 'epoch': 0.37}
{'loss': 0.0359, 'learning_rate': 1.9652391546162403e-05, 'epoch': 0.56}
{'loss': 0.0178, 'learning_rate': 1.953652206154987e-05, 'epoch': 0.74}
{'loss': 0.0314, 'learning_rate': 1.942065257693734e-05, 'epoch': 0.93}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.16743789613246918, 'eval_f1': 0.5111111111111112, 'eval_runtime': 155.7633, 'eval_samples_per_second': 17.321, 'eval_steps_per_second': 17.321, 'epoch': 1.0}




{'loss': 0.0206, 'learning_rate': 1.9304783092324804e-05, 'epoch': 1.11}
{'loss': 0.028, 'learning_rate': 1.9188913607712273e-05, 'epoch': 1.3}
{'loss': 0.023, 'learning_rate': 1.907304412309974e-05, 'epoch': 1.48}
{'loss': 0.0167, 'learning_rate': 1.895717463848721e-05, 'epoch': 1.67}
{'loss': 0.0236, 'learning_rate': 1.8841305153874677e-05, 'epoch': 1.85}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.16873560845851898, 'eval_f1': 0.4307692307692308, 'eval_runtime': 159.0412, 'eval_samples_per_second': 16.964, 'eval_steps_per_second': 16.964, 'epoch': 2.0}




{'loss': 0.0214, 'learning_rate': 1.8725435669262146e-05, 'epoch': 2.04}
{'loss': 0.0058, 'learning_rate': 1.8609566184649614e-05, 'epoch': 2.22}
{'loss': 0.0127, 'learning_rate': 1.849369670003708e-05, 'epoch': 2.41}
{'loss': 0.0173, 'learning_rate': 1.8377827215424547e-05, 'epoch': 2.6}
{'loss': 0.0094, 'learning_rate': 1.8261957730812015e-05, 'epoch': 2.78}
{'loss': 0.0407, 'learning_rate': 1.8146088246199484e-05, 'epoch': 2.97}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.15711432695388794, 'eval_f1': 0.4, 'eval_runtime': 59.5318, 'eval_samples_per_second': 45.32, 'eval_steps_per_second': 45.32, 'epoch': 3.0}




{'loss': 0.0205, 'learning_rate': 1.8030218761586952e-05, 'epoch': 3.15}
{'loss': 0.0128, 'learning_rate': 1.7914349276974417e-05, 'epoch': 3.34}
{'loss': 0.0119, 'learning_rate': 1.7798479792361885e-05, 'epoch': 3.52}
{'loss': 0.0112, 'learning_rate': 1.7682610307749353e-05, 'epoch': 3.71}
{'loss': 0.0086, 'learning_rate': 1.756674082313682e-05, 'epoch': 3.89}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.34660327434539795, 'eval_f1': 0.34374999999999994, 'eval_runtime': 152.1019, 'eval_samples_per_second': 17.738, 'eval_steps_per_second': 17.738, 'epoch': 4.0}


Several commits (2) will be pushed upstream.


{'loss': 0.0066, 'learning_rate': 1.7450871338524286e-05, 'epoch': 4.08}
{'loss': 0.0156, 'learning_rate': 1.7335001853911755e-05, 'epoch': 4.26}
{'loss': 0.015, 'learning_rate': 1.7219132369299223e-05, 'epoch': 4.45}
{'loss': 0.0212, 'learning_rate': 1.710326288468669e-05, 'epoch': 4.63}
{'loss': 0.007, 'learning_rate': 1.6987393400074156e-05, 'epoch': 4.82}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.19816473126411438, 'eval_f1': 0.37499999999999994, 'eval_runtime': 142.8261, 'eval_samples_per_second': 18.89, 'eval_steps_per_second': 18.89, 'epoch': 5.0}




{'loss': 0.0135, 'learning_rate': 1.6871523915461624e-05, 'epoch': 5.01}
{'loss': 0.0181, 'learning_rate': 1.6755654430849093e-05, 'epoch': 5.19}
{'loss': 0.0085, 'learning_rate': 1.663978494623656e-05, 'epoch': 5.38}
{'loss': 0.0323, 'learning_rate': 1.6523915461624026e-05, 'epoch': 5.56}
{'loss': 0.0044, 'learning_rate': 1.6408045977011494e-05, 'epoch': 5.75}
{'loss': 0.0091, 'learning_rate': 1.6292176492398962e-05, 'epoch': 5.93}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.20229972898960114, 'eval_f1': 0.34920634920634924, 'eval_runtime': 145.5307, 'eval_samples_per_second': 18.539, 'eval_steps_per_second': 18.539, 'epoch': 6.0}


Several commits (3) will be pushed upstream.


{'loss': 0.005, 'learning_rate': 1.617630700778643e-05, 'epoch': 6.12}
{'loss': 0.0, 'learning_rate': 1.60604375231739e-05, 'epoch': 6.3}
{'loss': 0.0047, 'learning_rate': 1.5944568038561367e-05, 'epoch': 6.49}
{'loss': 0.0112, 'learning_rate': 1.5828698553948835e-05, 'epoch': 6.67}
{'loss': 0.0071, 'learning_rate': 1.57128290693363e-05, 'epoch': 6.86}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.24766495823860168, 'eval_f1': 0.36781609195402304, 'eval_runtime': 134.184, 'eval_samples_per_second': 20.107, 'eval_steps_per_second': 20.107, 'epoch': 7.0}




{'loss': 0.0083, 'learning_rate': 1.559695958472377e-05, 'epoch': 7.04}
{'loss': 0.0, 'learning_rate': 1.5481090100111237e-05, 'epoch': 7.23}
{'loss': 0.0177, 'learning_rate': 1.5365220615498705e-05, 'epoch': 7.42}
{'loss': 0.0167, 'learning_rate': 1.5249351130886171e-05, 'epoch': 7.6}
{'loss': 0.0127, 'learning_rate': 1.5133481646273638e-05, 'epoch': 7.79}
{'loss': 0.0075, 'learning_rate': 1.5017612161661106e-05, 'epoch': 7.97}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.15560871362686157, 'eval_f1': 0.37499999999999994, 'eval_runtime': 139.1034, 'eval_samples_per_second': 19.396, 'eval_steps_per_second': 19.396, 'epoch': 8.0}




{'loss': 0.0093, 'learning_rate': 1.4901742677048574e-05, 'epoch': 8.16}
{'loss': 0.0021, 'learning_rate': 1.4785873192436043e-05, 'epoch': 8.34}
{'loss': 0.0128, 'learning_rate': 1.4670003707823508e-05, 'epoch': 8.53}
{'loss': 0.0157, 'learning_rate': 1.4554134223210976e-05, 'epoch': 8.71}
{'loss': 0.0089, 'learning_rate': 1.4438264738598444e-05, 'epoch': 8.9}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.2155970335006714, 'eval_f1': 0.37288135593220345, 'eval_runtime': 141.4184, 'eval_samples_per_second': 19.078, 'eval_steps_per_second': 19.078, 'epoch': 9.0}
{'train_runtime': 31255.3326, 'train_samples_per_second': 11.045, 'train_steps_per_second': 2.761, 'train_loss': 0.015349580170902575, 'epoch': 9.0}


Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 1.00/2.88G [00:00<?, ?B/s]

remote: error: cannot lock ref 'refs/heads/main': is at 48a972d7f8072ea4f4363b072243bc65e0fa98da but expected cbf7e352c17137cdba43e4b2e8e8cd0928934858        
To https://huggingface.co/Tanor/SRGPTSENTPOS2
 ! [remote rejected] main -> main (failed to update ref)
error: failed to push some refs to 'https://huggingface.co/Tanor/SRGPTSENTPOS2'



Push attempt 1 failed with error: remote: error: cannot lock ref 'refs/heads/main': is at 48a972d7f8072ea4f4363b072243bc65e0fa98da but expected cbf7e352c17137cdba43e4b2e8e8cd0928934858        
To https://huggingface.co/Tanor/SRGPTSENTPOS2
 ! [remote rejected] main -> main (failed to update ref)
error: failed to push some refs to 'https://huggingface.co/Tanor/SRGPTSENTPOS2'



Several commits (2) will be pushed upstream.
The progress bars may be unreliable.
To https://huggingface.co/Tanor/SRGPTSENTPOS2
   48a972d..4a8fd35  main -> main



Max memory allocated by tensors:
    6.42 GB


In [3]:
trainSRGPT.test_model_local(2, "POS")

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[[4400   30]
 [  51   15]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4430
           1       0.33      0.23      0.27        66

    accuracy                           0.98      4496
   macro avg       0.66      0.61      0.63      4496
weighted avg       0.98      0.98      0.98      4496



In [4]:
trainSRGPT.upload_local_model_to_hub(2, "POS")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [5]:
trainSRGPT.test_model(2, "POS")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

[[4400   30]
 [  51   15]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4430
           1       0.33      0.23      0.27        66

    accuracy                           0.98      4496
   macro avg       0.66      0.61      0.63      4496
weighted avg       0.98      0.98      0.98      4496



In [6]:
trainSRGPT.train_model(2, "NEG", eval="f1", epochs =32)

6896721408


Map:   0%|          | 0/2698 [00:00<?, ? examples/s]

Map:   0%|          | 0/10788 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTNEG2 into local empty directory.


Download file pytorch_model.bin:   0%|          | 15.4k/2.88G [00:00<?, ?B/s]

Download file training_args.bin: 100%|##########| 4.30k/4.30k [00:00<?, ?B/s]

Clean file training_args.bin:  23%|##3       | 1.00k/4.30k [00:00<?, ?B/s]

  0%|          | 0/86304 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'loss': 0.0638, 'learning_rate': 1.988413051538747e-05, 'epoch': 0.19}
{'loss': 0.0715, 'learning_rate': 1.9768261030774935e-05, 'epoch': 0.37}
{'loss': 0.0648, 'learning_rate': 1.9652391546162403e-05, 'epoch': 0.56}
{'loss': 0.0802, 'learning_rate': 1.953652206154987e-05, 'epoch': 0.74}
{'loss': 0.0592, 'learning_rate': 1.942065257693734e-05, 'epoch': 0.93}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.15903839468955994, 'eval_f1': 0.22641509433962262, 'eval_runtime': 152.6273, 'eval_samples_per_second': 17.677, 'eval_steps_per_second': 17.677, 'epoch': 1.0}




{'loss': 0.051, 'learning_rate': 1.9304783092324804e-05, 'epoch': 1.11}
{'loss': 0.0404, 'learning_rate': 1.9188913607712273e-05, 'epoch': 1.3}
{'loss': 0.0413, 'learning_rate': 1.907304412309974e-05, 'epoch': 1.48}
{'loss': 0.0624, 'learning_rate': 1.895717463848721e-05, 'epoch': 1.67}
{'loss': 0.0058, 'learning_rate': 1.8841305153874677e-05, 'epoch': 1.85}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.1645973026752472, 'eval_f1': 0.28571428571428575, 'eval_runtime': 163.3309, 'eval_samples_per_second': 16.519, 'eval_steps_per_second': 16.519, 'epoch': 2.0}




{'loss': 0.0486, 'learning_rate': 1.8725435669262146e-05, 'epoch': 2.04}
{'loss': 0.032, 'learning_rate': 1.8609566184649614e-05, 'epoch': 2.22}
{'loss': 0.0497, 'learning_rate': 1.849369670003708e-05, 'epoch': 2.41}
{'loss': 0.0295, 'learning_rate': 1.8377827215424547e-05, 'epoch': 2.6}
{'loss': 0.0404, 'learning_rate': 1.8261957730812015e-05, 'epoch': 2.78}
{'loss': 0.0169, 'learning_rate': 1.8146088246199484e-05, 'epoch': 2.97}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.1921713799238205, 'eval_f1': 0.4324324324324324, 'eval_runtime': 153.8194, 'eval_samples_per_second': 17.54, 'eval_steps_per_second': 17.54, 'epoch': 3.0}




{'loss': 0.0106, 'learning_rate': 1.8030218761586952e-05, 'epoch': 3.15}
{'loss': 0.0258, 'learning_rate': 1.7914349276974417e-05, 'epoch': 3.34}
{'loss': 0.0186, 'learning_rate': 1.7798479792361885e-05, 'epoch': 3.52}
{'loss': 0.0257, 'learning_rate': 1.7682610307749353e-05, 'epoch': 3.71}
{'loss': 0.0223, 'learning_rate': 1.756674082313682e-05, 'epoch': 3.89}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.1810169517993927, 'eval_f1': 0.4788732394366197, 'eval_runtime': 152.3424, 'eval_samples_per_second': 17.71, 'eval_steps_per_second': 17.71, 'epoch': 4.0}




{'loss': 0.0126, 'learning_rate': 1.7450871338524286e-05, 'epoch': 4.08}
{'loss': 0.0095, 'learning_rate': 1.7335001853911755e-05, 'epoch': 4.26}
{'loss': 0.007, 'learning_rate': 1.7219132369299223e-05, 'epoch': 4.45}
{'loss': 0.0268, 'learning_rate': 1.710326288468669e-05, 'epoch': 4.63}
{'loss': 0.0123, 'learning_rate': 1.6987393400074156e-05, 'epoch': 4.82}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.1618320196866989, 'eval_f1': 0.380952380952381, 'eval_runtime': 144.7833, 'eval_samples_per_second': 18.635, 'eval_steps_per_second': 18.635, 'epoch': 5.0}




{'loss': 0.0185, 'learning_rate': 1.6871523915461624e-05, 'epoch': 5.01}
{'loss': 0.013, 'learning_rate': 1.6755654430849093e-05, 'epoch': 5.19}
{'loss': 0.0202, 'learning_rate': 1.663978494623656e-05, 'epoch': 5.38}
{'loss': 0.0197, 'learning_rate': 1.6523915461624026e-05, 'epoch': 5.56}
{'loss': 0.0014, 'learning_rate': 1.6408045977011494e-05, 'epoch': 5.75}
{'loss': 0.0185, 'learning_rate': 1.6292176492398962e-05, 'epoch': 5.93}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.2078782469034195, 'eval_f1': 0.4931506849315068, 'eval_runtime': 57.3235, 'eval_samples_per_second': 47.066, 'eval_steps_per_second': 47.066, 'epoch': 6.0}




{'loss': 0.0036, 'learning_rate': 1.617630700778643e-05, 'epoch': 6.12}
{'loss': 0.0098, 'learning_rate': 1.60604375231739e-05, 'epoch': 6.3}
{'loss': 0.0148, 'learning_rate': 1.5944568038561367e-05, 'epoch': 6.49}
{'loss': 0.0148, 'learning_rate': 1.5828698553948835e-05, 'epoch': 6.67}
{'loss': 0.0101, 'learning_rate': 1.57128290693363e-05, 'epoch': 6.86}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.20985624194145203, 'eval_f1': 0.3582089552238805, 'eval_runtime': 57.9598, 'eval_samples_per_second': 46.549, 'eval_steps_per_second': 46.549, 'epoch': 7.0}




{'loss': 0.0122, 'learning_rate': 1.559695958472377e-05, 'epoch': 7.04}
{'loss': 0.0091, 'learning_rate': 1.5481090100111237e-05, 'epoch': 7.23}
{'loss': 0.0047, 'learning_rate': 1.5365220615498705e-05, 'epoch': 7.42}
{'loss': 0.0054, 'learning_rate': 1.5249351130886171e-05, 'epoch': 7.6}
{'loss': 0.0007, 'learning_rate': 1.5133481646273638e-05, 'epoch': 7.79}
{'loss': 0.0157, 'learning_rate': 1.5017612161661106e-05, 'epoch': 7.97}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.20124344527721405, 'eval_f1': 0.3478260869565218, 'eval_runtime': 58.0391, 'eval_samples_per_second': 46.486, 'eval_steps_per_second': 46.486, 'epoch': 8.0}




{'loss': 0.0076, 'learning_rate': 1.4901742677048574e-05, 'epoch': 8.16}
{'loss': 0.0004, 'learning_rate': 1.4785873192436043e-05, 'epoch': 8.34}
{'loss': 0.0047, 'learning_rate': 1.4670003707823508e-05, 'epoch': 8.53}
{'loss': 0.0205, 'learning_rate': 1.4554134223210976e-05, 'epoch': 8.71}
{'loss': 0.0065, 'learning_rate': 1.4438264738598444e-05, 'epoch': 8.9}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.21818579733371735, 'eval_f1': 0.4, 'eval_runtime': 57.4969, 'eval_samples_per_second': 46.924, 'eval_steps_per_second': 46.924, 'epoch': 9.0}




{'loss': 0.0, 'learning_rate': 1.4322395253985912e-05, 'epoch': 9.08}
{'loss': 0.0038, 'learning_rate': 1.4206525769373379e-05, 'epoch': 9.27}
{'loss': 0.0165, 'learning_rate': 1.4090656284760846e-05, 'epoch': 9.45}
{'loss': 0.0075, 'learning_rate': 1.3974786800148314e-05, 'epoch': 9.64}
{'loss': 0.0059, 'learning_rate': 1.3858917315535782e-05, 'epoch': 9.83}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.20864936709403992, 'eval_f1': 0.380952380952381, 'eval_runtime': 57.6415, 'eval_samples_per_second': 46.807, 'eval_steps_per_second': 46.807, 'epoch': 10.0}




{'loss': 0.001, 'learning_rate': 1.3743047830923249e-05, 'epoch': 10.01}
{'loss': 0.001, 'learning_rate': 1.3627178346310717e-05, 'epoch': 10.2}
{'loss': 0.0044, 'learning_rate': 1.3511308861698185e-05, 'epoch': 10.38}
{'loss': 0.0179, 'learning_rate': 1.3395439377085653e-05, 'epoch': 10.57}
{'loss': 0.004, 'learning_rate': 1.3279569892473118e-05, 'epoch': 10.75}
{'loss': 0.0085, 'learning_rate': 1.3163700407860586e-05, 'epoch': 10.94}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.24980825185775757, 'eval_f1': 0.3880597014925373, 'eval_runtime': 57.5545, 'eval_samples_per_second': 46.877, 'eval_steps_per_second': 46.877, 'epoch': 11.0}




{'loss': 0.0, 'learning_rate': 1.3047830923248055e-05, 'epoch': 11.12}
{'loss': 0.0076, 'learning_rate': 1.2931961438635523e-05, 'epoch': 11.31}
{'loss': 0.0, 'learning_rate': 1.281609195402299e-05, 'epoch': 11.49}
{'loss': 0.0014, 'learning_rate': 1.2700222469410458e-05, 'epoch': 11.68}
{'loss': 0.0086, 'learning_rate': 1.2584352984797924e-05, 'epoch': 11.87}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.2321060597896576, 'eval_f1': 0.34375, 'eval_runtime': 59.2032, 'eval_samples_per_second': 45.572, 'eval_steps_per_second': 45.572, 'epoch': 12.0}




{'loss': 0.01, 'learning_rate': 1.2468483500185393e-05, 'epoch': 12.05}
{'loss': 0.007, 'learning_rate': 1.235261401557286e-05, 'epoch': 12.24}
{'loss': 0.0028, 'learning_rate': 1.2236744530960327e-05, 'epoch': 12.42}
{'loss': 0.0094, 'learning_rate': 1.2120875046347796e-05, 'epoch': 12.61}
{'loss': 0.0031, 'learning_rate': 1.2005005561735264e-05, 'epoch': 12.79}
{'loss': 0.0003, 'learning_rate': 1.1889136077122729e-05, 'epoch': 12.98}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.2663653790950775, 'eval_f1': 0.38235294117647056, 'eval_runtime': 58.6707, 'eval_samples_per_second': 45.985, 'eval_steps_per_second': 45.985, 'epoch': 13.0}




{'loss': 0.0098, 'learning_rate': 1.1773266592510197e-05, 'epoch': 13.16}
{'loss': 0.0, 'learning_rate': 1.1657397107897665e-05, 'epoch': 13.35}
{'loss': 0.0078, 'learning_rate': 1.1541527623285134e-05, 'epoch': 13.53}
{'loss': 0.0087, 'learning_rate': 1.14256581386726e-05, 'epoch': 13.72}
{'loss': 0.0, 'learning_rate': 1.1309788654060068e-05, 'epoch': 13.9}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.2587439715862274, 'eval_f1': 0.4057971014492754, 'eval_runtime': 59.2098, 'eval_samples_per_second': 45.567, 'eval_steps_per_second': 45.567, 'epoch': 14.0}




{'loss': 0.0, 'learning_rate': 1.1193919169447535e-05, 'epoch': 14.09}
{'loss': 0.0154, 'learning_rate': 1.1078049684835003e-05, 'epoch': 14.28}
{'loss': 0.0001, 'learning_rate': 1.096218020022247e-05, 'epoch': 14.46}
{'loss': 0.0011, 'learning_rate': 1.0846310715609938e-05, 'epoch': 14.65}
{'loss': 0.0, 'learning_rate': 1.0730441230997406e-05, 'epoch': 14.83}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.271511435508728, 'eval_f1': 0.34375, 'eval_runtime': 57.6542, 'eval_samples_per_second': 46.796, 'eval_steps_per_second': 46.796, 'epoch': 15.0}




{'loss': 0.0086, 'learning_rate': 1.0614571746384871e-05, 'epoch': 15.02}
{'loss': 0.0017, 'learning_rate': 1.049870226177234e-05, 'epoch': 15.2}
{'loss': 0.0, 'learning_rate': 1.0382832777159808e-05, 'epoch': 15.39}
{'loss': 0.0017, 'learning_rate': 1.0266963292547276e-05, 'epoch': 15.57}
{'loss': 0.0054, 'learning_rate': 1.0151093807934743e-05, 'epoch': 15.76}
{'loss': 0.0043, 'learning_rate': 1.003522432332221e-05, 'epoch': 15.94}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.25165942311286926, 'eval_f1': 0.3692307692307692, 'eval_runtime': 158.1595, 'eval_samples_per_second': 17.059, 'eval_steps_per_second': 17.059, 'epoch': 16.0}




{'loss': 0.0062, 'learning_rate': 9.919354838709679e-06, 'epoch': 16.13}
{'loss': 0.0047, 'learning_rate': 9.803485354097146e-06, 'epoch': 16.31}
{'loss': 0.0072, 'learning_rate': 9.687615869484614e-06, 'epoch': 16.5}
{'loss': 0.0054, 'learning_rate': 9.57174638487208e-06, 'epoch': 16.69}
{'loss': 0.0085, 'learning_rate': 9.455876900259549e-06, 'epoch': 16.87}


  0%|          | 0/2698 [00:00<?, ?it/s]

{'eval_loss': 0.26240435242652893, 'eval_f1': 0.3888888888888889, 'eval_runtime': 144.5833, 'eval_samples_per_second': 18.661, 'eval_steps_per_second': 18.661, 'epoch': 17.0}


PermissionError: [Errno 13] Permission denied: 'SRGPTSENTNEG2\\pytorch_model.bin'

In [7]:
trainSRGPT.test_model_local(2, "NEG")

[[4404   16]
 [  59   17]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4420
           1       0.52      0.22      0.31        76

    accuracy                           0.98      4496
   macro avg       0.75      0.61      0.65      4496
weighted avg       0.98      0.98      0.98      4496



In [8]:
trainSRGPT.upload_local_model_to_hub(2, "NEG")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [9]:
trainSRGPT.test_model(2, "NEG")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

[[4404   16]
 [  59   17]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4420
           1       0.52      0.22      0.31        76

    accuracy                           0.98      4496
   macro avg       0.75      0.61      0.65      4496
weighted avg       0.98      0.98      0.98      4496



## Iteration 4 - Training and Testing
In this section, we use the data from the 4th iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models.


In [2]:
trainSRGPT.delete_model(4, "POS")

In [3]:
trainSRGPT.train_model(4, "POS", eval="f1", epochs =32)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at jerteh/gpt2-orao and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


0


Map:   0%|          | 0/2703 [00:00<?, ? examples/s]

Map:   0%|          | 0/10812 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTPOS4 into local empty directory.


  0%|          | 0/86496 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


{'loss': 0.1803, 'learning_rate': 1.9884387717351092e-05, 'epoch': 0.18}
{'loss': 0.1677, 'learning_rate': 1.9768775434702185e-05, 'epoch': 0.37}
{'loss': 0.1261, 'learning_rate': 1.9653163152053275e-05, 'epoch': 0.55}
{'loss': 0.1307, 'learning_rate': 1.9537550869404365e-05, 'epoch': 0.74}
{'loss': 0.114, 'learning_rate': 1.942193858675546e-05, 'epoch': 0.92}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.09674040973186493, 'eval_f1': 0.0, 'eval_runtime': 60.1677, 'eval_samples_per_second': 44.924, 'eval_steps_per_second': 44.924, 'epoch': 1.0}




{'loss': 0.1203, 'learning_rate': 1.9306326304106552e-05, 'epoch': 1.11}
{'loss': 0.1112, 'learning_rate': 1.9190714021457642e-05, 'epoch': 1.29}
{'loss': 0.097, 'learning_rate': 1.9075101738808733e-05, 'epoch': 1.48}
{'loss': 0.1392, 'learning_rate': 1.8959489456159823e-05, 'epoch': 1.66}
{'loss': 0.0938, 'learning_rate': 1.8843877173510916e-05, 'epoch': 1.85}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.13638992607593536, 'eval_f1': 0.0, 'eval_runtime': 62.6641, 'eval_samples_per_second': 43.135, 'eval_steps_per_second': 43.135, 'epoch': 2.0}




{'loss': 0.1478, 'learning_rate': 1.8728264890862006e-05, 'epoch': 2.03}
{'loss': 0.0947, 'learning_rate': 1.8612652608213096e-05, 'epoch': 2.22}
{'loss': 0.0751, 'learning_rate': 1.849704032556419e-05, 'epoch': 2.4}
{'loss': 0.0702, 'learning_rate': 1.838142804291528e-05, 'epoch': 2.59}
{'loss': 0.073, 'learning_rate': 1.8265815760266373e-05, 'epoch': 2.77}
{'loss': 0.1035, 'learning_rate': 1.8150203477617463e-05, 'epoch': 2.96}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.14871349930763245, 'eval_f1': 0.11111111111111109, 'eval_runtime': 60.9181, 'eval_samples_per_second': 44.371, 'eval_steps_per_second': 44.371, 'epoch': 3.0}




{'loss': 0.0453, 'learning_rate': 1.8034591194968557e-05, 'epoch': 3.14}
{'loss': 0.0626, 'learning_rate': 1.7918978912319647e-05, 'epoch': 3.33}
{'loss': 0.047, 'learning_rate': 1.7803366629670737e-05, 'epoch': 3.51}
{'loss': 0.0171, 'learning_rate': 1.7687754347021827e-05, 'epoch': 3.7}
{'loss': 0.0906, 'learning_rate': 1.757214206437292e-05, 'epoch': 3.88}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.14294204115867615, 'eval_f1': 0.2558139534883721, 'eval_runtime': 61.1255, 'eval_samples_per_second': 44.221, 'eval_steps_per_second': 44.221, 'epoch': 4.0}




{'loss': 0.0451, 'learning_rate': 1.745652978172401e-05, 'epoch': 4.07}
{'loss': 0.0349, 'learning_rate': 1.7340917499075104e-05, 'epoch': 4.25}
{'loss': 0.0319, 'learning_rate': 1.7225305216426194e-05, 'epoch': 4.44}
{'loss': 0.0329, 'learning_rate': 1.7109692933777288e-05, 'epoch': 4.62}
{'loss': 0.0597, 'learning_rate': 1.6994080651128378e-05, 'epoch': 4.81}
{'loss': 0.0575, 'learning_rate': 1.6878468368479468e-05, 'epoch': 4.99}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.21979837119579315, 'eval_f1': 0.2388059701492537, 'eval_runtime': 61.602, 'eval_samples_per_second': 43.878, 'eval_steps_per_second': 43.878, 'epoch': 5.0}




{'loss': 0.0285, 'learning_rate': 1.6762856085830558e-05, 'epoch': 5.18}
{'loss': 0.0248, 'learning_rate': 1.664724380318165e-05, 'epoch': 5.36}
{'loss': 0.0315, 'learning_rate': 1.653163152053274e-05, 'epoch': 5.55}
{'loss': 0.0357, 'learning_rate': 1.6416019237883835e-05, 'epoch': 5.73}
{'loss': 0.0369, 'learning_rate': 1.6300406955234925e-05, 'epoch': 5.92}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.16170959174633026, 'eval_f1': 0.28169014084507044, 'eval_runtime': 59.8973, 'eval_samples_per_second': 45.127, 'eval_steps_per_second': 45.127, 'epoch': 6.0}




{'loss': 0.0382, 'learning_rate': 1.618479467258602e-05, 'epoch': 6.1}
{'loss': 0.0393, 'learning_rate': 1.606918238993711e-05, 'epoch': 6.29}
{'loss': 0.0412, 'learning_rate': 1.59535701072882e-05, 'epoch': 6.47}
{'loss': 0.0152, 'learning_rate': 1.5837957824639292e-05, 'epoch': 6.66}
{'loss': 0.034, 'learning_rate': 1.5722345541990382e-05, 'epoch': 6.84}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.18340787291526794, 'eval_f1': 0.19354838709677416, 'eval_runtime': 61.6805, 'eval_samples_per_second': 43.823, 'eval_steps_per_second': 43.823, 'epoch': 7.0}




{'loss': 0.0392, 'learning_rate': 1.5606733259341472e-05, 'epoch': 7.03}
{'loss': 0.0187, 'learning_rate': 1.5491120976692563e-05, 'epoch': 7.21}
{'loss': 0.0066, 'learning_rate': 1.5375508694043656e-05, 'epoch': 7.4}
{'loss': 0.0208, 'learning_rate': 1.525989641139475e-05, 'epoch': 7.58}
{'loss': 0.0323, 'learning_rate': 1.5144284128745838e-05, 'epoch': 7.77}
{'loss': 0.0126, 'learning_rate': 1.5028671846096931e-05, 'epoch': 7.95}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.17378449440002441, 'eval_f1': 0.27692307692307694, 'eval_runtime': 58.714, 'eval_samples_per_second': 46.037, 'eval_steps_per_second': 46.037, 'epoch': 8.0}
{'train_runtime': 17100.7998, 'train_samples_per_second': 20.232, 'train_steps_per_second': 5.058, 'train_loss': 0.0656135044978188, 'epoch': 8.0}


Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 1.00/2.88G [00:00<?, ?B/s]

To https://huggingface.co/Tanor/SRGPTSENTPOS4
   d54d456..2dcde86  main -> main

To https://huggingface.co/Tanor/SRGPTSENTPOS4
   2dcde86..d99b2e5  main -> main



Max memory allocated by tensors:
    6.42 GB


In [6]:
trainSRGPT.test_model_local(4, "POS")

[[4409   25]
 [  64    8]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4434
           1       0.24      0.11      0.15        72

    accuracy                           0.98      4506
   macro avg       0.61      0.55      0.57      4506
weighted avg       0.97      0.98      0.98      4506



In [None]:
trainSRGPT.upload_local_model_to_hub(4, "POS")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [5]:
trainSRGPT.test_model(4, "POS")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

[[4409   25]
 [  64    8]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4434
           1       0.24      0.11      0.15        72

    accuracy                           0.98      4506
   macro avg       0.61      0.55      0.57      4506
weighted avg       0.97      0.98      0.98      4506



In [None]:
trainSRGPT.train_model(4, "NEG", eval="f1", epochs =32)

6896721408


Map:   0%|          | 0/2703 [00:00<?, ? examples/s]

Map:   0%|          | 0/10812 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTNEG4 into local empty directory.


Download file pytorch_model.bin:   0%|          | 15.4k/2.88G [00:00<?, ?B/s]

Download file training_args.bin: 100%|##########| 4.30k/4.30k [00:00<?, ?B/s]

Clean file training_args.bin:  23%|##3       | 1.00k/4.30k [00:00<?, ?B/s]

  0%|          | 0/86496 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'loss': 0.1333, 'learning_rate': 1.9884387717351092e-05, 'epoch': 0.18}
{'loss': 0.1358, 'learning_rate': 1.9768775434702185e-05, 'epoch': 0.37}
{'loss': 0.1011, 'learning_rate': 1.9653163152053275e-05, 'epoch': 0.55}
{'loss': 0.0728, 'learning_rate': 1.9537550869404365e-05, 'epoch': 0.74}
{'loss': 0.0804, 'learning_rate': 1.942193858675546e-05, 'epoch': 0.92}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.21285207569599152, 'eval_f1': 0.3818181818181818, 'eval_runtime': 162.9057, 'eval_samples_per_second': 16.592, 'eval_steps_per_second': 16.592, 'epoch': 1.0}




{'loss': 0.0371, 'learning_rate': 1.9306326304106552e-05, 'epoch': 1.11}
{'loss': 0.0563, 'learning_rate': 1.9190714021457642e-05, 'epoch': 1.29}
{'loss': 0.0688, 'learning_rate': 1.9075101738808733e-05, 'epoch': 1.48}
{'loss': 0.0622, 'learning_rate': 1.8959489456159823e-05, 'epoch': 1.66}
{'loss': 0.0864, 'learning_rate': 1.8843877173510916e-05, 'epoch': 1.85}


  0%|          | 0/2703 [00:00<?, ?it/s]

{'eval_loss': 0.18004903197288513, 'eval_f1': 0.38888888888888884, 'eval_runtime': 155.9538, 'eval_samples_per_second': 17.332, 'eval_steps_per_second': 17.332, 'epoch': 2.0}




{'loss': 0.0878, 'learning_rate': 1.8728264890862006e-05, 'epoch': 2.03}
{'loss': 0.032, 'learning_rate': 1.8612652608213096e-05, 'epoch': 2.22}
{'loss': 0.0412, 'learning_rate': 1.849704032556419e-05, 'epoch': 2.4}
{'loss': 0.0371, 'learning_rate': 1.838142804291528e-05, 'epoch': 2.59}


In [None]:
trainSRGPT.test_model_local(4, "NEG")

[[4426    0]
 [  80    0]]
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      4426
           1       0.00      0.00      0.00        80

    accuracy                           0.98      4506
   macro avg       0.49      0.50      0.50      4506
weighted avg       0.96      0.98      0.97      4506



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
trainSRGPT.upload_local_model_to_hub(4, "NEG")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [3]:
trainSRGPT.test_model(4, "NEG")

    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.2.0.dev20230928)
    Python  3.11.5 (you have 3.11.4)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


[[4357   69]
 [  42   38]]
              precision    recall  f1-score   support

           0       0.99      0.98      0.99      4426
           1       0.36      0.47      0.41        80

    accuracy                           0.98      4506
   macro avg       0.67      0.73      0.70      4506
weighted avg       0.98      0.98      0.98      4506



## Iteration 6 - Training and Testing
In this section, we use the data from the 6th iteration of the semi-automatic iterative algorithm for both Positive and Negative sentiment classification to train and test our BERT models.


In [2]:
trainSRGPT.train_model(6, "POS", eval="f1", epochs =32)

Downloading (…)okenizer_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading (…)olve/main/vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

0


Map:   0%|          | 0/2707 [00:00<?, ? examples/s]

Map:   0%|          | 0/10824 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTPOS6 into local empty directory.


Download file pytorch_model.bin:   0%|          | 8.00k/2.88G [00:00<?, ?B/s]

Download file training_args.bin: 100%|##########| 4.30k/4.30k [00:00<?, ?B/s]

Clean file training_args.bin:  23%|##3       | 1.00k/4.30k [00:00<?, ?B/s]

  0%|          | 0/86592 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


{'loss': 0.002, 'learning_rate': 1.9884515890613453e-05, 'epoch': 0.18}
{'loss': 0.0061, 'learning_rate': 1.9769031781226905e-05, 'epoch': 0.37}
{'loss': 0.0032, 'learning_rate': 1.9653547671840356e-05, 'epoch': 0.55}
{'loss': 0.015, 'learning_rate': 1.9538063562453808e-05, 'epoch': 0.74}
{'loss': 0.0145, 'learning_rate': 1.942257945306726e-05, 'epoch': 0.92}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.48092132806777954, 'eval_f1': 0.2526315789473684, 'eval_runtime': 59.2796, 'eval_samples_per_second': 45.665, 'eval_steps_per_second': 45.665, 'epoch': 1.0}




{'loss': 0.0112, 'learning_rate': 1.930709534368071e-05, 'epoch': 1.11}
{'loss': 0.0119, 'learning_rate': 1.9191611234294163e-05, 'epoch': 1.29}
{'loss': 0.0098, 'learning_rate': 1.9076127124907614e-05, 'epoch': 1.48}
{'loss': 0.0141, 'learning_rate': 1.8960643015521066e-05, 'epoch': 1.66}
{'loss': 0.0052, 'learning_rate': 1.8845158906134518e-05, 'epoch': 1.85}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.35108986496925354, 'eval_f1': 0.15625, 'eval_runtime': 61.1611, 'eval_samples_per_second': 44.26, 'eval_steps_per_second': 44.26, 'epoch': 2.0}




{'loss': 0.0046, 'learning_rate': 1.872967479674797e-05, 'epoch': 2.03}
{'loss': 0.0026, 'learning_rate': 1.861419068736142e-05, 'epoch': 2.22}
{'loss': 0.0057, 'learning_rate': 1.8498706577974872e-05, 'epoch': 2.4}
{'loss': 0.0041, 'learning_rate': 1.8383222468588324e-05, 'epoch': 2.59}
{'loss': 0.0085, 'learning_rate': 1.8267738359201775e-05, 'epoch': 2.77}
{'loss': 0.0136, 'learning_rate': 1.8152254249815227e-05, 'epoch': 2.96}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.36203038692474365, 'eval_f1': 0.2222222222222222, 'eval_runtime': 146.7049, 'eval_samples_per_second': 18.452, 'eval_steps_per_second': 18.452, 'epoch': 3.0}


Several commits (2) will be pushed upstream.


{'loss': 0.0, 'learning_rate': 1.803677014042868e-05, 'epoch': 3.14}
{'loss': 0.0075, 'learning_rate': 1.792128603104213e-05, 'epoch': 3.33}
{'loss': 0.0034, 'learning_rate': 1.7805801921655582e-05, 'epoch': 3.51}
{'loss': 0.0196, 'learning_rate': 1.7690317812269033e-05, 'epoch': 3.7}
{'loss': 0.0028, 'learning_rate': 1.7574833702882485e-05, 'epoch': 3.88}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.3375719487667084, 'eval_f1': 0.15384615384615383, 'eval_runtime': 151.6393, 'eval_samples_per_second': 17.852, 'eval_steps_per_second': 17.852, 'epoch': 4.0}
{'train_runtime': 11492.4959, 'train_samples_per_second': 30.139, 'train_steps_per_second': 7.535, 'train_loss': 0.007788614468361304, 'epoch': 4.0}


Several commits (3) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 1.00/2.88G [00:00<?, ?B/s]

EOF
EOF
error: failed to push some refs to 'https://huggingface.co/Tanor/SRGPTSENTPOS6'



Push attempt 1 failed with error: EOF
EOF
error: failed to push some refs to 'https://huggingface.co/Tanor/SRGPTSENTPOS6'



Several commits (4) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 1.00/2.88G [00:00<?, ?B/s]

To https://huggingface.co/Tanor/SRGPTSENTPOS6
   fa1e076..a7f3b79  main -> main



Max memory allocated by tensors:
    6.42 GB


In [3]:
trainSRGPT.test_model_local(6, "POS")

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[[4381   57]
 [  55   18]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4438
           1       0.24      0.25      0.24        73

    accuracy                           0.98      4511
   macro avg       0.61      0.62      0.62      4511
weighted avg       0.98      0.98      0.98      4511



In [4]:
trainSRGPT.upload_local_model_to_hub(6, "POS")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [5]:
trainSRGPT.test_model(6, "POS")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

[[4381   57]
 [  55   18]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4438
           1       0.24      0.25      0.24        73

    accuracy                           0.98      4511
   macro avg       0.61      0.62      0.62      4511
weighted avg       0.98      0.98      0.98      4511



In [6]:
trainSRGPT.train_model(6, "NEG", eval="f1", epochs =32)

Downloading (…)okenizer_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/832k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/498k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.17M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

6896721408


Map:   0%|          | 0/2707 [00:00<?, ? examples/s]

Map:   0%|          | 0/10824 [00:00<?, ? examples/s]

Cloning https://huggingface.co/Tanor/SRGPTSENTNEG6 into local empty directory.


Download file pytorch_model.bin:   0%|          | 15.4k/2.88G [00:00<?, ?B/s]

Clean file pytorch_model.bin:   0%|          | 1.00k/2.88G [00:00<?, ?B/s]

  0%|          | 0/86592 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'loss': 0.2043, 'learning_rate': 1.9884515890613453e-05, 'epoch': 0.18}
{'loss': 0.1571, 'learning_rate': 1.9769031781226905e-05, 'epoch': 0.37}
{'loss': 0.1371, 'learning_rate': 1.9653547671840356e-05, 'epoch': 0.55}
{'loss': 0.1562, 'learning_rate': 1.9538063562453808e-05, 'epoch': 0.74}
{'loss': 0.1385, 'learning_rate': 1.942257945306726e-05, 'epoch': 0.92}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.1402807980775833, 'eval_f1': 0.0, 'eval_runtime': 59.4355, 'eval_samples_per_second': 45.545, 'eval_steps_per_second': 45.545, 'epoch': 1.0}




{'loss': 0.1317, 'learning_rate': 1.930709534368071e-05, 'epoch': 1.11}
{'loss': 0.1625, 'learning_rate': 1.9191611234294163e-05, 'epoch': 1.29}
{'loss': 0.1235, 'learning_rate': 1.9076127124907614e-05, 'epoch': 1.48}
{'loss': 0.1538, 'learning_rate': 1.8960643015521066e-05, 'epoch': 1.66}
{'loss': 0.1099, 'learning_rate': 1.8845158906134518e-05, 'epoch': 1.85}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.10935695469379425, 'eval_f1': 0.0, 'eval_runtime': 58.3621, 'eval_samples_per_second': 46.383, 'eval_steps_per_second': 46.383, 'epoch': 2.0}




{'loss': 0.1226, 'learning_rate': 1.872967479674797e-05, 'epoch': 2.03}
{'loss': 0.0823, 'learning_rate': 1.861419068736142e-05, 'epoch': 2.22}
{'loss': 0.1056, 'learning_rate': 1.8498706577974872e-05, 'epoch': 2.4}
{'loss': 0.0934, 'learning_rate': 1.8383222468588324e-05, 'epoch': 2.59}
{'loss': 0.0758, 'learning_rate': 1.8267738359201775e-05, 'epoch': 2.77}
{'loss': 0.0738, 'learning_rate': 1.8152254249815227e-05, 'epoch': 2.96}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.2505303621292114, 'eval_f1': 0.24242424242424243, 'eval_runtime': 57.8953, 'eval_samples_per_second': 46.757, 'eval_steps_per_second': 46.757, 'epoch': 3.0}


Several commits (2) will be pushed upstream.


{'loss': 0.0543, 'learning_rate': 1.803677014042868e-05, 'epoch': 3.14}
{'loss': 0.0446, 'learning_rate': 1.792128603104213e-05, 'epoch': 3.33}
{'loss': 0.0523, 'learning_rate': 1.7805801921655582e-05, 'epoch': 3.51}
{'loss': 0.0382, 'learning_rate': 1.7690317812269033e-05, 'epoch': 3.7}
{'loss': 0.0431, 'learning_rate': 1.7574833702882485e-05, 'epoch': 3.88}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.1801680475473404, 'eval_f1': 0.2647058823529412, 'eval_runtime': 57.8293, 'eval_samples_per_second': 46.81, 'eval_steps_per_second': 46.81, 'epoch': 4.0}


Several commits (3) will be pushed upstream.


{'loss': 0.0338, 'learning_rate': 1.7459349593495937e-05, 'epoch': 4.07}
{'loss': 0.0437, 'learning_rate': 1.7343865484109388e-05, 'epoch': 4.25}
{'loss': 0.0176, 'learning_rate': 1.722838137472284e-05, 'epoch': 4.43}
{'loss': 0.0302, 'learning_rate': 1.711289726533629e-05, 'epoch': 4.62}
{'loss': 0.0233, 'learning_rate': 1.6997413155949743e-05, 'epoch': 4.8}
{'loss': 0.0453, 'learning_rate': 1.6881929046563195e-05, 'epoch': 4.99}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.2628609240055084, 'eval_f1': 0.22857142857142856, 'eval_runtime': 58.0727, 'eval_samples_per_second': 46.614, 'eval_steps_per_second': 46.614, 'epoch': 5.0}


Several commits (4) will be pushed upstream.


{'loss': 0.0117, 'learning_rate': 1.6766444937176646e-05, 'epoch': 5.17}
{'loss': 0.0083, 'learning_rate': 1.6650960827790098e-05, 'epoch': 5.36}
{'loss': 0.046, 'learning_rate': 1.653547671840355e-05, 'epoch': 5.54}
{'loss': 0.0087, 'learning_rate': 1.6419992609017e-05, 'epoch': 5.73}
{'loss': 0.0278, 'learning_rate': 1.6304508499630452e-05, 'epoch': 5.91}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.2564699649810791, 'eval_f1': 0.3488372093023256, 'eval_runtime': 60.3666, 'eval_samples_per_second': 44.843, 'eval_steps_per_second': 44.843, 'epoch': 6.0}


Several commits (5) will be pushed upstream.


{'loss': 0.0168, 'learning_rate': 1.6189024390243904e-05, 'epoch': 6.1}
{'loss': 0.0236, 'learning_rate': 1.6073540280857356e-05, 'epoch': 6.28}
{'loss': 0.0124, 'learning_rate': 1.5958056171470807e-05, 'epoch': 6.47}
{'loss': 0.0334, 'learning_rate': 1.584257206208426e-05, 'epoch': 6.65}
{'loss': 0.032, 'learning_rate': 1.572708795269771e-05, 'epoch': 6.84}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.2739080488681793, 'eval_f1': 0.2962962962962963, 'eval_runtime': 59.821, 'eval_samples_per_second': 45.252, 'eval_steps_per_second': 45.252, 'epoch': 7.0}


Several commits (6) will be pushed upstream.


{'loss': 0.0043, 'learning_rate': 1.5611603843311162e-05, 'epoch': 7.02}
{'loss': 0.0172, 'learning_rate': 1.5496119733924614e-05, 'epoch': 7.21}
{'loss': 0.0201, 'learning_rate': 1.5380635624538065e-05, 'epoch': 7.39}
{'loss': 0.0122, 'learning_rate': 1.5265151515151517e-05, 'epoch': 7.58}
{'loss': 0.0122, 'learning_rate': 1.5149667405764967e-05, 'epoch': 7.76}
{'loss': 0.0402, 'learning_rate': 1.503418329637842e-05, 'epoch': 7.95}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.21845291554927826, 'eval_f1': 0.2857142857142857, 'eval_runtime': 57.5626, 'eval_samples_per_second': 47.027, 'eval_steps_per_second': 47.027, 'epoch': 8.0}


Several commits (7) will be pushed upstream.


{'loss': 0.0119, 'learning_rate': 1.4918699186991872e-05, 'epoch': 8.13}
{'loss': 0.0074, 'learning_rate': 1.4803215077605321e-05, 'epoch': 8.31}
{'loss': 0.0086, 'learning_rate': 1.4687730968218775e-05, 'epoch': 8.5}
{'loss': 0.0221, 'learning_rate': 1.4572246858832226e-05, 'epoch': 8.68}
{'loss': 0.0251, 'learning_rate': 1.4456762749445676e-05, 'epoch': 8.87}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.23453719913959503, 'eval_f1': 0.273972602739726, 'eval_runtime': 58.7203, 'eval_samples_per_second': 46.1, 'eval_steps_per_second': 46.1, 'epoch': 9.0}


Several commits (8) will be pushed upstream.


{'loss': 0.0108, 'learning_rate': 1.4341278640059128e-05, 'epoch': 9.05}
{'loss': 0.0098, 'learning_rate': 1.4225794530672581e-05, 'epoch': 9.24}
{'loss': 0.0102, 'learning_rate': 1.4110310421286033e-05, 'epoch': 9.42}
{'loss': 0.0092, 'learning_rate': 1.3994826311899483e-05, 'epoch': 9.61}
{'loss': 0.0064, 'learning_rate': 1.3879342202512936e-05, 'epoch': 9.79}
{'loss': 0.0284, 'learning_rate': 1.3763858093126387e-05, 'epoch': 9.98}


  0%|          | 0/2707 [00:00<?, ?it/s]

{'eval_loss': 0.25835371017456055, 'eval_f1': 0.2777777777777778, 'eval_runtime': 57.9859, 'eval_samples_per_second': 46.684, 'eval_steps_per_second': 46.684, 'epoch': 10.0}


Several commits (9) will be pushed upstream.


{'train_runtime': 21045.6982, 'train_samples_per_second': 16.458, 'train_steps_per_second': 4.114, 'train_loss': 0.05416921397366901, 'epoch': 10.0}


Several commits (10) will be pushed upstream.
The progress bars may be unreliable.
fatal: unable to access 'https://huggingface.co/Tanor/SRGPTSENTNEG6/': Could not resolve host: huggingface.co



Push attempt 1 failed with error: fatal: unable to access 'https://huggingface.co/Tanor/SRGPTSENTNEG6/': Could not resolve host: huggingface.co

Push attempt 2 failed with error: (MaxRetryError('HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /api/models/Tanor/SRGPTSENTNEG6 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001A7CEA0F6D0>: Failed to resolve \'huggingface.co\' ([Errno 11001] getaddrinfo failed)"))'), '(Request ID: f6048bd5-6e67-45c3-a752-34ee920f7a15)')
Push attempt 3 failed with error: (MaxRetryError('HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /api/models/Tanor/SRGPTSENTNEG6 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001A7D2DA7190>: Failed to resolve \'huggingface.co\' ([Errno 11001] getaddrinfo failed)"))'), '(Request ID: c066f77e-70a9-457e-b1f5-48e33257dc55)')
Push attempt 4 failed with error: (MaxRetryError('HTTPS

In [7]:
trainSRGPT.test_model_local(6, "NEG")

[[4406   21]
 [  58   26]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4427
           1       0.55      0.31      0.40        84

    accuracy                           0.98      4511
   macro avg       0.77      0.65      0.69      4511
weighted avg       0.98      0.98      0.98      4511



In [10]:
trainSRGPT.upload_local_model_to_hub(6, "NEG")

pytorch_model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

In [5]:
trainSRGPT.test_model(6, "NEG")

[[4406   21]
 [  58   26]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4427
           1       0.55      0.31      0.40        84

    accuracy                           0.98      4511
   macro avg       0.77      0.65      0.69      4511
weighted avg       0.98      0.98      0.98      4511

