# BERTweetKFold.ipynb

### Designation: KFold Cross-Validation Script

    Purpose: 10-fold cross validate BERTweet with our current hyperparameter settings, and record the data.

- Requirements:
    
    Packages: tensorflow, pandas, matplotlib, transformers, sklearn, os

    Datasets (csv's): Tweets.csv

    Saved Model Weight: bertweet9010.h5

- This program will require an internet connection, as it will download the model and tokenizer from the HuggingFace model repository.

- csv output: 'foldOutput.csv'
    - Please note, all files referenced (input and output) will all be on the folder-level.

### A note on KFolding

- For detailed explanation, please refer to bertweet.ipynb

- A GPU is strongly, strongly recommended for this program.

## 1. TensorFlow Standalone Setup

In [1]:
useCPU = False #Choose whether to use CPU or GPU for running the program

import tensorflow as tf
import os
if useCPU:
    os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
tf.config.list_physical_devices('GPU')

Num GPUs Available:  1


2022-06-02 19:42:57.485201: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0a:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-02 19:42:57.490662: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0a:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-02 19:42:57.490927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0a:00.0/numa_node
Your kernel may have been built without NUMA support.


## 2. Importing, downloading, and Building the model

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import InputExample, InputFeatures
with tf.device('/GPU:0'):
    model = TFAutoModelForSequenceClassification.from_pretrained("vinai/bertweet-base",num_labels=3,problem_type="multi_label_classification")
    tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base",num_labels=3)
    model.summary()

  from .autonotebook import tqdm as notebook_tqdm
2022-06-02 19:42:58.343997: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-02 19:42:58.345006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0a:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-02 19:42:58.345318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0a:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-02 19:42:58.345595: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0a:00

Model: "tf_roberta_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 roberta (TFRobertaMainLayer  multiple                 134309376 
 )                                                               
                                                                 
 classifier (TFRobertaClassi  multiple                 592899    
 ficationHead)                                                   
                                                                 
Total params: 134,902,275
Trainable params: 134,902,275
Non-trainable params: 0
_________________________________________________________________


## 3. Read in dataset: Tweets.csv (our dataset for training purposes), and clean up the dataset

In [4]:
dataset = pd.read_csv('../Dataset/Tweets.csv', encoding='ISO-8859-1')
dataset_drop = dataset.drop(['textID', 'selected_text'], axis=1)
dataset_drop

Unnamed: 0,text,sentiment
0,"I`d have responded, if I were going",neutral
1,Sooo SAD I will miss you here in San Diego!!!,negative
2,my boss is bullying me...,negative
3,what interview! leave me alone,negative
4,"Sons of ****, why couldn`t they put them on t...",negative
...,...,...
27476,wish we could come see u on Denver husband l...,negative
27477,I`ve wondered about rake to. The client has ...,negative
27478,Yay good for both of you. Enjoy the break - y...,positive
27479,But it was worth it ****.,positive


### 3.1. Extract and encode the dataset's label column into number-category encoding.

In [5]:
datasetSentimentEncode = dataset_drop['sentiment'].apply(lambda c: 0 if c == 'negative' else (1 if c=='neutral' else 2))
datasetSentimentEncode

0        1
1        0
2        0
3        0
4        0
        ..
27476    0
27477    0
27478    2
27479    2
27480    1
Name: sentiment, Length: 27481, dtype: int64

## 4. Compiling Training/Test split dataframes

- 90:10 seeded split, note: same seeded split as bertweet9010.h5

In [6]:
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest = train_test_split(dataset_drop['text'].astype(str), datasetSentimentEncode, test_size=0.1, random_state=21)
trainDF = pd.DataFrame()
testDF = pd.DataFrame()
trainDF['DATA_COLUMN'] = xtrain
trainDF['LABEL_COLUMN'] = ytrain
testDF['DATA_COLUMN'] = xtest
testDF['LABEL_COLUMN'] = ytest
trainDF,testDF

(                                             DATA_COLUMN  LABEL_COLUMN
 8775                                   blastinggg music.             1
 8885    If it`s any consolation, you`re definitely on...             2
 22325                    fun day with boo. short but fun             2
 13024   Blow me away it IS raining harder here. Yay y...             1
 17426   Lame remarks like 'I wonder if they like blon...             1
 ...                                                  ...           ...
 16432                                   FC is back dear.             1
 8964    tea...  Mmmm crispy but no cake  Have headpho...             1
 5944                       thankyou very much, you rock!             2
 5327                                i looking at failure             1
 15305   happy mommas day . ging is so lucky to have a...             2
 
 [24732 rows x 2 columns],
                                              DATA_COLUMN  LABEL_COLUMN
 26493  I started X-Slimmer at eigh

## 5. Converting dataframes into supported input format for the AI

In [8]:
def convert_data_to_examples(train, test, DATA_COLUMN, LABEL_COLUMN): 
  train_InputExamples = train.apply(lambda x: InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this case
                                                          text_a = x[DATA_COLUMN], 
                                                          text_b = None,
                                                          label = x[LABEL_COLUMN]), axis = 1)

  validation_InputExamples = test.apply(lambda x: InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this case
                                                          text_a = x[DATA_COLUMN], 
                                                          text_b = None,
                                                          label = x[LABEL_COLUMN]), axis = 1)
  
  return train_InputExamples, validation_InputExamples

  
def convert_examples_to_tf_dataset(examples, tokenizer, max_length=128):
    features = [] # -> will hold InputFeatures to be converted later

    for e in examples:
        # Documentation is really strong for this method, so please take a look at it
        input_dict = tokenizer.encode_plus(
            e.text_a,
            add_special_tokens=True,
            max_length=max_length, # truncates if len(s) > max_length
            return_token_type_ids=True,
            return_attention_mask=True,
            pad_to_max_length=True, # pads to the right by default # CHECK THIS for pad_to_max_length
            truncation=True
        )

        input_ids, token_type_ids, attention_mask = (input_dict["input_ids"],
            input_dict["token_type_ids"], input_dict['attention_mask'])

        features.append(
            InputFeatures(
                input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, label=e.label
            )
        )

    def gen():
        for f in features:
            yield (
                {
                    "input_ids": f.input_ids,
                    "attention_mask": f.attention_mask,
                    "token_type_ids": f.token_type_ids,
                },
                f.label,
            )

    return tf.data.Dataset.from_generator(
        gen,
        ({"input_ids": tf.int32, "attention_mask": tf.int32, "token_type_ids": tf.int32}, tf.int64),
        (
            {
                "input_ids": tf.TensorShape([None]),
                "attention_mask": tf.TensorShape([None]),
                "token_type_ids": tf.TensorShape([None]),
            },
            tf.TensorShape([]),
        ),
    )


### 5.1. The call of the functions, and batching

In [9]:
DATA_COLUMN = 'DATA_COLUMN'
LABEL_COLUMN = 'LABEL_COLUMN'
train_InputExamples, validation_InputExamples = convert_data_to_examples(trainDF, testDF, DATA_COLUMN, LABEL_COLUMN)
with tf.device('/GPU:0'):
    train_data = convert_examples_to_tf_dataset(list(train_InputExamples), tokenizer)
    #train_eval_data = train_data.batch(1)
    train_data = train_data.shuffle(100).batch(32)#.repeat(2)
    

    validation_data = convert_examples_to_tf_dataset(list(validation_InputExamples), tokenizer)
    validation_data = validation_data.batch(32)
    



## 6. Train the model on the training set

- train the model on the training set once to ensure that the training set is generalized for the kfolding results.

In [11]:
model.layers[0].trainable = True
with tf.device('/GPU:0'):
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0), #default: 3e-5
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])

    model.fit(train_data, epochs=1, validation_data=validation_data)#callbacks=callbacks



## 7. (Optional) Save Weights

In [13]:
#model.save_weights('bertweet9010.h5')

## 8. (Test Code) Generate predictions on the test set

In [15]:
'''scores = {
                'test_accuracy': [],
                'test_precision_neg': [],
                'test_precision_neut': [],
                'test_precision_pos': [],
                'test_recall_neg': [],
                'test_recall_neut': [],
                'test_recall_pos': [],
                'test_f1_score_neg': [],
                'test_f1_score_neut': [],
                'test_f1_score_pos': []
          
          }
scores['test_accuracy'].append(model.metrics[1].result().numpy())
predictionsRaw = model.predict(train_eval_data)
predictions = pd.DataFrame(predictionsRaw['logits']).idxmax(axis=1)
predictions,ytrain'''


"scores = {\n                'test_accuracy': [],\n                'test_precision_neg': [],\n                'test_precision_neut': [],\n                'test_precision_pos': [],\n                'test_recall_neg': [],\n                'test_recall_neut': [],\n                'test_recall_pos': [],\n                'test_f1_score_neg': [],\n                'test_f1_score_neut': [],\n                'test_f1_score_pos': []\n          \n          }\nscores['test_accuracy'].append(model.metrics[1].result().numpy())\npredictionsRaw = model.predict(train_eval_data)\npredictions = pd.DataFrame(predictionsRaw['logits']).idxmax(axis=1)\npredictions,ytrain"

In [16]:
'''from sklearn.metrics import precision_recall_fscore_support
precision, recall, f1, _ = precision_recall_fscore_support(ytrain, predictions)
scores['test_precision_neg'].append(precision[0])
scores['test_precision_neut'].append(precision[1])
scores['test_precision_pos'].append(precision[2])
scores['test_recall_neg'].append(recall[0])
scores['test_recall_neut'].append(recall[1])
scores['test_recall_pos'].append(recall[2])
scores['test_f1_score_neg'].append(f1[0])
scores['test_f1_score_neut'].append(f1[1])
scores['test_f1_score_pos'].append(f1[2])
print(scores)'''

"from sklearn.metrics import precision_recall_fscore_support\nprecision, recall, f1, _ = precision_recall_fscore_support(ytrain, predictions)\nscores['test_precision_neg'].append(precision[0])\nscores['test_precision_neut'].append(precision[1])\nscores['test_precision_pos'].append(precision[2])\nscores['test_recall_neg'].append(recall[0])\nscores['test_recall_neut'].append(recall[1])\nscores['test_recall_pos'].append(recall[2])\nscores['test_f1_score_neg'].append(f1[0])\nscores['test_f1_score_neut'].append(f1[1])\nscores['test_f1_score_pos'].append(f1[2])\nprint(scores)"

## 9. 10-Fold Cross Validation

- we are folding on the 'training set', which is the 90% of the 90-10 train-test split.

- predictions of the 1-fold ('test-set') are generated after each model is 'trained' on the 9-folds ('train-set') in the loop.

- The predictions are evaluated against 'ground-truth', the fold's actual labels, and statistics recorded.

Statistis are recorded and saved afterwards.

In [17]:

from sklearn.model_selection import KFold
from sklearn.metrics import precision_recall_fscore_support
split = 10

#generate our folding sets
X = xtrain.to_numpy()
Y = ytrain.to_numpy()

#score dict
scores = {
                'test_accuracy': [],
                'test_precision_neg': [],
                'test_precision_neut': [],
                'test_precision_pos': [],
                'test_recall_neg': [],
                'test_recall_neut': [],
                'test_recall_pos': [],
                'test_f1_score_neg': [],
                'test_f1_score_neut': [],
                'test_f1_score_pos': []
          
          } 

#10-Fold cross validation loop
for train_index,test_index in KFold(10).split(X):
    #split the dataset into the folds
    x_train,x_test = X[train_index],X[test_index]
    y_train,y_test = Y[train_index],Y[test_index]

    #Compiling train/test split dataframes
    trainDF = pd.DataFrame()
    testDF = pd.DataFrame()
    trainDF['DATA_COLUMN'] = x_train
    trainDF['LABEL_COLUMN'] = y_train
    testDF['DATA_COLUMN'] = x_test
    testDF['LABEL_COLUMN'] = y_test

    #convert them into acceptable input formats (TensorFlow DataSet)
    train_InputExamples, validation_InputExamples = convert_data_to_examples(trainDF, testDF, DATA_COLUMN, LABEL_COLUMN)
    with tf.device('/GPU:0'):
        train_data = convert_examples_to_tf_dataset(list(train_InputExamples), tokenizer)
        #train_eval_data = train_data
        train_data = train_data.shuffle(100).batch(32)#.repeat(2)

        validation_data = convert_examples_to_tf_dataset(list(validation_InputExamples), tokenizer)
        val_eval_data = validation_data.batch(1)
        validation_data = validation_data.batch(32)

    #Train the model
    with tf.device('/GPU:0'):
        model = TFAutoModelForSequenceClassification.from_pretrained("vinai/bertweet-base",num_labels=3,problem_type="multi_label_classification")
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0), #default: 3e-5
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])
        model.fit(train_data, epochs=1, validation_data=validation_data)
        #record accuracy
        scores['test_accuracy'].append(model.metrics[1].result().numpy())

        #Predict the test set
        predictionsRaw = model.predict(val_eval_data)
        predictions = pd.DataFrame(predictionsRaw['logits']).idxmax(axis=1)

    #calculate the statistics and record them
    precision, recall, f1, _ = precision_recall_fscore_support(y_test, predictions)
    scores['test_precision_neg'].append(precision[0])
    scores['test_precision_neut'].append(precision[1])
    scores['test_precision_pos'].append(precision[2])
    scores['test_recall_neg'].append(recall[0])
    scores['test_recall_neut'].append(recall[1])
    scores['test_recall_pos'].append(recall[2])
    scores['test_f1_score_neg'].append(f1[0])
    scores['test_f1_score_neut'].append(f1[1])
    scores['test_f1_score_pos'].append(f1[2])
    #debug: print score every round
    #print(scores) 
 

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855], 'test_precision_neg': [0.8096551724137931], 'test_precision_neut': [0.7698237885462555], 'test_precision_pos': [0.8145065398335315], 'test_recall_neg': [0.8063186813186813], 'test_recall_neut': [0.7221074380165289], 'test_recall_pos': [0.8804627249357326], 'test_f1_score_neg': [0.8079834824501033], 'test_f1_score_neut': [0.7452025586353945], 'test_f1_score_pos': [0.8462013588634959]}


All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341], 'test_recall_pos': [0.8804627249357326, 0.8900523560209425], 'test_f1_score_neg': [0.8079834824501033, 0.8073260073260072], 'test_f1_score_neut': [0.7452025586353945, 0.7758620689655173], 'test_f1_score_pos': [0.8462013588634959, 0.8441961514587213]}


All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341, 0.7387387387387387], 'test_recall_pos': [0.8804627249357326, 0.8900523560209425, 0.8745046235138706], 'test_f1_score_neg': [0.8079834824501033, 0.8073260073260072, 0.8040540540540542], 'test_f1_score_neut': [0.7452025586353945, 0.7758620689655173, 0.7699530516431926], 'test_f1_score_pos': [0.8462013588634959, 0.8441961514587213, 0.8547449967721111]}


All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687, 0.78245044], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679, 0.7515375153751538], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451, 0.8103225806451613], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859, 0.7864406779661017], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583, 0.8557422969187675], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341, 0.7387387387387387, 0.6480908152734778], 'test_recall_pos': [0.8804627249357326, 0.8900523560209425, 0.8745046235138706, 0.8810126582278481], 'test_f1_score_neg': [0.8079834824501033, 0.8073260073260072, 0.8040540540540542, 0.8002619515389653], 'test_f1_score_neut': [0.7452025586353945, 0.7758620689655173, 0.7699530516431926, 0.7201834862385321], 'test_f1_score_pos': [0.8462013588634959, 0.8441961514587213, 0.8547449967721111, 0.831044776119403]}


All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687, 0.78245044, 0.79215527], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679, 0.7515375153751538, 0.7561881188118812], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451, 0.8103225806451613, 0.7955346650998825], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859, 0.7864406779661017, 0.8243243243243243], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583, 0.8557422969187675, 0.8629943502824858], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341, 0.7387387387387387, 0.6480908152734778, 0.6838383838383838], 'test_recall_pos': [0.8804627249357326, 0.8900523560209425, 0.8745046235138706, 0.8810126582278481, 0.8658064516129033], 'test_f1_score_neg': [0.8079834824501033, 0.8073260073260072, 0.8040540540540542, 0.8002619515389653, 0.8060686015831134], 'test_f1_score_neut': [0.7452025586353945, 0.7758620689655173,

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687, 0.78245044, 0.79215527, 0.7856854], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679, 0.7515375153751538, 0.7561881188118812, 0.7931034482758621], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451, 0.8103225806451613, 0.7955346650998825, 0.8009205983889528], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859, 0.7864406779661017, 0.8243243243243243, 0.764505119453925], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583, 0.8557422969187675, 0.8629943502824858, 0.8202567760342369], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341, 0.7387387387387387, 0.6480908152734778, 0.6838383838383838, 0.6843657817109144], 'test_recall_pos': [0.8804627249357326, 0.8900523560209425, 0.8745046235138706, 0.8810126582278481, 0.8658064516129033, 0.8900662251655629], 'test_f1_score_neg': [0.8079834824501033, 0.80732600732

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687, 0.78245044, 0.79215527, 0.7856854, 0.81803477], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679, 0.7515375153751538, 0.7561881188118812, 0.7931034482758621, 0.7925824175824175], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451, 0.8103225806451613, 0.7955346650998825, 0.8009205983889528, 0.8147762747138397], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859, 0.7864406779661017, 0.8243243243243243, 0.764505119453925, 0.8456632653061225], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583, 0.8557422969187675, 0.8629943502824858, 0.8202567760342369, 0.8548148148148148], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341, 0.7387387387387387, 0.6480908152734778, 0.6838383838383838, 0.6843657817109144, 0.7631578947368421], 'test_recall_pos': [0.8804627249357326, 0.8900523560209425, 0.8745046235138706, 0.88

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687, 0.78245044, 0.79215527, 0.7856854, 0.81803477, 0.77881116], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679, 0.7515375153751538, 0.7561881188118812, 0.7931034482758621, 0.7925824175824175, 0.7471116816431322], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451, 0.8103225806451613, 0.7955346650998825, 0.8009205983889528, 0.8147762747138397, 0.7628755364806867], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859, 0.7864406779661017, 0.8243243243243243, 0.764505119453925, 0.8456632653061225, 0.8307086614173228], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583, 0.8557422969187675, 0.8629943502824858, 0.8202567760342369, 0.8548148148148148, 0.8422575976845152], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341, 0.7387387387387387, 0.6480908152734778, 0.6838383838383838, 0.6843657817109144, 0.76315789473

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687, 0.78245044, 0.79215527, 0.7856854, 0.81803477, 0.77881116, 0.8026688], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679, 0.7515375153751538, 0.7561881188118812, 0.7931034482758621, 0.7925824175824175, 0.7471116816431322, 0.7629911280101395], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451, 0.8103225806451613, 0.7955346650998825, 0.8009205983889528, 0.8147762747138397, 0.7628755364806867, 0.8370986920332937], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859, 0.7864406779661017, 0.8243243243243243, 0.764505119453925, 0.8456632653061225, 0.8307086614173228, 0.8054567022538552], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583, 0.8557422969187675, 0.8629943502824858, 0.8202567760342369, 0.8548148148148148, 0.8422575976845152, 0.8551136363636364], 'test_recall_neut': [0.7221074380165289, 0.7463414634146341, 0.

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'test_accuracy': [0.7966855, 0.80679065, 0.80711687, 0.78245044, 0.79215527, 0.7856854, 0.81803477, 0.77881116, 0.8026688, 0.8018601], 'test_precision_neg': [0.8096551724137931, 0.8102941176470588, 0.7798165137614679, 0.7515375153751538, 0.7561881188118812, 0.7931034482758621, 0.7925824175824175, 0.7471116816431322, 0.7629911280101395, 0.7925373134328358], 'test_precision_neut': [0.7698237885462555, 0.8078141499472017, 0.803921568627451, 0.8103225806451613, 0.7955346650998825, 0.8009205983889528, 0.8147762747138397, 0.7628755364806867, 0.8370986920332937, 0.7539906103286385], 'test_precision_pos': [0.8145065398335315, 0.8028335301062574, 0.8358585858585859, 0.7864406779661017, 0.8243243243243243, 0.764505119453925, 0.8456632653061225, 0.8307086614173228, 0.8054567022538552, 0.8794037940379403], 'test_recall_neg': [0.8063186813186813, 0.8043795620437956, 0.8298465829846583, 0.8557422969187675, 0.8629943502824858, 0.8202567760342369, 0.8548148148148148, 0.8422575976845152, 0.85511363636

- generate DataFrame of the statistics

In [22]:
foldOutput = pd.DataFrame(scores)
foldOutput.to_csv('foldOutput.csv')
foldOutput

Unnamed: 0,test_accuracy,test_precision_neg,test_precision_neut,test_precision_pos,test_recall_neg,test_recall_neut,test_recall_pos,test_f1_score_neg,test_f1_score_neut,test_f1_score_pos
0,0.796686,0.809655,0.769824,0.814507,0.806319,0.722107,0.880463,0.807983,0.745203,0.846201
1,0.806791,0.810294,0.807814,0.802834,0.80438,0.746341,0.890052,0.807326,0.775862,0.844196
2,0.807117,0.779817,0.803922,0.835859,0.829847,0.738739,0.874505,0.804054,0.769953,0.854745
3,0.78245,0.751538,0.810323,0.786441,0.855742,0.648091,0.881013,0.800262,0.720183,0.831045
4,0.792155,0.756188,0.795535,0.824324,0.862994,0.683838,0.865806,0.806069,0.73547,0.844556
5,0.785685,0.793103,0.800921,0.764505,0.820257,0.684366,0.890066,0.806452,0.73807,0.822521
6,0.818035,0.792582,0.814776,0.845663,0.854815,0.763158,0.858808,0.822523,0.788123,0.852185
7,0.778811,0.747112,0.762876,0.830709,0.842258,0.708873,0.81258,0.791837,0.734884,0.821544
8,0.802669,0.762991,0.837099,0.805457,0.855114,0.699106,0.891076,0.80643,0.761905,0.846106
9,0.80186,0.792537,0.753991,0.879404,0.791356,0.796627,0.81738,0.791946,0.774723,0.847258


- generate averages of the statistics

In [23]:
import numpy as np
np.mean(foldOutput)

  return mean(axis=axis, dtype=dtype, out=out, **kwargs)


test_accuracy          0.797226
test_precision_neg     0.779582
test_precision_neut    0.795708
test_precision_pos     0.818970
test_recall_neg        0.832308
test_recall_neut       0.719125
test_recall_pos        0.866175
test_f1_score_neg      0.804488
test_f1_score_neut     0.754437
test_f1_score_pos      0.841036
dtype: float64