# Introduction

I am a medical doctor working on **artificial intelligence (AI) for medicine**. At present AI is also widely used in the medical field. Particularly, AI performs in the healthcare sector following tasks: **image classification, object detection, semantic segmentation, GANs, text classification, etc**. This notebook addresses **binary text classification with BERT**, although this topic is not related with medicine. **If you are interested in AI for medicine, please see my other notebooks**. For example, you can see **multi text classification with BERT** at [Coronavirus Tweets Multi-Classification with BERT](https://www.kaggle.com/code/gokifujiya/coronavirus-tweets-multi-classification-with-bert).

# Import Libraries

In [1]:
import math # Imports the math library for mathematical operations.
import random # Imports the random library for generating random numbers.
import time # Imports the time library for measuring time.
import warnings # Imports the warnings library for suppressing warning messages.

import numpy as np # Imports the numpy library and renames it as "np" for easier use.
import pandas as pd # Imports the pandas library and renames it as "pd" for easier use.
import re # Imports the re library for regular expression operations.

import os # Imports the os library for interacting with the operating system.
import sys # Imports the sys library for interacting with the Python interpreter.

import torch # Imports the PyTorch library for deep learning operations.
import torch.nn as nn # Imports the neural network module of PyTorch.
import transformers as T # Imports the transformers library for natural language processing tasks.
from sklearn.metrics import fbeta_score # Imports the fbeta_score metric from the scikit-learn library.
from sklearn.model_selection import StratifiedKFold # Imports the StratifiedKFold class from the scikit-learn library for cross-validation.
from torch.utils.data import DataLoader, Dataset # Imports the DataLoader and Dataset classes from PyTorch for handling datasets.
from tqdm.notebook import tqdm # Imports the tqdm library for progress bar visualization.

warnings.filterwarnings("ignore") # Disables the warnings that may be displayed during execution.

# Sets the device to GPU if it is available, otherwise it is set to CPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Read the Data

This code reads a CSV file located at /kaggle/input/imdb-dataset-of-50k-movie-reviews/IMDB Dataset.csv into a Pandas DataFrame object called df using the read_csv() function from the pandas library. **The encoding parameter is set to 'iso-8859-1' to ensure that the file is read properly.** The head() function is then used to display the first five rows of the DataFrame.

In [2]:
df = pd.read_csv('/kaggle/input/imdb-dataset-of-50k-movie-reviews/IMDB Dataset.csv', encoding = 'iso-8859-1')
df.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   review     50000 non-null  object
 1   sentiment  50000 non-null  object
dtypes: object(2)
memory usage: 781.4+ KB


In [4]:
df.sentiment.value_counts()

positive    25000
negative    25000
Name: sentiment, dtype: int64

In [5]:
print(df.columns)

Index(['review', 'sentiment'], dtype='object')


In [6]:
print(df['review'][0:5])

0    One of the other reviewers has mentioned that ...
1    A wonderful little production. <br /><br />The...
2    I thought this was a wonderful way to spend ti...
3    Basically there's a family where a little boy ...
4    Petter Mattei's "Love in the Time of Money" is...
Name: review, dtype: object


# Make Label

This code is creating a new column in the DataFrame df called "judgement" which is derived from the "sentiment" column. The "label_map" variable is a dictionary that **maps the string labels "negative" and "positive" to integer values 0 and 1, respectively**.

The map() method is applied to the "sentiment" column and uses the "label_map" dictionary to **convert the string values to their corresponding integer values**. The resulting integer values are then assigned to the new "judgement" column. The head() method is then used to display the first few rows of the updated DataFrame.

In [7]:
label_map = {"negative": 0, "positive": 1}
df["judgement"] = df["sentiment"].map(label_map)
df.head()

Unnamed: 0,review,sentiment,judgement
0,One of the other reviewers has mentioned that ...,positive,1
1,A wonderful little production. <br /><br />The...,positive,1
2,I thought this was a wonderful way to spend ti...,positive,1
3,Basically there's a family where a little boy ...,negative,0
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive,1


# Define Logger

This is a function definition for init_logger() which **creates a logger object to log messages at the INFO level**. The function takes an optional argument log_file which is used to specify the name of the log file where the messages will be saved.

Inside the function, logging module is imported and the logger object is initialized with the __name__ attribute of the current module. **The logger's level is set to INFO to ensure all messages with severity level INFO or higher will be logged.**

Two handlers are then created and added to the logger: **a StreamHandler to output messages to the console** and **a FileHandler to save messages to the log file**. Both handlers use a Formatter object to format the messages in a specific way.

Finally, **the logger object is returned and assigned to the LOGGER variable**.

This logger can be used to log important information and events during the training process of a machine learning model, making it easier to debug and diagnose issues.

In [8]:
def init_logger(log_file = "train.log"):
    from logging import INFO, FileHandler, Formatter, StreamHandler, getLogger

    logger = getLogger(__name__)
    logger.setLevel(INFO)
    handler1 = StreamHandler()
    handler1.setFormatter(Formatter("%(message)s"))
    handler2 = FileHandler(filename = log_file)
    handler2.setFormatter(Formatter("%(message)s"))
    logger.addHandler(handler1)
    logger.addHandler(handler2)
    return logger

LOGGER = init_logger()

# Define Random Seed

The code **defines a function seed_torch() which takes an integer value seed (default value 42) and sets the random seed for the functions** in Python, numpy and torch libraries to **ensure reproducibility of results**.

It sets the seed for random.seed() function, os.environ["PYTHONHASHSEED"] variable and numpy's np.random.seed() function to the provided seed value.

For torch, it **sets the random seed for both CPU and GPU using torch.manual_seed(), torch.cuda.manual_seed() and torch.cuda.manual_seed_all() functions**. Finally, it sets torch.backends.cudnn.deterministic = True for deterministic algorithms.

The code then sets the seed value to 471 and calls the seed_torch() function to set the seeds for random functions.

In [9]:
def seed_torch(seed = 42):
    # Fix seed for functions in python.
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    # Fix numpy seed.
    np.random.seed(seed)
    # Fix torch seed.
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    # Deterministic algorithms.
    torch.backends.cudnn.deterministic = True

seed = 471
seed_torch(seed)

## If you want to use all the data, use all 'df' and split it into train and test data.

In [10]:
# Here the use of data is limited for computational costs.
train = df.loc[:3999]

In [11]:
len(train)

4000

In [12]:
# Create dest data
test = df.loc[4000:4999]

In [13]:
len(test)

1000

# Preprocessing

The code defines a function called get_train_data that takes in a pandas DataFrame called train and performs preprocessing steps to **prepare the data for cross-validation**.

The function uses StratifiedKFold from scikit-learn to **split the data into Fold (set to 5 in this case) folds while maintaining class balance in each fold**. The indices of the training and validation data for each fold are **stored in the train_index and val_index variables**, respectively.

The function then creates a new column in the train DataFrame called "fold" and assigns the fold number to each data point based on its index in the cross-validation split. Finally, the "fold" column is cast to an unsigned 8-bit integer data type.

The output of this function is the preprocessed train DataFrame with the "fold" column added.

In [14]:
Fold = 5

def get_train_data(train):

    # cross-validation
    fold = StratifiedKFold(n_splits = Fold, shuffle = True, random_state = seed)
    for n, (train_index, val_index) in enumerate(fold.split(train, train["judgement"])):
        train.loc[val_index, "fold"] = int(n)
    train["fold"] = train["fold"].astype(np.uint8)

    return train

In [15]:
def get_test_data(test):
    return test

In [16]:
train = get_train_data(train)

# Create Dataset

This code defines **a custom PyTorch Dataset class named BaseDataset that inherits from the PyTorch Dataset class**. The purpose of this class is to **preprocess the input text data and create input tensors** that can be used by the PyTorch model for sequence classification.

The class constructor takes in a Pandas DataFrame df, the name of **the pre-trained Transformer model** to be used (model_name), and a boolean flag include_labels indicating whether the dataset includes labels.

The tokenizer object is initialized using the pre-trained model name. **The batch_encode_plus() method is used to tokenize and encode the input text data into input tensors that can be consumed by the Transformer model.** The encoded inputs are stored in the encoded attribute of the object.

The __len__() method returns the length of the dataset.

The __getitem__() method takes an index idx and returns the corresponding input tensors for the text at that index. If include_labels is True, the method also returns the label for the text at that index as a PyTorch tensor. **The input tensors include the tokenized and encoded input text, as well as an attention mask to indicate which tokens should be attended to by the model.**

In [17]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

class BaseDataset(Dataset):
    def __init__(self, df, model_name, include_labels = True):
        tokenizer = AutoTokenizer.from_pretrained(model_name)

        self.df = df
        self.include_labels = include_labels

        sentences = df["review"].tolist()
   
        max_length = 512
        self.encoded = tokenizer.batch_encode_plus(
            sentences,
            padding = 'max_length',            
            max_length = max_length,
            truncation = True,
            return_attention_mask = True
        )
        
        if self.include_labels:
            self.labels = df["judgement"].values

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        input_ids = torch.tensor(self.encoded['input_ids'][idx])
        attention_mask = torch.tensor(self.encoded['attention_mask'][idx])

        if self.include_labels:
            label = torch.tensor(self.labels[idx]).float()
            return input_ids, attention_mask, label

        return input_ids, attention_mask

# Modelling

This code defines a PyTorch module for **a basic model using the pre-trained PubMedBERT model for sequence classification**. The BaseModel class extends nn.Module, and its constructor takes a single argument model_name, which specifies the name of the pre-trained model to use.

In the __init__ method, an instance of the pre-trained model is created using **AutoModelForSequenceClassification.from_pretrained**, with **num_labels=1** indicating that we are **performing binary classification**. **The nn.Sigmoid function is used to convert the output logits to probabilities.**

The forward method of the BaseModel class **takes input_ids and attention_mask tensors as inputs, which represent the tokenized text and attention mask for the input sequence**, respectively. **The input_ids tensor is fed into the pre-trained model along with the attention_mask tensor**, and **the resulting logits are passed through the sigmoid function** and squeezed to obtain the predicted probability of a positive sentiment.

In [18]:
model_name = 'microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract'

In [19]:
class BaseModel(nn.Module):
    def __init__(self, model_name):
        super().__init__()
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 1)
        self.sigmoid = nn.Sigmoid() # binary classification

    def forward(self, input_ids, attention_mask):
        out = self.model(input_ids = input_ids, attention_mask = attention_mask)
        out = self.sigmoid(out.logits).squeeze()

        return out

# Execution Time Measurement Tool

This code defines two classes for measuring execution time:

1. **AverageMeter**: This class computes and stores **the average and current values of a variable**. It has methods for **resetting the values and updating them with new values**. This is useful for **computing the average loss or accuracy over multiple batches during training**.

2. **asMinutes and timeSince**: These are helper functions for **formatting and printing time elapsed during training**. They convert the time in seconds to minutes and seconds, and **calculate the remaining time** based on the percentage of completion. This is useful for monitoring the progress of training and estimating how much time is left.

In [20]:
class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n = 1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count


def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return "%dm %ds" % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return "%s (remain %s)" % (asMinutes(s), asMinutes(rs))

# Training Helper Function

This function takes in the training data loader, model, criterion, optimizer, epoch, and device as input. **The function initializes the start and end time to the current time and sets the losses object to the AverageMeter class.** It then sets the model to training mode.

For each batch in the training data loader, the function sets the optimizer gradients to zero and **moves the input_ids, attention_mask, and labels to the device (GPU or CPU) specified**. The model is called on the input_ids and attention_mask to get the predicted outputs (y_preds).

The function then **calculates the loss between the predicted outputs (y_preds) and the actual labels**, records the loss using the losses.update() method, and **backpropagates the loss through the model using loss.backward()**. The optimizer is then used to **update the model parameters using optimizer.step()**.

After **every 100 steps or at the last step** of the training data loader, the function **prints the epoch number, batch number, elapsed time, and average loss** for that epoch.

Finally, the function returns the average loss for the epoch.

In [21]:
def train_fn(train_loader, model, criterion, optimizer, epoch, device):
    start = end = time.time()
    losses = AverageMeter()

    # Switch to train mode.
    model.train()

    for step, (input_ids, attention_mask, labels) in enumerate(train_loader):
        optimizer.zero_grad()

        input_ids = input_ids.to(device)
        attention_mask = attention_mask.to(device)
        labels = labels.to(device)
        batch_size = labels.size(0)

        y_preds = model(input_ids, attention_mask)

        loss = criterion(y_preds, labels) # binary classification

        # Record loss.
        losses.update(loss.item(), batch_size)
        loss.backward()

        optimizer.step()

        if step % 100 == 0 or step == (len(train_loader) - 1):
            print(
                f"Epoch: [{epoch + 1}][{step}/{len(train_loader)}] "
                f"Elapsed {timeSince(start, float(step + 1) / len(train_loader)):s} "
                f"Loss: {losses.avg:.4f} "
            )

    return losses.avg

# Evaluation Helper Function

These are helper functions for training and validation in a PyTorch model.

The train_fn function takes a PyTorch dataloader (train_loader), a model (model), a loss function (criterion), an optimizer (optimizer), the current epoch (epoch), and the device to run the training on (device). It loops over the batches in the dataloader, **moves the data to the specified device, computes the predicted labels (y_preds), and backpropagates the loss**. The function also **calculates and returns the average loss over the batches**.

The valid_fn function takes a PyTorch dataloader (valid_loader), a model (model), a loss function (criterion), and the device to run the evaluation on (device). It loops over the batches in the dataloader, **moves the data to the specified device, computes the predicted labels (y_preds), and stores them in a list (preds)**. The function also **calculates and returns the average loss over the batches and the concatenated predicted labels**.

Both functions **print the average loss at every 100th batch and the elapsed time** since the start of the loop as well.

In [22]:
def valid_fn(valid_loader, model, criterion, device):
    start = end = time.time()
    losses = AverageMeter()

    # Switch to evaluation mode.
    model.eval()
    preds = []

    for step, (input_ids, attention_mask, labels) in enumerate(valid_loader):
        input_ids = input_ids.to(device)
        attention_mask = attention_mask.to(device)
        labels = labels.to(device)
        batch_size = labels.size(0)

        # Compute loss.
        with torch.no_grad():
            y_preds = model(input_ids, attention_mask)

        loss = criterion(y_preds, labels)
        losses.update(loss.item(), batch_size)

        # Record score.
        preds.append(y_preds.to("cpu").numpy())

        if step % 100 == 0 or step == (len(valid_loader) - 1):
            print(
                f"EVAL: [{step}/{len(valid_loader)}] "
                f"Elapsed {timeSince(start, float(step + 1) / len(valid_loader)):s} "
                f"Loss: {losses.avg:.4f} "
            )

    predictions = np.concatenate(preds)
    return losses.avg, predictions

# Inference Function

This code defines a function called "inference" that will be used to **make predictions on the test dataset using a trained model**.

The function first creates a test dataset object using the BaseDataset class defined earlier, with **the "include_labels" parameter set to False**. This is because **the test dataset does not have any labels**.

It then loads the trained model for each fold, and applies the model to the test dataset using a DataLoader object. **The predicted values are stored in a list called "preds"**.

Finally, the function **calculates the average of the predicted values from all the folds using np.mean(), and returns the predictions**.

In [23]:
def inference():
    predictions = []

    test_dataset = BaseDataset(test, model_name, include_labels = False)
    test_loader = DataLoader(
        test_dataset, batch_size = 16, shuffle = False, pin_memory = True
    )

    for fold in range(Fold):
        LOGGER.info(f"========== model: bert-movie-classification fold: {fold} inference ==========")
        model = BaseModel(model_name)
        model.to(device)
        model.load_state_dict(torch.load(f"bert-movie-classification_fold{fold}_best.pth")["model"])
        model.eval()
        preds = []
        for i, (input_ids, attention_mask) in tqdm(enumerate(test_loader), total = len(test_loader)):
            input_ids = input_ids.to(device)
            attention_mask = attention_mask.to(device)
            with torch.no_grad():
                y_preds = model(input_ids, attention_mask)
            preds.append(y_preds.to("cpu").numpy())
        preds = np.concatenate(preds)
        predictions.append(preds)
    predictions = np.mean(predictions, axis = 0)

    return predictions

# Training Loop

This code defines two functions for training and evaluating a BERT-based sequence classification model. **The train_loop function performs training for a single fold of the dataset using cross-validation**. **It sets up the data loader and the model, trains the model for several epochs, and saves the best model based on the validation score.** It returns a DataFrame containing the predicted labels for the validation set.

The get_result function takes the DataFrame returned by train_loop and **computes the fbeta score for the predicted labels compared to the true labels**. This function is called after all folds have been trained and validated to obtain the final score for the model.

Note that this code assumes that the model is a BERT-based sequence classification model and that the dataset has a column called "judgement" containing the target labels. The specific BERT model used is the microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract model.

In [24]:
def train_loop(train, fold):

    LOGGER.info(f"========== fold: {fold} training ==========")

    # ====================================================
    # Data Loader
    # ====================================================
    trn_idx = train[train["fold"] != fold].index
    val_idx = train[train["fold"] == fold].index

    train_folds = train.loc[trn_idx].reset_index(drop = True)
    valid_folds = train.loc[val_idx].reset_index(drop = True)

    train_dataset = BaseDataset(train_folds, model_name)
    valid_dataset = BaseDataset(valid_folds, model_name)

    train_loader = DataLoader(
        train_dataset,
        batch_size = 16,
        shuffle = True,
        num_workers = 4,
        pin_memory = True,
        drop_last = True,
    )
    valid_loader = DataLoader(
        valid_dataset,
        batch_size = 16,
        shuffle = False,
        num_workers = 4,
        pin_memory = True,
        drop_last = False,
    )

    # ====================================================
    # Model
    # ====================================================
    model = BaseModel(model_name)
    model.to(device)

    optimizer = T.AdamW(model.parameters(), lr = 2e-5)

    criterion = nn.BCELoss() # binary classification

    # ====================================================
    # Loop
    # ====================================================
    best_score = -1
    best_loss = np.inf

    for epoch in range(5):
        start_time = time.time()
        
        # train
        avg_loss = train_fn(train_loader, model, criterion, optimizer, epoch, device)

        # eval
        avg_val_loss, preds = valid_fn(valid_loader, model, criterion, device)
        valid_labels = valid_folds["judgement"].values

        # scoring for binary classification
        score = fbeta_score(valid_labels, np.where(preds < 0.5, 0, 1), beta = 7.0)

        elapsed = time.time() - start_time
        LOGGER.info(
            f"Epoch {epoch + 1} - avg_train_loss: {avg_loss:.4f}  avg_val_loss: {avg_val_loss:.4f}  time: {elapsed:.0f}s"
        )
        LOGGER.info(f"Epoch {epoch + 1} - Score: {score}")

        # Save the best score.
        if score > best_score:
            best_score = score
            LOGGER.info(f"Epoch {epoch + 1} - Save Best Score: {best_score:.4f} Model")
            torch.save(
                {"model": model.state_dict(), "preds": preds}, f"bert-movie-classification_fold{fold}_best.pth"
            )

    check_point = torch.load(f"bert-movie-classification_fold{fold}_best.pth")

    valid_folds["preds"] = check_point["preds"]

    return valid_folds

In [25]:
def get_result(result_df):
    preds = result_df["preds"].values
    labels = result_df["judgement"].values
    score = fbeta_score(labels, np.where(preds < 0.5, 0, 1), beta = 1.0)
    LOGGER.info(f"Score: {score:<.5f}")

# Training

This code is training a model using the Microsoft BiomedNLP PubMedBERT base-uncased-abstract pre-trained model for text classification on the dataset.

The data is preprocessed by encoding the text with the pre-trained tokenizer, and cross-validation is performed using stratified K-fold splitting.

The training loop iterates over epochs, training and validating the model, and saving the best performing model checkpoint based on **the fbeta score on the validation set**.

**The training is performed for each fold and the results are concatenated for the final evaluation**. The fbeta score is used as the evaluation metric. Finally, the out-of-fold predictions are saved in a CSV file.

In [26]:
# training
oof_df = pd.DataFrame()
for fold in range(Fold):
    _oof_df = train_loop(train, fold)
    oof_df = pd.concat([oof_df, _oof_df])
    LOGGER.info(f"========== fold: {fold} result ==========")
    get_result(_oof_df)

# CV result
LOGGER.info(f"========== CV ==========")
get_result(oof_df)

# Save OOF result.
oof_df.to_csv("oof_df.csv", index = False)



Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/225k [00:00<?, ?B/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epoch 1 - avg_train_loss: 0.4505  avg_val_loss: 0.3216  time: 178s
Epoch 1 - Score: 0.9214844736177732
Epoch 1 - Save Best Score: 0.9215 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3216 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 2 - avg_train_loss: 0.2320  avg_val_loss: 0.3267  time: 176s
Epoch 2 - Score: 0.8901026652505941


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3267 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 3 - avg_train_loss: 0.1054  avg_val_loss: 0.3210  time: 176s
Epoch 3 - Score: 0.8904629395395903


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3210 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 4 - avg_train_loss: 0.0556  avg_val_loss: 0.4602  time: 176s
Epoch 4 - Score: 0.8685743226133199


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.4602 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 5 - avg_train_loss: 0.0487  avg_val_loss: 0.4930  time: 176s
Epoch 5 - Score: 0.8759050174674701


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.4930 


Score: 0.86391
Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epoch 1 - avg_train_loss: 0.4443  avg_val_loss: 0.2900  time: 176s
Epoch 1 - Score: 0.9130918656414408
Epoch 1 - Save Best Score: 0.9131 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.2900 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 2 - avg_train_loss: 0.2264  avg_val_loss: 0.2856  time: 176s
Epoch 2 - Score: 0.9076199625827981


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.2856 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 3 - avg_train_loss: 0.1279  avg_val_loss: 0.3148  time: 176s
Epoch 3 - Score: 0.9057787673312416


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3148 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 4 - avg_train_loss: 0.0798  avg_val_loss: 0.4153  time: 176s
Epoch 4 - Score: 0.8959303502733346


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.4153 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 5 - avg_train_loss: 0.0364  avg_val_loss: 0.4377  time: 176s
Epoch 5 - Score: 0.9154822720145667
Epoch 5 - Save Best Score: 0.9155 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.4377 


Score: 0.89273
Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epoch 1 - avg_train_loss: 0.5202  avg_val_loss: 0.3046  time: 177s
Epoch 1 - Score: 0.8538995591141741
Epoch 1 - Save Best Score: 0.8539 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3046 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 2 - avg_train_loss: 0.2368  avg_val_loss: 0.2858  time: 177s
Epoch 2 - Score: 0.8564333856990827
Epoch 2 - Save Best Score: 0.8564 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.2858 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 3 - avg_train_loss: 0.1148  avg_val_loss: 0.3794  time: 176s
Epoch 3 - Score: 0.912076806467913
Epoch 3 - Save Best Score: 0.9121 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3794 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 4 - avg_train_loss: 0.0821  avg_val_loss: 0.4619  time: 176s
Epoch 4 - Score: 0.8242454983515091


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.4619 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 5 - avg_train_loss: 0.0609  avg_val_loss: 0.4018  time: 177s
Epoch 5 - Score: 0.8705334548031177


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.4018 


Score: 0.86988
Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epoch 1 - avg_train_loss: 0.4551  avg_val_loss: 0.3099  time: 177s
Epoch 1 - Score: 0.7891253436513593
Epoch 1 - Save Best Score: 0.7891 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3099 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 2 - avg_train_loss: 0.2564  avg_val_loss: 0.3026  time: 177s
Epoch 2 - Score: 0.8878291309421136
Epoch 2 - Save Best Score: 0.8878 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3026 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 3 - avg_train_loss: 0.1165  avg_val_loss: 0.3600  time: 177s
Epoch 3 - Score: 0.8781280138064058


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3600 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 4 - avg_train_loss: 0.0730  avg_val_loss: 0.3823  time: 177s
Epoch 4 - Score: 0.9192200557103064
Epoch 4 - Save Best Score: 0.9192 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3823 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 5 - avg_train_loss: 0.0457  avg_val_loss: 0.4364  time: 177s
Epoch 5 - Score: 0.9215656488936148
Epoch 5 - Save Best Score: 0.9216 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.4364 


Score: 0.86977
Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epoch 1 - avg_train_loss: 0.4770  avg_val_loss: 0.2855  time: 177s
Epoch 1 - Score: 0.9273335360291883
Epoch 1 - Save Best Score: 0.9273 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.2855 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 2 - avg_train_loss: 0.2328  avg_val_loss: 0.3089  time: 177s
Epoch 2 - Score: 0.9613438575187209
Epoch 2 - Save Best Score: 0.9613 Model


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3089 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 3 - avg_train_loss: 0.1229  avg_val_loss: 0.3102  time: 177s
Epoch 3 - Score: 0.9322119768973554


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3102 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 4 - avg_train_loss: 0.0917  avg_val_loss: 0.3190  time: 177s
Epoch 4 - Score: 0.9177568197951527


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.3190 
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already be

Epoch 5 - avg_train_loss: 0.0510  avg_val_loss: 0.2661  time: 177s
Epoch 5 - Score: 0.8740725683504421


EVAL: [49/50] Elapsed 0m 13s (remain 0m 0s) Loss: 0.2661 


Score: 0.89202
Score: 0.87760


# Inference and Submission

This code is performing inference on the test data using the trained model and generating a submission file. The code is **for binary classification**. The inference function uses the trained model to make predictions on the test data and returns the predicted labels as an array of 0s and 1s. These predicted labels are then **thresholded at 0.5 to convert them into binary predictions**. The resulting binary predictions are then saved in a submission file in the required format.

In [27]:
# inference
predictions = inference()
pd.Series(predictions).to_csv("predictions.csv", index = False)

# submission
predictions_final = np.where(predictions < 0.5, 0, 1)
pd.Series(predictions_final).to_csv("submission.csv", index = False, header = False)

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

  0%|          | 0/63 [00:00<?, ?it/s]

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

  0%|          | 0/63 [00:00<?, ?it/s]

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

  0%|          | 0/63 [00:00<?, ?it/s]

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

  0%|          | 0/63 [00:00<?, ?it/s]

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

  0%|          | 0/63 [00:00<?, ?it/s]

In [28]:
test["Judgement"] = predictions_final
test.head()

Unnamed: 0,review,sentiment,judgement,Judgement
4000,This feels as if it is a Czech version of Pear...,positive,1,1
4001,"When, oh, when will someone like Anchor Bay or...",positive,1,1
4002,"""Just before dawn "" is one of the best slasher...",positive,1,1
4003,A very courageous attempt to bring one of the ...,positive,1,1
4004,"If 1977's ""Exorcist II: The Heretic"" did him n...",negative,0,0


This code is mapping the string labels in the "Judgement" column of the DataFrame test to integer values 0 and 1. Specifically, it is creating a dictionary called label_map that maps the string values "negative" and "positive" to integer values 0 and 1, respectively.

Then, a new dictionary called reverse_label_map is created by swapping the keys and values of the label_map dictionary. This is done so that the mapping can be reversed later.

Finally, the map() method is applied to the "Judgement" column of the test DataFrame, using the reverse_label_map dictionary to convert the integer values back to their original string labels. The resulting string labels are then assigned to a new "prediction" column in the test DataFrame, which is displayed using the head() method.

In [29]:
label_map = {"negative": 0, "positive": 1}
reverse_label_map = {v: k for k, v in label_map.items()}
test["prediction"] = test["Judgement"].map(reverse_label_map)
test.head()

Unnamed: 0,review,sentiment,judgement,Judgement,prediction
4000,This feels as if it is a Czech version of Pear...,positive,1,1,positive
4001,"When, oh, when will someone like Anchor Bay or...",positive,1,1,positive
4002,"""Just before dawn "" is one of the best slasher...",positive,1,1,positive
4003,A very courageous attempt to bring one of the ...,positive,1,1,positive
4004,"If 1977's ""Exorcist II: The Heretic"" did him n...",negative,0,0,negative


In [30]:
fbeta_score(test.judgement, test.Judgement, beta = 1.0)

0.9162656400384985

In [31]:
test = test.drop(['judgement', 'Judgement'], axis = 1)
test.head()

Unnamed: 0,review,sentiment,prediction
4000,This feels as if it is a Czech version of Pear...,positive,positive
4001,"When, oh, when will someone like Anchor Bay or...",positive,positive
4002,"""Just before dawn "" is one of the best slasher...",positive,positive
4003,A very courageous attempt to bring one of the ...,positive,positive
4004,"If 1977's ""Exorcist II: The Heretic"" did him n...",negative,negative


In [32]:
test.to_csv("test_data.csv", index = False)