# Notebook 3.3: Machine Learning - Bert and Other Features Classification Model

After establishing the baseline model in 3.2, we sought to create a model that would leverage both lyrics and additional data. 

In this notebook, we will utilzed the modified **feature data** as well as **lyrics**, which both show relationship with the popularity of a song, to perform classification by training  our BERT model. Our new model, hence, will **concatenates** the output from Bert with other features, enabling a more comprehensive prediction of song popularity.

## Bert and Other Features Classification Model:
* [Train-Test Split](#train_test_split)
* [Model Construction](#model_con)
    * [Data Loader](#dataload)
    * [Build Model](#build)
    * [Train Model](#train)
    * [Result Analysis](#ana)


In [1]:
from torch import nn, utils
from transformers import BertTokenizer, BertModel
from torch.optim import Adam
from tqdm import tqdm
import numpy as np
import torch
import pandas as pd
import seaborn as sns
from torch.utils.tensorboard import SummaryWriter
from sklearn.metrics import accuracy_score

  from .autonotebook import tqdm as notebook_tqdm


### Import pre-trained tokenizer


In [2]:
tokenizer = BertTokenizer.from_pretrained('./distilbert-base-uncased', local_files_only=True)


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DistilBertTokenizer'. 
The class this function is called from is 'BertTokenizer'.


### Load the dataset


In [3]:
df = pd.read_csv("./positive_and_negative_one_hot.csv")
df = df.dropna()
df


Unnamed: 0,artist,year,views,features,lyrics,id,url,acousticness,danceability,duration_ms,...,key_8,key_9,key_10,key_11,tag_country,tag_misc,tag_pop,tag_rap,tag_rb,tag_rock
0,AKING,2015,4.432273e-05,{},Glorious mistakes are anxiously waiting to be ...,985583,https://open.spotify.com/track/30sr35axWFPOvmi...,0.760040,0.806517,0.144170,...,0,0,0,0,0,0,1,0,0,0
1,Filip Winther,2020,1.251733e-06,{},[Intro]\nDe-de-deluxe\n\n[Refräng]\nJag fuckar...,5097257,https://open.spotify.com/track/4mznGf6tTvHp74y...,0.020681,0.894094,0.141797,...,0,0,0,1,0,0,0,1,0,0
2,Dan Reeder,2018,1.513459e-05,{},The guy who bathes in the pond at the park\nTh...,3407076,https://open.spotify.com/track/1UbSSyqIVEkooKe...,0.993976,0.554990,0.044422,...,0,0,0,0,0,0,1,0,0,0
3,Noa Azazel,2021,1.251733e-06,{},[Pre-Chorus]\nWhen the moon is taking over i'm...,7061926,https://open.spotify.com/track/51F8whLH1Qou7iV...,0.214858,0.419552,0.169140,...,0,0,0,0,0,0,1,0,0,0
4,070 Phi,2019,2.031221e-05,{},[Chorus]\nAin't no way that you ain't eatin' w...,4241387,https://open.spotify.com/track/0mvzUwvyLT1Dm1y...,0.367469,0.695519,0.146753,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9529,mounika yadav,2021,1.257423e-05,"{""Allu Arjun"",""Rashmika Mandanna""}",నువ్ అమ్మీ అమ్మీ అంటాంటే నీ పెళ్ళాన్నైపోయినట్ట...,7552375,https://open.spotify.com/track/4ZUxhQNRCzlh6al...,0.360441,0.821792,0.161581,...,0,0,0,0,0,0,1,0,0,0
9530,d-metal stars,2016,1.706909e-07,{},[Verse 1]\nThe seaweed is always greener\nIn s...,7558599,https://open.spotify.com/track/0F8nLktPi0SgOAm...,0.000092,0.542770,0.154411,...,1,0,0,0,0,0,0,0,0,1
9531,grupo firme,2021,2.048290e-06,{Maluma},"Dejen de meterse ya, en donde no les importa\n...",7728445,https://open.spotify.com/track/5BE9B2FiFWBbBdo...,0.137549,0.719959,0.142190,...,0,0,0,0,0,0,1,0,0,0
9532,hensonn,2021,7.567295e-06,{},[Instrumental],7814578,https://open.spotify.com/track/6nqdgUTiWt4JbAB...,0.146585,0.626273,0.122640,...,0,0,0,0,0,0,0,1,0,0


## Train-Test Split <a name = "train_test_split"> </a>

**Train Dataset**: Used to fit the machine learning model.

**Test Dataset**: Used to evaluate the fit machine learning model.

The objective is to estimate the performance of the machine learning model on new data: data not used to train the model.

In [4]:
np.random.seed(6666)
df_train, df_val, df_test = np.split(df.sample(frac=1, random_state=88), 
                                     [int(.6*len(df)), int(.8*len(df))])

---
# Model Construction <a name = "model_con"> </a>

## Data Loader <a name = "dataload"> </a>

**Get Batch Data**: Iterate through the dataset with batch size = 32.

Code for processing data samples can get messy and hard to maintain, thus the dataset code can be used to decouple data from our model training code for better readability and modularity. 

In [5]:
class Dataset(utils.data.Dataset):

    def __init__(self, df):

        self.ys = df['if_popular'].to_numpy()
        self.texts = [tokenizer(text, 
                               padding='max_length', max_length = 512, truncation=True,
                                return_tensors="pt") for text in df['lyrics']]
        self.features = df[['key_0','key_1','key_2','key_3','key_4','key_5','key_6','key_7','key_8','key_9','key_10','key_11','tag_country','tag_misc','tag_pop','tag_rap','tag_rb','tag_rock','year', 'views','acousticness','danceability','duration_ms','energy','instrumentalness','liveness','loudness','speechiness','tempo','valence','popularity']].to_numpy()
        self.df = df

    def linear(self):
        return self.ys

    def __len__(self):
        return len(self.ys)

    def get_batch_labels(self, idx):
        # Fetch a batch of labels
        #print("Total len:", len(self.ys), " getting:", idx)
        return self.ys[idx]

    def get_batch_texts(self, idx):
        # Fetch a batch of inputs
        return self.texts[idx]
    
    def get_batch_freatures(self, idx):
        # Fetch a batch of inputs
        return self.features[idx]

    def __getitem__(self, idx):

        batch_texts = self.get_batch_texts(idx)
        batch_y = self.get_batch_labels(idx)
        batch_feature = self.get_batch_freatures(idx)

        return batch_texts, batch_y, batch_feature

## Build Model <a name = "build"> </a>


**BertModel**: BERT makes use of Transformer, an attention mechanism that learns contextual relations between words in a text.

**Drop out layers**: Dropout is a regularization technique to help prevent overfitting. As it randomly drop out nodes during training, tje model could become more generalized. 

**ReLU**: The rectified linear unit (ReLU) is an activation function commonly used in neural networks. It return 0 if the input value is non-positive.

**BatchNorm**: Batch normalization take the outputs from the a hidden layer and normalize them before passing them as the input of the next hidden layer, which can stabilizing the learning process and greatly reducing the number of training epochs.


In this classifier, we would like to utilize both lyrics and additional features to predict a song's popularity. The method for combining these two types of data is crucial, which might have direct impact on the model's performance. One approach involves embedding the data within a sentence, such as appending "The song's popularity is 6" to the end of a sentence. Alternatively, we could concatenate the data within the Multilayer Perceptron (MLP) layers. In this model, we have opted for the latter method.

In [6]:
class BertClassifier(nn.Module):

    def __init__(self, dropout=0.5):

        super(BertClassifier, self).__init__()

        self.bert = BertModel.from_pretrained('./distilbert-base-uncased', local_files_only=True)
        self.dropout = nn.Dropout(dropout)
        self.linear1 = nn.Linear(768, 64)
        self.linear2 = nn.Linear(64, 32)
        self.linear3 = nn.Linear(63, 16)
        self.layer_out = nn.Linear(16, 1) 
        self.dropout = nn.Dropout(p=0.1)
        self.batchnorm1 = nn.BatchNorm1d(64)
        self.batchnorm2 = nn.BatchNorm1d(32)
        self.batchnorm3 = nn.BatchNorm1d(16)
        self.relu = nn.ReLU()

    def forward(self, input_id, mask, features):

        _, pooled_output = self.bert(input_ids= input_id, attention_mask=mask,return_dict=False)
        dropout_output = self.dropout(pooled_output)
        linear_output1 = self.relu(self.linear1(dropout_output))
        linear_output1 = self.batchnorm1(linear_output1)
        linear_output2 = self.relu(self.linear2(linear_output1))
        linear_output2 = self.batchnorm2(linear_output2)
        linear_output2 = self.dropout(linear_output2)
        linear_output3 = self.relu(self.linear3(torch.cat((linear_output2, features), dim=1)))
        linear_output3 = self.batchnorm3(linear_output3)
        linear_output3 = self.dropout(linear_output3)
        final_layer = self.layer_out(linear_output3)

        return final_layer

In [7]:
def get_accuracy(y_true, y_prob):
    accuracy = accuracy_score(y_true, y_prob > 0.5)
    return accuracy

## Train Model <a name = "train"> </a>

**Batch Size**: Batch Size is the number of training examples used in one iteration. A large batch size might hinder the quality of the model and its ability to generalize, as the model might converge to sharp minimizers of the training function. However, small batch size would be too noisy for the model to convergence fast.

**Adam Optimizer**: Adam is a stochastic gradient descent (SGD) method that inherit the features of two popular adaptive learning rate methods: AdaGrad and RMSProp. It requires minimal tuning of hyperparameters, is known to for a faster convergence and better performance than traditional SGD and other adaptive learning rate methods.

**BCEWithLogitsLoss**: BCEWithLogitsLoss is a loss function that combines the Binary Cross Entropy (BCE) Loss with a sigmoid activation function together. It is designed for binary classification problems, where the goal is to distinguish between two classes that are represented by labels 0 and 1 respectively.


**SummaryWriter**: SummaryWriter is a library that allows user to log various types of data (Time Series and summary statistic) for visualization in TensorBoard. TensorBoard is a web-based visualization tool developed by Google as part of the TensorFlow ecosystem, but it can also be used with other frameworks like PyTorch.


In [None]:
def train(model, train_data, val_data, learning_rate, epochs):

    train, val = Dataset(train_data), Dataset(val_data)
    batch_size = 32

    train_dataloader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True)
    val_dataloader = torch.utils.data.DataLoader(val, batch_size=batch_size, shuffle=True)

    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda")
    use_cuda = True

    criterion = nn.BCEWithLogitsLoss()
    optimizer = Adam(model.parameters(), lr= learning_rate)

    if use_cuda:

            model = model.cuda()
            criterion = criterion.cuda()

    for epoch_num in range(epochs):

            total_cnt_train = 0
            total_loss_train = 0
            train_acc = 0
            train_acc_cnt = 0

            for train_input, train_label, train_features in tqdm(train_dataloader):

                train_label = train_label.to(device)
                mask = train_input['attention_mask'].to(device)
                input_id = train_input['input_ids'].squeeze(1).to(device)
                train_features = train_features.to(torch.float32).to(device)

                output = model(input_id, mask, train_features)
                
                # print("Output1: ", output, " Output2: ", train_label.float().unsqueeze(1), " loss: " , criterion(output, train_label.float()))
                
                batch_loss = criterion(output, train_label.float().unsqueeze(1))
                total_loss_train += batch_loss.item()
                # if total_cnt_train == 5:
                #     print("Train LOSS:", batch_loss.item())
                total_cnt_train += 1
                
                train_acc += get_accuracy(train_label.float().unsqueeze(1).cpu(), output.cpu())
                train_acc_cnt += 1
 
    
                model.zero_grad()
                batch_loss.backward()
                optimizer.step()
            
            total_cnt_val = 0
            total_loss_val = 0
            acc = 0
            acc_cnt = 0

            with torch.no_grad():

                for val_input, val_label, val_features in val_dataloader:

                    val_label = val_label.to(device)
                    mask = val_input['attention_mask'].to(device)
                    input_id = val_input['input_ids'].squeeze(1).to(device)
                    val_features = val_features.to(torch.float32).to(device)

                    output = model(input_id, mask, val_features)

                    batch_loss = criterion(output, val_label.float().unsqueeze(1))
                    total_loss_val += batch_loss.item()
                    # if total_cnt_val == 5:
                    #     print("Val LOSS:", batch_loss.item())
                    total_cnt_val += 1
                    # print("Loss: ", batch_loss, " Calc: ", sum((output - val_label.float().unsqueeze(1))**2) / batch_size)
                    acc += get_accuracy(val_label.float().unsqueeze(1).cpu(), output.cpu())
                    acc_cnt += 1

            
            print(
                f'Epochs: {epoch_num + 1} | \nTrain BCELoss: {total_loss_train / total_cnt_train: .3f} \
                | Val BCELoss: {total_loss_val / total_cnt_val: .3f}' +
                f'\nTrain Acc: {train_acc / train_acc_cnt: .3f} \
                | Val Acc: {acc / acc_cnt: .3f}'
            )
                  
            writer.add_scalar('Loss/train', total_loss_train / total_cnt_train, epoch_num)
            writer.add_scalar('Loss/val', total_loss_val / total_cnt_val, epoch_num)
            writer.add_scalar('Acc/train', train_acc / train_acc_cnt, epoch_num)
            writer.add_scalar('Acc/val', acc / acc_cnt, epoch_num)
            torch.save(model.state_dict(), "./Bert_classification/BERT-CLASSIFICATION_it" + str(epoch_num) + ".pt")


EPOCHS = 50
model = BertClassifier()
LR = 3e-5
     
writer = SummaryWriter()
train(model, df_train, df_val, LR, EPOCHS)
writer.flush()


You are using a model of type distilbert to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at ./distilbert-base-uncased were not used when initializing BertModel: ['distilbert.transformer.layer.4.output_layer_norm.bias', 'distilbert.transformer.layer.4.attention.v_lin.bias', 'distilbert.transformer.layer.4.ffn.lin1.bias', 'distilbert.transformer.layer.5.ffn.lin1.weight', 'distilbert.transformer.layer.5.output_layer_norm.bias', 'distilbert.transformer.layer.5.attention.out_lin.weight', 'distilbert.transformer.layer.1.attention.out_lin.weight', 'distilbert.transformer.layer.2.attention.out_lin.bias', 'distilbert.transformer.layer.4.ffn.lin2.bias', 'distilbert.transformer.layer.3.attention.v_lin.weight', 'distilbert.transformer.layer.5.ffn.lin2.weight', 'distilbert.embeddings.LayerNorm.bias', 'distilbert.transformer.layer.5.attention.q_lin.weight', 'distilbert.transformer.layer.5.sa_layer_

Epochs: 1 | 
Train BCELoss:  0.581                 | Val BCELoss:  0.526
Train Acc:  0.628                 | Val Acc:  0.722


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.54it/s]


Epochs: 2 | 
Train BCELoss:  0.512                 | Val BCELoss:  0.507
Train Acc:  0.716                 | Val Acc:  0.736


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 3 | 
Train BCELoss:  0.482                 | Val BCELoss:  0.512
Train Acc:  0.762                 | Val Acc:  0.733


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 4 | 
Train BCELoss:  0.443                 | Val BCELoss:  0.484
Train Acc:  0.817                 | Val Acc:  0.776


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 5 | 
Train BCELoss:  0.409                 | Val BCELoss:  0.472
Train Acc:  0.856                 | Val Acc:  0.806


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 6 | 
Train BCELoss:  0.383                 | Val BCELoss:  0.464
Train Acc:  0.890                 | Val Acc:  0.814


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 7 | 
Train BCELoss:  0.361                 | Val BCELoss:  0.467
Train Acc:  0.913                 | Val Acc:  0.818


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 8 | 
Train BCELoss:  0.331                 | Val BCELoss:  0.467
Train Acc:  0.946                 | Val Acc:  0.811


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.54it/s]


Epochs: 9 | 
Train BCELoss:  0.312                 | Val BCELoss:  0.468
Train Acc:  0.956                 | Val Acc:  0.815


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 10 | 
Train BCELoss:  0.301                 | Val BCELoss:  0.471
Train Acc:  0.966                 | Val Acc:  0.810


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 11 | 
Train BCELoss:  0.288                 | Val BCELoss:  0.462
Train Acc:  0.972                 | Val Acc:  0.824


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 12 | 
Train BCELoss:  0.288                 | Val BCELoss:  0.488
Train Acc:  0.962                 | Val Acc:  0.787


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 13 | 
Train BCELoss:  0.277                 | Val BCELoss:  0.491
Train Acc:  0.966                 | Val Acc:  0.788


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 14 | 
Train BCELoss:  0.265                 | Val BCELoss:  0.457
Train Acc:  0.976                 | Val Acc:  0.812


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 15 | 
Train BCELoss:  0.252                 | Val BCELoss:  0.460
Train Acc:  0.977                 | Val Acc:  0.808


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 16 | 
Train BCELoss:  0.247                 | Val BCELoss:  0.460
Train Acc:  0.977                 | Val Acc:  0.821


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 17 | 
Train BCELoss:  0.235                 | Val BCELoss:  0.489
Train Acc:  0.986                 | Val Acc:  0.801


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 18 | 
Train BCELoss:  0.227                 | Val BCELoss:  0.460
Train Acc:  0.985                 | Val Acc:  0.826


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 19 | 
Train BCELoss:  0.242                 | Val BCELoss:  0.467
Train Acc:  0.968                 | Val Acc:  0.816


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 20 | 
Train BCELoss:  0.219                 | Val BCELoss:  0.481
Train Acc:  0.983                 | Val Acc:  0.802


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 21 | 
Train BCELoss:  0.216                 | Val BCELoss:  0.462
Train Acc:  0.980                 | Val Acc:  0.818


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 22 | 
Train BCELoss:  0.203                 | Val BCELoss:  0.472
Train Acc:  0.986                 | Val Acc:  0.821


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 23 | 
Train BCELoss:  0.199                 | Val BCELoss:  0.496
Train Acc:  0.985                 | Val Acc:  0.800


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 24 | 
Train BCELoss:  0.192                 | Val BCELoss:  0.479
Train Acc:  0.987                 | Val Acc:  0.815


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 25 | 
Train BCELoss:  0.187                 | Val BCELoss:  0.476
Train Acc:  0.987                 | Val Acc:  0.814


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 26 | 
Train BCELoss:  0.197                 | Val BCELoss:  0.571
Train Acc:  0.977                 | Val Acc:  0.749


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 27 | 
Train BCELoss:  0.189                 | Val BCELoss:  0.490
Train Acc:  0.976                 | Val Acc:  0.790


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 28 | 
Train BCELoss:  0.179                 | Val BCELoss:  0.475
Train Acc:  0.982                 | Val Acc:  0.817


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 29 | 
Train BCELoss:  0.170                 | Val BCELoss:  0.545
Train Acc:  0.985                 | Val Acc:  0.775


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 30 | 
Train BCELoss:  0.164                 | Val BCELoss:  0.482
Train Acc:  0.987                 | Val Acc:  0.821


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 31 | 
Train BCELoss:  0.160                 | Val BCELoss:  0.479
Train Acc:  0.986                 | Val Acc:  0.822


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.56it/s]


Epochs: 32 | 
Train BCELoss:  0.155                 | Val BCELoss:  0.475
Train Acc:  0.987                 | Val Acc:  0.821


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.56it/s]


Epochs: 33 | 
Train BCELoss:  0.147                 | Val BCELoss:  0.481
Train Acc:  0.989                 | Val Acc:  0.822


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.56it/s]


Epochs: 34 | 
Train BCELoss:  0.148                 | Val BCELoss:  0.493
Train Acc:  0.989                 | Val Acc:  0.821


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.56it/s]


Epochs: 35 | 
Train BCELoss:  0.140                 | Val BCELoss:  0.499
Train Acc:  0.989                 | Val Acc:  0.822


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.56it/s]


Epochs: 36 | 
Train BCELoss:  0.140                 | Val BCELoss:  0.497
Train Acc:  0.989                 | Val Acc:  0.825


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 37 | 
Train BCELoss:  0.133                 | Val BCELoss:  0.499
Train Acc:  0.989                 | Val Acc:  0.823


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.56it/s]


Epochs: 38 | 
Train BCELoss:  0.133                 | Val BCELoss:  0.496
Train Acc:  0.989                 | Val Acc:  0.823


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.56it/s]


Epochs: 39 | 
Train BCELoss:  0.128                 | Val BCELoss:  0.584
Train Acc:  0.990                 | Val Acc:  0.774


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:55<00:00,  1.55it/s]


Epochs: 40 | 
Train BCELoss:  0.688                 | Val BCELoss:  0.761
Train Acc:  0.624                 | Val Acc:  0.561


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.56it/s]


Epochs: 41 | 
Train BCELoss:  0.843                 | Val BCELoss:  0.946
Train Acc:  0.544                 | Val Acc:  0.521


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.56it/s]


Epochs: 42 | 
Train BCELoss:  0.858                 | Val BCELoss:  0.637
Train Acc:  0.534                 | Val Acc:  0.644


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.57it/s]


Epochs: 43 | 
Train BCELoss:  0.520                 | Val BCELoss:  0.595
Train Acc:  0.763                 | Val Acc:  0.724


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.57it/s]


Epochs: 44 | 
Train BCELoss:  0.500                 | Val BCELoss:  0.592
Train Acc:  0.781                 | Val Acc:  0.717


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.57it/s]


Epochs: 45 | 
Train BCELoss:  0.485                 | Val BCELoss:  0.566
Train Acc:  0.784                 | Val Acc:  0.728


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [01:54<00:00,  1.57it/s]


Epochs: 46 | 
Train BCELoss:  0.504                 | Val BCELoss:  0.508
Train Acc:  0.750                 | Val Acc:  0.763


 94%|████████████████████████████████████████████████████████████████████████████████████████████████▋      | 168/179 [01:47<00:07,  1.55it/s]

## Result Analysis <a name = "ana"> </a>

**Accuracy**: As anticipated, this model achieved **higher accuracy** compared to the baseline models, which relied solely on BERT, or traditional approaches utilizing Support Vector Machines (SVM) and other models. This highlights the fact that **both lyrics and additional features** of a song contribute to **determining its popularity**.

**Early Stopping**: Early stopping is crucial, as demonstrated by the data above. If the number of iterations becomes excessive, the **test set's loss** starts to **increase due to overfitting**. In our specific case, halting at around the `25th` iteration led to a more reliable and accurate model.

![Results33](./3.3Result.png "Results33")

Next, we would elaborate about the insight driven by our various models in the next notebook. 