This project is a sentiment analysis task where the goal is to classify a real dataset of customer complaints in online reviews, according to 3 labels of Constructive, Vindicative, and Avoidance. Overall, 5 novel deep learning models for natural language processing are examined, the test accuracy of which are as follows. BERT 61.08%, CamemBERT 52.17%, SqueezeBert 59.63%, BigBird 60.66%, and MPNet 74.12% (the initial accuracy of MPNet was 62.32% which was increased to 74.12% by hyperparameter optimization). The estimated human accuracy for this task has been 83.24%, and the random classification accuracy would be 33.33%. Therefore, we can conclude that the MPNet model with an accuracy of 74.12%, works very well. The problem of this project was initially studied as a group project of the Deep Learning PhD course at HEC Montreal. The codes below are my individual reimplementation of the project using newer models and a new library (simpletransformers).

### Importing data files

In [None]:
# We first import the two dataset files of the task.
from google.colab import files
uploaded = files.upload()

Saving test.csv to test.csv
Saving train.csv to train.csv


### Data preprocessing

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Reading csv files with Pandas to be able to be used by NLP models.

train = pd.read_csv('train.csv', sep=',')
train

Unnamed: 0,id,label,description
0,1,2,Reservation desk and General Manager Billy is ...
1,2,0,"I ordered an egg sandwich, which was not adver..."
2,3,0,I have called the business due to some complai...
3,4,0,Spent over an hour and a half yesterday 4 sepa...
4,5,0,Customers earn free beverage/food rewards (1 s...
...,...,...,...
4339,4823,0,I really liked this wallet at first - the slim...
4340,4824,0,While the wallet seems to be well made and of ...
4341,4825,1,"I bought this because it was described as a ""s..."
4342,4826,0,I liked the idea of having a slot on the front...


In [None]:
testData = pd.read_csv('test.csv', sep=',')
testData

Unnamed: 0,id,label,description
0,1257,1,I bought Choline Bitartrate - 400 Grams (14.11...
1,2273,0,Used bait and switch. Sent me item different t...
2,1823,2,We are remodeling are bathroom and ordered til...
3,3192,0,I LOVE Mario Kart... I played it in single pla...
4,1919,2,This place was beyond disappointing. A friend ...
...,...,...,...
478,263,2,"Waited 10 mins to be seated, no one around. On..."
479,195,1,I took my dog to this business to get groomed ...
480,1297,0,I ordered once with no problem but after the f...
481,1963,0,Made an appointment for an estimate visit. Mar...


In [None]:
# Renaming the label and description columns, to labels and text, which is the default columns names for text classification in simpletransformers library.
train = train.rename(columns={'label': 'labels', 'description': 'text'})
testData = testData.rename(columns={'label': 'labels', 'description': 'text'})
# The same was done for my other NLP project as well. 

### Installing simpletransformers Library

In [None]:
!pip install simpletransformers   # Used for NLP tasks.

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting simpletransformers
  Downloading simpletransformers-0.63.9-py3-none-any.whl (250 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.5/250.5 KB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tokenizers
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m74.1 MB/s[0m eta [36m0:00:00[0m
Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 KB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 

In [None]:
from simpletransformers.classification import ClassificationModel, ClassificationArgs

### BERT model

In [None]:
# Configuration of model args
bertArgs = ClassificationArgs(num_train_epochs=2, overwrite_output_dir=True) 
# The accuracy with num_train_epochs of 1 and 3 was lower (0.60, and 0.59, respectively).

# Constructing the classification model
bertModel = ClassificationModel(
    "bert", # model_type 
    "bert-base-uncased", # model_name 
    use_cuda = True, # Using GPU (instead of CPU)
    num_labels = 3, # number of labels
    args = bertArgs )
# BERT was the best model for my other NLP project. So, it is used here again as well.

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [None]:
# Training the model 
bertModel.train_model(train)

  0%|          | 0/4344 [00:00<?, ?it/s]

Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 0 of 2:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 1 of 2:   0%|          | 0/543 [00:00<?, ?it/s]

(1086, 0.911758696835344)

In [None]:
predictedLabels, modelOutputs = bertModel.predict(list(testData.text))  # Prediction of the labels for the test set.

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/61 [00:00<?, ?it/s]

In [None]:
# Accuracy of the model
from sklearn.metrics import accuracy_score
accuracy_score(list(testData.labels), list(predictedLabels))
# The same was done for my other NLP project as well. 

0.6107660455486542

### CamemBERT model

In [None]:
# Configuration of model args
camembertArgs = ClassificationArgs(num_train_epochs=3, overwrite_output_dir=True)
# The accuracy with num_train_epochs of 1 and 2 was lower (0.43, and 0.51, respectively).

# Constructing the classification model
camembertModel = ClassificationModel(
    "camembert", # model_type 
    "Jean-Baptiste/camembert-ner", # model_name 
    use_cuda = True, # Using GPU (instead of CPU)
    num_labels = 3, # number of labels
    args = camembertArgs )

Some weights of the model checkpoint at Jean-Baptiste/camembert-ner were not used when initializing CamembertForSequenceClassification: ['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing CamembertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CamembertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at Jean-Baptiste/camembert-ner and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

In [None]:
# Training the model 
camembertModel.train_model(train)

Epoch:   0%|          | 0/3 [00:00<?, ?it/s]

Running Epoch 0 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 1 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 2 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

(1629, 0.9607391076412868)

In [None]:
predictedLabels, modelOutputs = camembertModel.predict(list(testData.text))  # Prediction of the labels for the test set.

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/61 [00:00<?, ?it/s]

In [None]:
# Accuracy of the model
from sklearn.metrics import accuracy_score
accuracy_score(list(testData.labels), list(predictedLabels))
# The same was done for my other NLP project as well. 

0.5217391304347826

### SqueezeBert model

In [None]:
# Configuration of model args
squeezebertArgs = ClassificationArgs(num_train_epochs=3, overwrite_output_dir=True)
# The accuracy with num_train_epochs of 1 and 2 was lower (0.53, and 0.59, respectively).

# Constructing the classification model
squeezebertModel = ClassificationModel(
    "squeezebert", # model_type 
    "squeezebert/squeezebert-uncased", # model_name 
    use_cuda = True, # Using GPU (instead of CPU)
    num_labels = 3, # number of labels
    args = squeezebertArgs )

Some weights of the model checkpoint at squeezebert/squeezebert-uncased were not used when initializing SqueezeBertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing SqueezeBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing SqueezeBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of SqueezeBertForSequenceClassification were no

In [None]:
# Training the model 
squeezebertModel.train_model(train)

  0%|          | 0/4344 [00:00<?, ?it/s]

Epoch:   0%|          | 0/3 [00:00<?, ?it/s]

Running Epoch 0 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 1 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 2 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

(1629, 0.805339002989786)

In [None]:
predictedLabels, modelOutputs = squeezebertModel.predict(list(testData.text))  # Prediction of the labels for the test set.

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/61 [00:00<?, ?it/s]

In [None]:
# Accuracy of the model
from sklearn.metrics import accuracy_score
accuracy_score(list(testData.labels), list(predictedLabels))
# The same was done for my other NLP project as well. 

0.5962732919254659

### BigBird model

In [None]:
# Configuration of model args
bigbirdArgs = ClassificationArgs(num_train_epochs=3, overwrite_output_dir=True)
# The accuracy with num_train_epochs of 1 and 2 was lower (0.59, and 0.58, respectively).

# Constructing the classification model
bigbirdModel = ClassificationModel(
    "bigbird", # model_type 
    "google/bigbird-roberta-base", # model_name 
    use_cuda = True, # Using GPU (instead of CPU)
    num_labels = 3, # number of labels
    args = bigbirdArgs )

Some weights of the model checkpoint at google/bigbird-roberta-base were not used when initializing BigBirdForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BigBirdForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BigBirdForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BigBirdForSequenceClassifica

In [None]:
# Training the model 
bigbirdModel.train_model(train)

  0%|          | 0/4344 [00:00<?, ?it/s]

Epoch:   0%|          | 0/3 [00:00<?, ?it/s]

Running Epoch 0 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

Attention type 'block_sparse' is not possible if sequence_length: 128 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3. Changing attention type to 'original_full'...


Running Epoch 1 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 2 of 3:   0%|          | 0/543 [00:00<?, ?it/s]

(1629, 0.785716874677027)

In [None]:
predictedLabels, modelOutputs = bigbirdModel.predict(list(testData.text))  # Prediction of the labels for the test set.

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/61 [00:00<?, ?it/s]

In [None]:
# Accuracy of the model
from sklearn.metrics import accuracy_score
accuracy_score(list(testData.labels), list(predictedLabels))
# The same was done for my other NLP project as well. 

0.6066252587991718

### MPNet model

In [None]:
# Configuration of model args
mpnetArgs = ClassificationArgs(num_train_epochs=4, overwrite_output_dir=True)
# The accuracy with num_train_epochs of 1, 2, 3 and 5 was lower (0.56, 0.60, 0.617, 0.58, respectively).

# Constructing the classification model
mpnetModel = ClassificationModel(
    "mpnet", # model_type 
    "sentence-transformers/all-mpnet-base-v2", # model_name 
    use_cuda = True, # Using GPU (instead of CPU)
    num_labels = 3, # number of labels
    args = mpnetArgs )

Some weights of the model checkpoint at sentence-transformers/all-mpnet-base-v2 were not used when initializing MPNetForSequenceClassification: ['pooler.dense.bias', 'pooler.dense.weight']
- This IS expected if you are initializing MPNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MPNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of MPNetForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/all-mpnet-base-v2 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a

In [None]:
# Training the model 
mpnetModel.train_model(train)

  0%|          | 0/4344 [00:00<?, ?it/s]

Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Running Epoch 0 of 4:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 1 of 4:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 2 of 4:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 3 of 4:   0%|          | 0/543 [00:00<?, ?it/s]

(2172, 0.7263315299158817)

In [None]:
predictedLabels, modelOutputs = mpnetModel.predict(list(testData.text))  # Prediction of the labels for the test set.

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/61 [00:00<?, ?it/s]

In [None]:
# Accuracy of the model
from sklearn.metrics import accuracy_score
accuracy_score(list(testData.labels), list(predictedLabels))
# The same was done for my other NLP project as well. 

0.6231884057971014

### MPNet model (more optimized)

In [None]:
# Since the MPNet model resulted in the highest accuracy among other examined models, we further optimize it to reach a higher accuracy.
# After this optimization, we achieved an accuracy of 0.7412, which is considerably higher than the MPNet model with less optimization.
# Configuration of model args
mpnetArgs = ClassificationArgs(num_train_epochs=2, max_seq_length=512, overwrite_output_dir=True)
# Other examined hyperparameters, to optimize the MPNet model:
# num_train_epochs=1, max_seq_length=32 : accuracy=0.5238095238095238
# num_train_epochs=1, max_seq_length=64 : accuracy=0.5279503105590062
# num_train_epochs=4, max_seq_length=128 (default) : accuracy=0.6231884057971014
# num_train_epochs=1, max_seq_length=256 : accuracy=0.6894409937888198
# num_train_epochs=1, max_seq_length=512 : accuracy=0.7370600414078675
# num_train_epochs=3, max_seq_length=512 : accuracy=0.7287784679089027
# num_train_epochs=1, max_seq_length=1024 : OutOfMemoryError
# A higher max_seq_length results in a higher training time as well.

# Constructing the classification model
mpnetModel = ClassificationModel(
    "mpnet", # model_type 
    "sentence-transformers/all-mpnet-base-v2", # model_name 
    use_cuda = True, # Using GPU (instead of CPU)
    num_labels = 3, # number of labels
    args = mpnetArgs )

Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at sentence-transformers/all-mpnet-base-v2 were not used when initializing MPNetForSequenceClassification: ['pooler.dense.bias', 'pooler.dense.weight']
- This IS expected if you are initializing MPNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MPNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of MPNetForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/all-mpnet-base-v2 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a

Downloading:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

In [None]:
# Training the model 
mpnetModel.train_model(train)

  0%|          | 0/4344 [00:00<?, ?it/s]

Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 0 of 2:   0%|          | 0/543 [00:00<?, ?it/s]

Running Epoch 1 of 2:   0%|          | 0/543 [00:00<?, ?it/s]

(1086, 0.6859385558913426)

In [None]:
predictedLabels, modelOutputs = mpnetModel.predict(list(testData.text))  # Prediction of the labels for the test set.

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/61 [00:00<?, ?it/s]

In [None]:
# Accuracy of the model
from sklearn.metrics import accuracy_score
accuracy_score(list(testData.labels), list(predictedLabels))
# The same was done for my other NLP project as well. 

0.7412008281573499

### Confusion matrix and classification report for the best model (MPNet (more optimized))

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix(list(testData.labels), list(predictedLabels))
# The same was done for my other NLP project as well. 

array([[161,  29,  20],
       [ 21, 129,   8],
       [ 29,  18,  68]])

In [None]:
from sklearn.metrics import classification_report
target_names = ['0', '1', '2']
print(classification_report(list(testData.labels), list(predictedLabels),target_names=target_names))

              precision    recall  f1-score   support

           0       0.76      0.77      0.76       210
           1       0.73      0.82      0.77       158
           2       0.71      0.59      0.64       115

    accuracy                           0.74       483
   macro avg       0.73      0.72      0.73       483
weighted avg       0.74      0.74      0.74       483

