### Assignment 4 - EMPIRICAL TEXT CLASSIFICATION STUDY: DL and NLP


The objective of this task is to use NLP techniques to classify textual data. spaCy, an open-source NLP library for Python, will be utilised to process the text before passing it to the classification models - Logistic Regression (LR) and Multi-layer Perceptron (MLP).


In [None]:
import spacy
import pandas as pd

The chosen dataset for textual classification is the **Airline Passenger Reviews**, consisting of 10761 rows of text and their respective Net Promoter Score (NPS). The NPS categorises customers' inclination to recommend a company's services as Promoters, Passives, or Detractors.

The dataset is available for download or visualization at https://raw.githubusercontent.com/baharin/CSI4106-Assignment4-Datasets/main/reduced_file_AirPassengerReviews.csv, with courtesy from Baharin (TA for CSI4106).
Another option for obtaining the dataset is through https://github.com/fredjkhar/nlp-text-classification/blob/main/data/reduced_file_AirPassengerReviews.csv.

In [None]:
url = "https://raw.githubusercontent.com/fredjkhar/nlp-text-classification/main/data/reduced_file_AirPassengerReviews.csv"

In [None]:
#  REFERENCE [1] : https://stackoverflow.com/questions/25351968/how-can-i-display-full-non-truncated-dataframe-information-in-html-when-conver
pd.set_option(
    "display.max_colwidth", None
)  # Display full rows content in dataframe .i.e no truncation
dataset = pd.read_csv(url)

dataset = dataset.dropna()


dataset.head()

Unnamed: 0,customer_review,NPS Score
0,"London to Izmir via Istanbul. First time I'd flown TK. I found them very good in the air, cabin crew, planes, food, all very nice. Not so great on the ground, ground staff, call centre, computer systems. My flight from LHR was delayed so I missed the connection in Istanbul. Most ground staff don't speak English, and I was given contradictory instructions from those that could speak a little English. I eventually got on a flight to Izmir three hours later, but it wasn't an easy process, made worse by the vast distances one has to walk between gates in the cavernous new airport. Also, I'd phoned a TK call centre (based in Ukraine) to pay an extra Â£40 or so each way for extra leg room seats. However, as the departure times kept changing, my seats kept changing, and for the return leg to London from Istanbul I was not given an extra leg room seat. Luckily there was a spare exit row seat and the cabin crew sorted me out. Overall, I think their cabin crew and planes are very good, ground staff and call centre staff need better training and they all need better computer systems and software to work with.",Passive
1,"Istanbul to Bucharest. We make our check in in the airport, they Take our luggage , we go to the gate and at the gate surprise they dont let uÈ™ board with two children, because they say the flight is overbooked. We had to wait in the airport with two children until 5 oclock in the morning until they bring uÈ™ to a hotel 2 hours far away from the airport without luggage, without eat without nothing. Our first and last flight with this airline.",Detractor
2,"Rome to Prishtina via Istanbul. I flew with this company several times in the past years, and I can honestly say that it is getting worse and worse. I flew from Rome to Prishtina via Istanbul, all 4 flights had a delay (which apparently is pretty normal with Turkish). The ground staff is for the most part useless. In Istambul i have tried to ask a few information about a flight delay (i had just 30 minutes before the connecting flight) and the whole answer was: ""Relax Sir, No problem Sir, It's okay"". The new airport is a gigantic mess, very big and disorganized. When you land in Istanbul it takes about 20 to 25 minutes taxiing and other 10 minutes before they actually start disembarking, it's an exhausting experience especially if you are in a hurry. Forget about asking for some indication at the new airport, they all chat between each other, some with a coffee in their hands. I flew with four different aircraft on this trip, two were fairly new, the other two were old, seats worn. Food on board was of very poor quality.., let me repeat this, very...poor quality. In general i can say that crew is decently trained and deliver a good experience or at least they try, of course it is far from the experience you get flying Qatar or Emirates but still pretty good, while ground personnel, especially in Istanbul is the personification of the word Lazy. A complete disaster. I will try and avoid Turkish Airlines in the future, it is not a cheap company and definitely not worth for the money you pay. If you are in a hurry or you know you will be avoid this company and avoid the new istanbul airport at all costs.",Detractor
3,"Flew on Turkish Airlines IAD-IST-KHI and return KHI-IST-IAD. Turkish Airlines has consistently maintained its quality since I first flew with them in 2007. The flights leave on time, the catering is excellent, the inflight entertainment is extensive and the interface easy to use, and the cabin crew is excellent. Interesting though the A330 on the KHI-IST route and return seemed to have more leg room and was newer than the A330 on the IAD-IST route which was showing its age. The A330 on the IAD-IST route had a slow responding interface for the inflight entertainment and a broken table on the return flight. But Turkish Airlines will be replacing the A330 on its flight to IAD with the 787 sometime in the summer. Turkish food was served on the return leg which I personally like, and I saw the cabin staff helping elderly passengers walk to the lavatory which was nice. Overall another wonderful experience with Turkish Airlines.",Promoter
4,"Mumbai to Dublin via Istanbul. Never book Turkish airlines if you are traveling to Dublin from Mumbai. If the flight gets delay from Mumbai, they don't have any other options for you. They will straight forward ask you to stay in hotel in Istanbul. They do not care for any for your time loss. No decisions has been made from airlines crew within time. They kept me waiting for more than 3 hours.",Detractor


# 1. NLP Pipeline


### a. Tokenisation


Our initial task involves the tokenization of textual data within the dataset, utilizing the spaCy model, en_core_web_sm. This process yields an additional column in the original dataset, which serves to contain the tokenized data.


In [None]:
nlp = spacy.load("en_core_web_sm")
dataset["tokenized"] = dataset["customer_review"].apply(nlp)

dataset.head()

Unnamed: 0,customer_review,NPS Score,tokenized
0,"London to Izmir via Istanbul. First time I'd flown TK. I found them very good in the air, cabin crew, planes, food, all very nice. Not so great on the ground, ground staff, call centre, computer systems. My flight from LHR was delayed so I missed the connection in Istanbul. Most ground staff don't speak English, and I was given contradictory instructions from those that could speak a little English. I eventually got on a flight to Izmir three hours later, but it wasn't an easy process, made worse by the vast distances one has to walk between gates in the cavernous new airport. Also, I'd phoned a TK call centre (based in Ukraine) to pay an extra Â£40 or so each way for extra leg room seats. However, as the departure times kept changing, my seats kept changing, and for the return leg to London from Istanbul I was not given an extra leg room seat. Luckily there was a spare exit row seat and the cabin crew sorted me out. Overall, I think their cabin crew and planes are very good, ground staff and call centre staff need better training and they all need better computer systems and software to work with.",Passive,"( , London, to, Izmir, via, Istanbul, ., First, time, I, 'd, flown, TK, ., I, found, them, very, good, in, the, air, ,, cabin, crew, ,, planes, ,, food, ,, all, very, nice, ., Not, so, great, on, the, ground, ,, ground, staff, ,, call, centre, ,, computer, systems, ., My, flight, from, LHR, was, delayed, so, I, missed, the, connection, in, Istanbul, ., Most, ground, staff, do, n't, speak, English, ,, and, I, was, given, contradictory, instructions, from, those, that, could, speak, a, little, English, ., I, eventually, got, on, a, flight, to, Izmir, three, hours, later, ,, but, ...)"
1,"Istanbul to Bucharest. We make our check in in the airport, they Take our luggage , we go to the gate and at the gate surprise they dont let uÈ™ board with two children, because they say the flight is overbooked. We had to wait in the airport with two children until 5 oclock in the morning until they bring uÈ™ to a hotel 2 hours far away from the airport without luggage, without eat without nothing. Our first and last flight with this airline.",Detractor,"( , Istanbul, to, Bucharest, ., We, make, our, check, in, in, the, airport, ,, they, Take, our, luggage, ,, we, go, to, the, gate, and, at, the, gate, surprise, they, do, nt, let, uÈ, ™, board, with, two, children, ,, because, they, say, the, flight, is, overbooked, ., We, had, to, wait, in, the, airport, with, two, children, until, 5, oclock, in, the, morning, until, they, bring, uÈ, ™, to, a, hotel, 2, hours, far, away, from, the, airport, without, luggage, ,, without, eat, without, nothing, ., Our, first, and, last, flight, with, this, airline, .)"
2,"Rome to Prishtina via Istanbul. I flew with this company several times in the past years, and I can honestly say that it is getting worse and worse. I flew from Rome to Prishtina via Istanbul, all 4 flights had a delay (which apparently is pretty normal with Turkish). The ground staff is for the most part useless. In Istambul i have tried to ask a few information about a flight delay (i had just 30 minutes before the connecting flight) and the whole answer was: ""Relax Sir, No problem Sir, It's okay"". The new airport is a gigantic mess, very big and disorganized. When you land in Istanbul it takes about 20 to 25 minutes taxiing and other 10 minutes before they actually start disembarking, it's an exhausting experience especially if you are in a hurry. Forget about asking for some indication at the new airport, they all chat between each other, some with a coffee in their hands. I flew with four different aircraft on this trip, two were fairly new, the other two were old, seats worn. Food on board was of very poor quality.., let me repeat this, very...poor quality. In general i can say that crew is decently trained and deliver a good experience or at least they try, of course it is far from the experience you get flying Qatar or Emirates but still pretty good, while ground personnel, especially in Istanbul is the personification of the word Lazy. A complete disaster. I will try and avoid Turkish Airlines in the future, it is not a cheap company and definitely not worth for the money you pay. If you are in a hurry or you know you will be avoid this company and avoid the new istanbul airport at all costs.",Detractor,"( , Rome, to, Prishtina, via, Istanbul, ., I, flew, with, this, company, several, times, in, the, past, years, ,, and, I, can, honestly, say, that, it, is, getting, worse, and, worse, ., I, flew, from, Rome, to, Prishtina, via, Istanbul, ,, all, 4, flights, had, a, delay, (, which, apparently, is, pretty, normal, with, Turkish, ), ., The, ground, staff, is, for, the, most, part, useless, ., In, Istambul, i, have, tried, to, ask, a, few, information, about, a, flight, delay, (, i, had, just, 30, minutes, before, the, connecting, flight, ), and, the, whole, answer, was, :, "", Relax, ...)"
3,"Flew on Turkish Airlines IAD-IST-KHI and return KHI-IST-IAD. Turkish Airlines has consistently maintained its quality since I first flew with them in 2007. The flights leave on time, the catering is excellent, the inflight entertainment is extensive and the interface easy to use, and the cabin crew is excellent. Interesting though the A330 on the KHI-IST route and return seemed to have more leg room and was newer than the A330 on the IAD-IST route which was showing its age. The A330 on the IAD-IST route had a slow responding interface for the inflight entertainment and a broken table on the return flight. But Turkish Airlines will be replacing the A330 on its flight to IAD with the 787 sometime in the summer. Turkish food was served on the return leg which I personally like, and I saw the cabin staff helping elderly passengers walk to the lavatory which was nice. Overall another wonderful experience with Turkish Airlines.",Promoter,"( , Flew, on, Turkish, Airlines, IAD, -, IST, -, KHI, and, return, KHI, -, IST, -, IAD, ., Turkish, Airlines, has, consistently, maintained, its, quality, since, I, first, flew, with, them, in, 2007, ., The, flights, leave, on, time, ,, the, catering, is, excellent, ,, the, inflight, entertainment, is, extensive, and, the, interface, easy, to, use, ,, and, the, cabin, crew, is, excellent, ., Interesting, though, the, A330, on, the, KHI, -, IST, route, and, return, seemed, to, have, more, leg, room, and, was, newer, than, the, A330, on, the, IAD, -, IST, route, which, was, showing, its, age, ., ...)"
4,"Mumbai to Dublin via Istanbul. Never book Turkish airlines if you are traveling to Dublin from Mumbai. If the flight gets delay from Mumbai, they don't have any other options for you. They will straight forward ask you to stay in hotel in Istanbul. They do not care for any for your time loss. No decisions has been made from airlines crew within time. They kept me waiting for more than 3 hours.",Detractor,"( , Mumbai, to, Dublin, via, Istanbul, ., Never, book, Turkish, airlines, if, you, are, traveling, to, Dublin, from, Mumbai, ., If, the, flight, gets, delay, from, Mumbai, ,, they, do, n't, have, any, other, options, for, you, ., They, will, straight, forward, ask, you, to, stay, in, hotel, in, Istanbul, ., They, do, not, care, for, any, for, your, time, loss, ., No, decisions, has, been, made, from, airlines, crew, within, time, ., They, kept, me, waiting, for, more, than, 3, hours, .)"


### b. POS and NE extraction

Now that we have the tokenised version of the text, let us proceed by defining the _get_pos_ and _get_ents_ functions. The former filters the tokenised text based on the provided Part-of-speech (POS) and returns the lemmatised version of the required tokens for each row. The latter extracts the named entities as required. If the np_list is empty, the function returns all the named entities in a given sentence.

These two functions are defined now, as we will be relying on them frequently later.


In [None]:
# Get Lemmatized tokens based on Wanted POS
def get_pos(sentence, wanted_pos):
    return " ".join(token.lemma_ for token in sentence if token.pos_ in wanted_pos)


# REFERENCE [2] : https://towardsdatascience.com/named-entity-recognition-ner-using-spacy-nlp-part-4-28da2ece57c6
# Get named entities - If np_list is not None, only return named entities in np_list
# include label: if True, return (entity, label) tuple, else return entity only
def get_ents(sentence, np_list=None, include_label=True):
    doc = nlp(sentence)
    if doc.ents:
        if np_list:
            return [
                (ent.text, ent.label_) if include_label else ent.text
                for ent in doc.ents
                if ent.label_ in np_list
            ] or []
        else:
            return [
                (ent.text, ent.label_) if include_label else ent.text
                for ent in doc.ents
            ] or []
    return []

We will use adjectives and verbs as our Parts-of-Speech (POS) in this study for our derived datasets. Adjectives like "delayed," "comfortable," "good," "worse," and "terrible" serve as significant indicators of customer satisfaction. Meanwhile, verbs such as "planned," "delayed," "missed," and "overbooked" help us understand the actions of both customers and airlines and how they are connected.

When in comes to named entities we shall first explore them by not providing the list of desired named entities to the _get_ents_ function.


In [None]:
pos_tags = ["ADJ", "VERB"]

print(dataset["customer_review"].apply(get_ents).head(10).to_list())

[[('London', 'GPE'), ('Izmir', 'PERSON'), ('Istanbul', 'GPE'), ('First', 'ORDINAL'), ('LHR', 'ORG'), ('Istanbul', 'GPE'), ('English', 'LANGUAGE'), ('English', 'LANGUAGE'), ('three hours later', 'TIME'), ('Ukraine', 'GPE'), ('London', 'GPE'), ('Istanbul', 'GPE')], [('Istanbul', 'GPE'), ('two', 'CARDINAL'), ('two', 'CARDINAL'), ('5', 'CARDINAL'), ('the morning', 'TIME'), ('2 hours', 'TIME'), ('first', 'ORDINAL')], [('Rome', 'GPE'), ('Prishtina', 'GPE'), ('Istanbul', 'GPE'), ('the past years', 'DATE'), ('Rome', 'GPE'), ('Prishtina', 'GPE'), ('Istanbul', 'GPE'), ('4', 'CARDINAL'), ('Turkish', 'NORP'), ('Istambul', 'GPE'), ('just 30 minutes', 'TIME'), ('Relax Sir', 'WORK_OF_ART'), ('Istanbul', 'GPE'), ('about 20 to 25 minutes', 'TIME'), ('10 minutes', 'TIME'), ('four', 'CARDINAL'), ('two', 'CARDINAL'), ('two', 'CARDINAL'), ('Qatar', 'GPE'), ('Emirates', 'GPE'), ('Istanbul', 'GPE'), ('Lazy', 'ORG'), ('Turkish Airlines', 'ORG')], [('Turkish Airlines', 'ORG'), ('Turkish Airlines', 'ORG'), ('fi

Several named entities appear in the first 10 rows of the dataset. However, in my opinion, the ones with the greatest potential for affecting the NPS score are "TIME," "ORG," and "MONEY."
Naturally, time management is a critical factor in air transportation. Analyzing this entity can reveal frequent complaints or compliments in relation to punctuality, scheduling, and overall airline time management.
When discussing an "ORG" (organization or institution) entity, it is important to identify the specific airline receiving complaints or praise. For example, if a particular airline receives mostly positive reviews, we would expect to see more promoters compared to an airline with mostly negative reviews.
Additionally, money is a significant consideration for airlines, as their ultimate goal is to make a profit, while individual travellers typically aim to reduce expenses. This area is, personally, very crucial to study, as, most of the time, only one part (airline or the traveler) gets out of it completely satisfied.


In [None]:
np_list = [
    "TIME",
    "ORG",
    "MONEY",
]

### c. Datasets setup

We now proceed with defining our derived datasets.


In [None]:
derived_dataset1 = pd.DataFrame(columns=["NPS Score", "pos"])
derived_dataset1["NPS Score"] = dataset["NPS Score"]

# Pos_tags = ["ADJ", "VERB"]
derived_dataset1["pos"] = dataset["tokenized"].apply(
    lambda sent: get_pos(sent, pos_tags)
)

derived_dataset1["pos"].head() # scroll right to see full pos elements

0                                                                                                                             first fly find good nice great delay miss Most speak give contradictory speak little get easy make bad vast have walk cavernous new phone base pay extra Â£40 extra keep change keep change give extra be spare sort think good need well need well work
1                                                                                                                                                                                                                                                                                                                          make take go let ™ say overbooke have wait bring first last
2    fly several past say get bad bad fly have normal most useless try ask few connect whole relax okay new gigantic big disorganized land take taxi other start disembarking exhausting forget ask new chat other fly different new other old wear poor l

In [None]:
derived_dataset2 = pd.DataFrame(columns=["NPS Score", "pos-np"])
derived_dataset2["NPS Score"] = dataset["NPS Score"]

# Pos_tags = ["ADJ", "VERB"]
lemmas = dataset["tokenized"].apply(lambda sent: get_pos(sent, pos_tags))

# np_list = ["TIME", "ORG", "MONEY"]
entities = dataset["customer_review"].apply(
    lambda sent: get_ents(sent, np_list=np_list, include_label=False)
)

# Combine lemmatized POS and entities
derived_dataset2["pos-np"] = (
    lemmas + " " + entities.apply(lambda ents: " ".join([ent for ent in ents]))
)

derived_dataset2["pos-np"].head().to_list()

['first fly find good nice great delay miss Most speak give contradictory speak little get easy make bad vast have walk cavernous new phone base pay extra Â£40 extra keep change keep change give extra be spare sort think good need well need well work LHR three hours later',
 'make take go let ™ say overbooke have wait bring first last the morning 2 hours',
 'fly several past say get bad bad fly have normal most useless try ask few connect whole relax okay new gigantic big disorganized land take taxi other start disembarking exhausting forget ask new chat other fly different new other old wear poor let repeat poor general say train deliver good least try get fly good complete try avoid cheap worth pay know avoid avoid new just 30 minutes about 20 to 25 minutes 10 minutes Lazy Turkish Airlines',
 'return maintain fly leave excellent inflight extensive easy use excellent interesting seem have more new show have slow respond inflight broken replace turkish serve like see help elderly walk 

# 2. CLASSIFICATION


### 1. Setup


To simplify the model training, we've pre-set functions to reduce code. For feature encoding, we will use CountVectorizer and eliminate stop words. Therefore, the get_features function will extract features from the provided dataset column. Two additional functions are defined for returning the LR and MPL models, respectively. The getMPL function permits the declaration of the MPL classifier with either the default settings or customized ones.

The chosen MPLClassifier settings are **[3]**:

- hidden_layer_sizes, presenting the number of neurons in the i-th hidden layer with a default value of 100.
- activation, representing the activation function for the hidden layer. The default activation function is "relu".
- The solver used for weight optimization is specified by the solver parameter, with the default being "adam".

Finally, a 4-fold cross-validation function has been pre-defined to simplify the training and validation procedures discussed in the subsequent sections. Metrics for macro and micro recall, as well as precision, will be calculated.


In [None]:
# REFERENCE [4] : https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer(stop_words="english")


def get_features(dataset_column):
    return count_vect.fit_transform(dataset_column)

In [None]:
# REFERENCE [3] : https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
# REFERENCE [5] : https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier


def getLR():
    return LogisticRegression()


# Default MLP settings
def getMLP(hidden_layer_sizes=100, activation="relu", learning_rate="constant"):
    return MLPClassifier(
        hidden_layer_sizes=hidden_layer_sizes,
        activation=activation,
        learning_rate=learning_rate,
    )

In [None]:
# REFERENCE [6] : https://scikit-learn.org/stable/modules/cross_validation.html
# REFERENCE [7] : https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html
from sklearn.model_selection import cross_val_score


def cross_validate_model(
    model,
    X,
    y,
):
    scoring = ["precision_micro", "precision_macro", "recall_micro", "recall_macro"]
    return [cross_validate(model, X, y, scoring=scoring) for scoring in scoring]


def cross_validate(
    model,
    X,
    y,
    scoring,
    cv=4,
):
    scores = cross_val_score(model, X, y, cv=cv, scoring=scoring)
    return scores.mean()

We define our dataframe to hold the calculated metrics for later analysis


In [None]:
## Model : LR - MPL
## Settings : Default - Custom
## Dataset : dataset - derived_dataset1 - derived_dataset2
scores = pd.DataFrame(
    columns=[
        "Model",
        "Settings",
        "dataset",
        "precision_micro",
        "precision_macro",
        "recall_micro",
        "recall_macro",
    ]
)

## 2. Default Settings Classification


We commence the classification procedure by specifying the default LR and MPL classifiers for the primary dataset, along with the two derived datasets.


### a. LR Default settings


In [None]:
logreg_default = getLR()

        i. dataset


In [None]:
# REFERENCE [8] : https://stackoverflow.com/questions/43162506/undefinedmetricwarning-f-score-is-ill-defined-and-being-set-to-0-0-in-labels-wi
import warnings

warnings.filterwarnings("ignore")  # Ignore zero division warnings

scores.loc[len(scores)] = ["LR", "Default", "dataset"] + cross_validate_model(
    logreg_default, get_features(dataset["customer_review"]), dataset["NPS Score"]
)
scores.head()  # Sample of scores' format

Unnamed: 0,Model,Settings,dataset,precision_micro,precision_macro,recall_micro,recall_macro
0,LR,Default,dataset,0.683393,0.622308,0.683393,0.600668


        ii. derived_dataset1


In [None]:
scores.loc[len(scores)] = ["LR", "Default", "derived_dataset1"] + cross_validate_model(
    logreg_default, get_features(derived_dataset1["pos"]), derived_dataset1["NPS Score"]
)

        iii. derived_dataset2


In [None]:
scores.loc[len(scores)] = ["LR", "Default", "derived_dataset2"] + cross_validate_model(
    logreg_default,
    get_features(derived_dataset2["pos-np"]),
    derived_dataset2["NPS Score"],
)

### b. MLP Default settings


In [None]:
mlp_default = getMLP()

        i. dataset


In [None]:
scores.loc[len(scores)] = ["MLP", "Default", "dataset"] + cross_validate_model(
    mlp_default, get_features(dataset["customer_review"]), dataset["NPS Score"]
)

        ii. derived_dataset1


In [None]:
scores.loc[len(scores)] = ["MLP", "Default", "derived_dataset1"] + cross_validate_model(
    mlp_default, get_features(derived_dataset1["pos"]), derived_dataset1["NPS Score"]
)

        iii. derived_dataset2


In [None]:
scores.loc[len(scores)] = ["MLP", "Default", "derived_dataset2"] + cross_validate_model(
    mlp_default,
    get_features(derived_dataset2["pos-np"]),
    derived_dataset2["NPS Score"],
)

### c. Results and discussion


After the processing the three dataset variants using LR and MPL classifiers, we obtain the following precision and recall scores:


In [None]:
scores.head(6)

Unnamed: 0,Model,Settings,dataset,precision_micro,precision_macro,recall_micro,recall_macro
0,LR,Default,dataset,0.683393,0.622308,0.683393,0.600668
1,LR,Default,derived_dataset1,0.752627,0.666477,0.752627,0.655626
2,LR,Default,derived_dataset2,0.701792,0.629293,0.701792,0.609128
3,MLP,Default,dataset,0.697147,0.623664,0.69724,0.604627
4,MLP,Default,derived_dataset1,0.726514,0.642096,0.725491,0.638241
5,MLP,Default,derived_dataset2,0.686181,0.607843,0.689248,0.597967


**LR:**
The micro recall and precision are identical (0.683393, 0.752627, and 0.701792) across all three dataset variations for the LR classifier. This indicates that the number of correct predictions (true positives) is equal to the number of actual positive cases that should have been predicted in the datasets.

The scores are marginally better for the derived_dataset1 compared to the original dataset, suggesting that focusing on certain POS can lead to better outcomes than including every token.

However, derived_dataset2, which includes both Parts of Speech (POS) and named entities, outperforms the original dataset but falls short of the performance of derived_dataset1. This suggests that the named entities may not have been as effective as planned in enhancing recall, precision, and overall model accuracy, or the initial selection of entities was suboptimal. Although I believed that combining POS and named entities would certainly improve the results.

**MLP:**
Although the initial model of the dataset achieved better recall and precision rates, the derived datasets display lower scores in comparison to LR's outcomes. Furthermore, derived_dataset1 exhibits better scores than the original dataset, while derived_dataset2 demonstrates the poorest outcomes. This yet again accentuates the adverse impact of incorporating named entities as features for model training.

Additionally, we observe minor disparities in micro and macro results for the MLP classifier, signifying class imbalance where specific classes have a greater frequency in the dataset.


## 3. Custom Settings MLP Classification


The next step involves modifying the default parameter values for the MLP classifiers. We repeat this process twice.


### a. MLP Custom settings (version 1)


In [None]:
# default MLP settings: hidden_layer_sizes=100, activation="relu", learning_rate="constant"
mlp_custom = getMLP(
    hidden_layer_sizes=120, activation="tanh", learning_rate="invscaling"
)

        i. dataset


In [None]:
scores.loc[len(scores)] = ["MLP", "Custom", "dataset"] + cross_validate_model(
    mlp_default, get_features(dataset["customer_review"]), dataset["NPS Score"]
)

        ii. derived_dataset1


In [None]:
scores.loc[len(scores)] = ["MLP", "Custom", "derived_dataset1"] + cross_validate_model(
    mlp_default, get_features(derived_dataset1["pos"]), derived_dataset1["NPS Score"]
)

        iii. derived_dataset2


In [None]:
scores.loc[len(scores)] = ["MLP", "Custom", "derived_dataset2"] + cross_validate_model(
    mlp_default, get_features(derived_dataset2["pos-np"]), derived_dataset2["NPS Score"]
)

### b. MLP Custom settings (version 2)


In [None]:
# default MLP settings: hidden_layer_sizes=100, activation="relu", learning_rate="constant"
mlp_custom = getMLP(
    hidden_layer_sizes=140, activation="identity", learning_rate="adaptive"
)

        i. dataset


In [None]:
scores.loc[len(scores)] = ["MLP", "Custom", "dataset"] + cross_validate_model(
    mlp_default, get_features(dataset["customer_review"]), dataset["NPS Score"]
)

        ii. derived_dataset1


In [None]:
scores.loc[len(scores)] = ["MLP", "Custom", "derived_dataset1"] + cross_validate_model(
    mlp_default, get_features(derived_dataset1["pos"]), derived_dataset1["NPS Score"]
)

        iii. derived_dataset2


In [None]:
scores.loc[len(scores)] = ["MLP", "Custom", "derived_dataset2"] + cross_validate_model(
    mlp_default, get_features(derived_dataset2["pos-np"]), derived_dataset2["NPS Score"]
)

# 3. Results and discussion


We begin by appending a column to the scores dataframe, showcasing all parameters utilized to configure the MLP classifier.

In [None]:
params = pd.DataFrame(columns=["params"]) # To be added to scores dataframe to highlight the parameters used for each model (if any)
for i in range(1, 4):
    params.loc[len(params)] = "None"
for i in range(1, 4):
    params.loc[
        len(params)
    ] = "hidden_layer_sizes=100, activation='relu', learning_rate='constant'"
for i in range(1, 4):
    params.loc[
        len(params)
    ] = "hidden_layer_sizes=120, activation='tanh', learning_rate='invscaling'"
for i in range(1, 4):
    params.loc[
        len(params)
    ] = "hidden_layer_sizes=140, activation='identity', learning_rate='adaptive'"

In [None]:
fullscores = scores.copy()
fullscores["params"] = params["params"]
fullscores.head(12)

Unnamed: 0,Model,Settings,dataset,precision_micro,precision_macro,recall_micro,recall_macro,params
0,LR,Default,dataset,0.683393,0.622308,0.683393,0.600668,
1,LR,Default,derived_dataset1,0.752627,0.666477,0.752627,0.655626,
2,LR,Default,derived_dataset2,0.701792,0.629293,0.701792,0.609128,
3,MLP,Default,dataset,0.697147,0.623664,0.69724,0.604627,"hidden_layer_sizes=100, activation='relu', learning_rate='constant'"
4,MLP,Default,derived_dataset1,0.726514,0.642096,0.725491,0.638241,"hidden_layer_sizes=100, activation='relu', learning_rate='constant'"
5,MLP,Default,derived_dataset2,0.686181,0.607843,0.689248,0.597967,"hidden_layer_sizes=100, activation='relu', learning_rate='constant'"
6,MLP,Custom,dataset,0.695753,0.625926,0.69343,0.606495,"hidden_layer_sizes=120, activation='tanh', learning_rate='invscaling'"
7,MLP,Custom,derived_dataset1,0.724098,0.643099,0.72512,0.638379,"hidden_layer_sizes=120, activation='tanh', learning_rate='invscaling'"
8,MLP,Custom,derived_dataset2,0.686275,0.608729,0.684322,0.597375,"hidden_layer_sizes=120, activation='tanh', learning_rate='invscaling'"
9,MLP,Custom,dataset,0.697797,0.62904,0.69566,0.607107,"hidden_layer_sizes=140, activation='identity', learning_rate='adaptive'"


After repeating the MLP model training with custom parameters, the resulting micro and macro values are a bit surprising. In fact, all three MLP configurations and the default LR indicate that derived_dataset1's model has the best micro and macro accuracies. On the other hand, derived_dataset2 had the lowest accuracy, even lower than the original dataset with MLP. This suggests that the inclusion of "TIME", "ORG", and "MONEY" entities was not effective. Perhaps selecting alternative entities may improve the scores and subsequently lead to higher precision rates. Additionally, the disparity between the macro and micro averages suggests a class imbalance.

Between the nine different configurations of the MLP classifier, the one trained on the derived_dataset1, scored the best results. As mentioned before, including the named entities as features alongside POS lemmatized tokens only reduced the scores. For MLP, the model trained on the derived_dataset2 failed to even outperform the original dataset.

When it comes to the different configurations of the MLP classifier, the results are a bit surprising. Normally, increasing the number of neurons in the ith hidden layer (hidden_layer_sizes) would lead to a more complex model, resulting in better micro precision and recall. However, this is not the case here. The standard MLP classifier achieved the best accuracy compared to the custom classifiers. This may also indicate that MLP is sensitive to different parameter values and more extensive testing and evaluation may be required to determine the best configuration for each dataset variation.
Presumably, the change in the activation function and learning rate, together with the increase in the hidden_layer_sizes parameter, is the reason for the inconsistency between the classifiers. Changing each parameter at once might clarify the individual impacts.

Overall, we have emphasised the importance of feature engineering demonstrated through the the precise selection of POS elements alongside the named entities. Nevertheless, certain token selections could lead to undesirable outcomes, as evidenced by derived_dataset2. To identify the most effective options, thorough evaluation and additional testing are necessary. I still remember how, at the begining of the CSI4506 course, Professor Caroline Barriere stated that training the models is a very exhaustive process and requires a lot of trying and sometimes performing random configuration experiments until a satsfactiory outcome is achieved. Even though my task involved working with a small dataset with only three parameters (MLP params), how challenging must it be to work on models like the ones used by chatGPT of Bing AI?

Thank you.

# 4. References


[1] StackOverflow. "How can I display full (non-truncated) dataframe information in HTML when converting from Pandas dataframe to HTML?". Retrieved on December 3st 2023 from :
https://stackoverflow.com/questions/25351968/how-can-i-display-full-non-truncated-dataframe-information-in-html-when-conver

[2] Medium. "Named Entity Recognition NER using spaCy". Retrieved on December 1st 2023 from :
https://towardsdatascience.com/named-entity-recognition-ner-using-spacy-nlp-part-4-28da2ece57c6

[3] scikit-learn. "sklearn.neural_network.MLPClassifier". Retrieved on December 1st 2023 from :
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

[4] scikit-learn. "Working With Text Data". Retrieved on December 1st 2023 from :
https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

[5] scikit-learn. "sklearn.linear_model.LogisticRegression". Retrieved on December 1st 2023 from :
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression

[6] scikit-learn. "Cross-validation: evaluating estimator performance". Retrieved on December 1st 2023 from :
https://scikit-learn.org/stable/modules/cross_validation.html

[7] scikit-learn. "sklearn.model_selection.cross_val_score". Retrieved on December 1st 2023 from :
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html

[8] StackOverflow. "UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples". Retrieved on December 3st 2023 from :
https://stackoverflow.com/questions/43162506/undefinedmetricwarning-f-score-is-ill-defined-and-being-set-to-0-0-in-labels-wi
