**OPTIMIZATION IN SENTIMENT ANALYSIS**
---

---

# Comandos

## Instalación de dependencias

```bash
conda install -c anaconda nltk
conda install -c anaconda pandas
conda install -c conda-forge optuna
conda install -c anaconda scikit-learn
conda install -c conda-forge xgboost
conda install -c conda-forge keras
```

## Solución de errores

**Anaconda deja de funcionar:**

```bash
conda config --set channel_priority true
conda config --set channel_priority false
```

---

# Data preprocesing

In [1]:
import os

os.getcwd()

'/home/porfirio/Workspace/mcpi/optimizacion-ejercicios'

Not every part of the tweet is important for the text processing we do. Some aspects of the tweet like numbers, symbols, stopwords are not so useful for sentiment analysis.

So we just remove them in the preprocessing step. I used nltk python library and regular expressions to remove stopwords, emails, URLs, numbers, white spaces, punctuations, special characters and Unicode data.

The code looks like this:

In [2]:
import pandas as pd
import numpy as np
import re
import string
import unicodedata
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer

#read data
df = pd.read_csv('proyecto_final/train.csv')

nltk.download('stopwords')

url = r'''(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)
(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([
  ^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))'''

tokenizer = RegexpTokenizer(r'\w+')


def clean_data(temp):
    temp = temp.map(lambda x: str(x).lower())  #lower case
    temp = temp.map(lambda x: re.sub(r"\b[^\s]+@[^\s]+[.][^\s]+\b", "", x))  #email
    temp = temp.map(lambda x: re.sub(url, "", x))  #url
    temp = temp.map(lambda x: re.sub(r'[^a-zA-z.,!?/:;\"\'\s]', "", x))  #numbers
    temp = temp.map(lambda x: re.sub(r'^\s*|\s\s*', ' ', x).strip())  #white space
    temp = temp.map(lambda x: ''.join([c for c in x if c not in string.punctuation]))  #punctuations
    temp = temp.map(lambda x: re.sub(r'[^a-zA-z0-9.,!?/:;\"\'\s]', '', x))  #special char
    temp = temp.map(
        lambda x: unicodedata.normalize('NFKD', x).encode('ascii', 'ignore').decode('utf-8', 'ignore'))  #unicode
    temp = temp.map(lambda x: tokenizer.tokenize(x))
    temp = temp.map(lambda x: [i for i in x if i not in stopwords.words('english')])
    temp = temp.map(lambda x: ' '.join(x))
    return temp


df.text = clean_data(df.text)

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/porfirio/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


As I mentioned before, we are going to use two different methods for sentiment analysis namely, XGBoost Classifier and LSTM neural network architecture.

---

# XGBoost classifier

After “cleaning” the text data, the next step is Vectorization. Here, we just convert the text into a numerical format so that the machine learning model can ‘understand’ it.

You can observe that data structures such as Text, Images, Graphs etc need to be converted into numerical representations before building an ML model.

## Vectorization

To vectorize the text, we can simply use a Count Vectorizer method from Sci-Kit Learn. Basically, we transform the text into a sparse matrix of unique words where we use numbers to indicate the presence of a word in our text example.

We will divide the data into train, validation and test sets in the split ratio of — 80:10:10. The split is stratified so that we have an equal proportion of labels/ sentiments in all data splits.

You can use the following code to do this:

In [3]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

#vectorization
cv = CountVectorizer(lowercase=False)
text_vector = cv.fit_transform(df.text.values)

x = text_vector
y = df.iloc[:, -1].values

# train validation test split
x_train, xtest, y_train, ytest = train_test_split(x, y, stratify=y,
                                                  test_size=0.20, random_state=42)

x_val, x_test, y_val, y_test = train_test_split(xtest, ytest, stratify=ytest,
                                                test_size=0.5, random_state=42)

## Optuna integration

In the following code, you will notice an objective function that is being optimized by Optuna. Firstly, we define the hyperparameters that we are interested in tuning and add them to the trial object. Here, I chose to tune learning_rate, max_depth and n_estimators . Depending on the type of hyperparameter, we can use methods such as suggest_float, suggest_int, suggest_categorical .

Inside this objective function, we create an instance of the model and fit it on the training set. After training, we predict the sentiment on the validation set and calculate the accuracy metric. The Optuna’s objective function will try to maximize this accuracy score by performing trials with different values of hyperparameters. Different sampling techniques can be employed during this optimization.

We can rewrite the objective function to work with the loss value of the model. In this case, we will try minimize the objective function.

An early-stopping method is implemented in the form of pruning. The trial will be skipped/ pruned if it seems unpromising.

In [4]:
import optuna
from xgboost import XGBClassifier
from optuna.trial import TrialState
from sklearn.metrics import accuracy_score


# optuna's objective function
def objective(trial):
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
    max_depth = trial.suggest_int("max_depth", 2, 10, step=2, log=False)
    n_estimators = trial.suggest_int("n_estimators", 100, 300, step=100, log=False)

    model = XGBClassifier(objective='multi:softprob',
                          learning_rate=learning_rate,
                          n_estimators=n_estimators,
                          max_depth=max_depth,
                          seed=42)

    model.fit(x_train, y_train)

    y_pred = model.predict(x_val)
    accuracy = accuracy_score(y_val, y_pred)

    # Handle pruning based on the intermediate value.
    if trial.should_prune():
        raise optuna.exceptions.TrialPruned()

    trial.set_user_attr(key="best_model", value=model)  # save model
    return accuracy


# callback function to save the best model as user attribute
def callback(study, trial):
    if study.best_trial.number == trial.number:
        study.set_user_attr(key="best_model", value=trial.user_attrs["best_model"])


# study to maximize the accuracy metric
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20, timeout=None, callbacks=[callback])

[32m[I 2021-12-01 21:26:23,929][0m A new study created in memory with name: no-name-4d54152c-9a0d-4a20-a4ec-590b68dcbd94[0m




[32m[I 2021-12-01 21:26:30,661][0m Trial 0 finished with value: 0.5687772925764192 and parameters: {'learning_rate': 4.271962559326317e-05, 'max_depth': 6, 'n_estimators': 300}. Best is trial 0 with value: 0.5687772925764192.[0m




[32m[I 2021-12-01 21:26:39,708][0m Trial 1 finished with value: 0.5847889374090247 and parameters: {'learning_rate': 0.0020237952677643095, 'max_depth': 8, 'n_estimators': 300}. Best is trial 1 with value: 0.5847889374090247.[0m




[32m[I 2021-12-01 21:26:46,672][0m Trial 2 finished with value: 0.5822416302765647 and parameters: {'learning_rate': 2.25569185938823e-05, 'max_depth': 8, 'n_estimators': 200}. Best is trial 1 with value: 0.5847889374090247.[0m




[32m[I 2021-12-01 21:26:48,482][0m Trial 3 finished with value: 0.616448326055313 and parameters: {'learning_rate': 0.05671432738414724, 'max_depth': 4, 'n_estimators': 100}. Best is trial 3 with value: 0.616448326055313.[0m




[32m[I 2021-12-01 21:26:53,760][0m Trial 4 finished with value: 0.5687772925764192 and parameters: {'learning_rate': 2.171464586770149e-05, 'max_depth': 6, 'n_estimators': 300}. Best is trial 3 with value: 0.616448326055313.[0m




[32m[I 2021-12-01 21:26:55,635][0m Trial 5 finished with value: 0.5251091703056768 and parameters: {'learning_rate': 0.002089127267623584, 'max_depth': 2, 'n_estimators': 200}. Best is trial 3 with value: 0.616448326055313.[0m




[32m[I 2021-12-01 21:26:57,576][0m Trial 6 finished with value: 0.6364628820960698 and parameters: {'learning_rate': 0.061385108846382534, 'max_depth': 6, 'n_estimators': 100}. Best is trial 6 with value: 0.6364628820960698.[0m




[32m[I 2021-12-01 21:27:05,709][0m Trial 7 finished with value: 0.643740902474527 and parameters: {'learning_rate': 0.02715941914843331, 'max_depth': 6, 'n_estimators': 300}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:13,380][0m Trial 8 finished with value: 0.6029839883551674 and parameters: {'learning_rate': 0.0049612250432105685, 'max_depth': 8, 'n_estimators': 300}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:16,243][0m Trial 9 finished with value: 0.5687772925764192 and parameters: {'learning_rate': 0.0007002876973624442, 'max_depth': 6, 'n_estimators': 100}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:24,602][0m Trial 10 finished with value: 0.6131732168850073 and parameters: {'learning_rate': 0.00831602300885738, 'max_depth': 10, 'n_estimators': 200}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:26,524][0m Trial 11 finished with value: 0.6324599708879185 and parameters: {'learning_rate': 0.0786831855818283, 'max_depth': 4, 'n_estimators': 100}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:28,384][0m Trial 12 finished with value: 0.5909752547307132 and parameters: {'learning_rate': 0.023078436031149036, 'max_depth': 4, 'n_estimators': 100}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:30,177][0m Trial 13 finished with value: 0.4959970887918486 and parameters: {'learning_rate': 0.00034747849006004026, 'max_depth': 2, 'n_estimators': 200}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:36,525][0m Trial 14 finished with value: 0.6353711790393013 and parameters: {'learning_rate': 0.02037026812340371, 'max_depth': 10, 'n_estimators': 200}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:38,429][0m Trial 15 finished with value: 0.5906113537117904 and parameters: {'learning_rate': 0.02312789003540615, 'max_depth': 4, 'n_estimators': 100}. Best is trial 7 with value: 0.643740902474527.[0m




[32m[I 2021-12-01 21:27:46,493][0m Trial 16 finished with value: 0.6844978165938864 and parameters: {'learning_rate': 0.08454277038040124, 'max_depth': 8, 'n_estimators': 300}. Best is trial 16 with value: 0.6844978165938864.[0m




[32m[I 2021-12-01 21:27:54,322][0m Trial 17 finished with value: 0.5826055312954876 and parameters: {'learning_rate': 0.0001950536630794215, 'max_depth': 8, 'n_estimators': 300}. Best is trial 16 with value: 0.6844978165938864.[0m




[32m[I 2021-12-01 21:28:07,158][0m Trial 18 finished with value: 0.6295487627365357 and parameters: {'learning_rate': 0.008668902168183992, 'max_depth': 10, 'n_estimators': 300}. Best is trial 16 with value: 0.6844978165938864.[0m




[32m[I 2021-12-01 21:28:14,406][0m Trial 19 finished with value: 0.6874090247452693 and parameters: {'learning_rate': 0.09154420383633016, 'max_depth': 8, 'n_estimators': 300}. Best is trial 19 with value: 0.6874090247452693.[0m


You might have noticed the set_user_attr method. This is used to save any variable which we might find important. Here we are interested in saving the best model that is associated with the highest validation accuracy. We save the best XGboost model in this user attribute.

During the Optuna optimization process, this is what you see:

IMAGE

The number of trials can be higher if you want Optuna to cover a wider range of hyperparameter values.

After the trials have finished, we can retrieve a hyperparameter importance plot which is shown below:

IMAGE

We observe that learning_rate is a more important hyperparameter than the rest of them. With this, we understand which hyperparameters we need to focus on.

## Predicting on the test set

So we have finished with our model training and hyperparameter tuning. We performed 20 trials to find the optimal hyperparameters. Now we can retrieve our best model and make a prediction on the test set.

In [5]:
# retrieve the best model from optuna study
best_model = study.user_attrs['best_model']
y_pred = best_model.predict(x_test)
print(accuracy_score(y_test, y_pred))

0.7042560931247727


Not a shabby score! Let’s see if we can do better.

---

# LSTM architecture

Long short-term memory neural network architecture is popular in the domain of Natural Language Processing as it has the capability to retain the sequence information in its “memory”.

Just like XGBoost, we should vectorize the text data in order to train the LSTM model. We perform tokenization and then pad the vectorized sequences into the same length.

The data is split in a similar fashion to that of the XGBoost model so that we can have a comparison between the two.

## Tokenization and padding

In [6]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical

maxlen = 100
embedding_dim = 100

x = df.text.values
y = df.sentiment.astype("category").cat.codes.values

# train validation and test split
x_train, xtest, y_train, ytest = train_test_split(x, y, stratify=y,
                                                  test_size=0.20,
                                                  random_state=42)
x_val, x_test, y_val, y_test = train_test_split(xtest, ytest,
                                                stratify=ytest,
                                                test_size=0.5,
                                                random_state=42)
y_train = to_categorical(y_train)
y_val = to_categorical(y_val)
y_test = to_categorical(y_test)

#tokenizing and padding
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(df.text.values)

X_train = tokenizer.texts_to_sequences(x_train)
X_val = tokenizer.texts_to_sequences(x_val)
X_test = tokenizer.texts_to_sequences(x_test)

vocab_size = len(tokenizer.word_index) + 1

X_train = pad_sequences(X_train, padding='pre', maxlen=maxlen)
X_val = pad_sequences(X_val, padding='pre', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='pre', maxlen=maxlen)

Now, we define the LSTM model as follows:
    
I selected optimizer, epochs and batch_size as the tunable hyperparameters.

In [7]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D


def lstm(optimizer, epochs, batchsize):
    model = Sequential()
    model.add(Embedding(input_dim=vocab_size,
                        output_dim=embedding_dim,
                        input_length=maxlen))
    model.add(SpatialDropout1D(0.4))
    model.add(LSTM(64, activation="tanh"))
    model.add(Dense(3, activation='softmax'))
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    history = model.fit(X_train, y_train,
                        epochs=epochs,
                        verbose=0,
                        validation_data=(X_val, y_val),
                        batch_size=batchsize)

    return history, model

This neural network model is now ready to train!!

Let’s integrate the Optuna to perform the hyperparameter tuning while we train the LSTM model.

The code for this Optuna integration looks something like this:

In [8]:
import optuna
from optuna.trial import TrialState
from sklearn.metrics import accuracy_score


def objective(trial):
    optimizer_name = trial.suggest_categorical("optimizer", ["adam", "SGD", "RMSprop", "Adadelta"])
    epochs = trial.suggest_int("epochs", 5, 15, step=5, log=False)
    batchsize = trial.suggest_int("batchsize", 8, 40, step=16, log=False)

    history, model = lstm(optimizer_name, epochs, batchsize)

    val_acc = model.evaluate(X_val, y_val)[1]
    weights = model.get_weights()

    # Handle pruning based on the intermediate value.
    if trial.should_prune():
        raise optuna.exceptions.TrialPruned()

    trial.set_user_attr(key="best_model_weights", value=weights)
    return val_acc


def callback(study, trial):
    if study.best_trial.number == trial.number:
        study.set_user_attr(key="best_model_weights",
                            value=trial.user_attrs["best_model_weights"])


study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20, timeout=None, callbacks=[callback])

[32m[I 2021-12-01 21:28:16,730][0m A new study created in memory with name: no-name-8da66392-4d92-4830-a1c3-3f45822cda68[0m
2021-12-01 21:28:16.747785: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-01 21:28:16.981259: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-12-01 21:28:17.000192: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3593260000 Hz




[32m[I 2021-12-01 21:37:05,371][0m Trial 0 finished with value: 0.40465793013572693 and parameters: {'optimizer': 'Adadelta', 'epochs': 10, 'batchsize': 8}. Best is trial 0 with value: 0.40465793013572693.[0m




[32m[I 2021-12-01 21:41:18,755][0m Trial 1 finished with value: 0.7099708914756775 and parameters: {'optimizer': 'RMSprop', 'epochs': 10, 'batchsize': 40}. Best is trial 1 with value: 0.7099708914756775.[0m




[32m[I 2021-12-01 21:45:50,045][0m Trial 2 finished with value: 0.4912663698196411 and parameters: {'optimizer': 'SGD', 'epochs': 10, 'batchsize': 24}. Best is trial 1 with value: 0.7099708914756775.[0m




[32m[I 2021-12-01 21:52:24,449][0m Trial 3 finished with value: 0.6819505095481873 and parameters: {'optimizer': 'adam', 'epochs': 10, 'batchsize': 24}. Best is trial 1 with value: 0.7099708914756775.[0m




[32m[I 2021-12-01 21:56:47,640][0m Trial 4 finished with value: 0.7092430591583252 and parameters: {'optimizer': 'RMSprop', 'epochs': 10, 'batchsize': 40}. Best is trial 1 with value: 0.7099708914756775.[0m




[32m[I 2021-12-01 22:01:05,339][0m Trial 5 finished with value: 0.7150654792785645 and parameters: {'optimizer': 'RMSprop', 'epochs': 10, 'batchsize': 40}. Best is trial 5 with value: 0.7150654792785645.[0m




[32m[I 2021-12-01 22:03:16,021][0m Trial 6 finished with value: 0.4032023251056671 and parameters: {'optimizer': 'Adadelta', 'epochs': 5, 'batchsize': 24}. Best is trial 5 with value: 0.7150654792785645.[0m




[32m[I 2021-12-01 22:14:00,808][0m Trial 7 finished with value: 0.7001455426216125 and parameters: {'optimizer': 'RMSprop', 'epochs': 10, 'batchsize': 8}. Best is trial 5 with value: 0.7150654792785645.[0m




[32m[I 2021-12-01 22:22:52,296][0m Trial 8 finished with value: 0.40465793013572693 and parameters: {'optimizer': 'Adadelta', 'epochs': 10, 'batchsize': 8}. Best is trial 5 with value: 0.7150654792785645.[0m




[32m[I 2021-12-01 22:28:03,836][0m Trial 9 finished with value: 0.7157933115959167 and parameters: {'optimizer': 'RMSprop', 'epochs': 5, 'batchsize': 8}. Best is trial 9 with value: 0.7157933115959167.[0m




[32m[I 2021-12-01 22:35:25,299][0m Trial 10 finished with value: 0.7063318490982056 and parameters: {'optimizer': 'adam', 'epochs': 5, 'batchsize': 8}. Best is trial 9 with value: 0.7157933115959167.[0m




[32m[I 2021-12-01 22:41:35,164][0m Trial 11 finished with value: 0.7074235677719116 and parameters: {'optimizer': 'RMSprop', 'epochs': 15, 'batchsize': 40}. Best is trial 9 with value: 0.7157933115959167.[0m




[32m[I 2021-12-01 22:47:47,370][0m Trial 12 finished with value: 0.6997816562652588 and parameters: {'optimizer': 'RMSprop', 'epochs': 15, 'batchsize': 40}. Best is trial 9 with value: 0.7157933115959167.[0m




[32m[I 2021-12-01 22:49:53,036][0m Trial 13 finished with value: 0.418122261762619 and parameters: {'optimizer': 'SGD', 'epochs': 5, 'batchsize': 24}. Best is trial 9 with value: 0.7157933115959167.[0m




[32m[I 2021-12-01 22:52:19,155][0m Trial 14 finished with value: 0.7208878993988037 and parameters: {'optimizer': 'RMSprop', 'epochs': 5, 'batchsize': 24}. Best is trial 14 with value: 0.7208878993988037.[0m




[32m[I 2021-12-01 22:57:48,013][0m Trial 15 finished with value: 0.7165210843086243 and parameters: {'optimizer': 'RMSprop', 'epochs': 5, 'batchsize': 8}. Best is trial 14 with value: 0.7208878993988037.[0m




[32m[I 2021-12-01 23:00:19,694][0m Trial 16 finished with value: 0.7212518453598022 and parameters: {'optimizer': 'RMSprop', 'epochs': 5, 'batchsize': 24}. Best is trial 16 with value: 0.7212518453598022.[0m




[32m[I 2021-12-01 23:03:34,899][0m Trial 17 finished with value: 0.6986899375915527 and parameters: {'optimizer': 'adam', 'epochs': 5, 'batchsize': 24}. Best is trial 16 with value: 0.7212518453598022.[0m




[32m[I 2021-12-01 23:05:47,799][0m Trial 18 finished with value: 0.42503640055656433 and parameters: {'optimizer': 'SGD', 'epochs': 5, 'batchsize': 24}. Best is trial 16 with value: 0.7212518453598022.[0m




[32m[I 2021-12-01 23:08:17,794][0m Trial 19 finished with value: 0.7219796180725098 and parameters: {'optimizer': 'RMSprop', 'epochs': 5, 'batchsize': 24}. Best is trial 19 with value: 0.7219796180725098.[0m


The structure for this Optuna integration is the same. We just change the model and hyperparameters inside the objective function.

Similarly, we obtain the hyperparameter importance plot for LSTM:

IMAGE

We see that optimizer is an important hyperparameter and batch size was not contributing much to the improvement in the accuracy score.

## Issues that I faced

For XGBoost we could save the model directly but Optuna gives some errors when you are trying to save the Keras model in a similar fashion. From my search, I found that this is because the Keras model is non-pickleable?!

A workaround for this is to just save the weights for the best model and then use these weights to reconstruct the model.

The following code will explain more about this:

In [23]:
print(study.best_params)

embedding_dim = 100
optimizer = study.best_params['optimizer']

model = Sequential()
model.add(Embedding(input_dim=vocab_size,
                           output_dim=embedding_dim,
                           input_length=maxlen))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(64, activation="tanh"))
model.add(Dense(3, activation='softmax'))
model.compile(optimizer=optimizer,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

best_model_weights = study.user_attrs['best_model_weights']
# setting the saved weights to new model
model.set_weights(best_model_weights)

# evaluating on the test set
test_acc = model.evaluate(X_test, y_test)[1]
print(test_acc)

{'optimizer': 'RMSprop', 'epochs': 5, 'batchsize': 24}
0.7115314602851868


You will just create a new instance of the model and set the weights retrieved from Optuna, instead of training it again.

The test accuracy score obtained with LSTM:

XXXX

This score is better than XGBoost. Often, neural network methods perform better than standard machine learning methods. We can improve this accuracy score even further by using Transformer architectures such as BERT, RoBERTa or XLNet.

Finally, I enjoyed using Optuna for hyperparameter tuning. I could easily retrieve the best model from all the different trials and also understand which hyperparameter is important to tune during the training process (using the hyperparameter importance plot).