<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 15px; height: 45px">

# Capstone Project: Multi-Label Text classification for Trust and Safety (Content moderation) - Part 2 (Feature Extraction, Modelling)

### Import library

In [57]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import re
import nltk

# module for tokenizer, stemming and lemmatization
from nltk.tokenize import sent_tokenize, word_tokenize, RegexpTokenizer      
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# feature extraction / Bag of words
from sklearn.feature_extraction.text import TfidfVectorizer

# modelling
from sklearn.model_selection import cross_val_score, train_test_split, GridSearchCV, StratifiedKFold
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier 
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold

#Evalutation
from surprise import SVD
#from surprise.model_selection import cross_validate, train_test_split
from sklearn.model_selection import train_test_split
from surprise import accuracy #pip install scikit-surprise
from sklearn.model_selection import cross_val_score, KFold
from sklearn.metrics import accuracy_score, make_scorer
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.metrics import confusion_matrix, plot_confusion_matrix, plot_roc_curve, roc_auc_score, accuracy_score, f1_score

import warnings
warnings.filterwarnings("ignore")


## Content

#### Part 2:
- Feature Extraction
- Train Test Split
- Baseline Modelling
- Model Optimization through GridSearch CV
- Model Analysis
- Conclusion and Recommedations

### Import data

In [58]:
df =pd.read_pickle('./datasets/df.pkl')
# make sure to run the save to pickle file from part 1

In [59]:
# check
df.head()


Unnamed: 0,id,comment_text,senti_scores,compound,sentiment_type,tokens,snowball_stems,stop_snow_stems,lems,stop_lems,...,stop_lems_wcount,comment_length,comment_wordcount,toxic,severe_toxic,obscene,threat,insult,identity_hate,safe
0,0000997932d777bf,why the edits made under my username hardcore ...,"{'neg': 0.0, 'neu': 0.892, 'pos': 0.108, 'comp...",0.5574,POSITIVE,"[why, the, edits, made, under, my, username, h...","[whi, the, edit, made, under, my, usernam, har...","[whi, edit, made, usernam, hardcor, metallica,...","[why, the, edits, made, under, my, username, h...","[edits, made, username, hardcore, metallica, f...",...,25,230,41,0,0,0,0,0,0,1
1,000103f0d9cfb60f,daww he matches this background colour im seem...,"{'neg': 0.118, 'neu': 0.71, 'pos': 0.172, 'com...",0.2263,NEUTRAL,"[daww, he, matches, this, background, colour, ...","[daww, he, match, this, background, colour, im...","[daww, match, background, colour, im, seem, st...","[daww, he, match, this, background, colour, im...","[daww, match, background, colour, im, seemingl...",...,11,90,14,0,0,0,0,0,0,1
2,000113f07ec002fd,hey man im really not trying to edit war its j...,"{'neg': 0.083, 'neu': 0.849, 'pos': 0.068, 'co...",-0.1779,NEUTRAL,"[hey, man, im, really, not, trying, to, edit, ...","[hey, man, im, realli, not, tri, to, edit, war...","[hey, man, im, realli, tri, edit, war, guy, co...","[hey, man, im, really, not, trying, to, edit, ...","[hey, man, im, really, trying, edit, war, guy,...",...,22,227,42,0,0,0,0,0,0,1
3,0001b41b1c6bb37e,i cant make any real suggestions on improvemen...,"{'neg': 0.044, 'neu': 0.893, 'pos': 0.063, 'co...",0.25,POSITIVE,"[i, cant, make, any, real, suggestions, on, im...","[i, cant, make, ani, real, suggest, on, improv...","[cant, make, ani, real, suggest, improv, wonde...","[i, cant, make, any, real, suggestion, on, imp...","[cant, make, real, suggestion, improvement, wo...",...,50,593,107,0,0,0,0,0,0,1
4,0001d958c54c6e35,you sir are my hero any chance you remember wh...,"{'neg': 0.0, 'neu': 0.663, 'pos': 0.337, 'comp...",0.6808,POSITIVE,"[you, sir, are, my, hero, any, chance, you, re...","[you, sir, are, my, hero, ani, chanc, you, rem...","[sir, hero, ani, chanc, rememb, page]","[you, sir, are, my, hero, any, chance, you, re...","[sir, hero, chance, remember, page, thats]",...,6,62,13,0,0,0,0,0,0,1


In [60]:
categories = df.iloc[:, 14::].columns # y target

In [61]:
categories # y target

Index(['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate',
       'safe'],
      dtype='object')

## 1. Feature Extraction

In [62]:
# Create tokenizer and lemmatizer function for TF-ID vectorizer
lemmatizer = WordNetLemmatizer()

def tokenize_and_lemmatize(text):
    
    tokens = [word.lower() for sent in nltk.sent_tokenize(text) for word in nltk.word_tokenize(sent)]
    filtered_tokens = []
    
    for token in tokens:
        if re.search('[a-zA-Z]', token):
            filtered_tokens.append(token)
    lem = [lemmatizer.lemmatize(t) for t in filtered_tokens]
    return lem

In [71]:
#instantiate TF-IDF
tvec = TfidfVectorizer(stop_words='english', tokenizer=tokenize_and_lemmatize, ngram_range=(1, 2), max_features=8000,) 

## 2. Train Test Split

In [72]:
%%time

#Fit transform TF-IDF vectorizer on comment text col

X = tvec.fit_transform(df['comment_text']) #Features
y = df[categories] #target

Wall time: 2min 7s


In [73]:
#check shape 
print(f'X : {X.shape}')
print(f'y :  {y.shape}')

X : (159571, 8000)
y :  (159571, 7)


In [74]:
# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [75]:
#check shape
print(f'X : {X.shape}')
print(f'y :  {y.shape}')

X : (159571, 8000)
y :  (159571, 7)


## 3. Baseline Modelling

- Generate Baseline Models scores

In [101]:
# set up baseline model performance table
model_performance = []

In [77]:
# Create function for model metric scores
   
def get_performance(y_test, y_pred):
    # Evaluate Performance
    accuracy = accuracy_score(y_test, y_pred)
    # Get precision, recall, f1 scores
    precision, recall, f1score, support = score(y_test, y_pred, average='micro') 
    #Micro-average will aggregate the contributions of all classes to compute the average metric
    
    return accuracy, precision, recall, f1score

#In a multi-class classification setup, micro-average is preferable if you suspect there might be class imbalance

### Logistic Regression

In [78]:
%%time

#Instantiate lr model and fit 
lr = LogisticRegression()
oneVsRes_lr = OneVsRestClassifier(lr) # OVR a method for using binary classification algorithms for multi-label classification problems
oneVsRes_lr.fit(X_train, y_train)

Wall time: 7.85 s


OneVsRestClassifier(estimator=LogisticRegression())

In [79]:
y_pred = oneVsRes_lr.predict(X_test)

# Performance metrics
accuracy, precision, recall, f1score = get_performance(y_test, y_pred)
print(f'Test Accuracy Score of Logistic Reg.: {accuracy}')
print(f'Precision : {precision}')
print(f'Recall    : {recall}')
print(f'F1-score   : {f1score}')


Test Accuracy Score of Logistic Reg.: 0.9154989597172436
Precision : 0.9463262238090764
Recall    : 0.9018602672875019
F1-score   : 0.9235583370585606


- Test scores performed too well. This i assessed, is due to data imbalance that we have identified during the EDA Process. logistic regression tends to have a bias prediction based on the majority of the class data.

- We will treat the Data imbalance with setting the hyperparameter in the model, class weight = 'balance'.

In [80]:
%%time

#Instantiate lr model and fit 
lr = LogisticRegression(class_weight = 'balanced')
oneVsRes_lr = OneVsRestClassifier(lr) # OVR a method for using binary classification algorithms for multi-class classification problems
oneVsRes_lr.fit(X_train, y_train)

Wall time: 9.36 s


OneVsRestClassifier(estimator=LogisticRegression(class_weight='balanced'))

In [102]:
y_pred = oneVsRes_lr.predict(X_test)

accuracy, precision, recall, f1score = get_performance(y_test, y_pred)
print(f'Test Accuracy Score of Logistic Regression: {accuracy}')
print(f'Precision : {precision}')
print(f'Recall    : {recall}')
print(f'F1-score   : {f1score}')
print(f'Train_score : {oneVsRes_lr.score(X_train, y_train)}')
print(f'Test_score : {oneVsRes_lr.score(X_test, y_test)}')

# Add performance parameters to list
model_performance.append(dict([
    ('Model', 'Logistic Regression'),
    ('Test Accuracy', accuracy),
    ('Precision', precision),
    ('Recall', recall),
    ('F1', f1score),
    ('Train Score', oneVsRes_lr.score(X_train, y_train)),
    ('Test Score', oneVsRes_lr.score(X_test, y_test))
    
     ]))

Test Accuracy Score of Logistic Regression: 0.8543102799989973
Precision : 0.819183343440469
Recall    : 0.9071881086163283
F1-score   : 0.860942628610276
Train_score : 0.8563144437574157
Test_score : 0.8543102799989973


- After setting the class weight to balanced, the scores declined. This proves that data imbalance was indeed affecting the scores with bias.

### Random Forrest Classifier

In [84]:
%%time

rf = RandomForestClassifier(class_weight = 'balanced', n_jobs = -1) # setting class_weight to balance to treat data imbalance
rf.fit(X_train, y_train)

Wall time: 7min 47s


RandomForestClassifier(class_weight='balanced', n_jobs=-1)

In [103]:
y_pred = rf.predict(X_test)

# Performance metrics
accuracy, precision, recall, f1score = get_performance(y_test, y_pred)
print(f'Test Accuracy Score of Random Forrest Classifier: {accuracy}')
print(f'Precision : {precision}')
print(f'Recall    : {recall}')
print(f'F1-score   : {f1score}')
print(f'Train_score : {rf.score(X_train, y_train)}')
print(f'Test_score : {rf.score(X_test, y_test)}')

# Add performance parameters to list
model_performance.append(dict([
    ('Model', 'Random Forrest Classifier'),
    ('Test Accuracy', accuracy),
    ('Precision', precision),
    ('Recall', recall),
    ('F1', f1score),
    ('Train Score', rf.score(X_train, y_train)),
    ('Test Score', rf.score(X_test, y_test))
    
     ]))

Test Accuracy Score of Random Forrest Classifier: 0.8982528263103803
Precision : 0.8982528263103803
Recall    : 0.802175908307403
F1-score   : 0.8475001182536304
Train_score : 0.898343889436655
Test_score : 0.8982528263103803


### Decision Tree Classifier

In [86]:
%%time

clf = DecisionTreeClassifier(class_weight = 'balanced')
clf.fit(X_train, y_train)

Wall time: 3min 6s


DecisionTreeClassifier(class_weight='balanced')

In [104]:
y_pred = clf.predict(X_test)

# Performance metrics
accuracy, precision, recall, f1score = get_performance(y_test, y_pred)
print(f'Test Accuracy Score of Decision Tree Classifier: {accuracy}')
print(f'Precision : {precision}')
print(f'Recall    : {recall}')
print(f'F1-score   : {f1score}')
print(f'Train_score : {clf.score(X_train, y_train)}')
print(f'Test_score : {clf.score(X_test, y_test)}')

# Add performance parameters to list
model_performance.append(dict([
    ('Model', 'Decision Tree Classifier'),
    ('Test Accuracy', accuracy),
    ('Precision', precision),
    ('Recall', recall),
    ('F1', f1score),
    ('Train Score', clf.score(X_train, y_train)),
    ('Test Score', clf.score(X_test, y_test))
    
     ]))

Test Accuracy Score of Decision Tree Classifier: 0.7639686160479282
Precision : 0.6378176518808635
Recall    : 0.7837970943117458
F1-score   : 0.7033123757105838
Train_score : 0.9853523621718278
Test_score : 0.7639686160479282


In [105]:
model_performance

[{'Model': 'Logistic Regression',
  'Test Accuracy': 0.8543102799989973,
  'Precision': 0.819183343440469,
  'Recall': 0.9071881086163283,
  'F1': 0.860942628610276,
  'Train Score': 0.8563144437574157,
  'Test Score': 0.8543102799989973},
 {'Model': 'Random Forrest Classifier',
  'Test Accuracy': 0.8982528263103803,
  'Precision': 0.8982528263103803,
  'Recall': 0.802175908307403,
  'F1': 0.8475001182536304,
  'Train Score': 0.898343889436655,
  'Test Score': 0.8982528263103803},
 {'Model': 'Decision Tree Classifier',
  'Test Accuracy': 0.7639686160479282,
  'Precision': 0.6378176518808635,
  'Recall': 0.7837970943117458,
  'F1': 0.7033123757105838,
  'Train Score': 0.9853523621718278,
  'Test Score': 0.7639686160479282}]

In [106]:
# Generate baseline score summary table
results = pd.DataFrame(data=model_performance)
results = results[['Model', 'Test Accuracy', 'Precision', 'Recall', 'F1', 'Train Score', 'Test Score']]
results = results.sort_values(by='F1', ascending=False)
results = results.set_index('Model')
results

Unnamed: 0_level_0,Test Accuracy,Precision,Recall,F1,Train Score,Test Score
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Logistic Regression,0.85431,0.819183,0.907188,0.860943,0.856314,0.85431
Random Forrest Classifier,0.898253,0.898253,0.802176,0.8475,0.898344,0.898253
Decision Tree Classifier,0.763969,0.637818,0.783797,0.703312,0.985352,0.763969


- Based on F1 scores, Random Forrest Classifier performed the best while Logestic Regression performed slightly poorer. 
- Train and Test score indicates that Random Forrest Classifier is overfitted whilst Logistic Regression is not.
- With the baseline scores generated, we will now utilise GridsearchCV to optimize our selected 2 models, Logistic Regression and Random Forrest Classifier with hypertuninng parameters for better performing results.
- we will ommit out Decision Tree classifier as it performed the poorest among all the models.

## 5. Model Optimization through GridSearch CV

In [90]:
# set up baseline model performance table
model_performance_tuned = []

### GridSearch Radomn Forrest Classifer

In [91]:
rf.get_params().keys()

dict_keys(['bootstrap', 'ccp_alpha', 'class_weight', 'criterion', 'max_depth', 'max_features', 'max_leaf_nodes', 'max_samples', 'min_impurity_decrease', 'min_impurity_split', 'min_samples_leaf', 'min_samples_split', 'min_weight_fraction_leaf', 'n_estimators', 'n_jobs', 'oob_score', 'random_state', 'verbose', 'warm_start'])

In [92]:
# instatiate parameters for Gridsearch
param_gs = {
    'max_depth': [5, 8, 15, 30, 80],
    'max_leaf_nodes':[1, 3, 5, 10], 
    'max_samples': [1,3,5,10]
    
    #'max_features': [ 1, 3, 10],
    #'n_estimators': [ 1, 5, 10]

}

# Set kflod parameters
kf = KFold(n_splits = 5)

#Set Gridsearch Parameters
rf_grid = GridSearchCV(rf, param_grid = param_gs, cv = kf, n_jobs = -1, verbose = 1)


In [46]:
%%time
# Fit model
rf_grid.fit(X_train, y_train)

Fitting 5 folds for each of 80 candidates, totalling 400 fits
Wall time: 20min 54s


GridSearchCV(cv=KFold(n_splits=5, random_state=None, shuffle=False),
             estimator=RandomForestClassifier(class_weight='balanced',
                                              n_jobs=-1),
             n_jobs=-1,
             param_grid={'max_depth': [5, 8, 15, 30, 80],
                         'max_leaf_nodes': [1, 3, 5, 10],
                         'max_samples': [1, 3, 5, 10]},
             verbose=1)

In [52]:
print(rf_grid.score(X_train, y_train))
print(rf_grid.score(X_test, y_test))

0.898343889436655
0.8982528263103803


In [48]:
# examine the best model
print(rf_grid.best_params_)
print(rf_grid.best_estimator_)

{'max_depth': 5, 'max_leaf_nodes': 3, 'max_samples': 1}
RandomForestClassifier(class_weight='balanced', max_depth=5, max_leaf_nodes=3,
                       max_samples=1, n_jobs=-1)


In [93]:
%%time
#fitting model with best params

rf = RandomForestClassifier(class_weight='balanced', max_depth=5, max_leaf_nodes=3,
                       max_samples=1, n_jobs=-1)
rf.fit(X_train, y_train)

Wall time: 1.76 s


RandomForestClassifier(class_weight='balanced', max_depth=5, max_leaf_nodes=3,
                       max_samples=1, n_jobs=-1)

In [94]:
y_pred = rf.predict(X_test)

# Performance metrics
accuracy, precision, recall, f1score = get_performance(y_test, y_pred)
print(f'Test Accuracy Score of Random Forrest Classifer.: {accuracy}')
print(f'Precision : {precision}')
print(f'Recall    : {recall}')
print(f'F1-score   : {f1score}')
print(f' train_score: {rf.score(X_train, y_train)}')
print(f' test_score: {rf.score(X_test, y_test)}')

#Add performance parameters to list
model_performance_tuned.append(dict([
    ('Model', 'Random Forrest Classifier'),
    ('Test Accuracy', accuracy),
    ('Precision', precision),
    ('Recall', recall),
    ('F1', f1score),
     ('Train Score', rf.score(X_train, y_train)),
    ('Test Score', rf.score(X_test, y_test))


]))

Test Accuracy Score of Random Forrest Classifer.: 0.8982528263103803
Precision : 0.8982528263103803
Recall    : 0.802175908307403
F1-score   : 0.8475001182536304
 train_score: 0.898343889436655
 test_score: 0.8982528263103803


In [None]:
# no overfitting 

### GridSearch Logistic Regression

In [95]:
#Re instantiate Logistic Regression
lr = LogisticRegression() # we will insert the class weight = 'balance' later

oneVsRes_lr1 = OneVsRestClassifier(lr) 
#oneVsRes_lr1.fit(X_train, y_train)

In [47]:
oneVsRes_lr1.get_params().keys()

dict_keys(['estimator__C', 'estimator__class_weight', 'estimator__dual', 'estimator__fit_intercept', 'estimator__intercept_scaling', 'estimator__l1_ratio', 'estimator__max_iter', 'estimator__multi_class', 'estimator__n_jobs', 'estimator__penalty', 'estimator__random_state', 'estimator__solver', 'estimator__tol', 'estimator__verbose', 'estimator__warm_start', 'estimator', 'n_jobs'])

In [86]:
# instatiate parameters for Gridsearch
lr_param_gs = {
'estimator__class_weight' : ['balanced'],
'estimator__penalty' : ['l1', 'l2'],
'estimator__solver': ['newton-cg', 'lbfgs', 'sag', 'saga'],
 'estimator__C' : [100, 10, 1.0, 0.1, 0.01],
    'n_jobs' : [-1]
}


# Set kfold parameters
kf = KFold(n_splits = 5)


In [177]:
# Gridsearch
lr_grid = GridSearchCV(oneVsRes_lr1, param_grid = lr_param_gs, cv = kf, n_jobs = -1)


In [88]:
%%time
# Fit model
lr_grid.fit(X_train, y_train)

Wall time: 1h 46min 2s


GridSearchCV(cv=KFold(n_splits=5, random_state=None, shuffle=False),
             estimator=OneVsRestClassifier(estimator=LogisticRegression()),
             n_jobs=-1,
             param_grid={'estimator__C': [100, 10, 1.0, 0.1, 0.01],
                         'estimator__class_weight': ['balanced'],
                         'estimator__penalty': ['l1', 'l2'],
                         'estimator__solver': ['newton-cg', 'lbfgs', 'sag',
                                               'saga'],
                         'n_jobs': [-1]})

In [89]:
# examine the best model
print(lr_grid.best_score_)
print(lr_grid.best_params_)
print(lr_grid.best_estimator_)

0.856005265137086
{'estimator__C': 0.01, 'estimator__class_weight': 'balanced', 'estimator__penalty': 'l2', 'estimator__solver': 'lbfgs', 'n_jobs': -1}
OneVsRestClassifier(estimator=LogisticRegression(C=0.01,
                                                 class_weight='balanced'),
                    n_jobs=-1)


In [96]:
#instantiate model with best params

lr = LogisticRegression(C = 0.01, class_weight='balanced', penalty= 'l2', solver= 'lbfgs',
                        n_jobs=-1)


In [97]:
# fit model with best params
oneVsRes_lr = OneVsRestClassifier(lr) 
oneVsRes_lr.fit(X_train, y_train)

OneVsRestClassifier(estimator=LogisticRegression(C=0.01,
                                                 class_weight='balanced',
                                                 n_jobs=-1))

In [98]:
y_pred = oneVsRes_lr.predict(X_test)

# Performance metrics
accuracy, precision, recall, f1score = get_performance(y_test, y_pred)
print(f'Test Accuracy Score of Logistic Regression: {accuracy}')
print(f'Precision : {precision}')
print(f'Recall    : {recall}')
print(f'F1-score   : {f1score}')
print(f'Train_score : {oneVsRes_lr.score(X_train, y_train)}')
print(f'Test_score : {oneVsRes_lr.score(X_test, y_test)}')

# Add performance parameters to list
model_performance_tuned.append(dict([
    ('Model', 'Logistic Regression'),
    ('Test Accuracy', accuracy),
    ('Precision', precision),
    ('Recall', recall),
    ('F1', f1score),
    ('Train Score', oneVsRes_lr.score(X_train, y_train)),
    ('Test Score', oneVsRes_lr.score(X_test, y_test))
    
     ]))

Test Accuracy Score of Logistic Regression: 0.8543102799989973 %
Precision : 0.819183343440469
Recall    : 0.9071881086163283
F1-score   : 0.860942628610276
Train_score : 0.8563144437574157
Test_score : 0.8543102799989973


In [None]:
# no overfitting

## 6. Model Analysis

In [99]:
# results of the models without tunning (Baseline model scores)
results

Unnamed: 0_level_0,Test Accuracy,Precision,Recall,F1,Train Score,Test Score
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Logistic Regression,0.845838,0.820676,0.917329,0.866315,0.86274,0.845838
Random Forrest Classifier,0.890231,0.86448,0.856081,0.86026,0.986322,0.890231
Decision Tree Classifier,0.763969,0.637818,0.783797,0.703312,0.985352,0.763969


In [100]:
# Generate tuned score summary table
results_tuned = pd.DataFrame(data= model_performance_tuned)
results_tuned = results_tuned[[ 'Model','Test Accuracy', 'Precision', 'Recall', 'F1', 'Train Score', 'Test Score']]
results_tuned = results_tuned.sort_values(by='F1', ascending=False)
results_tuned = results_tuned.set_index('Model')
results_tuned

Unnamed: 0_level_0,Test Accuracy,Precision,Recall,F1,Train Score,Test Score
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Logistic Regression,0.85431,0.819183,0.907188,0.860943,0.856314,0.85431
Random Forrest Classifier,0.898253,0.898253,0.802176,0.8475,0.898344,0.898253


- After hypertunning the parameters of the models, we have anaylsed that both model's F1 score was slightly reduced. however, overfitting were resolved. 
- With these observations, Logestic Regression performed the best amongst all thus it is the selected Model of Choice.

## 7. Conclusion

From the modelling process,it is evident that Logistic Regression is the choice model to predict multi-label classification of the Toxic Words. 
It is also observed that top frequent words are co-related in Toxic, Severe_toxic, Insult and Obscene classifications. This relates that there is a co-relations in these classifications.

## Recommendations

Recently NLP Deep Learning models on text classification has been achieving state of the art results. It has shown that deep learning methods are able to provide for a suite of standard academic benchmark problems. Though we had achieve a considerably good modelling result here, we can explore utilising Deep learning models such as LSTM to predict the results.  Having establishing a a benchmark score here we can run deep learning models to analyse if they are indeed better. 