# M - Automated Essay Scoring
_School of Information Technology_<br>
_Monash University Malaysia_<br>
(c) Copyright 2020, Ian Tan & Jun Qing Lim

Steps

- Read dataset (ASAP)
- Extract features (into file) using EASE
- Conduct machine learning (Sci-kit Learn libraries)
    - Naive Bayes
    - SVR
    - BLRR (later)
- Evaluate (QWK)

## Import Libraries

In [6]:
import time
start = time.time()

In [7]:
import numpy as np
import pandas as pd
from collections import defaultdict

from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
from nltk.stem import WordNetLemmatizer

from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import model_selection, naive_bayes, svm #SVR is in SVM
from sklearn.metrics import accuracy_score, confusion_matrix

### Import the EASE functions, which is located in the ease folder.

In [8]:
import sys
sys.path.insert(1, 'ease')
import create
import grade 
import model_creator 
import predictor_extractor 
import predictor_set 
import util_functions
import essay_set
import feature_extractor

from essay_set import EssaySet
from feature_extractor import FeatureExtractor

## Read Dataset

AES (Hewlett Foundation dataset from Kaggle) in the folder `asap-aes`.  For this, we use the `training_set_rel3` for training and testing.  Note that the `test_set` and the `valid_set` cannot be used as they don't contain the scores and are meant for the competition to score the entries.

In [9]:
data_set = pd.read_csv("asap-aes/training_set_rel3.tsv", sep='\t', encoding="latin-1")

In [10]:
data_set['essay'] = [entry.lower() for entry in data_set['essay']] # lower case for all words in essay

There are 8 different essay sets.  As an overview:
- Sets 1 & 2 are of persuasive/narrative in the form of letters
- Sets 3, 4, 5 & 6 are source dependent response to a given essay
- Sets 7 & 8 are of persuasive/narrative in the form of story writing essays

These format makes it good for transfer learning.

In [11]:
data_set_1 = data_set[data_set['essay_set'] == 1]
data_set_2 = data_set[data_set['essay_set'] == 2]
#data_set_3 = data_set[data_set['essay_set'] == 3]
#data_set_4 = data_set[data_set['essay_set'] == 4]
#data_set_5 = data_set[data_set['essay_set'] == 5]
#data_set_6 = data_set[data_set['essay_set'] == 6]
#data_set_7 = data_set[data_set['essay_set'] == 7]
#data_set_8 = data_set[data_set['essay_set'] == 8]

As each set will retain the original index, we want each of them to have their own indexing so that it is easier to match the essay and the scores.

In [12]:
data_set_1 = data_set_1.reset_index() # resets index
data_set_2 = data_set_2.reset_index()
#data_set_3 = data_set_3.reset_index()
#data_set_4 = data_set_4.reset_index()
#data_set_5 = data_set_5.reset_index()
#data_set_6 = data_set_6.reset_index()
#data_set_7 = data_set_7.reset_index()
#data_set_8 = data_set_8.reset_index()

We use just the `essay` content and the respective `scores`.

In [13]:
# If you want for the whole dataset.
# Commented out as we will work on individual datasets
#essays = data_set['essay']
#scores = data_set['domain1_score']

In [14]:
essays_1 = data_set_1['essay']
scores_1 = data_set_1['domain1_score']
essays_2 = data_set_2['essay']
scores_2 = data_set_2['domain1_score']
#essays_3 = data_set_3['essay']
#scores_3 = data_set_3['domain1_score']
#essays_4 = data_set_4['essay']
#scores_4 = data_set_4['domain1_score']
#essays_5 = data_set_5['essay']
#scores_5 = data_set_5['domain1_score']
#essays_6 = data_set_6['essay']
#scores_6 = data_set_6['domain1_score']
#essays_7 = data_set_7['essay']
#scores_7 = data_set_7['domain1_score']
#essays_8 = data_set_8['essay']
#scores_8 = data_set_8['domain1_score']

Rename the `domain1_score` column to `score`.

In [15]:
scores_1.columns = "score"
scores_2.columns = "score"
#scores_3.columns = "score"
#scores_4.columns = "score"
#scores_5.columns = "score"
#scores_6.columns = "score"
#scores_7.columns = "score"
#scores_8.columns = "score"

THE ABOVE NEEDS TO BE PUT INTO A LOOP BUT I LEFT IT AS IS BECAUSE YOU CAN PICK AND CHOOSE EASILY INSTEAD.

## Prepare Data

### Create the essay sets

Again, these can be looped but I kept them separated for ease of readability and commenting out those that we don't need.  Each set takes a long time to process, and hence please be patient with this part.

In [16]:
e_set_1 = EssaySet()
e_set_2 = EssaySet()
#e_set_3 = EssaySet()
#e_set_4 = EssaySet()
#e_set_5 = EssaySet()
#e_set_6 = EssaySet()
#e_set_7 = EssaySet()
#e_set_8 = EssaySet()

In [17]:
for i in range(len(essays_1)):
    e_set_1.add_essay(essays_1[i], scores_1[i])

In [18]:
for i in range(len(essays_2)):
    e_set_2.add_essay(essays_2[i], scores_2[i])

Left out for sets 3 - 6 for now.

In [19]:
"""
for i in range(len(essays_7)):
    e_set_7.add_essay(essays_7[i], scores_7[i])
"""

'\nfor i in range(len(essays_7)):\n    e_set_7.add_essay(essays_7[i], scores_7[i])\n'

In [20]:
"""
for i in range(len(essays_8)):
    e_set_8.add_essay(essays_8[i], scores_8[i])
"""

'\nfor i in range(len(essays_8)):\n    e_set_8.add_essay(essays_8[i], scores_8[i])\n'

## Extract Features

In [21]:
f_extractor = FeatureExtractor()

Change the next two variable assignment to change the evaluation of the essay sets.

Would be better to do this above.

**SETUP HERE**

In [22]:
e_set = e_set_2
score = scores_2

In [23]:
length = f_extractor.gen_length_feats(e_set)
length_df = pd.DataFrame(
    length, 
    columns = [
        'chars', 
        'words', 
        'commas', 
        'apostrophes', 
        'punctuations', 
        'avg_word_length',
        # new stuff, will need to compare original with new and separate punctuations
        'sentences',
        'questions',
        'avg_word_sentence',
        'POS', 
        'POS/total_words'
    ]
)

_*Exclude the prompts for the time being*_

To be included next.

In [24]:
# Merge this with the score based on the index
# We use the shallow features first
features = length_df
dataset = features.merge(score, left_index=True, right_index=True)
dataset.columns = ['chars', 'words', 'commas', 'apostrophes', 'punctuations',
                   'avg_word_length', 'sentences', 'questions', 'avg_word_sentence',
                   'POS', 'POS/total_words', 'score']
#X_1 = dataset.iloc[:,0:10].values.astype(float)
#y_1 = dataset.iloc[:,11].values.astype(float)

## Determine Essay Prompts

In [25]:
essay_prompts = []

for i in range(1,9):
    file = "prompts/set" + str(i) + ".txt"
    f = open(file, "r", encoding="latin-1") # there are some 0x9x characters, hence need to specify encoding
    essay_prompts.append(f.read())
    
def get_essay_prompt(essay_set):
    return essay_prompts[essay_set-1]

In [26]:
len(essay_prompts)

8

**SETUP HERE**

In [27]:
# Unsure how this works
e_set.update_prompt(get_essay_prompt(2))

# Need more explanation on how this works - look into EASE
prompts = f_extractor.gen_prompt_feats(e_set)
prompts_df = pd.DataFrame(prompts, columns = [
    'prompt_words', 'prompt_words/total_words', 'synonym_words', 'synonym_words/total_words'
])
e_set # To check

<essay_set.EssaySet at 0x1249a2810>

In [28]:
# Another process that takes sometime to process
unstemmed = util_functions.get_vocab_essays_count(e_set._text, e_set._score)
stemmed = util_functions.get_vocab_essays_count(e_set._clean_stem_text, e_set._score)

bow = list(map(lambda a,b:[a,b], unstemmed, stemmed))
bow_df = pd.DataFrame(bow, columns = ['unstemmed', 'stemmed'])

In [29]:
features = pd.concat([length_df, prompts_df, bow_df], axis=1, sort=False)
features.head()

Unnamed: 0,chars,words,commas,apostrophes,punctuations,avg_word_length,sentences,questions,avg_word_sentence,POS,POS/total_words,prompt_words,prompt_words/total_words,synonym_words,synonym_words/total_words,unstemmed,stemmed
0,2639.0,525.0,15.0,13.0,0.0,5.026667,21.0,0.0,25.0,524.330784,0.998725,227.0,0.432381,112.0,0.213333,584,560
1,841.0,180.0,5.0,2.0,0.0,4.672222,3.0,0.0,60.0,178.6629,0.992572,83.0,0.461111,66.0,0.366667,210,211
2,1181.0,259.0,12.0,15.0,0.0,4.559846,10.0,4.0,25.9,257.320363,0.993515,148.0,0.571429,81.0,0.312741,291,287
3,2705.0,525.0,22.0,6.0,0.0,5.152381,31.0,0.0,16.935484,521.65392,0.993627,245.0,0.466667,131.0,0.249524,547,530
4,2394.0,499.0,25.0,15.0,2.0,4.797595,33.0,1.0,15.121212,484.298031,0.970537,218.0,0.436874,117.0,0.234469,591,562


In [30]:
# Export features to a file for next stage (optional)
dataset = features.merge(score, left_index=True, right_index=True)
dataset.head()

Unnamed: 0,chars,words,commas,apostrophes,punctuations,avg_word_length,sentences,questions,avg_word_sentence,POS,POS/total_words,prompt_words,prompt_words/total_words,synonym_words,synonym_words/total_words,unstemmed,stemmed,domain1_score
0,2639.0,525.0,15.0,13.0,0.0,5.026667,21.0,0.0,25.0,524.330784,0.998725,227.0,0.432381,112.0,0.213333,584,560,4
1,841.0,180.0,5.0,2.0,0.0,4.672222,3.0,0.0,60.0,178.6629,0.992572,83.0,0.461111,66.0,0.366667,210,211,1
2,1181.0,259.0,12.0,15.0,0.0,4.559846,10.0,4.0,25.9,257.320363,0.993515,148.0,0.571429,81.0,0.312741,291,287,2
3,2705.0,525.0,22.0,6.0,0.0,5.152381,31.0,0.0,16.935484,521.65392,0.993627,245.0,0.466667,131.0,0.249524,547,530,4
4,2394.0,499.0,25.0,15.0,2.0,4.797595,33.0,1.0,15.121212,484.298031,0.970537,218.0,0.436874,117.0,0.234469,591,562,4


In [31]:
"""
dataset.columns = ['chars', 'words', 'commas', 'apostrophes', 'punctuations',
                   'avg_word_length', 'sentences', 'questions', 'avg_word_sentence',
                   'POS', 'POS/total_words',
                   'score']
"""

dataset.columns = ['chars', 'words', 'commas', 'apostrophes', 'punctuations',
                   'avg_word_length', 'sentences', 'questions', 'avg_word_sentence',
                   'POS', 'POS/total_words',
                   'prompt_words', 'prompt_words/total_words', 'synonym_words',
                   'synonym_words/total_words', 'unstemmed', 'stemmed',
                   'score']
dataset.head()
dataset.to_csv('maes_features.csv')

Can just use the features and score for the X and y but just to keep to certain convention if reading back from the CSV file above.

**YOU CAN RUN FROM HERE ON BY READING THE FEATURES FOR THE TRAINING**

In [32]:
dataset = pd.read_csv('maes_features.csv')

Reshape the data and label

In [33]:
X = dataset.iloc[:,1:16].values.astype(float)
y = dataset.iloc[:,18].values.astype(float)
y

array([4., 1., 2., ..., 2., 3., 3.])

In [34]:
X.shape

(1800, 15)

In [35]:
y = np.array(y).reshape(-1,1)
y.shape

(1800, 1)

In [36]:
### Split the train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Have a look at the first few lines
print(y_test[:5, :])

[[3.]
 [5.]
 [3.]
 [4.]
 [4.]]


## Model Training

In [37]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler

### Naive Bayes Training

No scaling used for Naive Bayes

In [38]:
X_trainNB = X_train
y_trainNB = y_train
X_testNB = X_test
y_testNB = y_test

In [39]:
model_nb = naive_bayes.MultinomialNB()
model_nb.fit(X_trainNB, y_trainNB.ravel())

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

At this stage, the Naive Bayes model is called `model_nb`

### SVM Training

Use standard scaler for the data

In [40]:
from sklearn.preprocessing import StandardScaler
sc_Xsvm = StandardScaler()
sc_ysvm = StandardScaler()
X_trainSVM = sc_Xsvm.fit_transform(X_train)
y_trainSVM = sc_ysvm.fit_transform(y_train)
X_testSVM = sc_Xsvm.transform(X_test)
y_testSVM = sc_ysvm.transform(y_test)

In [41]:
from sklearn.svm import SVR
# most important SVR parameter is Kernel type. It can be #linear,polynomial or gaussian SVR. We have a non-linear condition #so we can select polynomial or gaussian but here we select RBF(a #gaussian type) kernel.
# kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’
# maybe use poly and increase the degree
model_svm = SVR(kernel='rbf', gamma='auto', verbose=True)
#regressor = SVR(kernel='poly', degree=5, gamma='auto', verbose=True)
model_svm.fit(X_trainSVM,y_trainSVM.ravel())

[LibSVM]

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=True)

At this stage, the Support Vector Machine (SVM) model is called `model_svm`

### BLRR

In [42]:
from sklearn.preprocessing import StandardScaler
sc_Xblrr = StandardScaler()
sc_yblrr = StandardScaler()
X_trainBLRR = sc_Xblrr.fit_transform(X_train)
y_trainBLRR = sc_yblrr.fit_transform(y_train)
X_testBLRR = sc_Xblrr.transform(X_test)
y_testBLRR = sc_yblrr.transform(y_test)

In [43]:
from sklearn import linear_model
model_blrr = linear_model.BayesianRidge()
model_blrr.fit(X_trainBLRR, y_trainBLRR.ravel())

BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
              compute_score=False, copy_X=True, fit_intercept=True,
              lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=300,
              normalize=False, tol=0.001, verbose=False)

At this stage, the Bayesian Linear Ridge Regression (BLRR) model is called `model_blrr_1`

## Prediction

We will be using the respective validation set and will have to also pre-process the data.

### Naive Bayes

In [44]:
y_predNB = model_nb.predict(X_testNB)
#y_1_predNB = sc_ynb.inverse_transform(y_1_predNB).round()

cm = confusion_matrix(y_test, y_predNB)
print(cm)

[[  2   4   1   0   0   0]
 [  1  13   8   1   0   0]
 [  1  10 107  41   0   1]
 [  0   1  45  92  16   2]
 [  0   0   0   3   9   1]
 [  0   0   0   0   1   0]]


In [45]:
y_predNB

array([4., 6., 3., 3., 5., 4., 3., 4., 4., 3., 5., 4., 4., 3., 3., 5., 3.,
       3., 3., 3., 5., 4., 4., 3., 3., 3., 3., 3., 4., 4., 4., 4., 3., 5.,
       2., 4., 6., 2., 2., 4., 4., 3., 4., 2., 3., 4., 3., 2., 4., 4., 3.,
       4., 4., 3., 4., 2., 3., 3., 3., 4., 3., 5., 3., 2., 4., 3., 4., 3.,
       4., 3., 3., 3., 3., 4., 3., 3., 4., 4., 4., 3., 4., 2., 4., 3., 4.,
       4., 3., 4., 3., 5., 4., 4., 3., 4., 3., 4., 3., 1., 3., 3., 3., 4.,
       4., 3., 3., 3., 4., 2., 3., 2., 4., 2., 5., 3., 3., 4., 4., 4., 5.,
       4., 3., 3., 4., 5., 2., 1., 3., 1., 5., 4., 3., 3., 4., 4., 3., 3.,
       4., 4., 3., 6., 3., 3., 3., 5., 3., 3., 4., 6., 4., 3., 4., 4., 4.,
       4., 3., 4., 3., 5., 4., 3., 3., 3., 3., 3., 3., 4., 3., 4., 4., 4.,
       5., 3., 3., 4., 4., 3., 4., 4., 2., 2., 3., 5., 2., 4., 3., 3., 4.,
       3., 4., 3., 4., 1., 4., 3., 4., 4., 3., 4., 3., 4., 4., 4., 4., 3.,
       4., 3., 3., 3., 2., 4., 3., 4., 4., 2., 3., 4., 3., 3., 4., 4., 5.,
       4., 5., 4., 4., 4.

### SVM

In [46]:
y_predSVM = model_svm.predict(X_testSVM)
y_predSVM = sc_ysvm.inverse_transform(y_predSVM).round()

cm = confusion_matrix(y_test, y_predSVM)
#np.set_printoptions(threshold=np.inf)
print(cm)

[[  0   5   2   0   0   0]
 [  0   9  12   2   0   0]
 [  0   3 116  41   0   0]
 [  0   0  54 102   0   0]
 [  0   0   0  13   0   0]
 [  0   0   0   1   0   0]]


In [47]:
y_predSVM

array([4., 4., 3., 3., 4., 3., 4., 3., 4., 3., 4., 4., 4., 3., 3., 4., 3.,
       3., 3., 3., 4., 4., 3., 3., 3., 3., 3., 3., 4., 3., 4., 4., 3., 4.,
       3., 4., 4., 2., 3., 4., 4., 3., 4., 2., 3., 4., 3., 2., 4., 4., 3.,
       4., 4., 3., 3., 3., 3., 4., 3., 4., 3., 4., 3., 2., 4., 3., 4., 3.,
       4., 3., 3., 4., 3., 3., 4., 3., 4., 4., 4., 3., 4., 3., 4., 3., 4.,
       4., 3., 4., 3., 4., 4., 3., 3., 3., 4., 4., 3., 2., 3., 3., 3., 3.,
       4., 3., 3., 4., 4., 2., 3., 3., 4., 3., 4., 3., 3., 4., 4., 4., 4.,
       4., 3., 3., 4., 4., 2., 2., 3., 2., 4., 3., 3., 3., 4., 4., 3., 4.,
       4., 4., 3., 4., 3., 3., 3., 4., 3., 3., 4., 4., 3., 3., 4., 4., 4.,
       4., 3., 4., 3., 4., 4., 3., 3., 3., 3., 3., 3., 4., 3., 4., 3., 4.,
       4., 3., 3., 4., 4., 3., 3., 4., 3., 2., 3., 4., 2., 3., 3., 3., 3.,
       3., 4., 3., 4., 2., 3., 3., 4., 4., 3., 4., 3., 4., 4., 4., 4., 3.,
       4., 4., 3., 4., 2., 4., 3., 4., 3., 3., 3., 4., 3., 3., 4., 4., 4.,
       4., 4., 3., 4., 4.

### BLRR

In [48]:
y_predBLRR = model_blrr.predict(X_testBLRR)
y_predBLRR = sc_yblrr.inverse_transform(y_predBLRR).round()

cm = confusion_matrix(y_test, y_predBLRR)
print(cm)

[[  0   5   2   0   0   0]
 [  0   7  15   1   0   0]
 [  0   2 126  31   1   0]
 [  0   0  62  93   1   0]
 [  0   0   0   7   6   0]
 [  0   0   0   0   1   0]]


In [49]:
y_predBLRR

array([5., 4., 3., 3., 4., 4., 3., 3., 4., 3., 4., 4., 3., 3., 3., 4., 3.,
       3., 3., 3., 4., 4., 3., 3., 3., 3., 3., 3., 4., 3., 4., 4., 3., 4.,
       3., 4., 4., 2., 3., 4., 3., 3., 4., 2., 3., 3., 3., 2., 4., 3., 3.,
       4., 4., 3., 3., 2., 3., 4., 3., 4., 3., 5., 3., 2., 4., 3., 4., 3.,
       4., 3., 3., 4., 4., 3., 3., 3., 4., 4., 4., 3., 4., 2., 4., 3., 4.,
       4., 3., 4., 3., 4., 4., 3., 3., 3., 3., 4., 3., 2., 3., 3., 3., 3.,
       3., 3., 3., 4., 4., 3., 3., 3., 4., 3., 4., 3., 3., 4., 3., 4., 4.,
       4., 3., 3., 3., 4., 3., 3., 3., 2., 4., 3., 3., 3., 4., 4., 3., 4.,
       4., 3., 3., 4., 3., 3., 3., 5., 3., 3., 4., 4., 3., 3., 4., 3., 4.,
       4., 3., 4., 3., 4., 4., 3., 3., 3., 3., 3., 3., 4., 3., 4., 4., 4.,
       4., 3., 3., 4., 5., 3., 3., 3., 3., 3., 3., 5., 2., 3., 3., 3., 3.,
       3., 4., 3., 4., 2., 3., 3., 4., 4., 3., 4., 3., 4., 4., 4., 4., 3.,
       4., 4., 3., 4., 2., 4., 3., 4., 3., 3., 3., 4., 3., 3., 4., 4., 4.,
       4., 4., 3., 4., 3.

In [50]:
y_test.ravel()

array([3., 5., 3., 4., 4., 4., 4., 3., 4., 3., 4., 3., 4., 2., 3., 5., 3.,
       3., 3., 4., 4., 4., 4., 4., 3., 3., 3., 4., 3., 3., 3., 4., 3., 4.,
       1., 3., 4., 1., 2., 4., 3., 3., 4., 2., 3., 3., 3., 1., 4., 4., 4.,
       4., 4., 3., 3., 3., 4., 3., 3., 4., 3., 5., 3., 2., 3., 4., 3., 4.,
       4., 3., 4., 3., 4., 4., 3., 3., 4., 4., 4., 3., 3., 2., 4., 3., 4.,
       3., 3., 4., 3., 4., 4., 4., 3., 4., 3., 4., 3., 1., 3., 3., 4., 4.,
       4., 4., 3., 2., 4., 2., 3., 2., 4., 2., 4., 3., 3., 4., 4., 3., 4.,
       4., 3., 3., 3., 5., 2., 3., 3., 2., 5., 3., 3., 3., 3., 4., 3., 3.,
       4., 4., 3., 3., 3., 3., 3., 5., 3., 4., 4., 4., 3., 3., 4., 2., 4.,
       4., 2., 4., 2., 4., 4., 3., 2., 3., 3., 3., 4., 4., 3., 4., 4., 3.,
       4., 3., 3., 4., 4., 4., 3., 3., 3., 1., 4., 5., 3., 4., 3., 4., 4.,
       3., 4., 4., 3., 1., 3., 4., 3., 4., 4., 4., 4., 4., 4., 3., 4., 4.,
       3., 4., 3., 4., 2., 4., 3., 5., 4., 3., 3., 4., 3., 4., 4., 3., 5.,
       3., 4., 4., 4., 4.

### Ensembling the 3 Algorithms

In [51]:
actual = pd.Series(y_test.ravel())
predNB = pd.Series(y_predNB)
predSVM = pd.Series(y_predSVM)
predBLRR = pd.Series(y_predBLRR)

data = {"Actual": actual,
        "NB": predNB, 
        "SVM": predSVM, 
        "BLRR": predBLRR} 
results = pd.concat(data, axis=1)
results

Unnamed: 0,Actual,NB,SVM,BLRR
0,3.0,4.0,4.0,5.0
1,5.0,6.0,4.0,4.0
2,3.0,3.0,3.0,3.0
3,4.0,3.0,3.0,3.0
4,4.0,5.0,4.0,4.0
...,...,...,...,...
355,3.0,3.0,3.0,3.0
356,3.0,3.0,3.0,3.0
357,4.0,3.0,4.0,4.0
358,4.0,4.0,4.0,4.0


In [52]:
results['Ensemble'] = np.where(
                            (results['NB'] == results['BLRR']) |
                            (results['NB'] == results['SVM']),
                            results['NB'],
                            results['BLRR']
                        )
results

Unnamed: 0,Actual,NB,SVM,BLRR,Ensemble
0,3.0,4.0,4.0,5.0,4.0
1,5.0,6.0,4.0,4.0,4.0
2,3.0,3.0,3.0,3.0,3.0
3,4.0,3.0,3.0,3.0,3.0
4,4.0,5.0,4.0,4.0,4.0
...,...,...,...,...,...
355,3.0,3.0,3.0,3.0,3.0
356,3.0,3.0,3.0,3.0,3.0
357,4.0,3.0,4.0,4.0,4.0
358,4.0,4.0,4.0,4.0,4.0


## Evaluation using QWK

QWK scores for NB, SVR and BLRR

In [53]:
from sklearn.metrics import classification_report
from sklearn.metrics import cohen_kappa_score

### Naive Bayes

In [54]:
rpt = classification_report(y_test, y_predNB)
print(rpt)

              precision    recall  f1-score   support

         1.0       0.50      0.29      0.36         7
         2.0       0.46      0.57      0.51        23
         3.0       0.66      0.67      0.67       160
         4.0       0.67      0.59      0.63       156
         5.0       0.35      0.69      0.46        13
         6.0       0.00      0.00      0.00         1

    accuracy                           0.62       360
   macro avg       0.44      0.47      0.44       360
weighted avg       0.64      0.62      0.62       360



In [55]:
print(cohen_kappa_score(y_test, y_predNB, weights="quadratic"))

0.6404411764705883


### SVM

In [56]:
rpt = classification_report(y_test, y_predSVM)
print(rpt)

              precision    recall  f1-score   support

         1.0       0.00      0.00      0.00         7
         2.0       0.53      0.39      0.45        23
         3.0       0.63      0.72      0.67       160
         4.0       0.64      0.65      0.65       156
         5.0       0.00      0.00      0.00        13
         6.0       0.00      0.00      0.00         1

    accuracy                           0.63       360
   macro avg       0.30      0.30      0.30       360
weighted avg       0.59      0.63      0.61       360



  _warn_prf(average, modifier, msg_start, len(result))


In [57]:
print(cohen_kappa_score(y_test, y_predSVM, weights="quadratic"))

0.5477386934673366


### BLRR

In [58]:
rpt = classification_report(y_test, y_predBLRR)
print(rpt)

              precision    recall  f1-score   support

         1.0       0.00      0.00      0.00         7
         2.0       0.50      0.30      0.38        23
         3.0       0.61      0.79      0.69       160
         4.0       0.70      0.60      0.65       156
         5.0       0.67      0.46      0.55        13
         6.0       0.00      0.00      0.00         1

    accuracy                           0.64       360
   macro avg       0.41      0.36      0.38       360
weighted avg       0.63      0.64      0.63       360



In [59]:
print(cohen_kappa_score(y_test, y_predBLRR, weights="quadratic"))

0.5860165593376265


### Ensemble

In [60]:
rpt = classification_report(y_test,results['Ensemble'])
print(rpt)

              precision    recall  f1-score   support

         1.0       0.00      0.00      0.00         7
         2.0       0.56      0.43      0.49        23
         3.0       0.65      0.75      0.70       160
         4.0       0.69      0.67      0.68       156
         5.0       0.83      0.38      0.53        13
         6.0       0.00      0.00      0.00         1

    accuracy                           0.66       360
   macro avg       0.45      0.37      0.40       360
weighted avg       0.65      0.66      0.65       360



In [61]:
print(cohen_kappa_score(y_test, results['Ensemble'], weights="quadratic"))

0.6108582574772432


In [62]:
end = time.time()
print("Total time to execute the notebook is " + str(end - start))

Total time to execute the notebook is 401.5997109413147


QWK scores output are from -1 to 1, where -1 means that it is totally wrong while 1 is a perfect match (classification).  The aim is to get as close as possible to 1, with a score of 0.6 being generally accepted as a good score.

On the output of the QWK agreements, the score is just "moderate agreement".  Work now is to achieve substantial agreement.

https://www.statisticshowto.com/cohens-kappa-statistic/

In short, BLRR works better than SVM but a small margin but better than NB.

## Appendix

### QWK Scores (Manual Code)

In [63]:
N = len(cm) # Just to get the same size as the confusion matrix from above
w = np.zeros((N,N)) # create a matrix of N by N
d = (N-1)**2 # the weighted portion
for i in range(len(w)):
    for j in range(len(w)):
        w[i][j] = float(((i-j)**2)/d) 
w # The weighted matrix

array([[0.  , 0.04, 0.16, 0.36, 0.64, 1.  ],
       [0.04, 0.  , 0.04, 0.16, 0.36, 0.64],
       [0.16, 0.04, 0.  , 0.04, 0.16, 0.36],
       [0.36, 0.16, 0.04, 0.  , 0.04, 0.16],
       [0.64, 0.36, 0.16, 0.04, 0.  , 0.04],
       [1.  , 0.64, 0.36, 0.16, 0.04, 0.  ]])

In [64]:
N

6

In [65]:
np.unique(y_test)

array([1., 2., 3., 4., 5., 6.])

In [66]:
np.unique(y_predNB)

array([1., 2., 3., 4., 5., 6.])

In [67]:
act_hist=np.zeros([N])
for item in y_test: 
    act_hist[int(item)-1] += 1

In [68]:
pred_hist=np.zeros([N])
for item in y_predNB: 
    pred_hist[int(item)-1]+=1

In [69]:
E = np.outer(act_hist, pred_hist)
E

array([[2.8000e+01, 1.9600e+02, 1.1270e+03, 9.5900e+02, 1.8200e+02,
        2.8000e+01],
       [9.2000e+01, 6.4400e+02, 3.7030e+03, 3.1510e+03, 5.9800e+02,
        9.2000e+01],
       [6.4000e+02, 4.4800e+03, 2.5760e+04, 2.1920e+04, 4.1600e+03,
        6.4000e+02],
       [6.2400e+02, 4.3680e+03, 2.5116e+04, 2.1372e+04, 4.0560e+03,
        6.2400e+02],
       [5.2000e+01, 3.6400e+02, 2.0930e+03, 1.7810e+03, 3.3800e+02,
        5.2000e+01],
       [4.0000e+00, 2.8000e+01, 1.6100e+02, 1.3700e+02, 2.6000e+01,
        4.0000e+00]])

In [70]:
E = E/E.sum()
E.sum()

1.0000000000000002

In [71]:
cm = cm/cm.sum()
cm.sum()

1.0

In [72]:
num=0
den=0
for i in range(len(w)):
    for j in range(len(w)):
        num+=w[i][j]*cm[i][j]
        den+=w[i][j]*E[i][j]
            
weighted_kappa = (1 - (num/den))
weighted_kappa

0.6911764705882353

# END