# M - Automated Essay Scoring
_School of Information Technology_<br>
_Monash University Malaysia_<br>
(c) Copyright 2020, Ian Tan & Jun Qing Lim

Steps

- Read dataset (ASAP)
- Extract features (into file) using EASE
- Conduct machine learning (Sci-kit Learn libraries)
    - Naive Bayes
    - SVR
    - BLRR
- Evaluate (QWK)

## Import Libraries

In [27]:
import numpy as np
import pandas as pd
from collections import defaultdict

from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
from nltk.stem import WordNetLemmatizer

from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import model_selection, naive_bayes, svm #SVR is in SVM
from sklearn.metrics import accuracy_score, confusion_matrix

#### Import the EASE functions, which is located in the ease folder.

In [28]:
import sys
sys.path.insert(1, 'ease')
import create
import grade 
import model_creator 
import predictor_extractor 
import predictor_set 
import util_functions
import essay_set
import feature_extractor

from essay_set import EssaySet
from feature_extractor import FeatureExtractor

## Read Dataset

AES (Hewlett Foundation dataset from Kaggle) in the folder `asap-aes`.  For this, we use the `training_set_rel3` for training and the `valid_set` for testing.

In [29]:
train_set = pd.read_csv("asap-aes/training_set_rel3.tsv", sep='\t', encoding="latin-1")
test_set = pd.read_csv("asap-aes/test_set.tsv", sep='\t', encoding="latin-1")

In [30]:
train_set['essay'] = [entry.lower() for entry in train_set['essay']] # lower case for all words in essay
test_set['essay'] = [entry.lower() for entry in test_set['essay']] # lower case for all words in essay

There are 8 different essay sets.  As an overview:
- Sets 1 & 2 are of persuasive/narrative in the form of letters
- Sets 3, 4, 5 & 6 are source dependent response to a given essay
- Sets 7 & 8 are of persuasive/narrative in the form of story writing essays

These format makes it good for transfer learning.

In [31]:
train_set_1 = train_set[train_set['essay_set'] == 1]
train_set_2 = train_set[train_set['essay_set'] == 2]
train_set_3 = train_set[train_set['essay_set'] == 3]
train_set_4 = train_set[train_set['essay_set'] == 4]
train_set_5 = train_set[train_set['essay_set'] == 5]
train_set_6 = train_set[train_set['essay_set'] == 6]
train_set_7 = train_set[train_set['essay_set'] == 7]
train_set_8 = train_set[train_set['essay_set'] == 8]

We do similarly for the test sets.

In [32]:
test_set.head()

Unnamed: 0,essay_id,essay_set,essay,domain1_predictionid,domain2_predictionid
0,2383,1,i believe that computers have a positive effec...,2383,
1,2384,1,"dear @caps1, i know some problems have came up...",2384,
2,2385,1,"dear to whom it @month1 concern, computers are...",2385,
3,2386,1,"dear @caps1 @caps2, @caps3 has come to my atte...",2386,
4,2387,1,"dear local newspaper, i think that people have...",2387,


In [33]:
test_set_1 = test_set[test_set['essay_set'] == 1]
test_set_2 = test_set[test_set['essay_set'] == 2]
test_set_3 = test_set[test_set['essay_set'] == 3]
test_set_4 = test_set[test_set['essay_set'] == 4]
test_set_5 = test_set[test_set['essay_set'] == 5]
test_set_6 = test_set[test_set['essay_set'] == 6]
test_set_7 = test_set[test_set['essay_set'] == 7]
test_set_8 = test_set[test_set['essay_set'] == 8]

As each set will retain the original index, we want each of them to have their own indexing so that it is easier to match the essay and the scores.

In [34]:
train_set_1 = train_set_1.reset_index() # resets index
train_set_2 = train_set_2.reset_index()
train_set_3 = train_set_3.reset_index()
train_set_4 = train_set_4.reset_index()
train_set_5 = train_set_5.reset_index()
train_set_6 = train_set_6.reset_index()
train_set_7 = train_set_7.reset_index()
train_set_8 = train_set_8.reset_index()

In [35]:
test_set_1 = test_set_1.reset_index() # resets index
test_set_2 = test_set_2.reset_index()
test_set_3 = test_set_3.reset_index()
test_set_4 = test_set_4.reset_index()
test_set_5 = test_set_5.reset_index()
test_set_6 = test_set_6.reset_index()
test_set_7 = test_set_7.reset_index()
test_set_8 = test_set_8.reset_index()

We use just the `essay` content and the respective `scores`.

In [36]:
# If you want for the whole dataset.
# Commented out as we will work on individual datasets
#essays = train_set['essay']
#scores = train_set['domain1_score']

In [37]:
essays_1 = train_set_1['essay']
scores_1 = train_set_1['domain1_score']

In [38]:
essays_2 = train_set_2['essay']
scores_2 = train_set_2['domain1_score']

In [39]:
essays_3 = train_set_3['essay']
scores_3 = train_set_3['domain1_score']

In [40]:
essays_4 = train_set_4['essay']
scores_4 = train_set_4['domain1_score']

In [41]:
essays_5 = train_set_5['essay']
scores_5 = train_set_5['domain1_score']

In [42]:
essays_6 = train_set_6['essay']
scores_6 = train_set_6['domain1_score']

In [43]:
essays_7 = train_set_7['essay']
scores_7 = train_set_7['domain1_score']

In [44]:
essays_8 = train_set_8['essay']
scores_8 = train_set_8['domain1_score']

Rename the `domain1_score` column to `score`.

In [45]:
scores_1.columns = "score"
scores_2.columns = "score"
scores_3.columns = "score"
scores_4.columns = "score"
scores_5.columns = "score"
scores_6.columns = "score"
scores_7.columns = "score"
scores_8.columns = "score"

THE ABOVE NEEDS TO BE PUT INTO A LOOP BUT I LEFT IT AS IS BECAUSE YOU CAN PICK AND CHOOSE EASILY INSTEAD.

#### Create the essay sets

In [142]:
# This can take some time, be patient :-)
e_set = EssaySet()

for i in range(len(essays)):
    e_set.add_essay(essays[i], scores[i])

## Extract Features

In [143]:
f_extractor = FeatureExtractor()

In [144]:
length = f_extractor.gen_length_feats(e_set)
length_df = pd.DataFrame(
    length, 
    columns = [
        'chars', 
        'words', 
        'commas', 
        'apostrophes', 
        'punctuations', 
        'avg_word_length', 
        'POS', 
        'POS/total_words'
    ]
)

#### Collate the essay prompts
This consist of one essay from each set

In [145]:
essay_prompts = []

# Takes a bit of time also :)
for i in range(1,9):
    file = "prompts/set" + str(i) + ".txt"
    f = open(file, "r", encoding="latin-1") # there are some 0x9x characters, hence need to specify encoding
    essay_prompts.append(f.read())
    
def get_essay_prompt(essay_set):
    return essay_prompts[essay_set-1]

In [146]:
# Unsure how this works
e_set.update_prompt(get_essay_prompt(2))

# Need more explanation on how this works - look into EASE

prompts = f_extractor.gen_prompt_feats(e_set)
prompts_df = pd.DataFrame(prompts, columns = ['prompt_words', 'prompt_words/total_words', 'synonym_words', 'synonym_words/total_words'])

In [147]:
e_set

<essay_set.EssaySet at 0x2021d5dbf28>

In [148]:
# Another process that takes sometime to process
unstemmed = util_functions.get_vocab_essays_count(e_set._text, e_set._score)
stemmed = util_functions.get_vocab_essays_count(e_set._clean_stem_text, e_set._score)

bow = list(map(lambda a,b:[a,b], unstemmed, stemmed))
bow_df = pd.DataFrame(bow, columns = ['unstemmed', 'stemmed'])

In [149]:
features = pd.concat([length_df, prompts_df, bow_df], axis=1, sort=False)

In [150]:
features.head()

Unnamed: 0,chars,words,commas,apostrophes,punctuations,avg_word_length,POS,POS/total_words,prompt_words,prompt_words/total_words,synonym_words,synonym_words/total_words,unstemmed,stemmed
0,2639.0,527.0,15.0,13.0,21.0,5.00759,524.330784,0.994935,220.0,0.417457,112.0,0.212524,584,559
1,841.0,180.0,5.0,2.0,3.0,4.672222,178.6629,0.992572,82.0,0.455556,66.0,0.366667,210,210
2,1181.0,261.0,12.0,15.0,14.0,4.524904,257.992218,0.988476,144.0,0.551724,83.0,0.318008,291,285
3,2705.0,527.0,22.0,6.0,31.0,5.132827,521.65392,0.989856,245.0,0.464896,131.0,0.248577,547,528
4,2394.0,501.0,25.0,15.0,34.0,4.778443,484.298031,0.966663,216.0,0.431138,117.0,0.233533,591,562


In [151]:
# Export features to a file for next stage (optional)
dataset = features.merge(scores, left_index=True, right_index=True)

In [152]:
dataset.head()

Unnamed: 0,chars,words,commas,apostrophes,punctuations,avg_word_length,POS,POS/total_words,prompt_words,prompt_words/total_words,synonym_words,synonym_words/total_words,unstemmed,stemmed,domain1_score
0,2639.0,527.0,15.0,13.0,21.0,5.00759,524.330784,0.994935,220.0,0.417457,112.0,0.212524,584,559,4
1,841.0,180.0,5.0,2.0,3.0,4.672222,178.6629,0.992572,82.0,0.455556,66.0,0.366667,210,210,1
2,1181.0,261.0,12.0,15.0,14.0,4.524904,257.992218,0.988476,144.0,0.551724,83.0,0.318008,291,285,2
3,2705.0,527.0,22.0,6.0,31.0,5.132827,521.65392,0.989856,245.0,0.464896,131.0,0.248577,547,528,4
4,2394.0,501.0,25.0,15.0,34.0,4.778443,484.298031,0.966663,216.0,0.431138,117.0,0.233533,591,562,4


In [153]:
dataset.columns = ['chars', 'words', 'commas', 'apostrophes', 'punctuations',
       'avg_word_length', 'POS', 'POS/total_words', 'prompt_words',
       'prompt_words/total_words', 'synonym_words',
       'synonym_words/total_words', 'unstemmed', 'stemmed', 'score']

In [154]:
dataset.head()

Unnamed: 0,chars,words,commas,apostrophes,punctuations,avg_word_length,POS,POS/total_words,prompt_words,prompt_words/total_words,synonym_words,synonym_words/total_words,unstemmed,stemmed,score
0,2639.0,527.0,15.0,13.0,21.0,5.00759,524.330784,0.994935,220.0,0.417457,112.0,0.212524,584,559,4
1,841.0,180.0,5.0,2.0,3.0,4.672222,178.6629,0.992572,82.0,0.455556,66.0,0.366667,210,210,1
2,1181.0,261.0,12.0,15.0,14.0,4.524904,257.992218,0.988476,144.0,0.551724,83.0,0.318008,291,285,2
3,2705.0,527.0,22.0,6.0,31.0,5.132827,521.65392,0.989856,245.0,0.464896,131.0,0.248577,547,528,4
4,2394.0,501.0,25.0,15.0,34.0,4.778443,484.298031,0.966663,216.0,0.431138,117.0,0.233533,591,562,4


In [155]:
dataset.to_csv('maes_features.csv')

Can just use the features and score for the X and y but just to keep to certain convention if reading back from the CSV file above.


In [156]:
X = dataset.iloc[:,0:13].values.astype(float)
y = dataset.iloc[:,14].values.astype(float)

In [157]:
y

array([4., 1., 2., ..., 2., 3., 3.])

In [158]:
X.shape

(1800, 13)

In [159]:
y = np.array(y).reshape(-1,1)
y.shape

(1800, 1)

#### Conduct Feature Scaling

In [160]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

In [161]:
len(X)

1800

In [162]:
len(y)

1800

#### Split the train and test sets

In [163]:
# To split the train / test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Have a look at the first few lines
print(y_test[:5, :])

[[-0.53668756]
 [ 2.04630071]
 [-0.53668756]
 [ 0.75480657]
 [ 0.75480657]]


### Training

#### Support Vector Regression

In [164]:
from sklearn.svm import SVR
# most important SVR parameter is Kernel type. It can be #linear,polynomial or gaussian SVR. We have a non-linear condition #so we can select polynomial or gaussian but here we select RBF(a #gaussian type) kernel.
# kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’
# maybe use poly and increase the degree
regressor = SVR(kernel='rbf', gamma='auto', verbose=True)
#regressor = SVR(kernel='poly', degree=5, gamma='auto', verbose=True)
regressor.fit(X_train,y_train.ravel())

[LibSVM]

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto',
  kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=True)

#### Test / Predict the fit

In [165]:
# Not used yet as I don't have a sample X
y_pred = regressor.predict(X_test)
y_pred = sc_y.inverse_transform(y_pred).round()

In [166]:
df = pd.DataFrame(
    {
        'Real Values':sc_y.inverse_transform(y_test.reshape(-1)),
        'Predicted Values':y_pred
    }
)
df.head()

Unnamed: 0,Real Values,Predicted Values
0,3.0,4.0
1,5.0,4.0
2,3.0,3.0
3,4.0,3.0
4,4.0,4.0


#### Accuracy Score

In [167]:
# y_pred

In [168]:
# y_test = sc_y.inverse_transform(y_test).round()
# y_test.ravel()

In [169]:
# Need to wrap my head around this (where's the predictor)

print("accuracy score:", regressor.score(X_test, y_test))

accuracy score: 0.48081605360897567


In [170]:
print("accuracy score:", accuracy_score(df['Real Values'], df['Predicted Values']))

accuracy score: 0.6388888888888888


In [171]:
from sklearn.metrics import cohen_kappa_score

In [172]:
print(cohen_kappa_score(sc_y.inverse_transform(y_test).round(), y_pred, weights="quadratic"))

0.584716505112203


### Naive Bayes

In [173]:
X_train

array([[-0.79591812, -0.79274203, -0.52192486, ..., -0.4291183 ,
         1.3780107 , -0.99534758],
       [-0.51618611, -0.55699135, -0.43044826, ..., -0.4291183 ,
         0.36655217, -0.75801391],
       [-0.63721302,  1.55901482, -1.25373772, ..., -2.02112129,
        -6.02218295, -2.41310399],
       ...,
       [-0.34377985, -0.33274069,  0.57579442, ..., -0.04793449,
         0.80414712, -0.37703091],
       [ 1.40768504,  1.11626353,  0.85022424, ...,  0.64716541,
        -0.97593924,  0.92830429],
       [ 1.12452775,  1.25426394,  1.6735137 , ...,  1.20772984,
        -0.18916275,  1.49665597]])

In [174]:
X_train_test = sc_X.inverse_transform(X_train)
X_train_test = X_train_test.astype(int)
X_train_test

array([[1414,  289,    9, ...,   91,    0,  310],
       [1659,  330,   10, ...,   91,    0,  348],
       [1553,  698,    1, ...,   19,    0,   83],
       ...,
       [1810,  369,   21, ...,  108,    0,  409],
       [3344,  621,   24, ...,  139,    0,  618],
       [3096,  645,   33, ...,  164,    0,  709]])

In [175]:
y_train_test = sc_y.inverse_transform(y_train.reshape(-1))
y_train_test = y_train_test.astype(int)
y_train_test

array([3, 3, 2, ..., 3, 4, 4])

In [176]:
nbclassifier = naive_bayes.MultinomialNB()
nbclassifier.fit(X_train_test, y_train_test)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [177]:
X_test_test = sc_X.inverse_transform(X_test)
X_test_test = X_test_test.astype(int)
X_test_test

array([[5038, 1022,   13, ...,  302,    0,  750],
       [3220,  607,   29, ...,  132,    0,  704],
       [1309,  271,    7, ...,   94,    0,  322],
       ...,
       [2254,  449,   19, ...,  108,    0,  515],
       [2879,  572,   24, ...,  170,    0,  579],
       [2494,  466,    8, ...,  119,    0,  572]])

In [178]:
y_test_test = sc_y.inverse_transform(y_test.reshape(-1))
y_test_test = y_test_test.astype(int)
y_test_test

array([3, 5, 3, 4, 4, 4, 4, 3, 4, 3, 4, 3, 4, 2, 3, 5, 3, 3, 3, 4, 4, 4,
       4, 4, 3, 3, 3, 4, 3, 3, 3, 4, 3, 4, 1, 3, 4, 1, 2, 4, 3, 3, 4, 2,
       3, 3, 3, 1, 4, 4, 4, 4, 4, 3, 3, 3, 4, 3, 3, 4, 3, 5, 3, 2, 3, 4,
       3, 4, 4, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 3, 3, 2, 4, 3, 4, 3, 3, 4,
       3, 4, 4, 4, 3, 4, 3, 4, 3, 1, 3, 3, 4, 4, 4, 4, 3, 2, 4, 2, 3, 2,
       4, 2, 4, 3, 3, 4, 4, 3, 4, 4, 3, 3, 3, 5, 2, 3, 3, 2, 5, 3, 3, 3,
       3, 4, 3, 3, 4, 4, 3, 3, 3, 3, 3, 5, 3, 4, 4, 4, 3, 3, 4, 2, 4, 4,
       2, 4, 2, 4, 4, 3, 2, 3, 3, 3, 4, 4, 3, 4, 4, 3, 4, 3, 3, 4, 4, 4,
       3, 3, 3, 1, 4, 5, 3, 4, 3, 4, 4, 3, 4, 4, 3, 1, 3, 4, 3, 4, 4, 4,
       4, 4, 4, 3, 4, 4, 3, 4, 3, 4, 2, 4, 3, 5, 4, 3, 3, 4, 3, 4, 4, 3,
       5, 3, 4, 4, 4, 4, 3, 4, 3, 2, 3, 3, 3, 4, 3, 3, 3, 4, 6, 4, 3, 3,
       3, 4, 3, 3, 3, 4, 4, 3, 3, 3, 4, 2, 4, 4, 3, 2, 3, 3, 4, 4, 3, 2,
       4, 4, 4, 3, 3, 4, 4, 5, 4, 4, 4, 4, 2, 4, 4, 4, 3, 4, 4, 5, 4, 3,
       4, 3, 3, 4, 5, 4, 4, 3, 4, 2, 4, 3, 4, 3, 5,

In [179]:
y_predNB = nbclassifier.predict(X_test_test)

cm = confusion_matrix(y_test_test, y_predNB)
print(cm)

[[  3   3   0   1   0   0]
 [  5   6  10   1   0   1]
 [  0  15 104  37   1   3]
 [  0   3  61  81  11   0]
 [  0   0   1   4   4   4]
 [  0   0   0   0   1   0]]


In [180]:
from sklearn.metrics import classification_report

rpt = classification_report(y_test_test, y_predNB)
print(rpt)

              precision    recall  f1-score   support

           1       0.38      0.43      0.40         7
           2       0.22      0.26      0.24        23
           3       0.59      0.65      0.62       160
           4       0.65      0.52      0.58       156
           5       0.24      0.31      0.27        13
           6       0.00      0.00      0.00         1

   micro avg       0.55      0.55      0.55       360
   macro avg       0.35      0.36      0.35       360
weighted avg       0.58      0.55      0.56       360



### QWK Scores (Manual Code)

In [181]:
N = len(cm) # Just to get the same size as the confusion matrix from above
w = np.zeros((N,N)) # create a matrix of N by N
d = (N-1)**2 # the weighted portion
for i in range(len(w)):
    for j in range(len(w)):
        w[i][j] = float(((i-j)**2)/d) 
w # The weighted matrix

array([[0.  , 0.04, 0.16, 0.36, 0.64, 1.  ],
       [0.04, 0.  , 0.04, 0.16, 0.36, 0.64],
       [0.16, 0.04, 0.  , 0.04, 0.16, 0.36],
       [0.36, 0.16, 0.04, 0.  , 0.04, 0.16],
       [0.64, 0.36, 0.16, 0.04, 0.  , 0.04],
       [1.  , 0.64, 0.36, 0.16, 0.04, 0.  ]])

In [182]:
N

6

In [183]:
np.unique(y_test_test)

array([1, 2, 3, 4, 5, 6])

In [184]:
np.unique(y_predNB)

array([1, 2, 3, 4, 5, 6])

In [185]:
act_hist=np.zeros([N])
for item in y_test_test: 
    act_hist[item-1] += 1

In [186]:
pred_hist=np.zeros([N])
for item in y_predNB: 
    pred_hist[item-1]+=1

In [187]:
E = np.outer(act_hist, pred_hist)
E

array([[5.6000e+01, 1.8900e+02, 1.2320e+03, 8.6800e+02, 1.1900e+02,
        5.6000e+01],
       [1.8400e+02, 6.2100e+02, 4.0480e+03, 2.8520e+03, 3.9100e+02,
        1.8400e+02],
       [1.2800e+03, 4.3200e+03, 2.8160e+04, 1.9840e+04, 2.7200e+03,
        1.2800e+03],
       [1.2480e+03, 4.2120e+03, 2.7456e+04, 1.9344e+04, 2.6520e+03,
        1.2480e+03],
       [1.0400e+02, 3.5100e+02, 2.2880e+03, 1.6120e+03, 2.2100e+02,
        1.0400e+02],
       [8.0000e+00, 2.7000e+01, 1.7600e+02, 1.2400e+02, 1.7000e+01,
        8.0000e+00]])

In [188]:
E = E/E.sum()
E.sum()

1.0

In [189]:
cm = cm/cm.sum()
cm.sum()

1.0

In [190]:
num=0
den=0
for i in range(len(w)):
    for j in range(len(w)):
        num+=w[i][j]*cm[i][j]
        den+=w[i][j]*E[i][j]
            
weighted_kappa = (1 - (num/den))
weighted_kappa

0.523820622785754

QWK scores output are from -1 to 1, where -1 means that it is totally wrong while 1 is a perfect match (classification).  The aim is to get as close as possible to 1, with a score of 0.6 being generally accepted as a good score.

### QWK for Naive Bayes

The above code is a manual computation of the QWK, which we later found that it is already available as an option with the [Cohen Kappa Score](https://journals.sagepub.com/doi/10.1177/001316446002000104) in sklearn, when we set the weights to 'quadratic'.  Since it has already been manually coded above, we use the sklearn.metrics.cohen_kappa_score to validate our manual coded scoring. 

In [192]:
y_test_test

array([3, 5, 3, 4, 4, 4, 4, 3, 4, 3, 4, 3, 4, 2, 3, 5, 3, 3, 3, 4, 4, 4,
       4, 4, 3, 3, 3, 4, 3, 3, 3, 4, 3, 4, 1, 3, 4, 1, 2, 4, 3, 3, 4, 2,
       3, 3, 3, 1, 4, 4, 4, 4, 4, 3, 3, 3, 4, 3, 3, 4, 3, 5, 3, 2, 3, 4,
       3, 4, 4, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 3, 3, 2, 4, 3, 4, 3, 3, 4,
       3, 4, 4, 4, 3, 4, 3, 4, 3, 1, 3, 3, 4, 4, 4, 4, 3, 2, 4, 2, 3, 2,
       4, 2, 4, 3, 3, 4, 4, 3, 4, 4, 3, 3, 3, 5, 2, 3, 3, 2, 5, 3, 3, 3,
       3, 4, 3, 3, 4, 4, 3, 3, 3, 3, 3, 5, 3, 4, 4, 4, 3, 3, 4, 2, 4, 4,
       2, 4, 2, 4, 4, 3, 2, 3, 3, 3, 4, 4, 3, 4, 4, 3, 4, 3, 3, 4, 4, 4,
       3, 3, 3, 1, 4, 5, 3, 4, 3, 4, 4, 3, 4, 4, 3, 1, 3, 4, 3, 4, 4, 4,
       4, 4, 4, 3, 4, 4, 3, 4, 3, 4, 2, 4, 3, 5, 4, 3, 3, 4, 3, 4, 4, 3,
       5, 3, 4, 4, 4, 4, 3, 4, 3, 2, 3, 3, 3, 4, 3, 3, 3, 4, 6, 4, 3, 3,
       3, 4, 3, 3, 3, 4, 4, 3, 3, 3, 4, 2, 4, 4, 3, 2, 3, 3, 4, 4, 3, 2,
       4, 4, 4, 3, 3, 4, 4, 5, 4, 4, 4, 4, 2, 4, 4, 4, 3, 4, 4, 5, 4, 3,
       4, 3, 3, 4, 5, 4, 4, 3, 4, 2, 4, 3, 4, 3, 5,

In [193]:
y_predNB

array([6, 4, 3, 3, 5, 3, 3, 3, 4, 3, 4, 4, 4, 3, 4, 4, 4, 3, 3, 3, 4, 4,
       4, 3, 3, 3, 3, 3, 4, 4, 3, 4, 3, 4, 4, 3, 4, 2, 3, 4, 3, 3, 2, 2,
       3, 3, 3, 2, 4, 4, 3, 4, 4, 2, 4, 2, 3, 4, 3, 4, 3, 6, 3, 1, 4, 3,
       4, 3, 5, 3, 3, 3, 4, 3, 4, 3, 4, 4, 4, 3, 4, 3, 4, 4, 4, 3, 2, 3,
       3, 4, 5, 4, 3, 4, 3, 4, 3, 1, 4, 2, 3, 3, 4, 3, 4, 6, 4, 3, 3, 4,
       3, 1, 4, 3, 3, 3, 3, 5, 3, 4, 3, 3, 3, 4, 3, 3, 3, 1, 5, 3, 3, 3,
       3, 4, 4, 4, 4, 5, 3, 6, 3, 3, 3, 5, 2, 2, 4, 5, 3, 3, 3, 2, 4, 4,
       2, 4, 2, 5, 3, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 4, 3, 3, 4, 4, 3,
       4, 3, 3, 1, 4, 6, 2, 3, 3, 3, 3, 3, 4, 3, 3, 1, 3, 3, 4, 4, 3, 4,
       3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 1, 4, 3, 4, 3, 3, 4, 5, 3, 2, 4, 4,
       5, 4, 4, 3, 4, 3, 3, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 4, 5, 4, 2, 4,
       2, 4, 3, 3, 3, 4, 4, 4, 2, 3, 4, 3, 3, 4, 4, 1, 3, 3, 4, 4, 3, 2,
       5, 4, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 2, 3, 4, 5, 3, 4, 4, 6, 3, 3,
       3, 2, 4, 4, 6, 4, 3, 3, 3, 3, 4, 3, 4, 4, 5,

In [194]:
print(cohen_kappa_score(y_test_test, y_predNB))
print(cohen_kappa_score(y_test_test, y_predNB, weights="quadratic"))

0.28168493656854277
0.5238206227857543


On the output of the QWK agreements, the score is just "moderate agreement".  Work now is to achieve substantial agreement.

https://www.statisticshowto.com/cohens-kappa-statistic/

In short, SVM works a little better than Naive Bayes for AES.

# End