# <b> Top Models Evaluation </b>
___

<b> The Top* 3 models are evaluated below, namely: </b>
* Random Forest
* Logistic Regression
* XGBoost

_* based on their respective f1-scores and AUC-scores_

Left align all markdown tables with code below:

In [15]:
%%html
<style>
    table {
        display: inline-block
    }
</style>

| **Model**           | **Avg Weighted  F1-score** | **AUC score** |
|:-------------------:|:--------------------------:|:-------------:|
| Logistic Regression | 0\.8934\*                  | 0\.9239\*     |
| Random Forest       | 0\.8914\*                  | 0\.9233\*     |
| XGBoost             | 0\.8802\*                  | 0\.9239\*     |

\* _Cross validation results (KFold=10) , with optimized hyper-parameters._

Because the scores listed above are similar, we perform a statistical test (t-test) to evaluate if the scores are significantly different from one another.
<br> In case they are the same, we also evaluate the model with the lowest standard deviation in the weighted f1-scores and auc-scores based on our 10 split K-fold Cross Validation results.
<br> These scores are obtained from the respective models, discussed in the previous chapters.

<b> Function for t-test

In [1]:
from scipy import stats

def my_t_test(a,b,size): # a, b are two arrays for comparison
    n = size # size of the arrays

    ## Calculate the Standard Deviation
    var_a = a.var(ddof=1)
    var_b = b.var(ddof=1)

    #std deviation
    s = np.sqrt((var_a + var_b)/2)

    ## Calculate the t-statistics
    t = (a.mean() - b.mean())/(s*np.sqrt(2/n))

    ## Compare with the critical t-value
    #Degrees of freedom
    df = 2*n - 2

    #p-value after comparison with the t 
    p = 1 - stats.t.cdf(t,df=df)  
  
    print('t-statistic:', t, 'p-value:', p)

    if (abs(t) > 1.96) and p < 0.05:
        return('mean of a and b are significantly different (reject H0)')
    else:
        return('mean of a and b are not significantly different (not reject H0)')

# Random Forest

In [2]:
import numpy as np
# list f1 scores
random_forest_f1_scores = np.array([0.88483129, 0.88614592, 0.89104629, 0.89328366, 0.88943046, 0.89933388,
 0.89731823, 0.88379795, 0.89149793, 0.89761045])

# list auc scores
random_forest_auc_scores = np.array([0.91419658, 0.90797896, 0.91583495, 0.93737864, 0.91715366, 0.92299103,
 0.94289314, 0.90924855, 0.92914382, 0.93649436])

# XGBoost

In [3]:
xgboost_f1_scores = np.array([0.86467997, 0.88090757, 0.88221223, 0.88069853, 0.8858516,  0.8863854,
 0.89028023, 0.86840567, 0.87598576, 0.8864023])
xgboost_auc_scores = np.array([0.90930719, 0.91555045, 0.92187803, 0.93164666, 0.92176884, 0.93324737,
 0.93881725, 0.91301319, 0.9191369,  0.93429295])

### test difference between random forest and xgboost - f1 scores

In [4]:
my_t_test(random_forest_f1_scores, xgboost_f1_scores, 10)

t-statistic: 3.5862507930603744 p-value: 0.0010553865827719333


'mean of a and b are significantly different (reject H0)'

### test difference between random forest and xgboost - auc scores

In [5]:
my_t_test(random_forest_auc_scores, xgboost_auc_scores, 10)

t-statistic: -0.10551912060736061 p-value: 0.5414345452024795


'mean of a and b are not significantly different (not reject H0)'

# Logistic Regression

In [6]:
lr_f1_scores = np.array([0.8894459,  0.89001603, 0.88029261, 0.91148607, 0.89387695, 0.90060235,
 0.89136015, 0.88088811, 0.89507029, 0.90087893])

lr_auc_scores = np.array([0.91488578, 0.93329738, 0.91995549, 0.93562364, 0.91613354, 0.9210405,
 0.92940224, 0.92738434, 0.91595299, 0.92580348])

### test difference between logistic regression and random forest - f1 scores

In [7]:
my_t_test(lr_f1_scores, random_forest_f1_scores, 10)

t-statistic: 0.5690149206117066 p-value: 0.2881889904621324


'mean of a and b are not significantly different (not reject H0)'

### test difference between logistic regression and random forest - auc scores

In [8]:
my_t_test(lr_auc_scores, random_forest_auc_scores, 10)

t-statistic: 0.1342195652019725 p-value: 0.4473591529657279


'mean of a and b are not significantly different (not reject H0)'

# Check standard deviation f1-scores top 3 models

Random Forest

In [9]:
print("model variance (f1-score):", random_forest_f1_scores.std())

model variance (f1-score): 0.00522148693387849


XGBoost

In [10]:
print("model variance (f1-score):", xgboost_f1_scores.std())

model variance (f1-score): 0.007828226535137064


Logistic Regression

In [11]:
print("model variance (f1-score):", lr_f1_scores.std())

model variance (f1-score): 0.00893044817932051


# Check standard deviation AUC-scores top 3 models

Random Forest

In [12]:
print("model variance (ROC/AUC):", random_forest_auc_scores.std())

model variance (ROC/AUC): 0.011827850464091471


XGBoost

In [13]:
print("model variance (ROC/AUC):", xgboost_auc_scores.std())

model variance (ROC/AUC): 0.009541568176939305


Logistic Regression

In [14]:
print("model variance (ROC/AUC):", lr_auc_scores.std())

model variance (ROC/AUC): 0.00707272890777923
