# Model Comparison

**Objectives**
- Take note of the results of previous performance tests that used the Holdout Method (i.e., during Model Robustness Test)
- Using `McNemar Test`, determine any differences between GBDT Models (e.g., LGBM Default vs CatBoost Default), between configurations (e.g., LGBM Default vs LGBM Tuned), and between behavior-types (i.e., Time-based LGBM Tuned vs Time-based CatBoost Tuned)
- Use whichever dataset is appropriate (probably the Test/Holdout Split).
- Take note of the results

Assume a `significance level` of **0.05 (5%)** as it was mentioned in RRL relating to Model Comparison (let's just use it the reference no. for significane level).

<hr>

*Kindly double check the statement(s) that will follow:*

Assume null hypothesis as *"there is a significant difference between the two models"*. `<== Modify this accordingly depending on which will be compared (whether if GBDT vs GBDT or Default vs Tuned)`

If the resulting `p-value` is larger than the `significance level`, the null hypothesis is not rejected. Else if otherwise (`p-value` < `significance level`).

Interpretting the resulting array:
[[a,b]
 [c,d]]
 
- a = Both models are correct
- b = Model 1 wrong, Model 2 correct
- c = Model 1 correct, Model 2 wrong
- d = Both models are wrong

References:
- [https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/)

In [1]:
import statsmodels.stats.contingency_tables as statsmodels #mcnemar
import mlxtend.evaluate as mlxtend #mcnemar_table, mcnemar
import pandas as pd
import numpy as np
import lightgbm as lgbm
import catboost as catb
from joblib import load

import warnings
warnings.filterwarnings("ignore")

In [2]:
DF_LGBM_TB = pd.read_csv('../Dataset/TB/LGBM_TB_Test.csv', low_memory=False) #<== Point these to the proper Test/Holdout datasets.
DF_LGBM_IB = pd.read_csv('../Dataset/IB/LGBM_IB_Test.csv', low_memory=False)
DF_CATB_TB = pd.read_csv('../Dataset/TB/CATB_TB_Test.csv', low_memory=False) #<== Point these to the proper Test/Holdout datasets.
DF_CATB_IB = pd.read_csv('../Dataset/IB/CATB_IB_Test.csv', low_memory=False)
DF_CATB_IB.iloc[:,1:101] = DF_CATB_IB.iloc[:,1:101].astype('str')
DF_CATB_IB.replace("nan", "NaN", inplace=True)

y_target = DF_LGBM_TB['malware'] # <---- labels are equal across all datasets

In [3]:
display(DF_LGBM_TB)
display(DF_LGBM_IB)
display(DF_CATB_TB)
display(DF_CATB_IB)

Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,208,172,240,117,240,262,112,123,65,...,274,215,274,215,274,215,274,215,4e270486b92ccff8afa59935ba4f5adc,trojan
1,1,112,274,158,215,274,158,215,298,76,...,297,135,171,215,35,208,56,71,d92d0f24e15384541a0c3c72424fe3a8,trojan
2,1,215,274,158,215,274,158,215,172,117,...,15,240,117,240,117,240,117,172,1dcb9bd8dcdd50f6d07035ea895ecfd1,trojan
3,1,82,240,117,240,117,240,117,240,117,...,208,93,208,16,31,215,108,208,24dd4677c14eb5828bda78749fded6b8,pua
4,1,112,274,158,215,274,158,215,298,76,...,297,135,171,215,35,208,56,71,cc5d38cb80faaf60d8efabecdc04f832,trojan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4119,1,82,240,117,240,117,93,117,16,147,...,230,240,117,225,35,208,89,225,5a99618b63178d7a221552fe962992e3,trojan
4120,1,112,274,158,215,274,158,215,298,76,...,117,76,172,117,286,172,117,275,45bf43151fd02d4ea1d1028386d12d06,trojan
4121,1,82,240,117,240,117,240,117,240,117,...,260,141,260,141,260,141,260,141,cf6242404774ee9d15c67c75c80e1a14,trojan
4122,1,82,240,117,240,117,240,117,240,117,...,215,208,297,93,303,264,187,208,6536fb7723a2a091fdd2610a36b32741,trojan


Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,208,172,240,117,262,112,123,65,274,...,307,307,307,307,307,307,307,307,4e270486b92ccff8afa59935ba4f5adc,trojan
1,1,112,274,158,215,298,76,208,172,117,...,307,307,307,307,307,307,307,307,d92d0f24e15384541a0c3c72424fe3a8,trojan
2,1,215,274,158,172,117,198,208,260,257,...,307,307,307,307,307,307,307,307,1dcb9bd8dcdd50f6d07035ea895ecfd1,trojan
3,1,82,240,117,93,16,228,208,198,86,...,307,307,307,307,307,307,307,307,24dd4677c14eb5828bda78749fded6b8,pua
4,1,112,274,158,215,298,76,208,172,117,...,307,307,307,307,307,307,307,307,cc5d38cb80faaf60d8efabecdc04f832,trojan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4119,1,82,240,117,93,16,147,228,208,71,...,307,307,307,307,307,307,307,307,5a99618b63178d7a221552fe962992e3,trojan
4120,1,112,274,158,215,298,76,208,172,117,...,307,307,307,307,307,307,307,307,45bf43151fd02d4ea1d1028386d12d06,trojan
4121,1,82,240,117,172,16,11,274,158,215,...,307,307,307,307,307,307,307,307,cf6242404774ee9d15c67c75c80e1a14,trojan
4122,1,82,240,117,16,297,93,303,264,215,...,307,307,307,307,307,307,307,307,6536fb7723a2a091fdd2610a36b32741,trojan


Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,NtAllocateVirtualMemory,LdrGetDllHandle,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,NtQuerySystemInformation,RegOpenKeyExA,RegQueryValueExA,RegCloseKey,...,NtOpenKey,NtClose,NtOpenKey,NtClose,NtOpenKey,NtClose,NtOpenKey,NtClose,4e270486b92ccff8afa59935ba4f5adc,trojan
1,1,RegOpenKeyExA,NtOpenKey,NtQueryValueKey,NtClose,NtOpenKey,NtQueryValueKey,NtClose,NtQueryAttributesFile,LoadStringA,...,NtCreateFile,NtCreateSection,NtMapViewOfSection,NtClose,GetSystemMetrics,NtAllocateVirtualMemory,CreateActCtxW,GetSystemWindowsDirectoryW,d92d0f24e15384541a0c3c72424fe3a8,trojan
2,1,NtClose,NtOpenKey,NtQueryValueKey,NtClose,NtOpenKey,NtQueryValueKey,NtClose,LdrGetDllHandle,LdrGetProcedureAddress,...,LookupAccountSidW,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrGetDllHandle,1dcb9bd8dcdd50f6d07035ea895ecfd1,trojan
3,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,NtAllocateVirtualMemory,GetFileType,NtAllocateVirtualMemory,SetUnhandledExceptionFilter,CoInitializeEx,NtClose,WSAStartup,NtAllocateVirtualMemory,24dd4677c14eb5828bda78749fded6b8,pua
4,1,RegOpenKeyExA,NtOpenKey,NtQueryValueKey,NtClose,NtOpenKey,NtQueryValueKey,NtClose,NtQueryAttributesFile,LoadStringA,...,NtCreateFile,NtCreateSection,NtMapViewOfSection,NtClose,GetSystemMetrics,NtAllocateVirtualMemory,CreateActCtxW,GetSystemWindowsDirectoryW,cc5d38cb80faaf60d8efabecdc04f832,trojan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4119,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,GetFileType,LdrGetProcedureAddress,SetUnhandledExceptionFilter,FindWindowA,...,GetUserNameW,LdrLoadDll,LdrGetProcedureAddress,DrawTextExW,GetSystemMetrics,NtAllocateVirtualMemory,NtDuplicateObject,DrawTextExW,5a99618b63178d7a221552fe962992e3,trojan
4120,1,RegOpenKeyExA,NtOpenKey,NtQueryValueKey,NtClose,NtOpenKey,NtQueryValueKey,NtClose,NtQueryAttributesFile,LoadStringA,...,LdrGetProcedureAddress,LoadStringA,LdrGetDllHandle,LdrGetProcedureAddress,SetErrorMode,LdrGetDllHandle,LdrGetProcedureAddress,GetSystemDirectoryW,45bf43151fd02d4ea1d1028386d12d06,trojan
4121,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,cf6242404774ee9d15c67c75c80e1a14,trojan
4122,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,NtClose,NtAllocateVirtualMemory,NtCreateFile,GetFileType,SetFilePointerEx,NtReadFile,NtFreeVirtualMemory,NtAllocateVirtualMemory,6536fb7723a2a091fdd2610a36b32741,trojan


Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,NtAllocateVirtualMemory,LdrGetDllHandle,LdrLoadDll,LdrGetProcedureAddress,NtQuerySystemInformation,RegOpenKeyExA,RegQueryValueExA,RegCloseKey,NtOpenKey,...,,,,,,,,,4e270486b92ccff8afa59935ba4f5adc,trojan
1,1,RegOpenKeyExA,NtOpenKey,NtQueryValueKey,NtClose,NtQueryAttributesFile,LoadStringA,NtAllocateVirtualMemory,LdrGetDllHandle,LdrGetProcedureAddress,...,,,,,,,,,d92d0f24e15384541a0c3c72424fe3a8,trojan
2,1,NtClose,NtOpenKey,NtQueryValueKey,LdrGetDllHandle,LdrGetProcedureAddress,GetSystemInfo,NtAllocateVirtualMemory,RegOpenKeyExW,FindFirstFileExW,...,,,,,,,,,1dcb9bd8dcdd50f6d07035ea895ecfd1,trojan
3,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,GetFileType,SetUnhandledExceptionFilter,NtProtectVirtualMemory,NtAllocateVirtualMemory,GetSystemInfo,NtCreateMutant,...,,,,,,,,,24dd4677c14eb5828bda78749fded6b8,pua
4,1,RegOpenKeyExA,NtOpenKey,NtQueryValueKey,NtClose,NtQueryAttributesFile,LoadStringA,NtAllocateVirtualMemory,LdrGetDllHandle,LdrGetProcedureAddress,...,,,,,,,,,cc5d38cb80faaf60d8efabecdc04f832,trojan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4119,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,GetFileType,SetUnhandledExceptionFilter,FindWindowA,NtProtectVirtualMemory,NtAllocateVirtualMemory,GetSystemWindowsDirectoryW,...,,,,,,,,,5a99618b63178d7a221552fe962992e3,trojan
4120,1,RegOpenKeyExA,NtOpenKey,NtQueryValueKey,NtClose,NtQueryAttributesFile,LoadStringA,NtAllocateVirtualMemory,LdrGetDllHandle,LdrGetProcedureAddress,...,,,,,,,,,45bf43151fd02d4ea1d1028386d12d06,trojan
4121,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrGetDllHandle,SetUnhandledExceptionFilter,CryptAcquireContextW,NtOpenKey,NtQueryValueKey,NtClose,...,,,,,,,,,cf6242404774ee9d15c67c75c80e1a14,trojan
4122,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,SetUnhandledExceptionFilter,NtCreateFile,GetFileType,SetFilePointerEx,NtReadFile,NtClose,...,,,,,,,,,6536fb7723a2a091fdd2610a36b32741,trojan


**Battle Chart:**

**GBDT vs GBDT**
- LGBM TB vs CatBoost TB
- LGBM IB vs CatBoost IB
- Tuned LGBM TB vs Tuned CatBoost TB
- Tuned LGBM IB vs Tuned CatBoost IB

**Default vs Tuned**
- LGBM TB vs Tuned LGBM TB
- LGBM IB vs Tuned LGBM IB
- CatBoost TB vs Tuned CatBoost TB
- CatBoost IB vs Tuned CatBoost IB

In [4]:
def mcnemar_test(model1, model2, dataset1, dataset2):
    y_pred1 = model1.predict(dataset1.iloc[:,1:101])
    y_pred2 = model2.predict(dataset2.iloc[:,1:101])
    table = mlxtend.mcnemar_table(y_target,y_pred1,y_pred2)
    display(table)
    print("statsmodels.mcnemar:")
    print(statsmodels.mcnemar(table, exact=False, correction=False))
    chi2, p = mlxtend.mcnemar(table, exact=False, corrected=False)
    print("\nmlxtend.mcnemar (sanity check):")
    print(f"pvalue:\t{p}\nchi2:\t{chi2}\n")
    print("")

In [5]:
print('COMPARISON 1: DEFAULT LGBM TB vs DEFAULT CATBOOST TB\n')
lgbm_tb = load('../GBDT_Training/Outputs/LGBM/Default/RYZEN3b_LGBM_TB.model')
catb_tb = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Default/RYZEN3b_CATB_TB.model", format='json')

mcnemar_test(lgbm_tb, catb_tb, DF_LGBM_TB, DF_CATB_TB)

COMPARISON 1: DEFAULT LGBM TB vs DEFAULT CATBOOST TB



array([[4068,    7],
       [   6,   43]])

statsmodels.mcnemar:
pvalue      0.7815112949987134
statistic   0.07692307692307693

mlxtend.mcnemar (sanity check):
pvalue:	0.7815112949987134
chi2:	0.07692307692307693




In [6]:
print('COMPARISON 2: DEFAULT LGBM IB vs DEFAULT CATBOOST IB\n')
lgbm_ib = load('../GBDT_Training/Outputs/LGBM/Default/RYZEN3b_LGBM_IB.model')
catb_ib = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Default/RYZEN3b_CATB_IB.model", format='json')

mcnemar_test(lgbm_ib, catb_ib, DF_LGBM_IB, DF_CATB_IB)

COMPARISON 2: DEFAULT LGBM IB vs DEFAULT CATBOOST IB



array([[4065,    6],
       [  12,   41]])

statsmodels.mcnemar:
pvalue      0.15729920705028105
statistic   2.0

mlxtend.mcnemar (sanity check):
pvalue:	0.15729920705028105
chi2:	2.0




In [7]:
print('COMPARISON 3: TUNED LGBM TB vs TUNED CATBOOST TB\n')
lgbm_tb = load('../GBDT_Training/Outputs/LGBM/Tuned/TUNED_RYZEN3b_LGBM_TB.model')
catb_tb = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Tuned/TUNED_RYZEN3b_CATB_TB.model", format='json')

mcnemar_test(lgbm_tb, catb_tb, DF_LGBM_TB, DF_CATB_TB)

COMPARISON 3: TUNED LGBM TB vs TUNED CATBOOST TB



array([[4069,    7],
       [   7,   41]])

statsmodels.mcnemar:
pvalue      1.0
statistic   0.0

mlxtend.mcnemar (sanity check):
pvalue:	1.0
chi2:	0.0




In [8]:
print('COMPARISON 4: TUNED LGBM IB vs TUNED CATBOOST IB\n')
lgbm_ib = load('../GBDT_Training/Outputs/LGBM/Tuned/TUNED_RYZEN3b_LGBM_IB.model')
catb_ib = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Tuned/TUNED_RYZEN3b_CATB_IB.model", format='json')

mcnemar_test(lgbm_ib, catb_ib, DF_LGBM_IB, DF_CATB_IB)

COMPARISON 4: TUNED LGBM IB vs TUNED CATBOOST IB



array([[4073,   10],
       [   7,   34]])

statsmodels.mcnemar:
pvalue      0.46685427082272524
statistic   0.5294117647058824

mlxtend.mcnemar (sanity check):
pvalue:	0.46685427082272524
chi2:	0.5294117647058824




In [9]:
print('COMPARISON 5: DEFAULT LGBM TB vs TUNED LGBM TB\n')
default_tb = load('../GBDT_Training/Outputs/LGBM/Default/RYZEN3b_LGBM_TB.model')
tuned_tb = load('../GBDT_Training/Outputs/LGBM/Tuned/TUNED_RYZEN3b_LGBM_TB.model')

mcnemar_test(default_tb, tuned_tb, DF_LGBM_TB, DF_LGBM_TB)

COMPARISON 5: DEFAULT LGBM TB vs TUNED LGBM TB



array([[4073,    2],
       [   3,   46]])

statsmodels.mcnemar:
pvalue      0.6547208460185768
statistic   0.2

mlxtend.mcnemar (sanity check):
pvalue:	0.6547208460185768
chi2:	0.2




In [10]:
print('COMPARISON 6: DEFAULT LGBM IB vs TUNED LGBM IB\n')
default_ib = load('../GBDT_Training/Outputs/LGBM/Default/RYZEN3b_LGBM_IB.model')
tuned_ib = load('../GBDT_Training/Outputs/LGBM/Tuned/TUNED_RYZEN3b_LGBM_IB.model')

mcnemar_test(default_ib, tuned_ib, DF_LGBM_IB, DF_LGBM_IB)

COMPARISON 6: DEFAULT LGBM IB vs TUNED LGBM IB



array([[4070,    1],
       [  13,   40]])

statsmodels.mcnemar:
pvalue      0.0013406411172294807
statistic   10.285714285714286

mlxtend.mcnemar (sanity check):
pvalue:	0.0013406411172294807
chi2:	10.285714285714286




In [11]:
print('COMPARISON 7: DEFAULT CATBOOST TB vs TUNED CATBOOST TB\n')
default_tb = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Default/RYZEN3b_CATB_TB.model", format='json')
tuned_tb = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Tuned/TUNED_RYZEN3b_CATB_TB.model", format='json')

mcnemar_test(default_tb, tuned_tb, DF_CATB_TB, DF_CATB_TB)

COMPARISON 7: DEFAULT CATBOOST TB vs TUNED CATBOOST TB



array([[4071,    3],
       [   5,   45]])

statsmodels.mcnemar:
pvalue      0.47950012218695337
statistic   0.5

mlxtend.mcnemar (sanity check):
pvalue:	0.47950012218695337
chi2:	0.5




In [12]:
print('COMPARISON 8: DEFAULT CATBOOST IB vs TUNED CATBOOST IB\n')
default_ib = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Default/RYZEN3b_CATB_IB.model", format='json')
tuned_ib = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Tuned/TUNED_RYZEN3b_CATB_IB.model", format='json')

mcnemar_test(default_ib, tuned_ib, DF_CATB_IB, DF_CATB_IB)

COMPARISON 8: DEFAULT CATBOOST IB vs TUNED CATBOOST IB



array([[4071,    6],
       [   9,   38]])

statsmodels.mcnemar:
pvalue      0.4385780260809997
statistic   0.6

mlxtend.mcnemar (sanity check):
pvalue:	0.4385780260809997
chi2:	0.6




In [13]:
print('COMPARISON 9: DEFAULT LIGHTGBM TB vs DEFAULT LIGHTGBM IB\n')
default_tb = load('../GBDT_Training/Outputs/LGBM/Default/RYZEN3b_LGBM_TB.model')
default_ib = load('../GBDT_Training/Outputs/LGBM/Default/RYZEN3b_LGBM_IB.model')

mcnemar_test(default_tb, default_ib, DF_LGBM_TB, DF_LGBM_IB)

COMPARISON 9: DEFAULT LIGHTGBM TB vs DEFAULT LIGHTGBM IB



array([[4063,   12],
       [   8,   41]])

statsmodels.mcnemar:
pvalue      0.37109336952269756
statistic   0.8

mlxtend.mcnemar (sanity check):
pvalue:	0.37109336952269756
chi2:	0.8




In [14]:
print('COMPARISON 10: TUNED LIGHTGBM TB vs TUNED LIGHTGBM IB\n')
tuned_tb = load('../GBDT_Training/Outputs/LGBM/Tuned/TUNED_RYZEN3b_LGBM_TB.model')
tuned_ib = load('../GBDT_Training/Outputs/LGBM/Tuned/TUNED_RYZEN3b_LGBM_IB.model')

mcnemar_test(tuned_tb, tuned_ib, DF_LGBM_TB, DF_LGBM_IB)

COMPARISON 10: TUNED LIGHTGBM TB vs TUNED LIGHTGBM IB



array([[4071,    5],
       [  12,   36]])

statsmodels.mcnemar:
pvalue      0.08955507441364248
statistic   2.8823529411764706

mlxtend.mcnemar (sanity check):
pvalue:	0.08955507441364248
chi2:	2.8823529411764706




In [15]:
print('COMPARISON 11: DEFAULT CATBOOST TB vs DEFAULT CATBOOST IB\n')
default_tb = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Default/RYZEN3b_CATB_TB.model", format='json')
default_ib = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Default/RYZEN3b_CATB_IB.model", format='json')

mcnemar_test(default_tb, default_ib, DF_CATB_TB, DF_CATB_IB)

COMPARISON 11: DEFAULT CATBOOST TB vs DEFAULT CATBOOST IB



array([[4065,    9],
       [  12,   38]])

statsmodels.mcnemar:
pvalue      0.5126907602619235
statistic   0.42857142857142855

mlxtend.mcnemar (sanity check):
pvalue:	0.5126907602619235
chi2:	0.42857142857142855




In [16]:
print('COMPARISON 12: TUNED CATBOOST TB vs TUNED CATBOOST IB\n')
tuned_tb = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Tuned/TUNED_RYZEN3b_CATB_TB.model", format='json')
tuned_ib = catb.CatBoostClassifier().load_model("../GBDT_Training/Outputs/CATB/Tuned/TUNED_RYZEN3b_CATB_IB.model", format='json')

mcnemar_test(tuned_tb, tuned_ib, DF_CATB_TB, DF_CATB_IB)

COMPARISON 12: TUNED CATBOOST TB vs TUNED CATBOOST IB



array([[4068,    8],
       [  12,   36]])

statsmodels.mcnemar:
pvalue      0.37109336952269756
statistic   0.8

mlxtend.mcnemar (sanity check):
pvalue:	0.37109336952269756
chi2:	0.8


