# Model Comparison

**Objectives**
- Take note of the results of previous performance tests that used the Holdout Method (i.e., during Model Robustness Test)
- Using `McNemar Test`, determine any differences between GBDT Models (e.g., LGBM Default vs CatBoost Default), between configurations (e.g., LGBM Default vs LGBM Tuned), and between behavior-types (i.e., Time-based LGBM Tuned vs Time-based CatBoost Tuned)
- Use whichever dataset is appropriate (probably the Test/Holdout Split).
- Take note of the results

Assume a `significance level` of **0.05 (5%)** as it was mentioned in RRL relating to Model Comparison (let's just use it the reference no. for significane level).

<hr>

*Kindly double check the statement(s) that will follow:*

Assume null hypothesis as *"there is a significant difference between the two models"*. `<== Modify this accordingly depending on which will be compared (whether if GBDT vs GBDT or Default vs Tuned)`

If the resulting `p-value` is larger than the `significance level`, the null hypothesis is not rejected. Else if otherwise (`p-value` < `significance level`).

Interpretting the resulting array:
[[a,b]
 [c,d]]
 
- a = Both models are correct
- b = Model 1 wrong, Model 2 correct
- c = Model 1 correct, Model 2 wrong
- d = Both models are wrong

References:
- [https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/)

In [1]:
import statsmodels.stats.contingency_tables as statsmodels #mcnemar
import mlxtend.evaluate as mlxtend #mcnemar_table, mcnemar
import pandas as pd
import numpy as np
import lightgbm as lgbm
import catboost as catb
from joblib import load

import warnings
warnings.filterwarnings("ignore")

In [2]:
DF_LGBM_TB = pd.read_csv('../Dataset/TB/LGBM_TB_Test.csv', low_memory=False) #<== Point these to the proper Test/Holdout datasets.
DF_LGBM_IB = pd.read_csv('../Dataset/IB/LGBM_IB_Test.csv', low_memory=False)
DF_CATB_TB = pd.read_csv('../Dataset/TB/CATB_TB_Test.csv', low_memory=False) #<== Point these to the proper Test/Holdout datasets.
DF_CATB_IB = pd.read_csv('../Dataset/IB/CATB_IB_Test.csv', low_memory=False)
DF_CATB_IB.iloc[:,1:101] = DF_CATB_IB.iloc[:,1:101].astype('str')
DF_CATB_IB.replace("nan", "NaN", inplace=True)

y_target = DF_LGBM_TB['malware'] # <---- labels are equal across all datasets

In [3]:
display(DF_LGBM_TB)
display(DF_LGBM_IB)
display(DF_CATB_TB)
display(DF_CATB_IB)

Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,172,117,110,60,81,60,81,60,81,...,35,117,208,240,117,208,117,35,eddcb22adbf61280d501087c37d10bd3,trojan
1,1,208,286,76,110,240,117,208,187,208,...,81,60,81,225,35,225,208,76,a98a261f44b348a5d4a71b37cd571394,trojan
2,1,82,240,117,240,117,240,117,240,117,...,260,141,260,141,260,141,260,141,3077863534c104b386a06484b8cf4672,trojan
3,1,82,208,187,208,172,117,172,117,172,...,159,82,215,109,201,45,160,159,59db25f426e0e040f3a6d07e1d31bb8c,adware
4,1,82,240,117,240,117,240,117,240,117,...,215,141,65,112,20,34,215,248,e75e6cee4ad187356938cc06667222b5,adware
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
466,1,82,240,117,240,117,240,117,240,117,...,260,141,260,141,260,141,260,141,ee56773db6ff7d6b0bd498417cd8b5f0,adware
467,1,286,110,172,240,117,240,117,240,117,...,65,117,260,297,215,114,215,71,9368836c52b7dce8d179d36436a00fc7,trojan
468,1,82,240,117,240,117,240,117,240,117,...,240,117,240,117,240,117,240,117,af587da66d2121ebd55b6ab4c05ce59b,trojan
469,1,82,172,117,16,208,171,239,172,117,...,208,264,252,119,187,215,297,34,7a79855483d136fbd4b65c8e4f74664a,trojan


Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,172,117,110,60,81,228,274,158,215,...,307,307,307,307,307,307,307,307,eddcb22adbf61280d501087c37d10bd3,trojan
1,1,208,286,76,110,240,117,187,198,228,...,307,307,307,307,307,307,307,307,a98a261f44b348a5d4a71b37cd571394,trojan
2,1,82,240,117,172,16,11,274,158,215,...,307,307,307,307,307,307,307,307,3077863534c104b386a06484b8cf4672,trojan
3,1,82,208,187,172,117,16,240,31,20,...,307,307,307,307,307,307,307,307,59db25f426e0e040f3a6d07e1d31bb8c,adware
4,1,82,240,117,172,16,31,86,112,122,...,307,307,307,307,307,307,307,307,e75e6cee4ad187356938cc06667222b5,adware
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
466,1,82,240,117,172,16,11,274,158,215,...,307,307,307,307,307,307,307,307,ee56773db6ff7d6b0bd498417cd8b5f0,adware
467,1,286,110,172,240,117,106,171,260,141,...,307,307,307,307,307,307,307,307,9368836c52b7dce8d179d36436a00fc7,trojan
468,1,82,240,117,172,16,262,208,228,187,...,307,307,307,307,307,307,307,307,af587da66d2121ebd55b6ab4c05ce59b,trojan
469,1,82,172,117,16,208,171,239,228,297,...,307,307,307,307,307,307,307,307,7a79855483d136fbd4b65c8e4f74664a,trojan


Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,LdrGetDllHandle,LdrGetProcedureAddress,OleInitialize,FindResourceExW,LoadResource,FindResourceExW,LoadResource,FindResourceExW,LoadResource,...,GetSystemMetrics,LdrGetProcedureAddress,NtAllocateVirtualMemory,LdrLoadDll,LdrGetProcedureAddress,NtAllocateVirtualMemory,LdrGetProcedureAddress,GetSystemMetrics,eddcb22adbf61280d501087c37d10bd3,trojan
1,1,NtAllocateVirtualMemory,SetErrorMode,LoadStringA,OleInitialize,LdrLoadDll,LdrGetProcedureAddress,NtAllocateVirtualMemory,NtFreeVirtualMemory,NtAllocateVirtualMemory,...,LoadResource,FindResourceExW,LoadResource,DrawTextExW,GetSystemMetrics,DrawTextExW,NtAllocateVirtualMemory,LoadStringA,a98a261f44b348a5d4a71b37cd571394,trojan
2,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,3077863534c104b386a06484b8cf4672,trojan
3,1,GetSystemTimeAsFileTime,NtAllocateVirtualMemory,NtFreeVirtualMemory,NtAllocateVirtualMemory,LdrGetDllHandle,LdrGetProcedureAddress,LdrGetDllHandle,LdrGetProcedureAddress,LdrGetDllHandle,...,NtDelayExecution,GetSystemTimeAsFileTime,NtClose,socket,ioctlsocket,connect,select,NtDelayExecution,59db25f426e0e040f3a6d07e1d31bb8c,adware
4,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,NtClose,RegQueryValueExW,RegCloseKey,RegOpenKeyExA,NtOpenFile,NtQueryInformationFile,NtClose,SHGetFolderPathW,e75e6cee4ad187356938cc06667222b5,adware
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
466,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,RegOpenKeyExW,RegQueryValueExW,ee56773db6ff7d6b0bd498417cd8b5f0,adware
467,1,SetErrorMode,OleInitialize,LdrGetDllHandle,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,RegCloseKey,LdrGetProcedureAddress,RegOpenKeyExW,NtCreateFile,NtClose,NtQueryDirectoryFile,NtClose,GetSystemWindowsDirectoryW,9368836c52b7dce8d179d36436a00fc7,trojan
468,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,...,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,LdrLoadDll,LdrGetProcedureAddress,af587da66d2121ebd55b6ab4c05ce59b,trojan
469,1,GetSystemTimeAsFileTime,LdrGetDllHandle,LdrGetProcedureAddress,SetUnhandledExceptionFilter,NtAllocateVirtualMemory,NtMapViewOfSection,NtSetContextThread,LdrGetDllHandle,LdrGetProcedureAddress,...,NtAllocateVirtualMemory,NtReadFile,NtWriteFile,NtSetInformationFile,NtFreeVirtualMemory,NtClose,NtCreateFile,NtQueryInformationFile,7a79855483d136fbd4b65c8e4f74664a,trojan


Unnamed: 0,malware,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,...,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,hash,type
0,1,LdrGetDllHandle,LdrGetProcedureAddress,OleInitialize,FindResourceExW,LoadResource,NtProtectVirtualMemory,NtOpenKey,NtQueryValueKey,NtClose,...,,,,,,,,,eddcb22adbf61280d501087c37d10bd3,trojan
1,1,NtAllocateVirtualMemory,SetErrorMode,LoadStringA,OleInitialize,LdrLoadDll,LdrGetProcedureAddress,NtFreeVirtualMemory,GetSystemInfo,NtProtectVirtualMemory,...,,,,,,,,,a98a261f44b348a5d4a71b37cd571394,trojan
2,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrGetDllHandle,SetUnhandledExceptionFilter,CryptAcquireContextW,NtOpenKey,NtQueryValueKey,NtClose,...,,,,,,,,,3077863534c104b386a06484b8cf4672,trojan
3,1,GetSystemTimeAsFileTime,NtAllocateVirtualMemory,NtFreeVirtualMemory,LdrGetDllHandle,LdrGetProcedureAddress,SetUnhandledExceptionFilter,LdrLoadDll,CoInitializeEx,NtOpenFile,...,,,,,,,,,59db25f426e0e040f3a6d07e1d31bb8c,adware
4,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrGetDllHandle,SetUnhandledExceptionFilter,CoInitializeEx,NtCreateMutant,RegOpenKeyExA,CoInitializeSecurity,...,,,,,,,,,e75e6cee4ad187356938cc06667222b5,adware
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
466,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrGetDllHandle,SetUnhandledExceptionFilter,CryptAcquireContextW,NtOpenKey,NtQueryValueKey,NtClose,...,,,,,,,,,ee56773db6ff7d6b0bd498417cd8b5f0,adware
467,1,SetErrorMode,OleInitialize,LdrGetDllHandle,LdrLoadDll,LdrGetProcedureAddress,NtOpenSection,NtMapViewOfSection,RegOpenKeyExW,RegQueryValueExW,...,,,,,,,,,9368836c52b7dce8d179d36436a00fc7,trojan
468,1,GetSystemTimeAsFileTime,LdrLoadDll,LdrGetProcedureAddress,LdrGetDllHandle,SetUnhandledExceptionFilter,NtQuerySystemInformation,NtAllocateVirtualMemory,NtProtectVirtualMemory,NtFreeVirtualMemory,...,,,,,,,,,af587da66d2121ebd55b6ab4c05ce59b,trojan
469,1,GetSystemTimeAsFileTime,LdrGetDllHandle,LdrGetProcedureAddress,SetUnhandledExceptionFilter,NtAllocateVirtualMemory,NtMapViewOfSection,NtSetContextThread,NtProtectVirtualMemory,NtCreateFile,...,,,,,,,,,7a79855483d136fbd4b65c8e4f74664a,trojan


**Battle Chart:**

**GBDT vs GBDT**
- LGBM TB vs CatBoost TB
- LGBM IB vs CatBoost IB
- Tuned LGBM TB vs Tuned CatBoost TB
- Tuned LGBM IB vs Tuned CatBoost IB

**Default vs Tuned**
- LGBM TB vs Tuned LGBM TB
- LGBM IB vs Tuned LGBM IB
- CatBoost TB vs Tuned CatBoost TB
- CatBoost IB vs Tuned CatBoost IB

In [4]:
def mcnemar_test(model1, model2, dataset1, dataset2):
    y_pred1 = model1.predict(dataset1.iloc[:,1:101])
    y_pred2 = model2.predict(dataset2.iloc[:,1:101])
    table = mlxtend.mcnemar_table(y_target,y_pred1,y_pred2)
    display(table)
    print("statsmodels.mcnemar:")
    print(statsmodels.mcnemar(table, exact=False, correction=False))
    chi2, p = mlxtend.mcnemar(table, exact=False, corrected=False)
    print("\nmlxtend.mcnemar (sanity check):")
    print(f"pvalue:\t{p}\nchi2:\t{chi2}\n")
    print("")

In [5]:
print('COMPARISON 1: DEFAULT LGBM TB vs DEFAULT CATBOOST TB\n')
lgbm_tb = load('./Models/LGBM/Train_Default/DEMO_LGBM_TB.model')
catb_tb = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Default/DEMO_CATB_TB.model", format='json')

mcnemar_test(lgbm_tb, catb_tb, DF_LGBM_TB, DF_CATB_TB)

COMPARISON 1: DEFAULT LGBM TB vs DEFAULT CATBOOST TB



array([[469,   1],
       [  0,   1]])

statsmodels.mcnemar:
pvalue      0.31731050786291115
statistic   1.0

mlxtend.mcnemar (sanity check):
pvalue:	0.31731050786291115
chi2:	1.0




In [6]:
print('COMPARISON 2: DEFAULT LGBM IB vs DEFAULT CATBOOST IB\n')
lgbm_ib = load('./Models/LGBM/Train_Default/DEMO_LGBM_IB.model')
catb_ib = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Default/DEMO_CATB_IB.model", format='json')

mcnemar_test(lgbm_ib, catb_ib, DF_LGBM_IB, DF_CATB_IB)

COMPARISON 2: DEFAULT LGBM IB vs DEFAULT CATBOOST IB



array([[470,   0],
       [  0,   1]])

statsmodels.mcnemar:
pvalue      nan
statistic   nan

mlxtend.mcnemar (sanity check):
pvalue:	nan
chi2:	nan




In [7]:
print('COMPARISON 3: TUNED LGBM TB vs TUNED CATBOOST TB\n')
lgbm_tb = load('./Models/LGBM/Train_Tuned/TUNED_DEMO_LGBM_TB.model')
catb_tb = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Tuned/TUNED_DEMO_CATB_TB.model", format='json')

mcnemar_test(lgbm_tb, catb_tb, DF_LGBM_TB, DF_CATB_TB)

COMPARISON 3: TUNED LGBM TB vs TUNED CATBOOST TB



array([[469,   1],
       [  0,   1]])

statsmodels.mcnemar:
pvalue      0.31731050786291115
statistic   1.0

mlxtend.mcnemar (sanity check):
pvalue:	0.31731050786291115
chi2:	1.0




In [8]:
print('COMPARISON 4: TUNED LGBM IB vs TUNED CATBOOST IB\n')
lgbm_ib = load('./Models/LGBM/Train_Tuned/TUNED_DEMO_LGBM_IB.model')
catb_ib = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Tuned/TUNED_DEMO_CATB_IB.model", format='json')

mcnemar_test(lgbm_ib, catb_ib, DF_LGBM_IB, DF_CATB_IB)

COMPARISON 4: TUNED LGBM IB vs TUNED CATBOOST IB



array([[467,   3],
       [  0,   1]])

statsmodels.mcnemar:
pvalue      0.08326451666355042
statistic   3.0

mlxtend.mcnemar (sanity check):
pvalue:	0.08326451666355042
chi2:	3.0




In [9]:
print('COMPARISON 5: DEFAULT LGBM TB vs TUNED LGBM TB\n')
default_tb = load('./Models/LGBM/Train_Default/DEMO_LGBM_TB.model')
tuned_tb = load('./Models/LGBM/Train_Tuned/TUNED_DEMO_LGBM_TB.model')

mcnemar_test(default_tb, tuned_tb, DF_LGBM_TB, DF_LGBM_TB)

COMPARISON 5: DEFAULT LGBM TB vs TUNED LGBM TB



array([[470,   0],
       [  0,   1]])

statsmodels.mcnemar:
pvalue      nan
statistic   nan

mlxtend.mcnemar (sanity check):
pvalue:	nan
chi2:	nan




In [10]:
print('COMPARISON 6: DEFAULT LGBM IB vs TUNED LGBM IB\n')
default_ib = load('./Models/LGBM/Train_Default/DEMO_LGBM_IB.model')
tuned_ib = load('./Models/LGBM/Train_Tuned/TUNED_DEMO_LGBM_IB.model')

mcnemar_test(default_ib, tuned_ib, DF_LGBM_IB, DF_LGBM_IB)

COMPARISON 6: DEFAULT LGBM IB vs TUNED LGBM IB



array([[470,   0],
       [  0,   1]])

statsmodels.mcnemar:
pvalue      nan
statistic   nan

mlxtend.mcnemar (sanity check):
pvalue:	nan
chi2:	nan




In [11]:
print('COMPARISON 7: DEFAULT CATBOOST TB vs TUNED CATBOOST TB\n')
default_tb = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Default/DEMO_CATB_TB.model", format='json')
tuned_tb = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Tuned/TUNED_DEMO_CATB_TB.model", format='json')

mcnemar_test(default_tb, tuned_tb, DF_CATB_TB, DF_CATB_TB)

COMPARISON 7: DEFAULT CATBOOST TB vs TUNED CATBOOST TB



array([[468,   1],
       [  1,   1]])

statsmodels.mcnemar:
pvalue      1.0
statistic   0.0

mlxtend.mcnemar (sanity check):
pvalue:	1.0
chi2:	0.0




In [12]:
print('COMPARISON 8: DEFAULT CATBOOST IB vs TUNED CATBOOST IB\n')
default_ib = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Default/DEMO_CATB_IB.model", format='json')
tuned_ib = catb.CatBoostClassifier().load_model("./Models/CATB/Train_Tuned/TUNED_DEMO_CATB_IB.model", format='json')

mcnemar_test(default_ib, tuned_ib, DF_CATB_IB, DF_CATB_IB)

COMPARISON 8: DEFAULT CATBOOST IB vs TUNED CATBOOST IB



array([[467,   3],
       [  0,   1]])

statsmodels.mcnemar:
pvalue      0.08326451666355042
statistic   3.0

mlxtend.mcnemar (sanity check):
pvalue:	0.08326451666355042
chi2:	3.0


