<a href="https://colab.research.google.com/github/pramodcgupta/Machine-Learning-Predictions/blob/master/Model_compare_using_paired_t_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Statistical Testing for Comparing Model Performance**

**Reading Material Link:**

http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/

In [49]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from mlxtend.data import iris_data
from sklearn.model_selection import train_test_split

X, y = iris_data()

X_train,  X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=123)

#clf1=DecisionTreeClassifier(random_state=1)             # Classfier for checking Test 1: Both Model have same performance
clf1=DecisionTreeClassifier(random_state=1, max_depth=1)  # Classfier for checking Test 2: Both Model have different performance
clf2=RandomForestClassifier(random_state=1)

score1 = clf1.fit(X_train, y_train).score(X_test, y_test)
score2 = clf2.fit(X_train, y_train).score(X_test, y_test)

print(f'Decision Tree Classifier Accuracy :  {score1 * 100 :.2f}%')
print(f'Random Forest Classifier Accuracy :  {score2 * 100 :.2f}%')

Decision Tree Classifier Accuracy :  63.16%
Random Forest Classifier Accuracy :  92.11%


**Resampled paired t test**

In [50]:
from mlxtend.evaluate import paired_ttest_resampled


t, p = paired_ttest_resampled(estimator1=clf1,
                              estimator2=clf2,
                              X=X, y=y,
                              random_seed=1)

print('t statistic: %.3f' % t)
print('p value: %.3f' % p)

if p > 0.05: 
  print("The performance of the two algorithms are same")
else: 
  print("The performance of the two algorithms are significantly different")

t statistic: -39.869
p value: 0.000
The performance of the two algorithms are significantly different


**K Fold paired t test**

In [52]:
from mlxtend.evaluate import paired_ttest_kfold_cv


t, p = paired_ttest_kfold_cv(estimator1=clf1,
                              estimator2=clf2,
                              X=X, y=y,
                              random_seed=1)

print('t statistic: %.3f' % t)
print('p value: %.3f' % p)

if p > 0.05: 
  print("The performance of the two algorithms are same")
else: 
  print("The performance of the two algorithms are significantly different")



t statistic: -20.988
p value: 0.000
The performance of the two algorithms are significantly different


**5x2cv paired t test**

This is better as compared to above 2 methods
The 5x2cv paired t test is a procedure for comparing the performance of two models (classifiers or regressors) that was proposed by Dietterich [1] to address shortcomings in other methods such as the resampled paired t test (see paired_ttest_resampled) and the k-fold cross-validated paired t test (see paired_ttest_kfold_cv).

In [53]:
from mlxtend.evaluate import paired_ttest_5x2cv


t, p = paired_ttest_5x2cv(estimator1=clf1,
                          estimator2=clf2,
                          X=X, y=y,
                          random_seed=1)

print('t statistic: %.3f' % t)
print('p value: %.3f' % p)

if p > 0.05: 
  print("The performance of the two algorithms are same")
else: 
  print("The performance of the two algorithms are significantly different")

t statistic: -10.758
p value: 0.000
The performance of the two algorithms are significantly different


**Regression Example**

In [55]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from mlxtend.data import boston_housing_data
from sklearn.model_selection import train_test_split

X,y = boston_housing_data()


X_train,  X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=123)

clf1=DecisionTreeRegressor(random_state=1)            
clf2=RandomForestRegressor(random_state=1)

score1 = clf1.fit(X_train, y_train).score(X_test, y_test)
score2 = clf2.fit(X_train, y_train).score(X_test, y_test)

print(f'Decision Tree Regressor Accuracy :  {score1 * 100 :.2f}%')
print(f'Random Forest Regressor Accuracy :  {score2 * 100 :.2f}%')

(506, 14)
Decision Tree Regressor Accuracy :  48.33%
Random Forest Regressor Accuracy :  80.45%


In [56]:
from mlxtend.evaluate import paired_ttest_resampled


t, p = paired_ttest_resampled(estimator1=clf1,
                              estimator2=clf2,
                              X=X, y=y,
                              random_seed=1)

print('t statistic: %.3f' % t)
print('p value: %.3f' % p)

if p > 0.05: 
  print("The performance of the two algorithms are same")
else: 
  print("The performance of the two algorithms are significantly different")

t statistic: -9.197
p value: 0.000
The performance of the two algorithms are significantly different


In [57]:
from mlxtend.evaluate import paired_ttest_kfold_cv


t, p = paired_ttest_kfold_cv(estimator1=clf1,
                              estimator2=clf2,
                              X=X, y=y,
                              random_seed=1)

print('t statistic: %.3f' % t)
print('p value: %.3f' % p)

if p > 0.05: 
  print("The performance of the two algorithms are same")
else: 
  print("The performance of the two algorithms are significantly different")



t statistic: -2.622
p value: 0.028
The performance of the two algorithms are significantly different


In [61]:
from mlxtend.evaluate import paired_ttest_5x2cv


t, p = paired_ttest_5x2cv(estimator1=clf1,
                          estimator2=clf2,
                          X=X, y=y,
                          random_seed=1)

print('t statistic: %.3f' % t)
print('p value: %.3f' % p)

if p > 0.05: 
  print("The performance of the two algorithms are same")
else: 
  print("The performance of the two algorithms are significantly different")

t statistic: -1.494
p value: 0.195
The performance of the two algorithms are same
