Outputs:  

Arrythmia:  
Decision Tree Classifier: 65.62% accuracy with a standard deviation of 5.04%  
Random Forest Classifier: 71.13% accuracy with a standard deviation of 4.51%  
Ada Boost Classifier: 61.76% accuracy with a standard deviation of 4.91%  
Bagging Classifier: 75.16% accuracy with a standard deviation of 4.30%  
Extra Trees Classifier: 69.85% accuracy with a standard deviation of 4.47%  
  
Caesarian:  
Decision Tree Classifier: 52.68% accuracy with a standard deviation of 11.32%  
Random Forest Classifier: 54.85% accuracy with a standard deviation of 12.75%  
Ada Boost Classifier: 60.97% accuracy with a standard deviation of 11.22%  
Bagging Classifier: 54.43% accuracy with a standard deviation of 11.48%  
Extra Trees Classifier: 53.50% accuracy with a standard deviation of 10.96%  
  
Phishing:    
Decision Tree Classifier: 96.29% accuracy with a standard deviation of 0.38%  
Random Forest Classifier: 96.99% accuracy with a standard deviation of 0.34%  
Ada Boost Classifier: 92.90% accuracy with a standard deviation of 0.53%  
Bagging Classifier: 96.79% accuracy with a standard deviation of 0.35%  
Extra Trees Classifier: 97.15% accuracy with a standard deviation of 0.34%   
  
Wine:    
Decision Tree Classifier: 90.70% accuracy with a standard deviation of 4.54%  
Random Forest Classifier: 96.96% accuracy with a standard deviation of 2.86%  
Ada Boost Classifier: 88.63% accuracy with a standard deviation of 8.29%  
Bagging Classifier: 95.38% accuracy with a standard deviation of 3.56%  
Extra Trees Classifier: 96.74% accuracy with a standard deviation of 2.82%  


Experimental design:

For this study, we compare classification ensemble algorithms with three different datasets and infer why a particular algorithm would achieve the highest accuracy for a specific dataset. When building the design of this experiment, we assume default settings across all parameters.  

To increase the validity of our results, we iteratively fitted the ensemble models with new randomized training data (80% subset of original dataset) and cross-validated the model using repeated k-folds (folds=5, repeat=10). We repeat this five times in total, each iteration with a different seed, before averaging the outcomes to reduce noisy estimation of model accuracy from issues such as fluke training/test data split.

We used the repeated k-fold cross-validation to estimate the performance of the ensembles since this method reports the mean accuracy result across all folds from ten repetitions as per the default settings. This reduces the error in the mean estimate of the model performance compared to a single cross-validation without repeats.  


Results:

The bagging ensemble algorithm performed the best on the arrhythmia dataset with an accuracy of 75.16% and the minimal standard deviation of 4.30%. Bagging generates multiple bootstrap samples from the original dataset (with replacement) where each sample is independently used to fit their prediction model and then combined into an aggregated prediction. This decreases the variance of the prediction since the bagging approach creates subsets which often overlap to model the data. As a result, we suspect bagging to be the best ensemble for the arrhythmia dataset because the method handles higher dimensionality data compared to the other ensembles. 


Let’s contrast this to AdaBoost which performed the best on the caesarian dataset which only contained five features to explain the binary response. AdaBoost builds on decision stumps (one split per tree) and the boosting algorithm sequentially trains the model and adjusts an observation's weight depending on if the prediction was correct or incorrect, until eventually a predetermined number of trees are trained or observations are perfectly predicted. As a result, AdaBoost performs less well on the other datasets with more features as it is more sensitive to outliers and noise but performs the best on the caesarian dataset where performance is increased on datasets with small number of features that have strong predictive powers. 


On the other hand, the extra trees ensemble performed the best on the website phishing dataset but if we take a closer look at the random forest figures, there's only a 0.16% difference between the two. 

There are two main differences between extra trees and random trees:  
(1) Extra trees sample without replacement  
(2) And split points are randomized (not optimized like in random trees)    

The extra randomness may outperform random forests since mistakes are less correlated to each other and more concerned with the random split. Once the split points are selected, the algorithm chooses the best one between the subsets of features.


For our wine dataset (imported from the sklearn module), random forest performed the best which is also the ensemble that consistently performed strongly across all datasets. Random forests is a robust ensemble that is well equipped to deal with noise and outliers by discarding those values, it handles collinearity well because one of the features will use up the predictive power of the other feature, and it can manage large datasets with high dimensionality by ignoring features that do not provide the optimal split. This experiment proves random forests is a great all-rounder for supervised learning and can be used with multiple types of datasets to produce a relatively powerful predictive model.

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.api.types import CategoricalDtype
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier 
from sklearn.ensemble import ExtraTreesClassifier
from sklearn import metrics
from sklearn.model_selection import cross_val_score
from autorank import autorank, create_report, plot_stats
from sklearn.svm import SVC
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.datasets import load_wine
from sklearn.model_selection import RepeatedKFold

In [2]:
%load_ext autoreload
%autoreload 2

Arrhythmia: 

The "J" column had the majority of ? values so we assumed it wouldn't be useful in explaining arrhythmia in our models. After dropping the "J" column, we replaced the ? values with NaN and then dropped the rows which contained NaN. This trimmed our dataset down to 420 rows and 279 columns. 

In [3]:
sample_size = 10
RANDOM_STATE = 1234 
np.random.seed(RANDOM_STATE)

arrhythmia = pd.read_csv("arrhythmia.csv")
arrhythmia.columns = arrhythmia.columns.str.strip()

# Cleaning ? values 
arrhythmia = arrhythmia.drop(["J"], axis=1)
arrhythmia = arrhythmia.replace("?", np.nan).dropna()

arrhythmia_sd = []
arrhythmia_means = []

caesarian = pd.read_csv("caesarian.csv")
caesarian.columns = caesarian.columns.str.strip()
caesarian_sd = []
caesarian_means = []

phishing = pd.read_csv("website-phishing.csv")
phishing.columns = phishing.columns.str.strip()
phishing_sd = []
phishing_means = []

wine = load_wine()
wine_sd = []
wine_means = []

In [4]:
# Splitting response and explanatory variables 

X=arrhythmia.drop(["class"], axis=1)
y=arrhythmia["class"]

random_seed = 1234
decision_tree_temp = []
random_forest_temp = []
ada_boost_temp = []
bagging_model_temp = []
extra_trees_temp = []

decision_tree_sd_temp = []
random_forest_sd_temp = []
ada_boost_sd_temp = []
bagging_model_sd_temp = []
extra_trees_sd_temp = []


for i in range(5):
    multi = RepeatedKFold(n_splits = 5, random_state = random_seed)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_seed)
    
    # Training the Decision Tree classifier
    
    decision_tree_model = DecisionTreeClassifier(random_state=random_seed)
    decision_tree_model.fit(X_train, y_train)
    
    # Training the Random Forest classifier

    random_forest_model = RandomForestClassifier(n_estimators=10, random_state=random_seed)
    random_forest_model.fit(X_train, y_train)

    # Training Ada Boost classifier 

    ada_boost_model = AdaBoostClassifier(n_estimators=10, random_state=random_seed)
    ada_boost_model.fit(X_train, y_train)

    # Training Bagging Model classifier 

    bagging_model = BaggingClassifier(n_estimators=10, random_state=random_seed)
    bagging_model.fit(X_train, y_train)

    # Training Extra Trees classifier

    extra_trees_model = ExtraTreesClassifier(n_estimators=10, random_state=random_seed).fit(X_train, y_train)
    
    decision_tree_score = cross_val_score(decision_tree_model, X, y, cv=multi)
    random_forest_model_score = cross_val_score(random_forest_model, X, y, cv=multi)
    ada_boost_model_score = cross_val_score(ada_boost_model, X, y, cv=multi)
    bagging_model_score = cross_val_score(bagging_model, X, y, cv=multi)
    extra_trees_score = cross_val_score(extra_trees_model, X, y, cv=multi)
    
    # Append the standard deviation scores to respective temporary lists
    
    decision_tree_sd_temp.append(decision_tree_score.std())
    random_forest_sd_temp.append(random_forest_model_score.std())
    ada_boost_sd_temp.append(ada_boost_model_score.std())
    bagging_model_sd_temp.append(bagging_model_score.std())
    extra_trees_sd_temp.append(extra_trees_score.std())

    
    # Append the mean of CV scores to respective temporary lists 
    
    decision_tree_temp.append(decision_tree_score.mean())
    random_forest_temp.append(random_forest_model_score.mean())
    ada_boost_temp.append(ada_boost_model_score.mean())
    bagging_model_temp.append(bagging_model_score.mean())
    extra_trees_temp.append(extra_trees_score.mean())

    random_seed += 999

# Append the average of all means across CV runs 

arrhythmia_means.append(np.mean(decision_tree_temp))
arrhythmia_means.append(np.mean(random_forest_temp))
arrhythmia_means.append(np.mean(ada_boost_temp))
arrhythmia_means.append(np.mean(bagging_model_temp))
arrhythmia_means.append(np.mean(extra_trees_temp))

arrhythmia_sd.append(np.mean(decision_tree_sd_temp))
arrhythmia_sd.append(np.mean(random_forest_sd_temp))
arrhythmia_sd.append(np.mean(ada_boost_sd_temp))
arrhythmia_sd.append(np.mean(bagging_model_sd_temp))
arrhythmia_sd.append(np.mean(extra_trees_sd_temp))

print(arrhythmia_means)

x = {'DecisionTree':decision_tree_temp,
     'RandomForest':random_forest_temp,
     'AdaBoost':ada_boost_temp,
     'Bagging': bagging_model_temp,
     'ExtraTrees': extra_trees_temp,
    }
df = pd.DataFrame (x, columns = ['DecisionTree','RandomForest','AdaBoost','Bagging','ExtraTrees'])
print(df)
result = autorank(df, verbose=False)
print(result)
create_report(result)



[0.6562380952380952, 0.7112857142857142, 0.6175714285714285, 0.7515714285714286, 0.6985238095238093]
   DecisionTree  RandomForest  AdaBoost   Bagging  ExtraTrees
0      0.659524      0.708333  0.622143  0.753333    0.690714
1      0.646905      0.720000  0.619286  0.748810    0.696429
2      0.661667      0.704286  0.616667  0.748095    0.704524
3      0.663571      0.705476  0.613571  0.753333    0.699524
4      0.649524      0.718333  0.616190  0.754286    0.701429
RankResult(rankdf=
              meanrank      mean       std  ci_lower  ci_upper effect_size  \
Bagging            1.0  0.751571  0.002885   0.74626  0.756883           0   
RandomForest       2.2  0.711286  0.007367  0.705974  0.716597      7.2013   
ExtraTrees         2.8  0.698524  0.005263  0.693213  0.703835     12.4989   
DecisionTree       4.0  0.656238  0.007521  0.650927  0.661549     16.7379   
AdaBoost           5.0  0.617571  0.003262   0.61226  0.622883     43.5182   

               magnitude  
Bagging     

  ax1.set_yticklabels(np.insert(self.groupsunique.astype(str), 0, ''))


In [5]:
random_seed = 1234
decision_tree_temp = []
random_forest_temp = []
ada_boost_temp = []
bagging_model_temp = []
extra_trees_temp = []

decision_tree_sd_temp = []
random_forest_sd_temp = []
ada_boost_sd_temp = []
bagging_model_sd_temp = []
extra_trees_sd_temp = []

X=caesarian.drop(["class"], axis=1)
y=caesarian["class"]

for i in range(5):
    multi = RepeatedKFold(n_splits = 5, random_state = random_seed)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_seed)
    
    # Training the Decision Tree classifier
    
    decision_tree_model = DecisionTreeClassifier(random_state=random_seed)
    decision_tree_model.fit(X_train, y_train)
    
    # Training the Random Forest classifier

    random_forest_model = RandomForestClassifier(n_estimators=10, random_state=random_seed)
    random_forest_model.fit(X_train, y_train)

    # Training Ada Boost classifier 

    ada_boost_model = AdaBoostClassifier(n_estimators=10, random_state=random_seed)
    ada_boost_model.fit(X_train, y_train)

    # Training Bagging Model classifier 

    bagging_model = BaggingClassifier(n_estimators=10, random_state=random_seed)
    bagging_model.fit(X_train, y_train)

    # Training Extra Trees classifier

    extra_trees_model = ExtraTreesClassifier(n_estimators=10, random_state=random_seed).fit(X_train, y_train)
    
    decision_tree_score = cross_val_score(decision_tree_model, X, y, cv=multi)
    random_forest_model_score = cross_val_score(random_forest_model, X, y, cv=multi)
    ada_boost_model_score = cross_val_score(ada_boost_model, X, y, cv=multi)
    bagging_model_score = cross_val_score(bagging_model, X, y, cv=multi)
    extra_trees_score = cross_val_score(extra_trees_model, X, y, cv=multi)
    
    # Append the standard deviation scores to respective temporary lists
    
    decision_tree_sd_temp.append(decision_tree_score.std())
    random_forest_sd_temp.append(random_forest_model_score.std())
    ada_boost_sd_temp.append(ada_boost_model_score.std())
    bagging_model_sd_temp.append(bagging_model_score.std())
    extra_trees_sd_temp.append(extra_trees_score.std())

    
    decision_tree_temp.append(decision_tree_score.mean())
    random_forest_temp.append(random_forest_model_score.mean())
    ada_boost_temp.append(ada_boost_model_score.mean())
    bagging_model_temp.append(bagging_model_score.mean())
    extra_trees_temp.append(extra_trees_score.mean())

    random_seed += 999


caesarian_means.append(np.mean(decision_tree_temp))
caesarian_means.append(np.mean(random_forest_temp))
caesarian_means.append(np.mean(ada_boost_temp))
caesarian_means.append(np.mean(bagging_model_temp))
caesarian_means.append(np.mean(extra_trees_temp))

caesarian_sd.append(np.mean(decision_tree_sd_temp))
caesarian_sd.append(np.mean(random_forest_sd_temp))
caesarian_sd.append(np.mean(ada_boost_sd_temp))
caesarian_sd.append(np.mean(bagging_model_sd_temp))
caesarian_sd.append(np.mean(extra_trees_sd_temp))

x = {'DecisionTree':decision_tree_temp,
     'RandomForest':random_forest_temp,
     'AdaBoost':ada_boost_temp,
     'Bagging': bagging_model_temp,
     'ExtraTrees': extra_trees_temp,
    }
df = pd.DataFrame (x, columns = ['DecisionTree','RandomForest','AdaBoost','Bagging','ExtraTrees'])
result = autorank(df, verbose=False)
print(result)
create_report(result)
    

RankResult(rankdf=
              meanrank     mean       std  ci_lower  ci_upper effect_size  \
AdaBoost           1.0  0.60975  0.007469  0.597623  0.621877           0   
RandomForest       2.5  0.54850  0.013619  0.536373  0.560627     5.57683   
Bagging            3.1  0.54425  0.012766  0.532123  0.556377       6.263   
ExtraTrees         3.8  0.53500  0.011924  0.522873  0.547127     7.51325   
DecisionTree       4.6  0.52675  0.016574  0.514623  0.538877     6.45696   

               magnitude  
AdaBoost      negligible  
RandomForest       large  
Bagging            large  
ExtraTrees         large  
DecisionTree       large  
pvalue=4.475291790504448e-08
cd=None
omnibus=anova
posthoc=tukeyhsd
all_normal=True
pvals_shapiro=[0.5007793307304382, 0.2373344749212265, 0.9443285465240479, 0.07801619917154312, 0.19198185205459595]
homoscedastic=True
pval_homogeneity=0.7054695098014168
homogeneity_test=bartlett
alpha=0.05
alpha_normality=0.01
num_samples=5
posterior_matrix=
None
decis

  ax1.set_yticklabels(np.insert(self.groupsunique.astype(str), 0, ''))


In [6]:
random_seed = 1234
decision_tree_temp = []
random_forest_temp = []
ada_boost_temp = []
bagging_model_temp = []
extra_trees_temp = []

decision_tree_sd_temp = []
random_forest_sd_temp = []
ada_boost_sd_temp = []
bagging_model_sd_temp = []
extra_trees_sd_temp = []

X=phishing.drop(["Class"], axis=1)
y=phishing["Class"]

for i in range(5):
    multi = RepeatedKFold(n_splits=5, random_state=random_seed)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_seed)
    
    # Training the Decision Tree classifier
    
    decision_tree_model = DecisionTreeClassifier(random_state=random_seed)
    decision_tree_model.fit(X_train, y_train)
    
    # Training the Random Forest classifier

    random_forest_model = RandomForestClassifier(n_estimators=10, random_state=random_seed)
    random_forest_model.fit(X_train, y_train)

    # Training Ada Boost classifier 

    ada_boost_model = AdaBoostClassifier(n_estimators=10, random_state=random_seed)
    ada_boost_model.fit(X_train, y_train)

    # Training Bagging Model classifier 

    bagging_model = BaggingClassifier(n_estimators=10, random_state=random_seed)
    bagging_model.fit(X_train, y_train)

    # Training Extra Trees classifier

    extra_trees_model = ExtraTreesClassifier(n_estimators=10, random_state=random_seed).fit(X_train, y_train)
    
    
    decision_tree_score = cross_val_score(decision_tree_model, X, y, cv=multi)
    random_forest_model_score = cross_val_score(random_forest_model, X, y, cv=multi)
    ada_boost_model_score = cross_val_score(ada_boost_model, X, y, cv=multi)
    bagging_model_score = cross_val_score(bagging_model, X, y, cv=multi)
    extra_trees_score = cross_val_score(extra_trees_model, X, y, cv=multi)
    
    
    # Append the standard deviation scores to respective temporary lists
    
    decision_tree_sd_temp.append(decision_tree_score.std())
    random_forest_sd_temp.append(random_forest_model_score.std())
    ada_boost_sd_temp.append(ada_boost_model_score.std())
    bagging_model_sd_temp.append(bagging_model_score.std())
    extra_trees_sd_temp.append(extra_trees_score.std())

    decision_tree_temp.append(decision_tree_score.mean())
    random_forest_temp.append(random_forest_model_score.mean())
    ada_boost_temp.append(ada_boost_model_score.mean())
    bagging_model_temp.append(bagging_model_score.mean())
    extra_trees_temp.append(extra_trees_score.mean())

    random_seed += 999

phishing_means.append(np.mean(decision_tree_temp))
phishing_means.append(np.mean(random_forest_temp))
phishing_means.append(np.mean(ada_boost_temp))
phishing_means.append(np.mean(bagging_model_temp))
phishing_means.append(np.mean(extra_trees_temp))

phishing_sd.append(np.mean(decision_tree_sd_temp))
phishing_sd.append(np.mean(random_forest_sd_temp))
phishing_sd.append(np.mean(ada_boost_sd_temp))
phishing_sd.append(np.mean(bagging_model_sd_temp))
phishing_sd.append(np.mean(extra_trees_sd_temp))


x = {'DecisionTree':decision_tree_temp,
     'RandomForest':random_forest_temp,
     'AdaBoost':ada_boost_temp,
     'Bagging': bagging_model_temp,
     'ExtraTrees': extra_trees_temp,
    }
df = pd.DataFrame (x, columns = ['DecisionTree','RandomForest','AdaBoost','Bagging','ExtraTrees'])
print(df)
result = autorank(df, verbose=False)
print(result)
create_report(result)


   DecisionTree  RandomForest  AdaBoost   Bagging  ExtraTrees
0      0.963220      0.970095  0.929037  0.968204    0.971588
1      0.962994      0.970185  0.929073  0.968295    0.971253
2      0.963238      0.969869  0.928820  0.967716    0.972013
3      0.962795      0.969462  0.929236  0.967644    0.971325
4      0.962388      0.969787  0.928937  0.967734    0.971398
RankResult(rankdf=
              meanrank      mean       std  ci_lower  ci_upper effect_size  \
ExtraTrees         1.0  0.971515  0.000305  0.971242  0.971788           0   
RandomForest       2.0  0.969880  0.000284  0.969607  0.970152     5.55035   
Bagging            3.0  0.967919  0.000306  0.967646  0.968191     11.7816   
DecisionTree       4.0  0.962927  0.000352  0.962654    0.9632     26.0904   
AdaBoost           5.0  0.929020  0.000155  0.928748  0.929293     175.675   

               magnitude  
ExtraTrees    negligible  
RandomForest       large  
Bagging            large  
DecisionTree       large  
AdaBo

  ax1.set_yticklabels(np.insert(self.groupsunique.astype(str), 0, ''))


In [7]:
X = wine.data
y = wine.target

random_seed = 1234
decision_tree_temp = []
random_forest_temp = []
ada_boost_temp = []
bagging_model_temp = []
extra_trees_temp = []

decision_tree_sd_temp = []
random_forest_sd_temp = []
ada_boost_sd_temp = []
bagging_model_sd_temp = []
extra_trees_sd_temp = []

for i in range(5):
    multi = RepeatedKFold(n_splits = 5, random_state = random_seed)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_seed)
    
    # Training the Decision Tree classifier
    
    decision_tree_model = DecisionTreeClassifier(random_state=random_seed)
    decision_tree_model.fit(X_train, y_train)
    
    # Training the Random Forest classifier

    random_forest_model = RandomForestClassifier(n_estimators=10, random_state=random_seed)
    random_forest_model.fit(X_train, y_train)

    # Training Ada Boost classifier 

    ada_boost_model = AdaBoostClassifier(n_estimators=10, random_state=random_seed)
    ada_boost_model.fit(X_train, y_train)

    # Training Bagging Model classifier 

    bagging_model = BaggingClassifier(n_estimators=10, random_state=random_seed)
    bagging_model.fit(X_train, y_train)

    # Training Extra Trees classifier

    extra_trees_model = ExtraTreesClassifier(n_estimators=10, random_state=random_seed).fit(X_train, y_train)
    
    decision_tree_score = cross_val_score(decision_tree_model, X, y, cv=multi)
    random_forest_model_score = cross_val_score(random_forest_model, X, y, cv=multi)
    ada_boost_model_score = cross_val_score(ada_boost_model, X, y, cv=multi)
    bagging_model_score = cross_val_score(bagging_model, X, y, cv=multi)
    extra_trees_score = cross_val_score(extra_trees_model, X, y, cv=multi)
    
    # Append the standard deviation scores to respective temporary lists
    
    decision_tree_sd_temp.append(decision_tree_score.std())
    random_forest_sd_temp.append(random_forest_model_score.std())
    ada_boost_sd_temp.append(ada_boost_model_score.std())
    bagging_model_sd_temp.append(bagging_model_score.std())
    extra_trees_sd_temp.append(extra_trees_score.std())

    
    decision_tree_temp.append(decision_tree_score.mean())
    random_forest_temp.append(random_forest_model_score.mean())
    ada_boost_temp.append(ada_boost_model_score.mean())
    bagging_model_temp.append(bagging_model_score.mean())
    extra_trees_temp.append(extra_trees_score.mean())

    random_seed += 999

wine_means.append(np.mean(decision_tree_temp))
wine_means.append(np.mean(random_forest_temp))
wine_means.append(np.mean(ada_boost_temp))
wine_means.append(np.mean(bagging_model_temp))
wine_means.append(np.mean(extra_trees_temp))


wine_sd.append(np.mean(decision_tree_sd_temp))
wine_sd.append(np.mean(random_forest_sd_temp))
wine_sd.append(np.mean(ada_boost_sd_temp))
wine_sd.append(np.mean(bagging_model_sd_temp))
wine_sd.append(np.mean(extra_trees_sd_temp))

x = {'DecisionTree':decision_tree_temp,
     'RandomForest':random_forest_temp,
     'AdaBoost':ada_boost_temp,
     'Bagging': bagging_model_temp,
     'ExtraTrees': extra_trees_temp,
    }
df = pd.DataFrame (x, columns = ['DecisionTree','RandomForest','AdaBoost','Bagging','ExtraTrees'])
print(df)
result = autorank(df, verbose=False)
print(result)
create_report(result)


   DecisionTree  RandomForest  AdaBoost   Bagging  ExtraTrees
0      0.906857      0.970651  0.887222  0.952286    0.967476
1      0.907429      0.966302  0.891016  0.952794    0.967429
2      0.901111      0.968587  0.889127  0.954476    0.961746
3      0.905667      0.970270  0.881873  0.954413    0.971365
4      0.913968      0.972397  0.882476  0.954905    0.969000
RankResult(rankdf=
              meanrank      mean       std  ci_lower  ci_upper effect_size  \
RandomForest       1.4  0.969641  0.002306   0.96645  0.972832           0   
ExtraTrees         1.6  0.967403  0.003545  0.964212  0.970594    0.748492   
Bagging            3.0  0.953775  0.001157  0.950584  0.956966     8.69694   
DecisionTree       4.0  0.907006  0.004616  0.903815  0.910197     17.1676   
AdaBoost           5.0  0.886343  0.004040  0.883152  0.889534     25.3229   

               magnitude  
RandomForest  negligible  
ExtraTrees        medium  
Bagging            large  
DecisionTree       large  
AdaBo

  ax1.set_yticklabels(np.insert(self.groupsunique.astype(str), 0, ''))
