# Model comparison
# Drop pi questions with selection metric on train set

Reduced models were obtained by the following procedure:
* drop all questions from sum score PI
* drop next question according to selection metric on **train set**

Selection metric criteria:
* `ca`: min of mean conditional accuracy
* `ca_class`: max of min conditional accuracy
* `ca_prod`: max of product conditional accuracy
* `mse`: min of mean square error
* `mse_class`: min of max conditional mean square error
* `xent` min of cross-entropy
* `xent_class` min of max cross-entropy

http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf

## Environment initialization

In [1]:
%autosave 0
%matplotlib notebook
%load_ext autoreload
%autoreload 2

import ipywidgets as widgets
import plotly.graph_objects as go

import sys
sys.path.append("../")

import mod_evaluation
import mod_viewer

Autosave disabled


## Execution params

In [2]:
results_path = 'data/results'

model_ref_id = 'linear'

n_splits = 25

metrics = mod_evaluation.sort_params_train

train_val_random = None

In [3]:
cache_pre = 'model_'+model_ref_id
cache_post = str(n_splits)

if train_val_random is not None:
    cache_post += '_r'+str(train_val_random)

## Load results

In [4]:
from copy import deepcopy

available = {0: [], 1: []}

for item in mod_evaluation.list_cache(results_path):
    if '_0_' in item:
        available[0] += [item]
    if '_1_' in item:
        available[1] += [item]    

my_data = {}

for run_type in available:
    
    info, stats, stats_val = {}, {}, {}

    for item in available[run_type]:

        for metric_id in metrics:

            cache_sig = mod_evaluation.cache_sig_gen(
                metric_id, 
                cache_pre=cache_pre, 
                cache_post=cache_post
            )
    
            if cache_sig in item:
                
                run_id = item.split('_')[-1]

                cache = mod_evaluation.from_cache(
                    item, 
                    results_path
                )
                
                if metric_id not in info:
                    
                    info[metric_id] = {}
                    stats[metric_id] = {}
                    stats_val[metric_id] = {}
                    
                else:

                    info[metric_id][run_id] = {
                        int(model.split(' + ')[0]): cache['info'][model]
                        for model in cache['info']
                        if len(model.split(' + '))>1
                    }
                    stats[metric_id][run_id] = {
                        int(model.split(' + ')[0]): cache['stats'][model]
                        for model in cache['stats']
                        if len(model.split(' + '))>1
                    }
                    stats_val[metric_id][run_id] = {
                        int(model.split(' + ')[0]): cache['stats_val'][model]
                        for model in cache['stats_val']
                        if len(model.split(' + '))>1
                    }

    my_data[run_type] = [deepcopy(info), deepcopy(stats), deepcopy(stats_val)]
    
    print('Loaded', run_type)

Loaded 0
Loaded 1


In [5]:
my_data_flat = {}

for run_type in my_data:
    
    info, stats, stats_val = my_data[run_type]
    info_flat, stats_flat, stats_val_flat = {}, {}, {}

    for metric_id in info:
        
        info_flat[metric_id] = deepcopy(info[metric_id]['r0'])
        stats_flat[metric_id] = deepcopy(stats[metric_id]['r0'])
        stats_val_flat[metric_id] = deepcopy(stats_val[metric_id]['r0'])
        
        for run_id in info[metric_id]:
            
            if run_id=='r0':
                continue
                
            for model_id in info[metric_id][run_id]:

                stats_flat[metric_id][model_id] += stats[metric_id][run_id][model_id]
                stats_val_flat[metric_id][model_id] += stats_val[metric_id][run_id][model_id]
                
    my_data_flat[run_type] = [deepcopy(info_flat), deepcopy(stats_flat), deepcopy(stats_val_flat)]

In [6]:
df_questions_0, df_questions_val_0 = mod_evaluation.get_df_questions(
    my_data_flat[0][0], my_data_flat[0][1], my_data_flat[0][2],
    ci=True
)

df_questions_1, df_questions_val_1 = mod_evaluation.get_df_questions(
    my_data_flat[1][0], my_data_flat[1][1], my_data_flat[1][2],
    ci=True
)

df_questions_ca_0, df_questions_val_ca_0 = mod_evaluation.get_df_questions_ca(
    my_data_flat[0][0], my_data_flat[0][1], my_data_flat[0][2],
    ci=False
)

df_questions_ca_1, df_questions_val_ca_1 = mod_evaluation.get_df_questions_ca(
    my_data_flat[1][0], my_data_flat[1][1], my_data_flat[1][2],
    ci=False
)

# Model comparison

## Mean accuracy on validation set (cross-validation)

* Mean accuracy on validation set (cross-validation) according to selection metric
* Confidence interval estimated by bootstrap method over cross-validation repetitions

(clicking on labels adds/removes traces, double-clicking selects single trace)

* `ca_class`: max of min conditional accuracy
* `mse_class`: min of max conditional mean square error

Figures: **train** set (top), **holdout set** (bottom)

In [7]:
display(mod_viewer.plot_accuracy_mse(df_questions_1))

display(mod_viewer.plot_accuracy_mse(df_questions_0))

HBox(children=(FigureWidget({
    'data': [{'fill': 'toself',
              'fillcolor': 'rgba(31, 119, 180, 0…

HBox(children=(FigureWidget({
    'data': [{'fill': 'toself',
              'fillcolor': 'rgba(31, 119, 180, 0…

## Mean conditional accuracy on validation set  (cross-validation)

* Mean conditional accuracy on validation set (cross-validation) according to selection metric
* Confidence interval estimated by bootstrap method over cross-validation repetitions

(clicking on labels adds/removes traces, double-clicking selects single trace)

Figures: **train** set (top), **holdout set** (bottom)

In [8]:
display(mod_viewer.tab_plot_conditional_accuracy(
    df_questions_1,
    df_questions_ca_1,
    info
))

display(mod_viewer.tab_plot_conditional_accuracy(
    df_questions_0,
    df_questions_ca_0,
    info
))

Tab(children=(HBox(children=(FigureWidget({
    'data': [{'line': {'color': 'rgba(31, 119, 180, 0.6)'},
      …

Tab(children=(HBox(children=(FigureWidget({
    'data': [{'line': {'color': 'rgba(31, 119, 180, 0.6)'},
      …

# Model validation

## Mean accuracy on holdout set

* Accuracy on holdout set according to selection metric
* Holdout accuracy outside confidence interval bounds may indicate (1) model overfitting or (2) data domain shift

(clicking on labels adds/removes traces, double-clicking selects single trace)

Figures: **train** set (top), **holdout set** (bottom)

In [9]:
display(mod_viewer.tab_plot_accuracy(
    df_questions_1,
    info,
    df_questions_holdout=df_questions_val_1
))

display(mod_viewer.tab_plot_accuracy(
    df_questions_0,
    info,
    df_questions_holdout=df_questions_val_0
))

Tab(children=(HBox(children=(FigureWidget({
    'data': [{'fill': 'toself',
              'fillcolor': 'rgba(3…

Tab(children=(HBox(children=(FigureWidget({
    'data': [{'fill': 'toself',
              'fillcolor': 'rgba(3…

## Mean conditional accuracy on validation set

* Accuracy on holdout set according to selection metric
* Holdout accuracy outside confidence interval bounds may indicate (1) model overfitting or (2) data domain shift

(clicking on labels adds/removes traces, double-clicking selects single trace)

Figures: **train** set (top), **holdout set** (bottom)

In [10]:
display(mod_viewer.tab_plot_conditional_accuracy(
    df_questions_val_1,
    df_questions_val_ca_1,
    info,
    holdout=True
))

display(mod_viewer.tab_plot_conditional_accuracy(
    df_questions_val_0,
    df_questions_val_ca_0,
    info,
    holdout=True
))

Tab(children=(HBox(children=(FigureWidget({
    'data': [{'line': {'color': 'rgba(31, 119, 180, 0.6)'},
      …

Tab(children=(HBox(children=(FigureWidget({
    'data': [{'line': {'color': 'rgba(31, 119, 180, 0.6)'},
      …