# Model comparison - linear model
# Drop pi questions with selection metric on validation set
# Train set - holdout set swapped

Reduced models were obtained by the following procedure:
* drop all questions from sum score PI
* drop next question according to selection metric on **validation set**

Selection metric criteria:
* `ca_min`: max of min conditional accuracy
* `ca_prod`: max of product conditional accuracy
* `log_loss_max`: min of max conditional cross-entropy
* `log_loss` min of cross-entropy
* `mse`: min of mean square error
* `mse_max`: min of max conditional mean square error

## Environment initialization

In [None]:
%autosave 0
%matplotlib notebook
%load_ext autoreload
%autoreload 2

import mod_evaluation
import mod_viewer
import results_cache

## Load results

In [None]:
results_folder = 'data/results'

questions_stats, questions_info = results_cache.get_questions(
    results_folder,
    sources=results_cache.sources_rev
)

df_questions, df_questions_ca = results_cache.get_df_questions(
    questions_stats, 
    questions_info
)

stats_holdout = mod_evaluation.from_cache(
    'questions_drop_pi_holdout_train', 
    results_folder
)

df_questions_holdout, df_questions_holdout_ca = results_cache.get_df_questions(
    stats_holdout, 
    questions_info,
    ci=False
)

## Model comparison: mean accuracy on validation set

* Mean accuracy on validation set according to selection metric
* Confidence interval estimated by bootstrap method over cross-validation repetitions

(clicking on labels adds/removes traces, double-clicking selects single trace)

* `ca_min`: max of min conditional accuracy
* `ca_prod`: max of product conditional accuracy
* `log_loss_max`: min of max conditional cross-entropy
* `log_loss` min of cross-entropy
* `mse`: min of mean square error
* `mse_max`: min of max conditional mean square error

In [None]:
mod_viewer.plot_simple(
    [
        [df_questions[item], item]
        for item in df_questions
    ],
    title='Mean validation accuracy',
    ci=True,
    black_trace=2,
    yaxes_range=[0.8, 1]
)

## Model comparison: mean conditional accuracy on validation set

* Mean conditional accuracy on validation set according to selection metric
* Confidence interval estimated by bootstrap method over cross-validation repetitions

(clicking on labels adds/removes traces, double-clicking selects single trace)

In [None]:
mod_viewer.tab_plot_conditional_accuracy(
    df_questions,
    df_questions_ca,
    questions_info,
    model_type_name=results_cache.model_type_name
)

## Model validation: mean accuracy on holdout set

* Accuracy on holdout set according to selection metric
* Holdout accuracy outside confidence interval bounds may indicate (1) model overfitting or (2) data domain shift

(clicking on labels adds/removes traces, double-clicking selects single trace)

In [None]:
mod_viewer.tab_plot_descent_methods_validation(
    df_questions, 
    questions_info,
    model_type_name=results_cache.model_type_name,
    stats_holdout=stats_holdout
)

## Model validation: mean accuracy on holdout set

* Accuracy on holdout set according to selection metric
* Holdout accuracy outside confidence interval bounds may indicate (1) model overfitting or (2) data domain shift

(clicking on labels adds/removes traces, double-clicking selects single trace)

In [None]:
mod_viewer.tab_plot_conditional_accuracy(
    df_questions_holdout,
    df_questions_holdout_ca,
    questions_info,
    model_type_name=results_cache.model_type_name
)