## Background


We often have to compare parameter estimates across several versions of a model.

- Same model estimated with several estimators (ols, iv, gmm, ...)
- Model estimated by numerical optimization with different optimizers and or different start values for the optimization
- Monte Carlo exercises

Especially in large models (100 or more parameters) this is time consuming. Therefore, we need a plot that makes it easier to see:

- how large the differences in parameter estimates are between different models
- whether the confidence intervals of one model contain the parameter estimates of other models

If we just plot every parameter estimate and every confidence interval at the same time, the plot won't show anything because it is too full. Therefore, we need an interactive plot that always shows exactly what we want to see. Probably the best library to do this is [bokeh](https://bokeh.pydata.org/en/latest/index.html), but if you find another one you can also use it. My first strategy would be to use something like [this](https://bokeh.pydata.org/en/latest/docs/gallery/elements.html) to plot the estimates for one parameter across models. The official [tutorials](https://hub.mybinder.org/user/bokeh-bokeh-notebooks-1rrayuuy/notebooks/tutorial/00%20-%20Introduction%20and%20Setup.ipynb) also explain how to make those plots interactive, how to stack several plots into one figure and how to link the subplots within one figure.

You should start by writing a very basic function that only takes a data dictionary as argument and produces the plot I described in our last meeting. Later we will add more arguments for colors, background styles, etc. 

Please just work in this notebook.

In [8]:
import pandas as pd
import numpy as np
from comparison_plot import comparison_plot
from bokeh.plotting import output_notebook
from bokeh.palettes import Category20
output_notebook()

In [11]:
nr_models = 10
nr_intercepts = 5

intercept_tuples = [str(i) for i in range(nr_intercepts)]
short_controls = ['age', 'grade', 'iq_score', 'gender']
long_controls = short_controls + [
    'parent_educ', 'parent_inc', 'parent_occup', 'dist_to_school', 
    'single_parent', 'single_parent_x_parent_inc']

cols = [
    'model', 'param_value', 'param_name',
    'conf_int_lower', 'conf_int_upper', 'param_group', 
    'widget_group', 'color']

model_comp_df = pd.DataFrame()

for i in range(nr_models):
    if i < 0.7 * nr_models:
        ctrl_names = short_controls
        factor = 0.5
        color = Category20[20][0]
        widget_label = "small models"
    else:
        ctrl_names = long_controls
        factor = 0.7
        color = Category20[20][2]
        widget_label = "large models"
    nr_params = nr_intercepts + len(ctrl_names)
    
    df = pd.DataFrame(columns=cols)
    df['param_value'] = factor * np.arange(nr_params) + np.random.normal(0, 1, nr_params)
    df['param_name'] = [str(i) for i in range(nr_intercepts)] + ctrl_names
    df['conf_int_lower'] = df['param_value'] - np.random.normal(1, factor, nr_params)
    df['conf_int_upper'] = df['param_value'] + np.random.normal(1, factor, nr_params)
    df['param_group'] = ['intercept'] * nr_intercepts + [r'$\beta$'] * len(ctrl_names)
    df['model'] = 'model_' + str(i)
    df['color'] = color
    df["widget_group"] = widget_label
    
    model_comp_df = pd.concat([model_comp_df, df], axis=0, sort=False)
    
model_comp_df.reset_index(inplace=True, drop=True)
model_comp_df = model_comp_df[
    ['model', 'param_group', 'param_name',  'color', 'widget_group',
     'conf_int_lower', 'param_value', 'conf_int_upper']]
model_comp_df[:9]

Unnamed: 0,model,param_group,param_name,color,widget_group,conf_int_lower,param_value,conf_int_upper
0,model_0,intercept,0,#1f77b4,small models,-1.829235,-0.656261,0.321222
1,model_0,intercept,1,#1f77b4,small models,-0.776941,0.696757,1.954641
2,model_0,intercept,2,#1f77b4,small models,-0.247585,1.386436,2.115413
3,model_0,intercept,3,#1f77b4,small models,1.195155,1.760466,3.754531
4,model_0,intercept,4,#1f77b4,small models,1.838701,2.131281,3.131246
5,model_0,$\beta$,age,#1f77b4,small models,0.442168,1.288789,2.25321
6,model_0,$\beta$,grade,#1f77b4,small models,2.266581,4.102435,5.52847
7,model_0,$\beta$,iq_score,#1f77b4,small models,1.404215,2.734055,3.736799
8,model_0,$\beta$,gender,#1f77b4,small models,4.428613,4.689733,5.039652


In [12]:
comparison_plot(df=model_comp_df)