# Stacking upper limit

This notebook tries to figure out the upper limit on performance using Degenerate EM stacking on collaborative experimental setting on a per week basis.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np
np.random.seed(1234)

import sys
sys.path.append("../src")

In [21]:
import matplotlib.pyplot as plt
import pandas as pd
import utils.data as udata
import utils.dists as udists
import utils.misc as u
import models
import os
import losses
import yaml

from functools import partial
from jrun import jin
from tqdm import tqdm
from copy import deepcopy

## Setup notebook parameters

In [3]:
EXP_NAME = "collaborative"
data_dir = "../data"
exp_dir = os.path.join(data_dir, "processed", EXP_NAME)

with open("../config.yaml") as fp:
    CONFIG = yaml.load(fp)
    
TEST_SPLIT_THRESH = CONFIG["TEST_SPLIT_THRESH"][EXP_NAME]

COMPONENTS = [udata.Component(exp_dir, name) for name in u.available_models(exp_dir)]
ACTUAL_DL = udata.ActualDataLoader(data_dir)

REGIONS = ["nat", *[f"hhs{i}" for i in range(1, 11)], None]
TARGETS = [udata.Target(t) for t in [1, 2, 3, 4, "peak", "peak_wk", "onset_wk"]]

We will work with the target based weights for now. In the usual setting the degenerate EM estimation goes like this:
1. (From the training data) Generate `y` (n_obs,) and `Xs` for each target.
2. Find log scores for the set above to get a score matrix of shape (n_obs, n_models).
3. Pass the (exp-ed) scores to `models.dem` for getting `n_models` weights.
4. Follow steps from 1 to finally get `n_models x n_target_type` weights.

After getting the weights, we apply them on test data for estimation.

In this case, we need to check how well the _overfitted_ stacking works.

In some ways, this is to check if the components are capable of giving the final results. As an example, if the truth is outside the range of prediction from all the components, no set of weights on the components is going to give us the correct answer.

Even if the components do cover the truth, its not guaranteed to get the exact answer because the truth might have been consistently given low value by all the components and adding weight for any component will add weight to _its_ peak too (along with the truth).

In this notebook, we try to see how well do the components cover the truth by overfitting degenerate em per `target_type` and week.

After this step we will have weight matrix with shape (n_models, n_target_type, n_week). Using the same training data, we will then estimate the log score and look at that.

In [4]:
def get_weeks(yi):
    return np.unique(np.array(yi[:, 0], dtype=np.uint))

In [16]:
def do_target(target, final_scores):
    for region in [None]:
        y, Xs, yi = target.get_training_data(
            ACTUAL_DL, COMPONENTS, region, TEST_SPLIT_THRESH
        )
        y_one_hot = udists.actual_to_one_hot(y, bins=target.bins)
        scores = udists.score_predictions(Xs, y)
        weeks = get_weeks(yi)
    
        e_scores = []
    
        for week in tqdm(weeks):
            f_indices = (yi[:, 0] == week) # TODO: Add filter for region when used
            Xs_f = [X[f_indices] for X in Xs]
            # Estimation step
            weights_f = models.dem(np.exp(scores[f_indices]))
            # Find the score on the same set
            output_f = udists.weighted_ensemble(Xs_f, weights_f)
            e_scores.append(losses.mean_cat_cross(y_one_hot[f_indices], output_f))

        final_scores["region"].append(region if region else "all")
        final_scores["target"].append(target.name)
        final_scores["score"].append(np.mean(e_scores))
    return final_scores

In [14]:
final_scores = {
    "region": [],
    "target": [],
    "score": []
} # This will contain all the scores

for target in TARGETS:
    print(f"Working on target {target.name}")
    final_scores = do_target(target, final_scores)

Working on target 1


100%|██████████| 134/134 [00:07<00:00, 18.62it/s]


Working on target 2


100%|██████████| 134/134 [00:06<00:00, 19.93it/s]


Working on target 3


100%|██████████| 134/134 [00:07<00:00, 18.12it/s]


Working on target 4


100%|██████████| 134/134 [00:08<00:00, 17.11it/s]


Working on target peak


100%|██████████| 134/134 [00:08<00:00, 11.85it/s]


Working on target peak_wk


100%|██████████| 134/134 [00:11<00:00, 11.53it/s]


Working on target onset_wk


100%|██████████| 134/134 [00:09<00:00, 13.94it/s]


In [20]:
pd.DataFrame(final_scores)

Unnamed: 0,region,score,target
0,all,2.531054,1
1,all,2.582795,2
2,all,2.690632,3
3,all,2.793264,4
4,all,2.669791,peak
5,all,1.385797,peak_wk
6,all,1.179273,onset_wk
