# Questions this notebook seeks to answer
This notebook is motivated by two questions:
* The *utility score* is defined over a number of days e.g. over the date range of the training dataset, or the mock test / public / private dataset. What if we take a slightly granular look by sliding, say, a 100-day or 200-day window, how would the *utility score* evolve over time?
* How would the above compare between ```resp```, ```resp_1```, ```resp_2```, ```resp_3``` and ```resp_4```?

This notebook calculates the *utility score* as defined under the competition [evaluation tab](https://www.kaggle.com/c/jane-street-market-prediction/overview/evaluation): 
For each ```date``` i, we have

$ p_i = \sum_j (weight_{ij} * resp_{ij} * action_{ij}) $

$ t = \frac{\sum p_i}{\sqrt{\sum p_i^2}} * \sqrt{\frac{250}{\mid i \mid}} $

$ u = min(max(t, 0), 6) \sum p_i $

This is a follow-on notebook from [Day 85 before vs after: a look at utility score](https://www.kaggle.com/marychin/day-85-before-vs-after-a-look-at-utility-score).

In [None]:
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets
from datetime import datetime
from pytz import timezone
print('tic', datetime.now(timezone('Canada/Pacific')).isoformat(timespec='minutes'))

In [None]:
train = pd.read_csv('../input/jane-street-market-prediction/train.csv')

# just slimming down

# remove rows we don't need
train = train.loc[ train['weight']>0 ]

# remove columns we don't need
train = train[ ['resp', 'resp_1', 'resp_2', 'resp_3', 'resp_4', 'date', 'weight'] ]

targets = ['resp', 'resp_1', 'resp_2', 'resp_3', 'resp_4']

# Step 1: $ p_i = \sum_j (weight_{ij} * resp_{ij} * action_{ij}) $
* I'm going to use ```dailyp``` to represent $p_i$. Not using ```pi```, which would feel like $\pi$.
* Sanity check for the next cell was already done in a preceding notebook, [Day 85 before vs after: a look at utility score](https://www.kaggle.com/marychin/day-85-before-vs-after-a-look-at-utility-score).

In [None]:
dailyp = pd.DataFrame(index=train['date'].unique(), columns=targets)
dailyp.index.name = 'date'
for ntarget, target in enumerate(targets):
# assuming action=1 when target>0
    df = train.loc[ train[target]>0 ].copy()
    dailyp[target] = df.groupby('date', sort=False).apply(lambda x: (x['weight'] * x[target]).sum())
dailyp

# Step 2: $ t = \frac{\sum p_i}{\sqrt{\sum p_i^2}} * \sqrt{\frac{250}{\mid i \mid}} $

In [None]:
t = dailyp.apply(lambda x: x.sum() / np.sqrt((x**2).sum()) * np.sqrt(250/len(x)))

# Step 3: $ u = min(max(t, 0), 6) \sum p_i $
Noting the values for ```t``` from the previous cell, ```t``` obviously never gets a chance to be of any effect. 
So, effectively $ u = 6\sum p_i $.

# Utility score for the entire training set

In [None]:
util_all = 6 * dailyp.sum()
ax = util_all.plot.bar(ylabel='utility score', grid=True)
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
util_all

# Sliding-window utility scores

In [None]:
window = 150     # CHANGE TO DESIRED WINDOW WIDTH
window = max(100, min(window, len(dailyp)-2) )
util = pd.DataFrame(index=np.arange(len(dailyp)-window), columns=targets)
for ntarget, target in enumerate(targets):
    for head in range(len(dailyp)-window):
        dailyps = dailyp.loc[head:head+window, target]
        t = dailyps.sum() / np.sqrt((dailyps**2).sum() * np.sqrt(250/window))
        if min(max(t, 0), 6)!=6:
            print('t kicking into effect')
        util.loc[head, target] = 6 * dailyps.sum()
util.index.name = 'start date'
ax = util.plot(ylabel='sliding-window utility scores', grid=True, figsize=(12, 7))
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
util

In [None]:
# sanity
pick_date = np.random.choice(window)
dailyps = dailyp.loc[ pick_date  :pick_date+window, 'resp_3' ]
auto = util.loc[pick_date, 'resp_3']
manual = 6 * dailyps.sum()
np.testing.assert_allclose(auto, manual)
manual, auto

In [None]:
print('toc', datetime.now(timezone('Canada/Pacific')).isoformat(timespec='minutes') )