# What is the optimal weighting policy to maxmize the score?
By @marketneutral

This kernel is a companion to the discussion [here](https://www.kaggle.com/c/two-sigma-financial-news/discussion/71996). In short, **given a model forecast of the target in this competition, what is the optimal policy to choose $\hat{y}_{ti}$ to maximize the score?** 

The scoring function is defined as

$$x_t = \sum_i \hat{y}_{ti}  r_{ti}  u_{ti}$$
$$\text{score} = \frac{\bar{x}_t}{\sigma(x_t)}$$

Where  $\hat{y}_{ti}$ is the submission per asset per day,  $r_{ti}$ is the (unknown at time time t) forward realization of the 10-day market residualized return, and $u_{ti}$ is an indicator variable, $\{0,1\}$, to indiciate if the return for that asset on that day matters in the score or not.

In the leak-that-wasn't-a-leak, some kernels (like [this one](https://www.kaggle.com/pennacchio/env-var07)) showed that to maxmize the score, **if you know $r_{ti}$ for certainty for all i,t, you set $\hat{y}_{ti}$ proportional to $1/r_{ti}$.** The apparent paradox here is that stocks with **larger** returns get **lower** "portfolio" weights. The confusing thing with this result is that the score here is like the Information Ratio of a theoretical portfolio with weights $\propto \hat{y}_{ti}$ and it seems odd that one would construct a portfolio where, all other things being equal, an asset with a higher return would get a lower weight.

**In making model submissions for this competition therefore, should one choose  $\hat{y}_{ti}$ proportional to the inverse of the model predicted confidence?** 

In this kernel, I argue that the policy of setting the weights proportional to $1/r$ maximizes the *ex-post* Information Ratio, but setting the weight proportional to $1/\hat{r}$, where $\hat{r}$ is your model prediction at time t, is significantly sub-optimal to maximize the *ex-ante* Information Ratio. In other words, **NO**, don't do this!

Why is this?

# Bias, Variance, and Noise

Following the nice write-up [here](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff), we assume a data generating process, $y = f(x) + \epsilon$ where $y$ are the labels we have in training and what we wish to predict. When we model, we propose a model $\hat{f}(x)$ to minimize some evaluation metric on the distance between $f(x)$ and $\hat{f}(x)$. To do this we could minimize the MSE between these two. In out-of-sample prediction, we can decompose the **expected error** as:

$$\mathbb{E}[(y -  \hat{f}(x))^2] = (\text{Bias}[\hat{f}(x)])^2 + \text{Var}[\hat{f}(x)] + \sigma^{2}_{\epsilon}$$


Imagine you have a “perfect” model. What though does "perfect" mean? **Perfect in the ML sense means you have a model with zero bias and zero variance and are left with just error due to irreducible noise.** It does not mean zero prediciton error. **There is always irreducible noise.**

# The Two-Asset Case

In that context, imagine the two asset case. 

The forecast is 0.025 and 0.075 and this is the ground truth but there is irreducible (say Gaussian) noise around each of 0.05 (and *wlog* the errors are not correlated).

# The 1/r strategy
Let's try the 1/r stategy for 20-days.


In [None]:
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
np.random.seed(seed=100)
import warnings
warnings.filterwarnings("ignore")

In [None]:
plt.rcParams['figure.figsize'] = (10, 7)

In [None]:
n_days = 20
r_mean = np.array([0.025, 0.075])
epsilon = np.array([0.05, 0.05])

We have a **perfect model** which knows `r_mean` precisely (and `epsilon`). 

To test the initial policy, we set our weights == `1/r` for each day.

In [None]:
w_1 = np.repeat(1/r_mean[0], n_days)
w_2 = np.repeat(1/r_mean[1], n_days)

print(w_1)
print(w_2)

Now the universe unfolds and the realizations are made.

In [None]:
r_1 = np.random.normal(loc=r_mean[0], scale=epsilon[0], size=n_days)
r_2 = np.random.normal(loc=r_mean[1], scale=epsilon[1], size=n_days)

plt.plot(r_1);
plt.plot(r_2);
plt.title('Single Realization of Generating Process');

The score as defined in the competition:

In [None]:
def score(w1, w2, r1, r2):
    x = w1*r1 + w2*r2
    return np.mean(x)/np.std(x)

**So what is the score of the `1/r` policy?**

In [None]:
score(w_1, w_2, r_1, r_2)

# Alternative: Single Period Optimization

Now lets' try an alternative policy:

Given our perfect model knowledge of `r_mean` and `epsilon`, we will seek to explicitly maximize $\mathbb{E} [ \text{score}]$:
$$
\begin{align}
\max_w \quad& \frac{w^\prime \hat{r}}{ \sqrt{ w^\prime \Sigma w}}\\
\text{subject to} \quad&-1 \leq  w  \leq 1 \\
\end{align}
$$

Where $\Sigma$ is the covariance matrix (in this case, just `np.diag(epsilon*epsilon)` since we assume zero correlation).

In [None]:
# the one-period expected two asset information ratio given weights, predictions, and noise
# I make it negative because we are going to optimize to find the maximum, but scipy will only find the minimum,
#  so we find the minimum of the negative to get the maximum
def information_ratio_2(w, y_hat, epsilon):
    r = w[0]*y_hat[0] + w[1]*y_hat[1]
    s = np.sqrt(w[0]*w[0]*epsilon[0]*epsilon[0] + w[1]*w[1]*epsilon[1]*epsilon[1])
    return -r/s

In [None]:
bounds = ((-1,1), (-1,1))
res = minimize(
    information_ratio_2,
    np.array([1.0, 1.0]),
    args=(r_mean, epsilon),
    bounds=bounds, 
    method='SLSQP'
)
print(res.x)

In [None]:
w_optimal_1 = np.repeat(res.x[0], n_days)
w_optimal_2 = np.repeat(res.x[1], n_days)

print(w_optimal_1)
print(w_optimal_2)

In [None]:
score(w_optimal_1, w_optimal_2, r_1, r_2)

The score of the `1/r` policy is 1.28 and the score of the alternative policy is 1.69. 
This alternative policy beats `1/r`.  This is a contrived example, but the key is that in the optimal strategy, the **weights are directly proportional to the forecasts, not inversely proportional.**

# The Expectation of the Optimal Policy
Since we are dealing with random variables, perhaps I just set the seed to work out this way :-)... So let's see across many draws how things look.

In [None]:
one_over_r = []
alt = []
ex_post = []
n_sims = 10000
    
# set weights for the 1/r policy
w_1 = np.repeat(1/r_mean[0], n_days)
w_2 = np.repeat(1/r_mean[1], n_days)

# set weights for the alternative policy
bounds = ((-1,1), (-1,1))
res = minimize(
    information_ratio_2,
    np.array([1.0, 1.0]),
    args=(r_mean, epsilon),
    bounds=bounds, 
    method='SLSQP'
)
w_optimal_1 = np.repeat(res.x[0], n_days)
w_optimal_2 = np.repeat(res.x[1], n_days)

for i in range(n_sims):
    r_1 = np.random.normal(loc=r_mean[0], scale=epsilon[0], size=n_days)
    r_2 = np.random.normal(loc=r_mean[1], scale=epsilon[1], size=n_days)

    # run the 1/r weights
    trial_score = score(w_1, w_2, r_1, r_2)
    one_over_r.append(trial_score)
    
    # run the one-period optimal weights
    trial_score = score(w_optimal_1, w_optimal_2, r_1, r_2)
    alt.append(trial_score)
    
    # run the "leak scenario"; we know the actual realizations
    w_1_expost = 1/r_1
    w_2_expost = 1/r_2
    trial_score = score(w_1_expost, w_2_expost, r_1, r_2)
    ex_post.append(trial_score)
    

In [None]:
plt.hist(one_over_r, alpha=0.5);
plt.hist(alt, alpha=0.5);
plt.legend(['1/r', 'alternative']);
plt.title('Comparison to the 1/r policy and Optimal Policy for %d Simulations' % n_sims);
plt.xlabel('Score');

So, **don't set your `confidenceValue` proportional to 1/predictions**. It is not optimal for ex-ante predictions (i.e., predictions in reality).

Nowt if you knew *in hindsight the realizations*, then `1/r` is optimal...

In [None]:
np.mean(ex_post)  # :-)