<h1>Exploratory Metric Analysis (EMA)

What you will find in this kernel:
    - an exploration of the metric of the competition
    - an attempt to answer the question: given that I have perfect predictions, how do I maximize my scoring?
        
We go thourgh some models
- I - Perfect binary classifier
- II - (Confident) Perfect binary classifier
- III - (Artificial) Perfect binary classifier
- IV - Variance minimizer

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [None]:
from kaggle.competitions import twosigmanews
env = twosigmanews.make_env()
(market_train_df, news_train_df) = env.get_training_data()
apple = market_train_df[market_train_df.assetCode == 'AAPL.O']

**[Metric Description]**
In this competition, you must predict a signed confidence value, ŷ ti∈[−1,1] , which is multiplied by the market-adjusted return of a given assetCode over a ten day window. If you expect a stock to have a large positive return--compared to the broad market--over the next ten days, you might assign it a large, positive confidenceValue (near 1.0). If you expect a stock to have a negative return, you might assign it a large, negative confidenceValue (near -1.0). If unsure, you might assign it a value near zero.

For each day in the evaluation time period, we calculate:
$$xt=∑iŷ tirtiuti$$

where rti is the 10-day market-adjusted leading return for day t for instrument i, and uti is a 0/1 universe variable (see the data description for details) that controls whether a particular asset is included in scoring on a particular day.

Your submission score is then calculated as the mean divided by the standard deviation of your daily xt values:
score=x¯tσ(xt).
If the standard deviation of predictions is 0, the score is defined as 0.

In [None]:
apple.shape

<h2>I - Perfect binary classifier

We can start by examining, what if we had a binary classifier that predict only 1 for positive target value and -1 for negative target value? 
We build the classifier and see what scores does it get.

In [None]:
score_df = pd.DataFrame(apple.returnsOpenNextMktres10)
score_df['label'] = 1
score_df.label.mask(score_df.returnsOpenNextMktres10 < 0, -1.0, inplace=True)

Here is our perfect binary classifier

In [None]:
score_df.head()

In [None]:
def compute_score(_score_df):
    """Args: score_df : pd.DataFrame([returnsOpenNextMktres10, label])"""
    _x_t = _score_df.iloc[:,0] * _score_df['label']
    return np.mean(_x_t) / np.std(_x_t, ddof=1)

In [None]:
compute_score(score_df)

A perfect binary classifier will get a score of only 1.0185581036583518, why is this? 
**The mean for x_t is high, but the low score is probably due to a high variance. ** Let's examine better.

In [None]:
def visualize(score_df):
    """plot distributions given score_df
    Args: score_df : pd.DataFrame([returnsOpenNextMktres10, label])
    """
    x_t = score_df.iloc[:,0] * score_df.iloc[:,1]
    x_t_mean, x_t_std = np.mean(x_t), np.std(x_t, ddof=1)
    real_mean, real_std = np.mean(score_df.iloc[:,0]), np.std(score_df.iloc[:,0], ddof=1)
    plt.hist(score_df.iloc[:,0].clip(-0.3,0.3), bins='auto', label="returnsOpenNextMktres10")
    plt.xlim([-0.3,0.3])
    plt.plot([real_mean, real_mean],[0,300], label="mean")
    plt.title("real returns distribution")
    plt.legend()
    plt.show()
    plt.hist(x_t.clip(-0.3,0.3), bins='auto', label='x_t')
    plt.xlim([-0.3,0.3])
    plt.plot([x_t_mean, x_t_mean],[0,300], label="mean")
    plt.legend()
    plt.title("returns distribution of predictions")
    plt.show()

In [None]:
visualize(score_df)

In the plot above you see how the binary classifier only has positive returns (since is perfect). The mean is of course positive but the variance look pretty big.
The orange line representing the mean can be simply interpreted as "how profitable is this trading strategy" (what is the mean of the returns).
If our goal was only to maximize this mean we would just need to naively **put a higher value in our confidence**. But what would happen to the competition score in this case? Let's see.

<h2>II - (Confident) Perfect binary classifier</h2>
<p>Since our classifier is perfect, we should be confident right? We try to increase the confidence from 1 to 3. We want to see how an increase in the mean of x_t affect our scoring

In [None]:
confidence = 3
score_df = pd.DataFrame(apple.returnsOpenNextMktres10)
score_df['label'] = 3
score_df.label.mask(score_df.returnsOpenNextMktres10 < 0, -3.0, inplace=True)

In [None]:
compute_score(score_df)

In [None]:
score_df.head()

In [None]:
visualize(score_df)

This time the mean of x_t (returns from binary classifier) is bigger, this mean our 'trading strategy' is way more profitable. 
But sadly our score remains the same. This is of course due to the variance increasing with the mean. So to optimize our score we need to increase the mean and reduce the variance. Let's work on **reducing the variance** now.

<h2>III - (Artificial) Perfect binary classifier</h2>

We can increase our scoring aribitrarly if we already know the mean of all our predictions and our predictions are all perfect, in this way we would decrease the variance:
- predict perfect binary labels [-1, 1]
- compute x_t_mean of predictions
- label = x_t_mean / returnsOpenNextMktres10

In [None]:
score_df = pd.DataFrame(apple.returnsOpenNextMktres10)
score_df['label'] = 1
score_df.label.mask(score_df.returnsOpenNextMktres10 < 0, -1.0, inplace=True)
x_t = score_df.iloc[:,0] * score_df.iloc[:,1]
x_t_mean, x_t_std = np.mean(x_t), np.std(x_t, ddof=1)
score_df['label'] = x_t_mean / score_df.returnsOpenNextMktres10

In [None]:
score_df.head()

In [None]:
compute_score(score_df)

In [None]:
visualize(score_df)

As predicted, the score got aribitrarly large. In the plot is not clear but what happens is that all predictions of our classifier are exactly the mean value, so that the variance got minimized. We just theoretically proved this concept. What can we learn from it?
One question might be: do we really care about the mean? What if we just minimize the variance using this method and we d**on't care about increasing the mean.**

<h2>IV - Variance minimizer</h2>

Now we don't care about the mean of x_t, we only want to minimize the variance. We still have a perfect predictor, how should be calibrate our labels?

In [None]:
score_df = pd.DataFrame(apple.returnsOpenNextMktres10)
mean = 1
score_df['label'] = 1 / score_df.returnsOpenNextMktres10

In [None]:
score_df.head()

In [None]:
compute_score(score_df)

The score is still aribtrarly large, and as you can notice I am using 1 as mean, our label are 1/target_value.
Wow! Did we just break the competition? **If instead of output target_value we output 1 / target_value we get huge improvement in the score?!** Let's take a moment to check this idea

In [None]:
score_df = pd.DataFrame(apple.returnsOpenNextMktres10)

In [None]:
def compare(score_df):
    A,B = score_df.copy(),score_df.copy()
    A['label']=A['predictions']
    B['label']=1/(B['predictions'])
    print("[model A] score with label = target_value -> {}".format(compute_score(A)))
    print("[model B] score with label = 1 / target_value -> {}".format(compute_score(B)))

Here we compare **[model A]** that output label = target_value and **[model B] ** that output label = 1 / target_value, we add some random noise and we check how the two models compare

In [None]:
score_df['predictions'] =score_df.iloc[:,0]

In [None]:
compare(score_df)

Now let's add some noise to our predictions (in real life they are not perfect)

In [None]:
score_df['predictions'] = score_df.iloc[:,0] + np.random.normal(0, 0.0001, len(score_df.iloc[:,0]))
compare(score_df)

In [None]:
score_df['predictions'] = score_df.iloc[:,0] + np.random.normal(0, 0.0005, len(score_df.iloc[:,0]))
compare(score_df)

In [None]:
score_df['predictions'] = score_df.iloc[:,0] + np.random.normal(0, 0.001, len(score_df.iloc[:,0]))
compare(score_df)

In [None]:
score_df['predictions'] = score_df.iloc[:,0] + np.random.normal(0, 0.01, len(score_df.iloc[:,0]))
compare(score_df)

In [None]:
score_df['predictions'] = score_df.iloc[:,0] + np.random.normal(0, 0.1, len(score_df.iloc[:,0]))
compare(score_df)

Now, based on your random seed you will get different result, but you can notice how model B with a bit of random noice to our predictions goes down immediately. **So we DID NOT break the competition (as expected)**