<img src="http://bam/_layouts/15/bam.sp.branding/images/logo-bam.png"/>
# <center>BAM 2017 Intern Competition Spec Sheet</center>

## Overview
The objective of the competition is to come up with a profitable quantitative trading strategy. At its most basic level, this is a classic prediction game. Given a matrix X of prediction variables per stock, predict the next period's return. 

## Instructions
1. You will be granted access to the Interns folder \\\\fountainhead\BAM\Interns
2. Inside this folder you will find two gzipped csv files "train" and "test". 
3. Use the "train" file to experiment with different prediction and portfolio formation ideas.
4. When you have a trading strategy you'd like to submit, use the "test" file to form portfolio scores and submit them for review.


## Data
The "train" file will contain the following columns: <br>
**y**: The forward stock returns. This is what you're trying to predict. <br>
**F00** through **F43**: These are the features you'll use to predict y. <br>
**TIMESTAMP**: A timestamp code giving the time period. <br>
**ID**: A code which maps to a particular stock ticker. <br>

The "test" file will contain the same columns with the exception of **y**.
Your model (i.e. trading strategy) can be fit using the "train" file. Then you'll apply your fitted model to the "test" file and output a predicted return per **ID** (stock) per **TIMESTAMP**. 

The columns **F00** through **F43** contain a variety of fundamental, technical, and sector information factors. Some of the factors are continuous, and some are binary. Some of them will have missing values. Some will be more informative than others.

Trading Strategy Definition
When we say "quantitative trading strategy" what we really mean is just a score per stock per time period. This score should have some predictive power for the returns in the next time period. Positive scores indicate a long position, and negative scores indicate a short position. A score of 0 takes no position. 

The techniques used to come up with this score can vary widely. However all methods should take the variables F00 through F43 as input (though certainly not all variables need or should be used), and output a score.


## Judging Performance
There will two main drivers of performance measurement for your trading strategy. The first will be the sharpe ratio. This ratio gives an indication of the risk adjusted performance of the strategy. The second driver will be turnover. In general, trading strategies with higher turnover tend to have higher sharpe ratios. If no trading costs are taken into account (as is the case here) then a high turnover strategy of questionable merit (because it probably isn't practically implementable) could blow away the competition. Therefore, the final performance criteria will reward high sharpe ratios, but penalize high turnover.  

Strategy performance will be evaulated on the following metric: **(Sharpe Ratio) - .1\*Turnover**.
Your portfolio scores will always be scaled to sum abs to 1 per date before applying the above performance criteria. 

The Sharpe Ratio is the mean of the strategy return divided by its standard deviation scaled to an annualized number. The turnover will be the sum of absolute differences in the portfolio weights period over period.

## General Advice
1. A common way to create a quant signal is to give each stock a score (or weight) which is *relative* to the other stocks in the universe. Creating some sort of z-score per date is one way to do this. You could also create z-scores in a time series manner. That is, each stock is z-scored relative to *its own history*. 
2. You can gauge the performance of your strategy by using the forward return (i.e. that timestamp's associated y_pred) in the **train** dataset to compute performance metrics such as a sharpe ratio. 
3. Combining several signals which have positive but uncorrelated return performance is a good way to boost your sharpe. 
4. The features can contain outliers. You may want to winsorize features or use robust methods for deriving your weights. 

## Submitting your Strategy for Review
The server for uploading your submission will be on: http://10.242.143.217/ (Server not available yet as of 6/29/2017. An email will be sent out with usernames and passwords when available).
You will be required to log in with your team name and password.

Your submission will be split into two portions, a public test set, and a hidden holdout set. The results of the public test set will be available on a public leaderboard (along with number of submissions, which you are not penalized for). The hidden holdout set performance will not be available to you or anyone else. **Your final competition performance will be evaluated on the holdout set performance of your most recent submission.**

Your submission must be in the following format:

**TIMESTAMP, ID, y_pred**<br>
T1122, S0001, .003<br>
T1122, S0002, .081<br>
....., ....., ....<br>
T1570, S1418, -.0301<br>

And all TIMESTAMP and ID values must match the given test data order. If you use Python/Pandas, ensure that "index=False" in pd.to_csv.
The submission csv must have shape: (463691, 3) (463691 rows, 3 columns for TIMESTAMP, ID, y_pred)

## Rules
1. At least 50% of the names in the test universe must have non-zero values on any given period.
2. You are encouraged to collaborate, however, submissions must be your team's own. 
3. The submitted values may be in any range of floating point number. Missing or non-weighted entries should be filled in with 0. The values be rescaled per timestamp to a sum abs value of 1.
4. You will not be penalized for multiple submissions.

In [1]:
# Here is a simple example

import pandas as pd
import numpy as np

# The 'train' dataset is what you'll use to develop and test your strategies
train = pd.read_csv('B:\\Interns\\train.csv.gz') # I mapped \\fountainhead\Bam to my B: drive. You might need to change this. 

# You'll use the 'test' set to produce portfolio weights and submit them for review
test = pd.read_csv('B:\\Interns\\test.csv.gz')

In [2]:
train.head()

Unnamed: 0,TIMESTAMP,ID,y,F00,F01,F02,F03,F04,F05,F06,...,F34,F35,F36,F37,F38,F39,F40,F41,F42,F43
0,T0000,S0000,-0.004353,1.0,0.0,50.0,2.210819,-0.07526,0.618897,0.0,...,44.82035,0.035828,0.0,0.022465,78.98305,-0.097214,0.955017,0.648999,0.2955,0.868845
1,T0000,S0001,0.002971,0.0,1.0,89.0,3.047588,-0.006635,0.484964,0.0,...,52.39853,0.021581,0.0,0.006165,33.33333,-0.279424,0.355129,0.444151,0.3144,0.98849
2,T0000,S0002,0.002614,0.0,0.0,85.0,2.145732,-0.007309,0.516876,0.0,...,82.43592,0.10909,0.0,0.005787,44.95496,0.079846,,0.093355,0.3266,1.233096
3,T0000,S0003,-0.000188,0.0,0.0,90.0,1.573162,-0.166067,0.446393,0.0,...,-33.64397,0.209749,0.0,0.012024,73.79487,0.43085,2.596279,-0.570243,-0.1386,0.65734
4,T0000,S0004,-0.014813,0.0,0.0,78.0,2.738358,0.009562,0.526339,0.0,...,-144.092,0.186767,1.0,0.005475,64.69136,0.107902,33.438054,0.530521,0.2053,0.709597


In [3]:
test.head()

Unnamed: 0,TIMESTAMP,ID,F00,F01,F02,F03,F04,F05,F06,F07,...,F34,F35,F36,F37,F38,F39,F40,F41,F42,F43
0,T1122,S0001,0.0,1.0,88.0,3.413485,-0.037251,0.911037,0.0,72.0,...,32.41207,0.099218,0.0,0.005583,53.65854,0.452007,1.220057,-0.629024,0.3523,0.891381
1,T1122,S0002,0.0,0.0,53.0,1.941026,-0.016867,1.543352,0.0,98.0,...,72.84467,0.145055,0.0,0.004368,55.13889,-0.605037,0.379833,-0.340378,0.3492,0.949277
2,T1122,S0003,0.0,0.0,87.0,2.024666,0.103946,0.783493,0.0,99.0,...,-39.6794,0.230158,0.0,0.006952,47.50219,-0.108841,2.342777,0.779514,0.344,0.907152
3,T1122,S0004,0.0,0.0,95.0,3.527144,0.023134,0.725113,0.0,85.0,...,-12.38947,0.184191,0.0,0.005457,54.01387,-0.739674,8.791357,0.305561,0.344,1.083814
4,T1122,S0005,0.0,0.0,35.0,2.38164,0.035584,2.277727,0.0,65.0,...,100.2469,0.132756,0.0,0.003509,48.49246,0.170616,0.951246,0.392146,0.3544,0.903084


In [19]:
# These functions will be used to compute our strategy (portfolio) weights

def zs(values):
    """
    Compute standard z-score and fill missing values with 0
    """
    return ((values - values.mean())/values.std()).fillna(0)

def get_wgts(data):
    """
    Compute portfolio wgts

    You'll probably want to do some sort of feature to selection to arrive at a subset of predictive features
    In this (totally fabricated) example the strategy only utilizes features F03 and F04
    We z-score each feature by date and then compute a weighted average of the two scores to get our final weights
    WARNING: You may want to winsorize or use a form of robust z-scoring in case of outliers
    """

    # We'll only use 2 features
    data = data.loc[:, ['TIMESTAMP', 'ID', 'F03', 'F04']].copy(deep=True)

    # Z-score both features by date cross sectionally.
    data.loc[:, 'F03_zs'] = data.groupby('TIMESTAMP')['F03'].apply(zs)
    data.loc[:, 'F04_zs'] = data.groupby('TIMESTAMP')['F04'].apply(zs)

    # Form a weighted average of the two z-scores
    data.loc[:, 'y_pred'] = .7*data.F03_zs + .3*data.F04_zs

    # Scale the predicted returns to sum abs to 1 per date
    # We will do this for you regardless even if not explicitly done here
    data.loc[:, 'y_pred'] = data.groupby('TIMESTAMP')['y_pred'].apply(lambda w: w/w.abs().sum())

    return data.loc[:, ['TIMESTAMP', 'ID', 'y_pred']]

In [20]:
# Get portfolio weights for the training data
train_strat = get_wgts(train)
train_strat.head()

Unnamed: 0,TIMESTAMP,ID,y_pred
0,T0000,S0000,-0.001151
1,T0000,S0001,-0.000304
2,T0000,S0002,-0.000399
3,T0000,S0003,-0.002227
4,T0000,S0004,-0.000153


In [21]:
# Join the weights with the 'y' (forward stock returns) to judge performance
perf_frame = pd.merge(train_strat, train.loc[:, ['TIMESTAMP', 'ID', 'y']])
perf_frame.head()

Unnamed: 0,TIMESTAMP,ID,y_pred,y
0,T0000,S0000,-0.001151,-0.004353
1,T0000,S0001,-0.000304,0.002971
2,T0000,S0002,-0.000399,0.002614
3,T0000,S0003,-0.002227,-0.000188
4,T0000,S0004,-0.000153,-0.014813


In [22]:
# Compute the portfolio return per date
r = perf_frame.groupby('TIMESTAMP').apply(lambda df: (df.y_pred*df.y).sum())
r.head()

TIMESTAMP
T0000    0.002688
T0001    0.000594
T0002   -0.000627
T0003   -0.000370
T0004    0.004321
dtype: float64

In [23]:
# With the time series of returns we can compute a sharpe ratio
np.sqrt(252)*r.mean()/r.std()

0.62946052020478471

In [24]:
# Get a dataframe of weights using the same strategy for submission
submit = get_wgts(test)
submit.head()

Unnamed: 0,TIMESTAMP,ID,y_pred
0,T1122,S0001,-0.000816
1,T1122,S0002,-0.000692
2,T1122,S0003,0.001064
3,T1122,S0004,7e-05
4,T1122,S0005,0.000117


In [18]:
submit.shape == (463691, 3)

True

In [15]:
(submit[['TIMESTAMP','ID']] == test[['TIMESTAMP','ID']]).all()

TIMESTAMP    True
ID           True
dtype: bool