# Engineering Predictive Alpha Factors

This notebook illustrates the following steps:

1. Select the adjusted open, high, low, and close prices as well as the volume for all tickers from the Quandl Wiki data that you downloaded and simplified for the last milestone for the 2007-2016 time period. Looking ahead, we will use 2014-2016 as our "out-of-sample" period to test the performance of a strategy based on a machine learning model selected using data from preceding periods.
2. Compute the dollar volume as the product of closing price and trading volume; then select the stocks with at least eight years of data and the lowest average daily rank for this metric. 
3. Compute daily returns and keep only "inliers" with values between -100% and + 100% as a basic check against data error.
4. Now we"re ready to compute financial features. The Alpha Factory Library listed among the resources below illustrates how to compute a broad range of those using pandas and TA-Lib. We will list a few examples; feel free to explore and evaluate the various TA-Lib indicators.
    - Compute **historical returns** for various time ranges such as 1, 3, 5, 10, 21 trading days, as well as longer periods like 2, 3, 6 and 12 months.
    - Use TA-Lib"s **Bollinger Band** indicator to create features that anticipate **mean-reversion**.
    - Select some indicators from TA-Lib"s **momentum** indicators family such as
        - the Average Directional Movement Index (ADX), 
        - the Moving Average Convergence Divergence (MACD), 
        - the Relative Strength Index (RSI), 
        - the Balance of Power (BOP) indictor, or 
        - the Money Flow Index (MFI).
    - Compute TA-Lib **volume** indicators like On Balance Volume (OBV) or the Chaikin A/D Oscillator (ADOSC)
    - Create volatility metrics such as the Normalized Average True Range (NATR).
    - Compute rolling factor betas using the five Fama-French risk factors for different rolling windows of three and 12 months (see resources below).
    - Compute the outcome variable that we will aim to predict, namely the 1-day forward returns.

## Usage tips

- If you experience resource constraints (suddenly restarting Kernel), increase the memory available for Docker Desktop (> Settings > Advanced). If this not possible or you experienced prolonged execution times, reduce the scope of the exercise. The easiest way to do so is to select fewer stocks or a shorter time period, or both.
- You may want to persist intermediate results so you can recover quickly in case something goes wrong. There"s an example under the first "Persist Results" subsection.

## From the instructions on the Manning site

> We are simplifying things a little bit. In practice, you would want to identify the universe as a rolling average of the dollar volume (for example, on a monthly or quarterly basis, depending on our train/test parameters in the next steps) to avoid including information “from the future” that could introduce a lookahead bias. Feel free to select the universe in this methodologically more robust yet computationally more intensive way.

## Imports & Settings

In [79]:
import warnings

warnings.filterwarnings("ignore")

In [80]:
DATA_STORE = "stock_prices.h5"

In [81]:
if "google.colab" in str(get_ipython()):
    !pip install --upgrade statsmodels talib-binary
    !test -f $DATA_STORE || gdown --id 1-BYZEu6gNlo33v-q3-ezjLv5IX79tzhh
else:
    !test -f $DATA_STORE || cp simplified_quandl_ds.h5 stock_prices.h5

In [82]:
%matplotlib inline

import numpy as np
import pandas as pd
import pandas_datareader.data as web

import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS
import talib

import seaborn as sns

In [83]:
sns.set_style("whitegrid")
idx = pd.IndexSlice
deciles = np.arange(0.1, 1, 0.1).round(1)

## Load Data

In [84]:
with pd.HDFStore(DATA_STORE) as store:
    data = store["us_stocks"]

In [85]:
data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2000-01-03,53.726454,53.856080,45.969377,49.121329,3343600.0
A,2000-01-04,46.481058,46.992738,44.175084,45.369006,3408500.0
A,2000-01-05,45.198445,45.239380,41.828176,41.998737,4119200.0
A,2000-01-06,42.046493,42.298923,39.658651,40.934441,1812900.0
A,2000-01-07,40.293135,44.986951,40.252200,44.345645,2016900.0
...,...,...,...,...,...,...
ZUMZ,2018-03-21,23.800000,24.600000,23.605800,23.950000,354092.0
ZUMZ,2018-03-22,23.900000,24.350000,23.300000,23.350000,269607.0
ZUMZ,2018-03-23,23.550000,24.200000,23.450000,23.550000,301584.0
ZUMZ,2018-03-26,23.750000,24.800000,23.700000,24.650000,375320.0


In [86]:
data = data.loc[idx[:, "2007-01-01":"2016-12-31"], :]
data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0
...,...,...,...,...,...,...
ZUMZ,2016-12-23,20.950000,21.500000,20.950000,21.350000,532292.0
ZUMZ,2016-12-27,21.200000,21.700000,21.200000,21.450000,308004.0
ZUMZ,2016-12-28,21.550000,21.749900,21.325000,21.450000,165827.0
ZUMZ,2016-12-29,21.550000,22.050000,21.400000,21.900000,322108.0


## Select 500 most-traded stocks prior to 2017

Compute the dollar volume as the product of the adjusted close price and the adjusted volume:

In [87]:
# Compute the dollar volume as the product of closing price and trading volume;
data["dollar_volume"] = data["close"] * data["volume"]
data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07
...,...,...,...,...,...,...,...
ZUMZ,2016-12-23,20.950000,21.500000,20.950000,21.350000,532292.0,1.136443e+07
ZUMZ,2016-12-27,21.200000,21.700000,21.200000,21.450000,308004.0,6.606686e+06
ZUMZ,2016-12-28,21.550000,21.749900,21.325000,21.450000,165827.0,3.556989e+06
ZUMZ,2016-12-29,21.550000,22.050000,21.400000,21.900000,322108.0,7.054165e+06


In [88]:
# then select the stocks with at least eight years of data
# average of 253 trading days per year
num_dates_threshold = 253 * 8
data = data[
    data.groupby(level="ticker")["dollar_volume"].transform("count")
    >= num_dates_threshold
]
data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07
...,...,...,...,...,...,...,...
ZUMZ,2016-12-23,20.950000,21.500000,20.950000,21.350000,532292.0,1.136443e+07
ZUMZ,2016-12-27,21.200000,21.700000,21.200000,21.450000,308004.0,6.606686e+06
ZUMZ,2016-12-28,21.550000,21.749900,21.325000,21.450000,165827.0,3.556989e+06
ZUMZ,2016-12-29,21.550000,22.050000,21.400000,21.900000,322108.0,7.054165e+06


In [89]:
# and the lowest average daily rank for this metric.
data = data.iloc[
    (data.groupby(level="ticker")["dollar_volume"].transform("mean")).argsort()
]
data = data[data.groupby(level="ticker").ngroup() < 500]
data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BBGI,2008-07-02,3.536715,3.604403,3.536715,3.536715,1100.0,3.890386e+03
BBGI,2010-09-24,4.452031,4.980385,4.357620,4.980385,5700.0,2.838819e+04
BBGI,2010-09-23,4.581954,4.618332,4.339431,4.356754,5100.0,2.221944e+04
BBGI,2010-09-22,4.382738,4.815815,4.382738,4.607938,6600.0,3.041239e+04
BBGI,2010-09-21,4.495338,4.629592,4.296123,4.408723,17200.0,7.583004e+04
...,...,...,...,...,...,...,...
BRK_A,2010-05-06,114543.000000,115726.000000,108565.000000,113500.000000,139100.0,1.578785e+10
BRK_A,2010-05-07,111550.000000,113661.000000,109187.000000,111500.000000,115500.0,1.287825e+10
BRK_A,2010-05-10,116290.000000,117445.000000,115050.000000,117290.000000,92100.0,1.080241e+10
BRK_A,2010-04-29,116900.000000,117128.000000,116395.000000,116801.000000,84500.0,9.869684e+09


In [90]:
data = data.sort_index()
data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07
...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07


## Compute returns

In [91]:
data["1D_return"] = data.groupby(level="ticker")["close"].pct_change()

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178
...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955


## Remove outliers based on daily returns

In [92]:
# Compute daily returns and keep only "inliers" with values between -100% and + 100% as a basic check against data error.

# Replace value with NaN if outlier
data["1D_return"] = data["1D_return"].mask(data["1D_return"].abs() >= 1)
# data = data.dropna() # also means we lose first trading day of year

# Alternative approach where we clip outliers at abs 100%
# data["1D_return"] = data["1D_return"].clip(-1, 1)
data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178
...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955


## Historical Returns

In [93]:
# 1, 3, 5, 10, and 21 trading days, as well as longer periods like 2, 3, 6, and 12 months.
periods = [3, 5, 10, 21, 42, 63, 126, 252]
for period in periods:
    col = f"{period}D_return"
    data[col] = data.groupby(level="ticker")["close"].pct_change(periods=period)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,21D_return,42D_return,63D_return,126D_return,252D_return
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,-0.029738,0.132353,0.059439,0.412326,1.228465
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,-0.016802,0.178877,0.056346,0.273747,1.456964
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,-0.030902,0.146195,-0.005717,0.202388,1.324765
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,-0.075593,0.111045,-0.002665,0.162834,1.408161


## Bollinger Bands

In [94]:
# This and following features taken from https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/24_alpha_factor_library/02_common_alpha_factors.ipynb


def compute_bb_indicators(close, timeperiod=20, matype=0):
    high, mid, low = talib.BBANDS(close, timeperiod=timeperiod, matype=matype)
    bb_up = high / close - 1
    bb_low = low / close - 1
    squeeze = (high - low) / close

    return bb_up, bb_low, squeeze

In [95]:
data["bb_up"], data["bb_down"], data["bb_squeeze"] = compute_bb_indicators(
    data["close"]
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,21D_return,42D_return,63D_return,126D_return,252D_return,bb_up,bb_down,bb_squeeze
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,-0.029738,0.132353,0.059439,0.412326,1.228465,0.153723,-0.063385,0.217108
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,-0.016802,0.178877,0.056346,0.273747,1.456964,0.150261,-0.068646,0.218907
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,-0.030902,0.146195,-0.005717,0.202388,1.324765,0.160279,-0.064879,0.225158
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,-0.075593,0.111045,-0.002665,0.162834,1.408161,0.184538,-0.052095,0.236633


## Momentum Indicators

TA-Lib offers the following choices - feel free to experiment with as many as you like (but you don"t have to..):

| Function | Name                                              |
|:---------|:--------------------------------------------------|
| PLUS_DM  | Plus Directional Movement                         |
| MINUS_DM | Minus Directional Movement                        |
| PLUS_DI  | Plus Directional Indicator                        |
| MINUS_DI | Minus Directional Indicator                       |
| DX       | Directional Movement Index                        |
| ADX      | Average Directional Movement Index                |
| ADXR     | Average Directional Movement Index Rating         |
| APO      | Absolute Price Oscillator                         |
| PPO      | Percentage Price Oscillator                       |
| AROON    | Aroon                                             |
| AROONOSC | Aroon Oscillator                                  |
| BOP      | Balance Of Power                                  |
| CCI      | Commodity Channel Index                           |
| CMO      | Chande Momentum Oscillator                        |
| MACD     | Moving Average Convergence/Divergence             |
| MACDEXT  | MACD with controllable MA type                    |
| MACDFIX  | Moving Average Convergence/Divergence Fix 12/26   |
| MFI      | Money Flow Index                                  |
| MOM      | Momentum                                          |
| RSI      | Relative Strength Index                           |
| STOCH    | Stochastic                                        |
| STOCHF   | Stochastic Fast                                   |
| STOCHRSI | Stochastic Relative Strength Index                |
| TRIX     | 1-day Rate-Of-Change (ROC) of a Triple Smooth EMA |
| ULTOSC   | Ultimate Oscillator                               |
| WILLR    | Williams" %R                                      |

### Average Directional Movement Index (ADX)

The ADX combines of two other indicators, namely the positive and directional indicators (PLUS_DI and MINUS_DI), which in turn build on the positive and directional movement (PLUS_DM and MINUS_DM). For additional details see [Wikipedia](https://en.wikipedia.org/wiki/Average_directional_movement_index) and [Investopedia](https://www.investopedia.com/articles/trading/07/adx-trend-indicator.asp).

In [96]:
by_ticker = data.groupby("ticker", group_keys=False)

In [97]:
def compute_adx(x, timeperiod=14):
    return talib.ADX(x.high, x.low, x.close, timeperiod=timeperiod)


data["adx"] = by_ticker.apply(compute_adx)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,21D_return,42D_return,63D_return,126D_return,252D_return,bb_up,bb_down,bb_squeeze,adx
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,-0.029738,0.132353,0.059439,0.412326,1.228465,0.153723,-0.063385,0.217108,25.443503
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,-0.016802,0.178877,0.056346,0.273747,1.456964,0.150261,-0.068646,0.218907,24.287900
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,-0.030902,0.146195,-0.005717,0.202388,1.324765,0.160279,-0.064879,0.225158,22.874841
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,-0.075593,0.111045,-0.002665,0.162834,1.408161,0.184538,-0.052095,0.236633,21.611590


### Absolute Price Oscillator (APO)

The absolute Price Oscillator (APO) is computed as the difference between two exponential moving averages (EMA) of price series, expressed as an absolute value. The EMA windows usually contain 26 and 12 data points, respectively.

In [98]:
data["apo"] = by_ticker.apply(
    lambda x: talib.APO(x.close, fastperiod=12, slowperiod=26, matype=0)
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,21D_return,42D_return,63D_return,126D_return,252D_return,bb_up,bb_down,bb_squeeze,adx,apo
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,-0.029738,0.132353,0.059439,0.412326,1.228465,0.153723,-0.063385,0.217108,25.443503,-0.264936
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,-0.016802,0.178877,0.056346,0.273747,1.456964,0.150261,-0.068646,0.218907,24.287900,-0.517692
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,-0.030902,0.146195,-0.005717,0.202388,1.324765,0.160279,-0.064879,0.225158,22.874841,-0.725064
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,-0.075593,0.111045,-0.002665,0.162834,1.408161,0.184538,-0.052095,0.236633,21.611590,-0.810064


### Percentage Price Oscillator (PPO)

The Percentage Price Oscillator (APO) is computed as the difference between two exponential moving averages (EMA) of price series, expressed as a percentage value and thus comparable across assets. The EMA windows usually contain 26 and 12 data points, respectively. 

In [99]:
data["ppo"] = by_ticker.apply(
    lambda x: talib.PPO(x.close, fastperiod=12, slowperiod=26, matype=0)
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,42D_return,63D_return,126D_return,252D_return,bb_up,bb_down,bb_squeeze,adx,apo,ppo
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,0.132353,0.059439,0.412326,1.228465,0.153723,-0.063385,0.217108,25.443503,-0.264936,-1.331954
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,0.178877,0.056346,0.273747,1.456964,0.150261,-0.068646,0.218907,24.287900,-0.517692,-2.596551
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,0.146195,-0.005717,0.202388,1.324765,0.160279,-0.064879,0.225158,22.874841,-0.725064,-3.631816
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,0.111045,-0.002665,0.162834,1.408161,0.184538,-0.052095,0.236633,21.611590,-0.810064,-4.060628


### Aroon Oscillator

#### Aroon Up/Down Indicator

The indicator measures the time between highs and the time between lows over a time period. It computes an AROON_UP and an AROON_DWN indicator as follows:

$$
\begin{align*}
\text{AROON_UP}&=\frac{T-\text{Periods since T period High}}{T}\times 100\\
\text{AROON_DWN}&=\frac{T-\text{Periods since T period Low}}{T}\times 100
\end{align*}
$$

In [100]:
def aroon(x):
    up, down = talib.AROON(high=x.high, low=x.low, timeperiod=14)
    return pd.DataFrame({"aroon_up": up, "aroon_down": down}, index=x.index)


data = data.join(by_ticker.apply(aroon))

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,126D_return,252D_return,bb_up,bb_down,bb_squeeze,adx,apo,ppo,aroon_up,aroon_down
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,0.412326,1.228465,0.153723,-0.063385,0.217108,25.443503,-0.264936,-1.331954,71.428571,21.428571
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,0.273747,1.456964,0.150261,-0.068646,0.218907,24.287900,-0.517692,-2.596551,64.285714,14.285714
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,0.202388,1.324765,0.160279,-0.064879,0.225158,22.874841,-0.725064,-3.631816,57.142857,7.142857
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,0.162834,1.408161,0.184538,-0.052095,0.236633,21.611590,-0.810064,-4.060628,100.000000,0.000000


#### Aroon Oscillator

The Aroon Oscillator is simply the difference between the Aroon Up and Aroon Down indicators.

In [102]:
data["aroon_osc"] = by_ticker.apply(
    lambda x: talib.AROONOSC(high=x.high, low=x.low, timeperiod=14)
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,252D_return,bb_up,bb_down,bb_squeeze,adx,apo,ppo,aroon_up,aroon_down,aroon_osc
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,1.228465,0.153723,-0.063385,0.217108,25.443503,-0.264936,-1.331954,71.428571,21.428571,-50.0
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,1.456964,0.150261,-0.068646,0.218907,24.287900,-0.517692,-2.596551,64.285714,14.285714,-50.0
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,1.324765,0.160279,-0.064879,0.225158,22.874841,-0.725064,-3.631816,57.142857,7.142857,-50.0
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,1.408161,0.184538,-0.052095,0.236633,21.611590,-0.810064,-4.060628,100.000000,0.000000,-100.0


### Balance Of Power (BOP)

The Balance of Power (BOP) intends to measure the strength of buyers relative to sellers in the market by assessing the ability of each side to drive prices. It is computer as the difference between the close and the open price, divided by the difference between the high and the low price: 

$$
\text{BOP}_t= \frac{P_t^\text{Close}-P_t^\text{Open}}{P_t^\text{High}-P_t^\text{Low}}
$$

In [103]:
data["bop"] = by_ticker.apply(lambda x: talib.BOP(x.open, x.high, x.low, x.close))

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,bb_up,bb_down,bb_squeeze,adx,apo,ppo,aroon_up,aroon_down,aroon_osc,bop
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,-0.482517
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,0.096491
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,-0.525000
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,-0.025000
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,-0.101449
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,0.153723,-0.063385,0.217108,25.443503,-0.264936,-1.331954,71.428571,21.428571,-50.0,0.372539
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,0.150261,-0.068646,0.218907,24.287900,-0.517692,-2.596551,64.285714,14.285714,-50.0,0.026316
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,0.160279,-0.064879,0.225158,22.874841,-0.725064,-3.631816,57.142857,7.142857,-50.0,-0.336449
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,0.184538,-0.052095,0.236633,21.611590,-0.810064,-4.060628,100.000000,0.000000,-100.0,-0.575758


### Commodity Channel Index (CCI)

The Commodity Channel Index (CCI) measures the difference between the current *typical* price, computed as the average of current low, high and close price and the historical average price. A positive (negative) CCI indicates that price is above (below) the historic average. When CCI is below zero, the price is below the historic average. It is computed as:

$$
\begin{align*}
\bar{P_t}&=\frac{P_t^H+P_t^L+P_t^C}{3}\\
\text{CCI}_t & =\frac{\bar{P_t} - \text{SMA}(T)_t}{0.15\sum_{t=i}^T |\bar{P_t}-\text{SMA}(N)_t|/T}
\end{align*}
$$

In [104]:
data["cci"] = by_ticker.apply(
    lambda x: talib.CCI(x.high, x.low, x.close, timeperiod=14)
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,bb_down,bb_squeeze,adx,apo,ppo,aroon_up,aroon_down,aroon_osc,bop,cci
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,-0.482517,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,0.096491,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,-0.525000,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,-0.025000,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,-0.101449,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,-0.063385,0.217108,25.443503,-0.264936,-1.331954,71.428571,21.428571,-50.0,0.372539,-53.998009
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,-0.068646,0.218907,24.287900,-0.517692,-2.596551,64.285714,14.285714,-50.0,0.026316,-35.918045
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-0.064879,0.225158,22.874841,-0.725064,-3.631816,57.142857,7.142857,-50.0,-0.336449,-41.804981
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-0.052095,0.236633,21.611590,-0.810064,-4.060628,100.000000,0.000000,-100.0,-0.575758,-86.999094


### Moving Average Convergence/Divergence (MACD)

Moving Average Convergence Divergence (MACD) is a trend-following (lagging) momentum indicator that shows the relationship between two moving averages of a security’s price. It is calculated by subtracting the 26-period Exponential Moving Average (EMA) from the 12-period EMA.

The TA-Lib implementation returns the MACD value and its signal line, which is the 9-day EMA of the MACD. In addition, the MACD-Histogram measures the distance between the indicator and its signal line.

In [105]:
def compute_macd(close, fastperiod=12, slowperiod=26, signalperiod=9):
    macd, macdsignal, macdhist = talib.MACD(
        close, fastperiod=fastperiod, slowperiod=slowperiod, signalperiod=signalperiod
    )
    return pd.DataFrame(
        {"macd": macd, "macd_signal": macdsignal, "macd_hist": macdhist},
        index=close.index,
    )


data = data.join(by_ticker.close.apply(compute_macd))

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,apo,ppo,aroon_up,aroon_down,aroon_osc,bop,cci,macd,macd_signal,macd_hist
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,-0.482517,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,0.096491,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,-0.525000,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,-0.025000,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,-0.101449,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,-0.264936,-1.331954,71.428571,21.428571,-50.0,0.372539,-53.998009,-0.020747,0.212926,-0.233673
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,-0.517692,-2.596551,64.285714,14.285714,-50.0,0.026316,-35.918045,-0.032992,0.163742,-0.196735
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-0.725064,-3.631816,57.142857,7.142857,-50.0,-0.336449,-41.804981,-0.056570,0.119680,-0.176250
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-0.810064,-4.060628,100.000000,0.000000,-100.0,-0.575758,-86.999094,-0.107901,0.074164,-0.182065


### Chande Momentum Oscillator (CMO)

The Chande Momentum Oscillator (CMO) intends to measure momentum on both up and down days. It is calculated as the difference between the sum of gains and losses over at time period T, divided by the sum of all price movement over the same period. It oscillates between +100 and -100.

In [106]:
# Not included due to high correlation with PPO.
# data['cmo'] = by_ticker.apply(lambda x: talib.CMO(x.close, timeperiod=14))

# data

### Money Flow Index

The Money Flow Index (MFI) incorporates price and volume information to identify overbought or oversold conditions.  The indicator is typically calculated using 14 periods of data. An MFI reading above 80 is considered overbought and an MFI reading below 20 is considered oversold.

In [107]:
data["mfi"] = by_ticker.apply(
    lambda x: talib.MFI(x.high, x.low, x.close, x.volume, timeperiod=14)
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,ppo,aroon_up,aroon_down,aroon_osc,bop,cci,macd,macd_signal,macd_hist,mfi
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,-0.482517,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,0.096491,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,-0.525000,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,-0.025000,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,-0.101449,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,-1.331954,71.428571,21.428571,-50.0,0.372539,-53.998009,-0.020747,0.212926,-0.233673,30.716027
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,-2.596551,64.285714,14.285714,-50.0,0.026316,-35.918045,-0.032992,0.163742,-0.196735,28.909778
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-3.631816,57.142857,7.142857,-50.0,-0.336449,-41.804981,-0.056570,0.119680,-0.176250,21.132505
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-4.060628,100.000000,0.000000,-100.0,-0.575758,-86.999094,-0.107901,0.074164,-0.182065,22.434002


### Relative Strength Index

RSI compares the magnitude of recent price changes across stocks to identify stocks as overbought or oversold. A high RSI (usually above 70) indicates overbought and a low RSI (typically below 30) indicates oversold. It first computes the average price change for a given number (often 14) of prior trading days with rising and falling prices, respectively as $\text{up}_t$ and $\text{down}_t$. Then, the RSI is computed as:
$$
\text{RSI}_t=100-\frac{100}{1+\frac{\text{up}_t}{\text{down}_t}}
$$



In [108]:
data["RSI"] = by_ticker.apply(lambda x: talib.RSI(x.close, timeperiod=14))

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,aroon_up,aroon_down,aroon_osc,bop,cci,macd,macd_signal,macd_hist,mfi,RSI
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,-0.482517,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,0.096491,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,-0.525000,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,-0.025000,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,-0.101449,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,71.428571,21.428571,-50.0,0.372539,-53.998009,-0.020747,0.212926,-0.233673,30.716027,47.561693
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,64.285714,14.285714,-50.0,0.026316,-35.918045,-0.032992,0.163742,-0.196735,28.909778,48.191056
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,57.142857,7.142857,-50.0,-0.336449,-41.804981,-0.056570,0.119680,-0.176250,21.132505,46.392172
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,100.000000,0.000000,-100.0,-0.575758,-86.999094,-0.107901,0.074164,-0.182065,22.434002,42.413801


#### Stochastic RSI (STOCHRSI)

The Stochastic Relative Strength Index (STOCHRSI) is based on the RSI just described and intends to identify crossovers as well as overbought and oversold conditions. It compares the distance of the current RSI to the lowest RSI over a given time period T to the maximum range of values the RSI has assumed for this period. It is computed as follows:

$$
\text{STOCHRSI}_t= \frac{\text{RSI}_t-\text{RSI}_t^L(T)}{\text{RSI}_t^H(T)-\text{RSI}_t^L(T)}
$$

The TA-Lib implementation offers more flexibility than the original "Unsmoothed stochastic RSI" version by Chande and Kroll (1993). To calculate the original indicator, keep the `timeperiod` and `fastk_period` equal. 

The return value `fastk` is the unsmoothed RSI. The `fastd_period` is used to compute a smoothed STOCHRSI, which  is returned as `fastd`. If you do not care about STOCHRSI smoothing, just set `fastd_period` to 1 and ignore the `fastd` output.

Reference: "Stochastic RSI and Dynamic Momentum Index" by Tushar Chande and Stanley Kroll Stock & Commodities V.11:5 (189-199)


In [109]:
data["stochrsi"] = by_ticker.apply(
    lambda x: talib.STOCHRSI(
        x.close, timeperiod=14, fastk_period=14, fastd_period=3, fastd_matype=0
    )[0]
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,aroon_down,aroon_osc,bop,cci,macd,macd_signal,macd_hist,mfi,RSI,stochrsi
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,-0.482517,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,0.096491,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,-0.525000,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,-0.025000,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,-0.101449,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,21.428571,-50.0,0.372539,-53.998009,-0.020747,0.212926,-0.233673,30.716027,47.561693,16.407277
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,14.285714,-50.0,0.026316,-35.918045,-0.032992,0.163742,-0.196735,28.909778,48.191056,18.471979
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,7.142857,-50.0,-0.336449,-41.804981,-0.056570,0.119680,-0.176250,21.132505,46.392172,14.164494
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,0.000000,-100.0,-0.575758,-86.999094,-0.107901,0.074164,-0.182065,22.434002,42.413801,0.000000


### Stochastic (STOCH)

A stochastic oscillator is a momentum indicator comparing a particular closing price of a security to a range of its prices over a certain period of time. Stochastic oscillators are based on the idea that closing prices should confirm the trend.

For stochastic (STOCH), there are four different lines: `FASTK`, `FASTD`, `SLOWK` and `SLOWD`. The `D` is the signal line usually drawn over its corresponding `K` function.

$$
\begin{align*}
& K^\text{Fast}(T_K) & = &\frac{P_t-P_{T_K}^L}{P_{T_K}^H-P_{T_K}^L}* 100 \\
& D^\text{Fast}(T_{\text{FastD}}) & = & \text{MA}(T_{\text{FastD}})[K^\text{Fast}]\\
& K^\text{Slow}(T_{\text{SlowK}}) & = &\text{MA}(T_{\text{SlowK}})[K^\text{Fast}]\\
& D^\text{Slow}(T_{\text{SlowD}}) & = &\text{MA}(T_{\text{SlowD}})[K^\text{Slow}]
\end{align*}
$$
  

The $P_{T_K}^L$, $P_{T_K}^H$, and $P_{T_K}^L$ are the extreme values among the last $T_K$ period.
 $K^\text{Slow}$ and $D^\text{Fast}$ are equivalent when using the same period. 

In [110]:
def compute_stoch(
    x, fastk_period=14, slowk_period=3, slowk_matype=0, slowd_period=3, slowd_matype=0
):
    slowk, slowd = talib.STOCH(
        x.high,
        x.low,
        x.close,
        fastk_period=fastk_period,
        slowk_period=slowk_period,
        slowk_matype=slowk_matype,
        slowd_period=slowd_period,
        slowd_matype=slowd_matype,
    )
    return slowd / slowk - 1


data["stoch"] = by_ticker.apply(compute_stoch)
# Could use a mask here instead?
data.loc[data.stoch.abs() > 1e5, "stock"] = np.nan

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,bop,cci,macd,macd_signal,macd_hist,mfi,RSI,stochrsi,stoch,stock
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,-0.482517,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,0.096491,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,-0.525000,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,-0.025000,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,-0.101449,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,0.372539,-53.998009,-0.020747,0.212926,-0.233673,30.716027,47.561693,16.407277,-0.214559,
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,0.026316,-35.918045,-0.032992,0.163742,-0.196735,28.909778,48.191056,18.471979,-0.094444,
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-0.336449,-41.804981,-0.056570,0.119680,-0.176250,21.132505,46.392172,14.164494,-0.021858,
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-0.575758,-86.999094,-0.107901,0.074164,-0.182065,22.434002,42.413801,0.000000,0.224469,


### Ultimate Oscillator (ULTOSC)

The Ultimate Oscillator (ULTOSC), developed by Larry Williams, measures the average difference of the current close to the previous lowest price over three time frames (default: 7, 14, and 28) to avoid overreacting to short-term price changes and incorporat short, medium, and long-term market trends. It first computes the buying pressure, $\text{BP}_t$, then sums it over the three periods $T_1, T_2, T_3$, normalized by the True Range ($\text{TR}_t$.
$$
\begin{align*}
\text{BP}_t & = P_t^\text{Close}-\min(P_{t-1}^\text{Close}, P_t^\text{Low})\\ 
\text{TR}_t & = \max(P_{t-1}^\text{Close}, P_t^\text{High})-\min(P_{t-1}^\text{Close}, P_t^\text{Low})
\end{align*}
$$

ULTOSC is then computed as a weighted average over the three periods as follows:
$$
\begin{align*}
\text{Avg}_t(T) & = \frac{\sum_{i=0}^{T-1} \text{BP}_{t-i}}{\sum_{i=0}^{T-1} \text{TR}_{t-i}}\\
\text{ULTOSC}_t & = 100*\frac{4\text{Avg}_t(7) + 2\text{Avg}_t(14) + \text{Avg}_t(28)}{4+2+1}
\end{align*}
$$

In [111]:
def compute_ultosc(x, timeperiod1=7, timeperiod2=14, timeperiod3=28):
    return talib.ULTOSC(
        x.high,
        x.low,
        x.close,
        timeperiod1=timeperiod1,
        timeperiod2=timeperiod2,
        timeperiod3=timeperiod3,
    )


data["ultosc"] = by_ticker.apply(compute_ultosc)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,cci,macd,macd_signal,macd_hist,mfi,RSI,stochrsi,stoch,stock,ultosc
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,-53.998009,-0.020747,0.212926,-0.233673,30.716027,47.561693,16.407277,-0.214559,,48.667470
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,-35.918045,-0.032992,0.163742,-0.196735,28.909778,48.191056,18.471979,-0.094444,,42.067553
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-41.804981,-0.056570,0.119680,-0.176250,21.132505,46.392172,14.164494,-0.021858,,42.204373
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-86.999094,-0.107901,0.074164,-0.182065,22.434002,42.413801,0.000000,0.224469,,43.445064


### Williams' %R (WILLR)

Williams %R, also known as the Williams Percent Range, is a momentum indicator that moves between 0 and -100 and measures overbought and oversold levels to identify entry and exit points. It is similar to the Stochastic oscillator and compares the current closing price $P_t^\text{Close}$ to the range of highest ($P_T^\text{High}$) and lowest ($P_T^\text{Low}$) prices over the last T periods (typically 14). The indicators is computed as:

$$
\text{WILLR}_t = \frac{P_T^\text{High}-P_t^\text{Close}}{P_T^\text{High}-P_T^\text{Low}}
$$


In [112]:
data["willr"] = by_ticker.apply(
    lambda x: talib.WILLR(x.high, x.low, x.close, timeperiod=14)
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,macd,macd_signal,macd_hist,mfi,RSI,stochrsi,stoch,stock,ultosc,willr
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,-0.020747,0.212926,-0.233673,30.716027,47.561693,16.407277,-0.214559,,48.667470,-83.064516
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,-0.032992,0.163742,-0.196735,28.909778,48.191056,18.471979,-0.094444,,42.067553,-81.451613
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-0.056570,0.119680,-0.176250,21.132505,46.392172,14.164494,-0.021858,,42.204373,-86.290323
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-0.107901,0.074164,-0.182065,22.434002,42.413801,0.000000,0.224469,,43.445064,-95.757576


## Volume Indicators

| Function | Name                   |
|:---------|:-----------------------|
| AD       | Chaikin A/D Line       |
| ADOSC    | Chaikin A/D Oscillator |
| OBV      | On Balance Volume      |

### Chaikin A/D Line

The Chaikin Advance/Decline or Accumulation/Distribution Line (AD) is a volume-based indicator designed to measure the cumulative flow of money into and out of an asset. The indicator assumes that the degree of buying or selling pressure can be determined by the location of the close, relative to the high and low for the period. There is buying (sellng) pressure when a stock closes in the upper (lower) half of a period"s range. The intention is to signal a change in direction when the indicator diverges from the security price.

The Accumulation/Distribution Line is a running total of each period"s Money Flow Volume. It is calculated as follows:

1. The Money Flow Multiplier (MFI) is the relationship of the close to the high-low range:
2. The MFI is multiplied by the period"s volume $V_t$ to come up with a Money Flow Volume (MFV). 
3. A running total of the Money Flow Volume forms the Accumulation Distribution Line:
$$
\begin{align*}
&\text{MFI}_t&=\frac{P_t^\text{Close}-P_t^\text{Low}}{P_t^\text{High}-P_t^\text{Low}}\\
&\text{MFV}_t&=\text{MFI}_t \times V_t\\
&\text{AD}_t&=\text{AD}_{t-1}+\text{MFV}_t
\end{align*}
$$

In [113]:
data["ad"] = by_ticker.apply(
    lambda x: talib.AD(x.high, x.low, x.close, x.volume) / x.volume.mean()
)
# Divide by average volume to normalize across assets
data.ad.replace((np.inf, -np.inf), np.nan).dropna().describe()

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,macd_signal,macd_hist,mfi,RSI,stochrsi,stoch,stock,ultosc,willr,ad
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,-0.550906
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,-0.096048
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,-0.580407
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,-0.349850
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,-0.303581
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,0.212926,-0.233673,30.716027,47.561693,16.407277,-0.214559,,48.667470,-83.064516,7.079870
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,0.163742,-0.196735,28.909778,48.191056,18.471979,-0.094444,,42.067553,-81.451613,6.972649
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,0.119680,-0.176250,21.132505,46.392172,14.164494,-0.021858,,42.204373,-86.290323,6.597107
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,0.074164,-0.182065,22.434002,42.413801,0.000000,0.224469,,43.445064,-95.757576,6.299691


### Chaikin A/D Oscillator (ADOSC)

The Chaikin A/D Oscillator (ADOSC) is the Moving Average Convergence Divergence indicator (MACD) applied to the Chaikin A/D Line. The Chaikin Oscillator intends to predict changes in the Accumulation/Distribution Line.

It is computed as the difference between the 3-day exponential moving average and the 10-day exponential moving average of the Accumulation/Distribution Line.

In [114]:
data["adosc"] = by_ticker.apply(
    lambda x: talib.ADOSC(x.high, x.low, x.close, x.volume, fastperiod=3, slowperiod=10)
    / x.rolling(14).volume.mean()
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,macd_hist,mfi,RSI,stochrsi,stoch,stock,ultosc,willr,ad,adosc
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,-0.550906,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,-0.096048,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,-0.580407,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,-0.349850,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,-0.303581,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,-0.233673,30.716027,47.561693,16.407277,-0.214559,,48.667470,-83.064516,7.079870,-0.197373
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,-0.196735,28.909778,48.191056,18.471979,-0.094444,,42.067553,-81.451613,6.972649,-0.136067
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-0.176250,21.132505,46.392172,14.164494,-0.021858,,42.204373,-86.290323,6.597107,-0.229725
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-0.182065,22.434002,42.413801,0.000000,0.224469,,43.445064,-95.757576,6.299691,-0.367212


### On Balance Volume (OBV)

The On Balance Volume indicator (OBV) is a cumulative momentum indicator that relates volume to price change. It assumes that OBV changes precede price changes because smart money can be seen flowing into the security by a rising OBV. When the public then moves into the security, both the security and OBV will rise.

The current OBV is computed by adding (subtracting) the current volume to the last OBV if the security closes higher (lower) than the previous close.

$$
\text{OBV}_t = 
\begin{cases}
\text{OBV}_{t-1}+V_t & \text{if }P_t>P_{t-1}\\
\text{OBV}_{t-1}-V_t & \text{if }P_t<P_{t-1}\\
\text{OBV}_{t-1} & \text{otherwise}
\end{cases}
$$

In [115]:
data["obv"] = by_ticker.apply(
    lambda x: talib.OBV(x.close, x.volume) / x.expanding().volume.mean()
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,mfi,RSI,stochrsi,stoch,stock,ultosc,willr,ad,adosc,obv
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,-0.550906,,1.000000
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,-0.096048,,2.000000
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,-0.580407,,0.807533
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,-0.349850,,0.186668
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,-0.303581,,0.876825
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,30.716027,47.561693,16.407277,-0.214559,,48.667470,-83.064516,7.079870,-0.197373,26.446849
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,28.909778,48.191056,18.471979,-0.094444,,42.067553,-81.451613,6.972649,-0.136067,26.745218
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,21.132505,46.392172,14.164494,-0.021858,,42.204373,-86.290323,6.597107,-0.229725,25.989859
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,22.434002,42.413801,0.000000,0.224469,,43.445064,-95.757576,6.299691,-0.367212,25.478350


## Volatility Indicators

| Function | Name                          |
|:---------|:------------------------------|
| TRANGE   | True Range                    |
| ATR      | Average True Range            |
| NATR     | Normalized Average True Range |

### ATR

The Average True Range indicator (ATR) shows volatility of the market. It was introduced by Welles Wilder (1978)  and has been used as a component of numerous other indicators since. It aims to anticipate changes in trend such that the higher its value, the higher the probability of a trend change; the lower the indicator’s value, the weaker the current trend.

It is computed as the simple moving average for a period T of the True Range (TRANGE), which measures volatility as the absolute value of the largest recent trading range:
$$
\text{TRANGE}_t = \max\left[P_t^\text{High} - P_t^\text{low}, \left| P_t^\text{High} - P_{t-1}^\text{Close}\right|, \left| P_t^\text{low} - P_{t-1}^\text{Close}\right|\right]
$$

In [116]:
# Compute normalized version of ATR using rolling mean of price
data["atr"] = by_ticker.apply(
    lambda x: talib.ATR(x.high, x.low, x.close, timeperiod=14)
    / x.rolling(14).close.mean()
)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,RSI,stochrsi,stoch,stock,ultosc,willr,ad,adosc,obv,atr
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,-0.550906,,1.000000,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,-0.096048,,2.000000,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,-0.580407,,0.807533,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,-0.349850,,0.186668,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,-0.303581,,0.876825,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,47.561693,16.407277,-0.214559,,48.667470,-83.064516,7.079870,-0.197373,26.446849,0.040094
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,48.191056,18.471979,-0.094444,,42.067553,-81.451613,6.972649,-0.136067,26.745218,0.038912
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,46.392172,14.164494,-0.021858,,42.204373,-86.290323,6.597107,-0.229725,25.989859,0.038470
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,42.413801,0.000000,0.224469,,43.445064,-95.757576,6.299691,-0.367212,25.478350,0.038564


### NATR

The Normalized Average True Range (NATR) is a normalized version of the ATR computed as follows:

$$
\text{NATR}_t = \frac{\text{ATR}_t(T)}{P_t^\text{Close}} * 100
$$

Normalization make the ATR function more relevant in the folllowing scenarios:
- Long term analysis where the price changes drastically.
- Cross-market or cross-security ATR comparison.

In [117]:
# Not included in data due to high correlation with ATR

# data['NATR'] = by_ticker.apply(lambda x: talib.NATR(x.high,
#                                                     x.low,
#                                                     x.close,
#                                                     timeperiod=14))

# data

## Rolling Factor Betas

In [118]:
factor_data = web.DataReader(
    "F-F_Research_Data_5_Factors_2x3_daily", "famafrench", start=2007, end=2017
)[0].rename(columns={"Mkt-RF": "MARKET"})
factor_data.index.names = ["date"]

In [119]:
factors = factor_data.columns[:-1]
factors

Index(['MARKET', 'SMB', 'HML', 'RMW', 'CMA'], dtype='object')

In [120]:
t = 1
# this should be historical returns and need to exist in the dataset (eg, 21D_return)
ret = f"{t}D_return"

# Windows of 3 and 12 month
windows = [63, 252]
for window in windows:
    betas = []
    for ticker, df in by_ticker:
        model_data = df[[ret]].merge(factor_data, on="date").dropna()
        model_data[ret] -= model_data.RF

        rolling_ols = RollingOLS(
            endog=model_data[ret],
            exog=sm.add_constant(model_data[factors]),
            window=window,
        )
        factor_model = rolling_ols.fit(params_only=True).params.rename(
            columns={"const": "ALPHA"}
        )
        result = (
            factor_model.assign(ticker=ticker)
            .set_index("ticker", append=True)
            .swaplevel()
        )
        betas.append(result)
    betas = pd.concat(betas).rename(columns=lambda x: f"{x.lower()}_{window:02}")
    data = data.join(betas)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,smb_63,hml_63,rmw_63,cma_63,alpha_252,market_252,smb_252,hml_252,rmw_252,cma_252
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,0.009840,-0.008886,-0.020718,0.007701,0.000534,0.023702,0.008019,0.009677,-0.004207,0.022067
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,0.009823,-0.008927,-0.020732,0.007678,0.000847,0.023703,0.007430,0.009216,-0.004408,0.022263
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,0.010023,-0.008182,-0.019624,0.004398,0.000784,0.023403,0.007502,0.009416,-0.004531,0.022185
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,0.009769,-0.007226,-0.019898,0.004008,0.000880,0.023227,0.007419,0.009349,-0.004802,0.022341


## Compute the outcome variable that we will aim to predict, namely the 1-day forward returns.

In [121]:
data["-1D_return"] = by_ticker["1D_return"].shift(-1)

data

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,1D_return,3D_return,5D_return,10D_return,...,hml_63,rmw_63,cma_63,alpha_252,market_252,smb_252,hml_252,rmw_252,cma_252,-1D_return
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A,2007-01-03,23.871602,24.205900,23.230295,23.400856,2574600.0,6.024784e+07,,,,,...,,,,,,,,,,0.003207
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,4.868198e+07,0.003207,,,,...,,,,,,,,,,-0.009300
A,2007-01-05,23.400856,23.469080,23.196183,23.257585,2676600.0,6.225125e+07,-0.009300,,,,...,,,,,,,,,,-0.003520
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,3.608923e+07,-0.003520,-0.009621,,,...,,,,,,,,,,0.001178
A,2007-01-09,23.250763,23.414500,22.943754,23.203006,1386200.0,3.216401e+07,0.001178,-0.011625,,,...,,,,,,,,,,-0.009115
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CNX,2016-12-23,19.040000,19.400000,18.836300,19.250000,2326533.0,4.478576e+07,0.007853,0.020679,0.002082,-0.096244,...,-0.008886,-0.020718,0.007701,0.000534,0.023702,0.008019,0.009677,-0.004207,0.022067,0.003117
CNX,2016-12-27,19.300000,19.570000,19.190000,19.310000,1202237.0,2.321520e+07,0.003117,0.003117,0.029318,-0.030136,...,-0.008927,-0.020732,0.007678,0.000847,0.023703,0.007430,0.009216,-0.004408,0.022263,-0.009322
CNX,2016-12-28,19.310000,19.530000,18.995000,19.130000,3131994.0,5.991505e+07,-0.009322,0.001571,0.014316,-0.020481,...,-0.008182,-0.019624,0.004398,0.000784,0.023403,0.007502,0.009416,-0.004531,0.022185,-0.021955
CNX,2016-12-29,19.090000,19.230000,18.570000,18.710000,2133928.0,3.992579e+07,-0.021955,-0.028052,-0.028052,-0.021955,...,-0.007226,-0.019898,0.004008,0.000880,0.023227,0.007419,0.009349,-0.004802,0.022341,-0.025655


## Persist results

In [122]:
# Not sure if we want to store these separately, but example notebook did so duplicating here.
# Also, it's unclear whether we want to retain dollar_volume moving forward.
# Finally, we could do dropna on the dataframe, but we'll loose all of 2007 data since we have features that depend on a year's worth of data.
data = data.drop(["open", "high", "low", "close", "volume"], axis=1).replace(
    (np.inf, -np.inf), np.nan
)

In [123]:
with pd.HDFStore(DATA_STORE) as store:
    store.put("factors/common", data)