# Industry Momentum + Shiller-CAPE

Converting raw data into ingredients fit for our strategy recipe is an annoying yet necessary work. Here's how I cleaned and processed data for this project.

In [1]:
import pandas as pd
from datetime import datetime, timedelta

In [2]:
df_div = pd.read_csv('D:/Repositories/cicc/Industry Momentum + CAPE/data/dividends.csv', parse_dates=['announced'], dtype={'ticker':str})
df_price = pd.read_csv('D:/Repositories/cicc/Industry Momentum + CAPE/data/price.csv', index_col=['date'], parse_dates=['date'])
df_mktcap = pd.read_csv('D:/Repositories/cicc/Industry Momentum + CAPE/data/market_cap.csv', index_col=['date'], parse_dates=['date'])
df_members = pd.read_csv('D:/Repositories/cicc/Industry Momentum + CAPE/data/WIND_index_members.csv', parse_dates=['included', 'excluded'])
df_map = pd.read_csv('D:/Repositories/cicc/Industry Momentum + CAPE/data/ticker_map.csv', index_col=['key'])
df_idx = pd.read_csv('D:/Repositories/cicc/Industry Momentum + CAPE/data/WIND_industry_index.csv', index_col=['Date'], parse_dates=['Date'])

In [3]:
# time range for back test
START = datetime(2009,1,1)
END = datetime(2020,12,31)

## Clean dividend data

In [4]:
from xquant.util import add_suffix

In [5]:
df_div['ticker'] = df_div['ticker'].apply(add_suffix) # convert ticker symbol into standard format (e.g. 000001.SZ)

In [6]:
df_div.dropna(subset=['announced'], inplace=True)

## Clean Index Members Data

In [7]:
from xquant.util.period import Period

In [8]:
# map symbols to actual names of industry
df_members['industry'] = df_members['industry'].apply(lambda x: df_map.at[x,'value'])

In [9]:
# if stock is still a member of the index, set excluded time to a future data far away
df_members['excluded'].fillna(pd.Timestamp('20991231'), inplace=True)

In [10]:
df_members.dropna(subset=['included'], inplace=True)

## Calculate Financial Metrics for an Index

Very often we would need to look at certain metrics of an index, such as earnings and dividends. In a market capitalization weighted indices with *n* members, its metric *m* is calculated by:

$\sum^{n}_{i=1} w_{i} \cdot m_{i}$

where *w* is the weight of member *i* in the index (i.e. market cap of member *i* divided by sum of market cap for all members).

In [16]:
from xquant.util import closest_trading_day

In [42]:
def get_index_metrics(df_mktcap, df_metric, df_members, date) -> float:
    # only need market cap data at date
    mktcap = df_mktcap.loc[closest_trading_day(date, df_mktcap.index, 'bfill')]
    
    # get all members of index at date
    df_members = df_members[(df_members['included'] <= date) & (date < df_members['excluded'])]
    members = df_members['ticker']
    members = [stock for stock in members if stock in mktcap.index] # keep only members in metric data
    
    mktcap = mktcap.loc[members]
    mktcap_sum = mktcap.sum()
    
    print(mktcap_sum)

In [48]:
get_index_metrics(df_mktcap, df_div, df_members.query("industry=='Industrials'"), datetime(2018,1,1))

1105966550.0
