# An IRL Approach

Having outlined the types of Hedge Funds, as well as their strategies, it is now time to begin analyzing the sources of risk present in these strategies and incorporate much of our knowledge in Statistical Learning to explore these dynamics over time.  In the notes, we have investigated a number of famous Hedge Fund distasters over time.  The major theme in these disasters is the over-reliance and over-leverage of particular trades and positions when markets shift.  We have also discussed the concept of market efficiency and analyzed the use and application of factor models in the ongoing active vs passive debate.  
  
One challenge in analyzing the risks exposure to particular Hedge Funds, is our inability to dissect and understand their trading philosophies and current market strategies. A method by which to get around this challenge has been to use of Style Analysis in order to identify the correlation of particular funds with various fundementals over time. Style analysis can be thought of as a form of Inverse Reinforement Learning (IRL).  In IRL, rather than being given an input data and a label and using a cost function and model to give a prediction, we use the known response of a person's behavior to figure out what goal that behavior seems to be trying to achieve, given known inputs or a known state-space. While we may use a cost function like Mean-Squared Error (MSE), in this approach, the aim is to find out an unbiased linear approximation of the investors investment philosophy, rather than a function which predicts an output. While style analysis is the simplest version of this, just observing the weight-space of a linear model over time and for different funds it provides key insights into the risk exposure of a particular fund.  
  
If a fund appears highly correlated with interest rates at a point in time, we can assume they have strategies which expose them to interest rate risk. If a particular fund is highly correlated with equity, we can imagine despite the intricacies of that strategy, over time the fund is exposed to market risk. The value in this techniques is its ability to understand the changing exposure of funds over time and how they respond to different market conditions.


In [1]:
# We will be importing many of the #
#  common libraries we have used before
import os
import pickle
from functools import reduce
from operator import mul

import pandas as pd
import numpy as np

from statsmodels.regression.linear_model import OLS
from sklearn import linear_model
from sklearn.decomposition import PCA

import holoviews as hv
import hvplot
import hvplot.pandas

In [2]:
# We set teh seed and 
#  import the javascipt extensions for our plots
np.random.seed(42)
hv.extension('bokeh')

In [3]:
# There is a compatilibility issue with this library \
#and newer versions of Pandas, this is short fix to the problem, \
#if you have issues at this chunk comment it out and you should be fine.  
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader as pdr

  from pandas.util.testing import assert_frame_equal


A warning to some of you running this code: the 3D plots do take a while to render. Please try your best to close other applications that may be using a large number of system resources. This set of notes is going to look at a common dataset of different Hedge Fund Strategies. Funds with these strategies have been grouped to form strategy indexes, namely: a Liquid Alternative Beta Index, an Event Driven Liquid Index, a Global Strategies Liquid Index, a Long/Short Liquid Index, a Managed Futures Liquid Index and a Merger Arbitrage Liquid Index.  

For those that are interested, I have included some code to download data on NASDAQ ETF's which should provide an interesting comparison in the types of funds. Most ETF should be highly market correlated, so you will see a very different correlation and scaling to the plots – which you may need to change. Unlike our previous note packs and peer review exercises there we have had to interpolate, detrend, scale and shape out data, this set of notes is far simpler as we will simply we looking at the daily returns on our different strategy indexes.

In [4]:
##### using this code, you can perform the code below on NASDAQ ETFs ####

# tickers = pdr.nasdaq_trader.get_nasdaq_symbols(retry_count=3, timeout=30, pause=None)
# etfs = tickers.loc[tickers.ETF == True, :]
# symbols = etfs.sample(75).index.tolist()

# packet = pdr.robinhood.RobinhoodHistoricalReader(symbols, retry_count=3, pause=0.1, timeout=30, session=None, freq=None, interval='day', span='year')

# data = packet.read().reset_index()
# pivot = data.loc[:,['symbol', 'begins_at', 'close_price']].drop_duplicates().pivot(index='begins_at', columns='symbol', values='close_price')

In [13]:
# We import our data from CSV
indexes = pd.read_csv('StyleIndexes.csv')

# We ensure the dates are recorded correctly and compute returns
indexes.Date = pd.to_datetime(indexes.Date)
indexes.index = indexes.Date
indexes = indexes.drop(columns=['Date'])
indexes = indexes.pct_change().dropna()

In [14]:
# As this is a large dataset, we will only look 
#  at the last 1000 trading-days
pivot = indexes.iloc[::-1,:].iloc[-1000:,:]
pivot.head()

Unnamed: 0_level_0,Liquid Alternative Beta Index,Event Driven Liquid Index,Global Strategies Liquid Index,Long/Short Liquid Index,Managed Futures Liquid Index,Merger Arbitrage Liquid Index
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-01-26,0.003407,0.003849,0.004988,0.004456,0.000782,-0.002519
2016-01-27,-0.007023,-0.002224,-0.002998,0.000437,0.001592,-0.000699
2016-01-28,0.002243,-0.009758,-0.003897,0.003343,0.006831,-0.001829
2016-01-29,-0.004865,0.002451,-0.000707,-0.003317,0.002221,0.005409
2016-01-30,0.006049,-0.000521,-0.000447,0.001144,-0.001196,-0.008144


Using the data above, the plot below shows the returns the different strategies achieved over time. While these strategies may seem very different in their investment approach, it is easy to see how correlated they appear over long periods of time. In 2008, hedge fund strategies took a serious hit, driven – in part – to investor confidence and liquidity within the funds. It appears in recent years that Long/Short Liquid and Managed Futures Strategies have performed strongly. The question remains: can we identify common risks to these strategies and can we identify the exposure of these risks to common market fundamentals? 

In [15]:
%%opts Curve [width=800 height=300] NdOverlay [legend_position='right'] 
pd.melt(indexes.add(1).cumprod().reset_index(), id_vars=['Date']).hvplot.line(y='value', x='Date', by='variable')

The first technique we are going to look at is Principle Component Analysis (PCA). As you have covered this method already in your Machine Learning course, we will not go into the details of the technique or its derivation. Using PCA we can identify sources of variance across the funds. Using the slider you can identify a start point in time and a window and observe the drift in the strategies over time. It appears over the 1000 days used in this interactive plot, that two components are a fairly strong predictor of strategy movements explaining roughly 70% of the variance between them.

Obviously, it is difficult to identify clearly what these components might represent, but we can imagine equity markets being a large source of variance. What is interesting to note from this plot, is that while many strategies move dramatically, Global Strategies remains fairly central– perhaps indicating its diversified exposure.  

In [16]:
# you can replace PCA with CCA, kernel PCA, FA or 
#  any other relevent method for dimensionality reduction
# labels just tells the function to include the 
#  names of the duffernt funds on the plot
class Component_Plots:
    def __init__(self, data=pivot, transformer=PCA(2), labels=True):
        self.data = data
        self.transformer = transformer
        self.labels = labels
    
    def components(self, start, window):
        component_data = self.transformer.fit_transform(self.data.iloc[start:(start+window),:].T)
        
        if self.labels:
            data_labels = reduce(mul, pd.DataFrame(component_data, index=self.data.columns.tolist(), 
                                             columns=['Component_1', 'Component_2'])\
                                               .reset_index()\
                                               .apply(lambda x: hv.Text(x[1], x[2], 
                                                                        ' '.join(x[0].split()[:-1]), fontsize=8), axis=1)\
                                               .tolist())
        else:
            data_labels = hv.Text(0,0,'')
    
        return pd.DataFrame(component_data, columns=['Component_1', 'Component_2'])\
                .hvplot.scatter(x='Component_1', y='Component_2')\
                .redim(Component_2={'range': (-0.1, 0.3)}, Component_1={'range': (-0.03, 0.05)})\
                .redim.label(Component_1=f'Component 1 {self.transformer.explained_variance_ratio_[0].round(4)}%', 
                             Component_2=f'Component 2 {self.transformer.explained_variance_ratio_[1].round(4)}%').options(alpha=1)*\
                data_labels

In [17]:
CompPlots = Component_Plots()

In [18]:
%%opts Scatter [width=800, height=400]
hv.DynamicMap(CompPlots.components, kdims=['start', 'window']).redim.range(start=(0,len(pivot.index))).redim.range(window=(30,90))

While PCA may provide insights into latent risk, given market information, we may also want to understand how these strategies respond over time to changing market conditions. For this we are going to analyze three primary factors: 3-month US T-bills, NASDAQ Composite Returns and the US Weighted Exchange Rate. We will be sourcing this data through the pandas_datareader API, from the [FRED website](https://fred.stlouisfed.org/). Using this data, we are going to
 fit models to understand the covariance of these models to factors over time. In order to analyze this we are going to try observe not the input or output of these models, but the weight-space, to determine the effects of changes in strategies in response to changing factors. We imagine for Global Strategies Exchange Rate covariance may be high, while for Long/Short Strategies, Equity Exposure may be most important.

In [None]:
# We download FRED data on 3-month Tbills, 
#  NASDAQ Comp and Exchnage rate mvts. 
factors = pdr.data.DataReader(['DTB3','NASDAQCOM','DTWEXB'], 'fred', start=str(pivot.index.min()), end=str(pivot.index.max()))
factors.loc[:,['NASDAQCOM','DTWEXB']] = factors.loc[:,['NASDAQCOM','DTWEXB']].pct_change()
factors.loc[:,['DTB3']]  = ((factors.loc[:,['DTB3']]+1)**(1/365)).pct_change()
factors = factors.dropna()
factors = factors.loc[pivot.index,:].interpolate().fillna(0)

While most of our factors appear faily stable, it is clear how astronomical the growth in the NASDAQ has been over this period.  As we imagine funds respond to changes in interest rates, we will look at changes to interest rates.  As such, we will be using:

$$\Delta (1+i)^{1/n}$$


In [None]:
pd.melt(factors.add(1).cumprod().reset_index(), id_vars=['Date']).hvplot.line(y='value', x='Date', by='variable')

In [None]:
# We must change the holoview 
#  backend to view a historic plot
hv.extension('matplotlib')

Given our three factors, we can use a 3 dimensional plot to best visualize and represent our weight-space and the movement of our strategies over time. Sadly, Holoview lacks functionality for 3D-text, though you should be able to track the movements of the dots to get an idea of the movements of strategies in this weight-space over time. Using the sliders, you should be able to adjust the starting point and window over which the weights are calculated.


In [None]:
class Weight_Plots:
    def __init__(self, x=factors, y=pivot, transformer=linear_model.LinearRegression()):
        self.data = {'x':x,'y':y}
        self.transformer = transformer
    
    def weights(self, start=0, window=90):
        self.transformer.fit(X=self.data['x'].iloc[start:(start+window),:], y=self.data['y'].iloc[start:(start+window),:])
    
        return hv.Scatter3D(pd.DataFrame(self.transformer.coef_, index=pivot.columns, columns=['x','y','z']).reset_index(), vdims='index')\
    .redim(y={'range': (-0.5, 0.5)}, x={'range': (-150, 150)},  z={'range': (-1, 2)})\
    .redim.label(x=self.data['x'].columns[0], y=self.data['x'].columns[1], z=self.data['x'].columns[2])

In [None]:
WeightPlots = Weight_Plots()
curve_dict_2D = {(s,w):WeightPlots.weights(s,w) for s in range(0,len(pivot.index),25) for w in range(30,90,15)}
hmap = hv.HoloMap(curve_dict_2D, kdims=['start', 'window']).collate()

In [None]:
%%opts Scatter3D [fig_inches=8] (s=50)
hmap

Analyzing this plot, it is again clear the volatility in beta-derived strategies, such as 
Managed Futures, Long/Short Liquid and Liquid Alternative. In comparison, 
other strategies remain fairly central in our vector-space indicator their low correlations to T-bills, Beta and the Exchange Rate.

From the analysis in these notes we have begun to integrate our knowledge in machine learning and quantitative finance, to use method in IRL to analyze the risk exposures of various hedge fund strategies over time. Using this analysis, we can begin to understand how strategies respond to market events and the correlation between different strategies, from a fund of funds perspective. 