# A Quantemental Approach to an Explainable Model: Predicting Financial Market Dislocations with Causal-Inference and Anonamly Detection
<br/>

Funds that fail to beat their benchmarks quickly go extinct. Generating alpha is an extraordinarily difficult task. One that is becoming increasingly difficult, a reality illustrated by deteriorating hedge fund returns. **Find alpha** or die is the stark new reality for every portfolio manager. At the same time, we are living through a technology driven data explosion. In the world’s 2.5 billion gigabytes of data, Wall Street sees its savior. The prevailing belief is that this data — and the predictive power it promises — is the most powerful alpha source to emerge in the last quarter century. The $3 trillion hedge fund industry is currently
betting its future on it.

<br/>
This sentiment is especially true as markets enter a period of unprecedented volatility and uncertainty as the world grapples with the fallout of the pandemic, growing social and financial inequality, and now widespread civil unrest. A market once thought "wacky" has tipped into full-blown madness as the world seemingly unravels before our eyes. To this end, I am proposing a quantamental approach to help us navigate current and future crises. 

<br/>
<br/>
To be clear: models are not magic, they have their limitations and are only as good as their assumptions and the quality of the data they ingest. Every model, regardless of complexity, should under-go sufficient levels of scrutiny. Models may appear predictive when they are not and can lose predictive power as market conditions change. It is therefore in the firm's best interest for us to reserve belief in the model until it passes the appropriate checks-and-balances. In the remaining sections of this notebook, I do my best to take this "black box" and make it as transparent as possible without losing the forest for the trees (pun intended).
<br/>
<br/>

## Motivation
<br/>
With oceans of information and only so much time to investigate fundamentals before pulling the trigger on an investment decision, where we look for opportunities and risk is of critical importance. It is becoming increasingly easy to lose the forest through the trees, especially on particularly demanding days. Running a lean shop has its disadvantages and this is one of them. While for the foreseeable future machines are unlikely to master the fundamentals, they are particularly good at sifting through large amounts of data and finding patterns otherwise invisible to the analyst. Models can provide clues the analyst can not see and the analyst can provide intuition and expertise the models can not comprehend. The idea is this: build a model that detects anomalous movements in pairwise correlations of market indices that most often precede market dislocations. The anomalies offer an executable signal that is ultimately at the discretion of individual desks to further investigate and act upon. In the following sections of this notebook, I walk the reader through the model, the assumptions I make, the promising preliminary results I obtain, and the conclusions I make.
<br/>
<br/>

## Market Sociology
<br/>
Suppose we survey every investor on the planet. Assuming everyone has access to the same N pieces of information, we could ask: given the nth piece, are you a bear or a bull? Market behavior is often considered to reflect external economic news, though empirical evidence has challenged this connection [1]. Indeed, it is ultimately the investor's internal outlook and biases that determine how they answer the question. They can imagine a threat when there is none and ignore one when there is. What is more, investors can and often will, ignore accumulating evidence of an economic crisis — until they don't — and panic.
<br/>
<br/>
In sociology [2–5], panic has been defined as a collective flight from a real or imagined threat. In economics, bank runs occur at least in part because of the risk to the individual from the bank run itself—and may be triggered by predisposing conditions, external (perhaps catastrophic) events, or even randomly [6, 7]. Although empirical studies of panic are difficult, efforts to distinguish endogenously (self-generated) and exogenous (externally-generated) market panics from oscillations of market indices have met with some success [8–10], though the conclusions have been debated [11–14]. The literature generally uses the volatility and the correlation between asset prices to characterize risk [15–19]. These measures are sensitive to the magnitude of price movement and therefore increase dramatically when there is a market crash.
<br/>
<br/>
This proposal is not radically different from what has been achieved in the literature. By making precise measurements of correlations between asset prices and the volatility of those correlations, we can paint a more complete picture of the market and look for early warning signals of extreme volatility and market dislocations. The reality of this approach is that correlations are non-stationary (they are notoriously unstable) and are known to harbor non-linear effects [20]. As such, instead of using Pearson's correlation, I compute Székely's correlation (which measures both linear and non-linear associations in the data) over a rolling 90-day window between each pair of stocks in the DJA from 2000-2020.

In [4]:
import sys
import os
import datetime
from IPython.core.display import display, HTML

import math
import pandas as pd
import numpy as np
import networkx as nx
from statsmodels.tsa.stattools import grangercausalitytests, coint, adfuller
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
import matplotlib.pyplot as plt
from matplotlib import gridspec
import seaborn as sns
from jupyterthemes import jtplot

# import custom library tools
from hedgepy.core import build_series, build_network_time_series
from hedgepy.utils import write_series, read_series, read_data
from hedgepy.centrality import global_degree_centrality, global_eigencentrality

# enable importing external modules
sys.path.append(sys.path[0] + "/..")

# autoreload magic
%load_ext autoreload
%autoreload 2

display(HTML("<style>.container { width:80% !important; }</style>"))
jtplot.style()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
df_dja = pd.read_csv('../experiments/data/interim/DJA-2000-2020-clean.csv')
df_dja['date'] = pd.to_datetime(df_dja['date'])
df_dja.set_index('date', inplace=True)
df_dja = df_dja.drop('Unnamed: 0', axis=1)
df_dja.head()

Unnamed: 0_level_0,open,high,low,close,name
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2000-05-11,101.37,104.25,99.0,102.81,AAPL
2000-05-12,106.0,110.5,104.77,107.62,AAPL
2000-05-15,108.06,108.06,100.12,101.0,AAPL
2000-05-16,104.52,109.06,102.75,105.69,AAPL
2000-05-17,103.62,103.69,100.37,101.37,AAPL


In [None]:
nx_ts_hard = build_network_time_series(dja_series, soft_threshold=False)