## Quantitative Analysis of the India Shadow Banking crisis of 2018-2019

#### Participating Group Members:

Long Nguyen (nguyenlongphi1507@gmail.com)

Dang Duy Nghia Nguyen (nghia002@e.ntu.edu.sg)

Bodhisattya Roy (bodhisattya_roy@yahoo.in)

### 1. Introduction

The effect of the India Shadow Banking crisis can be observed directly in the NIFTY50 index. Hence, in this report, we explored different approaches to see if we can predict the coming of the crisis. In particular, we are going to discuss two approaches: one involves modeling NIFTY50 price based on exogenous factors, and the other involve building a time series model out of NIFTY50 price.

In the prior, we selected a number of macro economics and market data factors and analyzed how much of them explains the variance in NIFTY50 price and build a linear model out of it. In the latter, we utilized the autocorrelation property of NIFTY50 price time series and tried to predict the crisis from the past price itself.

In [None]:
from functools import reduce
import pandas_datareader.wb as wb
import pandas_datareader as pdr
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
from statsmodels.regression.linear_model import OLS
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.tsa.arima.model import ARIMA

import holoviews as hv
import hvplot.pandas
hv.extension('bokeh')

In [None]:
COUNTRY = 'IND'
START = pd.to_datetime('2000-01-01')
END = pd.to_datetime('2020-12-31')

### 2. Modeling using exogenous factors

We select a number of macro economics factors and market data. The comprehensive is as follow: Current account balance (BoP, current \\$US), Reserves and related items (BoP, current \\$US), Foreign direct investment, net inflows (BoP, current \\$US), Stocks traded, total value (current \\$US), Official exchange rate, LCU per USD, period average, Portfolio investment, bonds (PPG + PNG) (NFL, current \\$US), Broad money (\% of GDP), Lending interest rate (\%), Real interest rate (\%), Interest payments (\% of revenue), GDP per capita (constant 2010 \\$US), Official exchange rate (LCU per \\$US, period average)

In [None]:
indicators = wb.get_indicators()

In [None]:
factor_ids = ['FM.LBL.BMNY.GD.ZS', 'BX.KLT.DINV.CD.WD', 'BN.CAB.XOKA.CD', 'BN.RES.INCL.CD', 'FR.INR.LEND', 'FR.INR.RINR', 'NY.GDP.PCAP.KD', 'GC.XPN.INTP.RV.ZS',
              'CM.MKT.TRAD.CD', 'DT.NFL.BOND.CD', 'DPANUSLCU', 'PA.NUS.FCRF']
factors = indicators.loc[indicators.id.isin(factor_ids)]
factors

In [None]:
nifty50_raw = pdr.data.get_data_yahoo('^NSEI', start=START, end=END)
nifty50_raw.tail()

In [None]:
n50 = nifty50_raw[['Adj Close']]
n50.hvplot.line(title='NIFTY50')

In [None]:
def plot_factors(indicator, countries, start, end, title, time_axis='year'):
    data = wb.download(indicator=indicator, country=countries, start=start, end=end)
    data = data.reset_index()
    plot = data.iloc[::-1,:].hvplot.line(x=time_axis, y=indicator, by='country', title=title)
    return data, plot

In [None]:
factor_data = dict()
factor_plots = dict()
for idx, factor in factors.iterrows():
    data, plot = plot_factors(factor.id, [COUNTRY], START, END, factor['name'], time_axis='year')
    factor_data[factor.id] = data
    factor_plots[factor.id] = plot

In [None]:
%%opts Curve [width=550]

total_plot = reduce(lambda acc, cur: acc + cur, list(factor_plots.values()))
total_plot

In [None]:
factors_df = pd.concat(list(factor_data.values()), axis=1)[['year'] + factor_ids]
factors_df = factors_df.set_index(pd.DatetimeIndex(factors_df.year.iloc[:, 0], yearfirst=True)).drop('year', axis=1).shift(-1)
factors_df.head()

In [None]:
n50['year'] = n50.index.year

In [None]:
y = n50.groupby('year').last()

In [None]:
X = factors_df[(factors_df.index.year >= 2007) & (factors_df.index.year <= 2020)].interpolate(axis=1)

In [None]:
stdScaler = StandardScaler()
X_std = stdScaler.fit_transform(X)
cov_mat = np.cov(X_std.T)

In [None]:
plt.figure(figsize=(10,10))
sns.set(font_scale=1.5)
hm = sns.heatmap(cov_mat,
                 cbar=True,
                 annot=True,
                 square=True,
                 fmt='.2f',
                 annot_kws={'size': 12},
                 yticklabels=factor_ids,
                 xticklabels=factor_ids)
plt.title('Covariance matrix showing correlation coefficients')
plt.tight_layout()
plt.show()

In [None]:
ols = OLS(y.values, X)
result = ols.fit()
print(result.summary())

R-squared of 0.977 indicates there is a strong correlation between the exogenous data and the nifty50 price. Among the factor, we can see that the most significant factor is 'Lending Interest Rate' at 1%-level using Students T-test. However, this model is highly unstable due the the limit of macro economics data.

### 3. Time series analysis on NIFTY50

Next, we aimed at utilizing the autocorrelation characteristic of time series data to try to forecast the crisis before it happened. The model used is ARIMA where AR represents the correlation with its past values and MA represent the moving average term which tries to capture the idisyncratic shocks observed in financial markets. We can think of events like terrorist attacks, earnings surprises, sudden political changes, etc. as the random shocks affecting the asset price movements. ARMA and ARIMA models have both AR and MA terms.

In [None]:
n50 = n50[['Adj Close']]

In [None]:
x = n50.values

First, we need to check the ACF and PACF plot to determine the lag terms.

In [None]:
lag_acf = acf(x, nlags=100)
plt.figure(figsize=(16, 7))
#Plot ACF: 
plt.plot(lag_acf, marker="o")
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(x)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(x)),linestyle='--',color='gray')
plt.title('Autocorrelation Function')
plt.xlabel('number of lags')
plt.ylabel('correlation')
plt.tight_layout()

In [None]:
lag_pacf = pacf(x, nlags=100, method='ols')
plt.figure(figsize=(16, 7))
plt.plot(lag_pacf, marker="o")
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(x)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(x)),linestyle='--',color='gray')
plt.title('Partial Autocorrelation Function')
plt.xlabel('number of lags')
plt.ylabel('correlation')
plt.tight_layout()

As seen from the graphs, there is a strong corellation to past value suggested by ACF. However PACF suggests that direct correlation is only up to 2 lags. So we will use 2 as the value of lag for our model.

In [None]:
# fit model
model = ARIMA(x, order=(2,0,1))
model_fit = model.fit()
# summary of fit model
print(model_fit.summary())
# line plot of residuals
residuals = pd.DataFrame(model_fit.resid)
residuals.plot()
plt.show()
# density plot of residuals
residuals.plot(kind='kde')
plt.show()
# summary stats of residuals
print(residuals.describe())

In [None]:
# split into train and test sets
X = x
size = int(len(X) * 0.66)
train, test = X[0:size], X[size:len(X)]
history = [x for x in train]
predictions = list()
# walk-forward validation
for t in range(len(test)):
    model = ARIMA(history, order=(2,0,1))
    model_fit = model.fit()
    output = model_fit.forecast()
    yhat = output[0]
    predictions.append(yhat)
    obs = test[t]
    history.append(obs)

In [None]:
# evaluate forecasts
rmse = np.sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse)
# plot forecasts against actual outcomes
plt.plot(test)
plt.plot(predictions, color='red')
plt.show()

### 4. Conclusion

By analyzing past data, we can prove that these crisis could have been predicted using quantitative methods and be intervened with the right measure to prevent it from happening. Learning from these past mishaps, regulations need to be strenghthened and made thorough to even prevent the precursor of such events.