## Timeseries Analysis of Silver Microfutures

* Understand the timeseries data

* Post process the data for analysis

* Basic plotting and Exploratory Data Analysis

* ACF/ PACF analysis

* Fitting using ARIMA model

In [None]:
%matplotlib inline
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
sns.set_style('darkgrid')

In [None]:
import warnings
from matplotlib import rcParams
rcParams["xtick.labelsize"] = 15
rcParams["ytick.labelsize"] = 15
rcParams["legend.fontsize"] = "small"
pd.set_option("precision", 2)
warnings.filterwarnings("ignore")

In [None]:
Jun = pd.read_csv('../input/silverminidataset/30Jun2021.csv',parse_dates=["Date"],index_col="Date")
Jun = Jun [['Open','High','Low','Close','Previous Close']]

In [None]:
Aug= pd.read_csv('../input/silverminidataset/31AUG2021.csv',parse_dates=["Date"],index_col="Date")
Aug = Aug [['Open','High','Low','Close','Previous Close']]

In [None]:
print(Jun.head())
print(Jun.tail())

In [None]:
print(Aug.head())
print(Aug.tail())

In [None]:
Jun = Jun[~Jun.Open.isnull()]
Jun.tail()

In [None]:
Aug[~Aug.Open.isnull()]

In [None]:
#I just need the data from 30th June till date
#The slicing starts reverse, because my dates are reversed
Aug = Aug.loc["2021-08":"2021-07"]

In [None]:
#Creating Data from Nov'20 to Jul'21, that is 9 months
qtr = Aug.append(Jun)

In [None]:
print(qtr.shape)
qtr.head()

In [None]:
qtr['mon'] = qtr.index.month_name()

In [None]:
qtr['day'] = qtr.index.day_name()

In [None]:
#qtr['Open'].asfreq('M').tail(5) is not workin
qtr.tail()

In [None]:
#Learning to impute
imputer = pd.read_csv('../input/silverminidataset/30Jun2021.csv',parse_dates=["Date"],index_col="Date")

In [None]:
imputer = imputer[['Open','High','Low','Close','Previous Close']]

In [None]:
imputer[imputer.Open.isna()].Open = imputer[imputer.Open.isna()].Close

In [None]:
for id in imputer[imputer.Open.isna()].index:
    imputer.loc[id,'Open'] = imputer.loc[id,'Close']

In [None]:
for id in imputer[imputer.High.isna()].index:
    imputer.loc[id,'High'] = imputer.loc[id,'Close']

In [None]:
for id in imputer[imputer.Low.isna()].index:
    imputer.loc[id,'Low'] = imputer.loc[id,'Close']

In [None]:
imputer.tail()

In [None]:
#Creating Data from Nov'20 to Jul'21, that is 9 months
new_qtr = Aug.append(imputer)

In [None]:
print(qtr.shape)
print(new_qtr.shape)
#If not imputed, then approx 60 days closing price will be lost

In [None]:
new_qtr['day']=new_qtr.index.day_name()
new_qtr['month']=new_qtr.index.month_name()
new_qtr.head()

In [None]:
#Adding additional information
new_qtr['intra'] = new_qtr.Open - new_qtr.Close
new_qtr['gap'] = new_qtr.Open - new_qtr['Previous Close']
new_qtr.tail()

In [None]:
#Which days of the week there has been high intra day jump?

sns.catplot(y ='intra', x = 'day',data=new_qtr, kind='boxen',col='month',col_wrap=3)

The above plots show many insights
1) In the beginning there is not much Intraday action, since Open and Close are the same, even though, the close might be at different values

2)Most of the intraday action happens on the Friday, as the week closes.

3)Major moves or corrections happen on the Thursdays, and on Tuesdays in most of the months. Mondays and Fridays are active only in 2 or less months.

4)Intraday has seen some good positive moves, upto 6000 INR. When it comes to negative side, it has gone upto -2500 INR in a day


In [None]:
#Which days of the week there has been high intra day jump?
sns.catplot(y ='gap', x = 'day',data=new_qtr, kind='box',col='month',col_wrap=3)

In [None]:
sns.jointplot(x='intra',y='gap',data=new_qtr[:'2021-03'],hue='month')

In [None]:
sns.jointplot(x='intra',y='gap',data=new_qtr['2021-02':],hue='month')

In [None]:
vol = pd.read_csv('../input/silverminidataset/30Jun2021.csv',parse_dates=["Date"],index_col="Date")
vol = vol[['Volume(Lots)','Close']]

In [None]:
vol['day'] = vol.index.day_name()
vol['month'] = vol.index.month_name()

In [None]:
vol.head()

In [None]:
sns.catplot(x='Volume(Lots)',y='month',data=vol[:'2021-04'],kind='box')

In [None]:
sns.catplot(x='Volume(Lots)',y='month',data=vol['2021-01':],kind='box')

1) One can see from the Opening Gap, how liquid the market is for traders

2) Overnight impact on the pricing slowly fades away as the future starts getting traded by more people

* By looking at the volumes traded till date, and dividing it into each month the details can be seen visually

* More people entering the market there is significant liquidity to abosorb any price shocks

3) How significant the impact of opening gap  on the intraday movement? 

* Take a look at one of the joint plot between the opening gap and the intraday moves, there is no correlation

* Higher Opening Gaps have occured during the initial months when the liquidity was less

* After splitting the data with help of the months column, the significance of the liquidity can be observed

Questions:
1) Where the price of the instrument comes in the begining of the instrument?
    
    - Usually, it is calculated and registered by the script creator on the market
    
    - The market makers open the script and close at the same price for accounting purposes
    
    - Even in this data in the beginning you will see the "Close" prices copied onto the "Open","High","Low", for the days 
    where this information was missing. Imputing in the statistical way was not done. 
    
2) How can these insights be used for understanding the price movements?

    - Price moves because there is supply and demand change of the underlying, in this it was silver. The instrument, Silver   Micro future
    
    - Once the analysis is through, and you find that using the price movement of this instrument is useless, then you search for another instrument that is correlating with this instrument. We have one such underlying, it is Gold.
    
    - Using the probability of the price movements can be very useful in controlling your anxiety during the wild swings in the market. The below plots on the "Close" will show what happens in market in statistical point.

In [None]:
new_qtr['lognat'] = new_qtr['Close'].apply(lambda x: np.log(x))

fig, ax = plt.subplots(3,1,figsize=(14,14))

new_qtr.Close.plot(ax=ax[0])

new_qtr.lognat.plot(ax=ax[1])

new_qtr['lag'] = new_qtr.Close - new_qtr.Close.shift()
new_qtr.lag.plot(ax=ax[2])

#To begin with 

Let us directly try to find out whether there is any correlation between the lagged prices. This is called as autocorrelation.
There are partial ACF and ACF functions that can help find out such a correlations in the series.

If there is no correlation, then the series is effectively a Random Walk. Lets begin

In [None]:
from statsmodels.tsa.stattools import acf
from statsmodels.tsa.stattools import pacf

#Here we are subtracting the close of earlier date with later date
lag_correlations = acf(new_qtr['lag'].iloc[1:])
lag_partial_correlations = pacf(new_qtr['lag'].iloc[1:])

In [None]:
fig, ax = plt.subplots(figsize=(16,12))

ax.plot(lag_correlations, marker='o', linestyle='--',color='b')
ax.plot(lag_partial_correlations, marker='x', linestyle='-',color='r')

The graph above shows that there is no significant correlation between the "lagged" series and the next 40 such lags. So essentially the series is a Random Walk. 

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(new_qtr['lag'].iloc[1:], model='additive', freq=30)
fig = plt.figure(figsize=(20,10))
fig = decomposition.plot()

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(new_qtr['lognat'], model='additive', freq=30)
fig = plt.figure()
fig = decomposition.plot()

There is surprising monthly cycles on the closing prices. What can explain this seasonality?

How can this seasonality be leveraged for forecasting? b

In [None]:
model = sm.tsa.ARIMA(new_qtr.lognat, order=(1, 0, 0))
results = model.fit(disp=-1)
new_qtr['Forecast'] = results.fittedvalues
new_qtr[['lognat', 'Forecast']].plot(figsize=(16, 12))

It has performed much better with the natural log values, how it will do on natural log with a lag?

In [None]:
new_qtr['diff_lognat'] = new_qtr.lognat - new_qtr.lognat.shift()

In [None]:
model = sm.tsa.ARIMA(new_qtr.diff_lognat.iloc[1:], order=(1, 0, 0))
results = model.fit(disp=-1)
new_qtr['diff_Forecast'] = results.fittedvalues
new_qtr[['diff_lognat', 'diff_Forecast']].plot(figsize=(16, 12))

Now it's pretty obvious that the forecast is way off.  We're predicting tiny little variations relative to what is actually happening day-to-day.  Again, this is more of less expected with a simple moving average model of a random walk time series.  There's not enough information from the previous days to accurately forcast what's going to happen the next day.

A moving average model doesn't appear to do so well.  What about an exponential smoothing model?  Exponential smoothing spreads out the impact of previous values using an exponential weighting, so things that happened more recently are more impactful than things that happened a long time ago.  Maybe this "smarter" form of averaging will be more accurate?

In [None]:
model = sm.tsa.ARIMA(new_qtr.diff_lognat.iloc[1:], order=(0, 0, 1))
results = model.fit(disp=-1)
new_qtr['diff_Forecast'] = results.fittedvalues
new_qtr[['diff_lognat', 'diff_Forecast']].plot(figsize=(16, 12))

Objective of this notebook is not just to check how the prediction is being done on a time series, but to understand the series and then use the knowledge to predict in other more predictable series.

## Thanks for Joining me in this journey....