## Notes:

*   General:

The following code runs co-integration, auto-correlation and Granger Causality tests between 2 different stocks in order to establish the existence of a correlation between them.

The 2 stocks chosen belong to the same country and the same sector.

*   About the Granger Causality tests:

The Pvalues we get from running the Granger Causality Test (parameter Ftest, SSR based chi2 test, SSR based Ftest, Likelihood Ratio test) on our stock data is largely similar

chi2: chi squared statistic tells you how much difference exists between your observed counts and the counts you would expect if there was no relationship at all in the population.

SSR: Regression Sum of Squares.

We say that a variable X ***Granger-causes*** another evolving variable Y if the predictions of the value of Y based on its own past values ***and*** on the past values of X are better than predictions of Y based only on Y's own past values. 

* About the Partial Autocorrelation comparison:

In time series analysis, the partial autocorrelation function gives the partial correlation of a stationary time series with its own lagged values.

### Code:

In [None]:
!pip install yfinance #YAHOOFINANCE FOR SECTOR AND COUNTRY DATA

In [None]:
import pandas as pd
import pandas_datareader as pdr
import statsmodels.tsa.stattools as ts
import matplotlib.pyplot as plt
from datetime import date
today = date.today()
import yfinance as yf
import requests
from urllib.error import HTTPError

In [None]:
stox_1=[]
stox_2=[]
lag_1=[]
lag_2=[]
lag_3=[]
lag_4=[]
lag_5=[]
lag_6=[]
lag_7=[]
lag_8=[]
lag_9=[]
min_pvalue=[]
bool_shortlist=[]
new_stock_list=['MSFT','AAPL','AMZN','GOOGL','ORCL','JPM','BRK','GS','AXP','HSBC','V','BAC','T','JNJ','NVS','MRK','UNH','XOM','CVX','TOT','TM','F',
            'NSANY','TSLA','GM','HOG','BA','DAL','LUV','UAL','AAL','SAVE','MCD','CMG','KMX','MUSA','CTB','SAH','GPC','H','LVS','MAR','PFE','PG','CHL',
            'ABT','MDT','WMT','KO','PEP','BABA','NKE','DIS','NFLX','EROS','WWE','CNK','DELL','IBM','HPQ','FIT','HMI','TSM','USEG','PTR','SNP','UNP','SPCE',
            'LMT','NOC','UPS','FDX','RIO','VZ','SAP','UL']
len(new_stock_list)

76

In [None]:
Master_df_pvalues

In [None]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from google.colab import files
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [None]:
from google.colab import drive
drive.mount('drive')
Master_df_pvalues.to_csv('yeet.csv')

In [None]:
!cp yeet.csv "drive/My Drive"

In [None]:
Master_df_pvalues_re=Master_df_pvalues[(Master_df_pvalues['Stock1 code']!='RDS.B')&(Master_df_pvalues['Stock2 code']!='RDS.B')]

In [None]:
Master_df_info['Key']=Master_df_info.index
Master_df_pvalues['Key']=Master_df_pvalues.index
Master_df=pd.merge(Master_df_info, Master_df_pvalues, on='Key')
Master_df.drop('Key', axis=1, inplace=True)

In [None]:
Master_df

In [None]:
Master_df.to_csv('pvalue.csv')

In [None]:
copy_master=Master_df[['Stock1 code', 'Stock2 code', 'Stock1 Sector', 'Stock2 Sector', 'Min. p', 'Shortlist?']].copy()

In [None]:
Master_df.columns

Index(['Stock1', 'Stock1 Country', 'Stock1 Sector', 'Stock2', 'Stock2 Country',
       'Stock2 Sector', 'Stock1 code', 'Stock2 code', 'Lag1', 'Lag2', 'Lag3',
       'Lag4', 'Lag5', 'Lag6', 'Lag7', 'Lag8', 'Lag9', 'Min. p', 'Shortlist?'],
      dtype='object')

In [None]:
copy_master=copy_master[(copy_master['Shortlist?'] == 'Yes')]

In [None]:
copy_master

In [None]:
Master_df['Min. p'].plot()

In [None]:
#VISUALISING HISTORICAL STOCK DATA
data_1=pdr.DataReader('HGV', data_source='yahoo', start='23/4/2019', end='23/4/2020')
data_1['Close'].plot(title='Graph of stock1 and stock2')
data_2=pdr.DataReader('LVS', data_source='yahoo', start='23/4/2019', end='23/4/2020')
data_2['Close'].plot()

In [None]:
data1=pdr.DataReader('C', data_source='yahoo', start='23/4/2019', end=today)
data2=pdr.DataReader('JPM', data_source='yahoo', start='23/4/2019', end=today)
data1['Key']=data1.index
data2['Key']=data2.index
result=pd.merge(data1, data2, on='Key')
x1=result['Close_x']
y1=result['Close_y']
coint_result=ts.coint(x1,y1) # CO-INTEGRATION FOR BOTH SERIES___________________________________
granger_data=list(zip(x1,y1))
granger_result=ts.grangercausalitytests(granger_data, maxlag=9) #GRANGER CAUSALITY TEST RESULTS
partial_autocorrect_1=ts.pacf_ols(x1, nlags=50)#PARTIAL AUTOCORRELATION RESULTS (STOCK 1)_______
partial_autocorrect_2=ts.pacf_ols(y1, nlags=50)#PARTIAL AUTOCORRELATION RESULTS (STOCK 2)_______

In [None]:
#VISUALIZING PARTIAL AUTOCORRELATIONS FOR BOTH TIME SERIES––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
plt.xlabel('Nlag', fontdict={'color':'White'})
plt.ylabel('Partial autocorrection value', fontdict={'color':'White'})
plt.title('Graph of partial autocorrections for Stock 1 and Stock 2', fontdict={'color':'White'})
plt.plot(partial_autocorrect_1)
plt.plot(partial_autocorrect_2)
plt.show()

In [None]:
print(coint_result)

#Interpretation:
Our results help in quantifying the correlation between the two different stocks (same sector and country). Running this for more stocks in the same country and sector could be useful in helping investors establish a pair trading strategy – which is when two stocks follow a similar trend in terms of stock prices.

We make use of historical stock data to establish the correlation between the 2 stocks. Thus, this is, in a sense, a macro-level analysis because this gives us an understanding of the general trend in stock prices for different countries and sectors: it doesn't help much if we want to precisely forecast the stock price in the near future because this depends on other factors relating to sentiment pertaining to the specific company in question, not just the factors affecting the industry. In order to micro-analyze and forecast stock performance, we would have to bring in sentiment analysis.

The standard cutoff value for the Pvalue is 0.05 (95% correlation) however we have used 0.01 (99% correlation)

