# VAR

<hr> 

## Table of contents

[0: Used packages](#abs_0)<br>

[1: Data preparation](#abs_1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.a: Preparing the daily stock returns](#abs_1a)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.b: Preparing the insider trades](#abs_1b)<br>

[2: Time series analysis](#abs_2)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[2.a: Building the model](#abs_2a)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[2.b: Stationarity of the time series](#abs_2b)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[2.c: Granger-Causality](#abs_2c)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[2.d: Impulse Response Functions](#abs_2d)<br>

[3: Causality analysis based on monthly mean S&P returns](#abs_3)<br>


In this script we investigated the causality between insider transactions and stock using approaches of time series analysis. We build several vector autoregression (VAR) between two variables each: the first one indicates the stock returns, the second one represents the insider trading transactions. Firstly, we verify that the considered time series are stationary. Then, we consider the Granger causality between the variables of the considered models. Finally, we observe the impulse response functions (IRFs) for a predetermined numbers of legs for the defined VAR models.

<a id="abs_0"></a>
<hr>

## 0. Used packages

In [None]:
import pandas as pd
import numpy as np
import math
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import statistics
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.api import VAR
import scipy
import statsmodels.tsa.vector_ar.plotting as plotting
import warnings
import random

random.seed(20)
warnings.filterwarnings("ignore")

<a id="abs_1"></a>
<hr>

## 1. Data preparation

We consider the daily stock returns obtained from the daily adjusted closing prices from Algoseek and the insider trading transactions obtained from SEC API. 

<a id="abs_1a"></a>

### 1a. Preparing the daily stock returns

We use the previously in Data.ipynb prepared data with daily stock prices and returns from "close_prices_returns.csv". As we consider the insider trading only throughout the period 01.2018-12.2023, we need to restrict the time range of returns respectively.

In [None]:
# get the returns from the csv file

returns = pd.read_csv('close_prices_returns.csv')
returns['Date'] = pd.to_datetime(returns['Date'])
returns = returns.set_index(['issuerTicker', 'Date'])
returns = returns.reset_index().dropna().set_index(['issuerTicker', 'Date']) 
returns = returns.loc[pd.IndexSlice[:, '2018':], :] 

display(returns)

<a id="abs_1b"></a>

### 1b. Preparing the insider trades

First, we read the insider trades data and stock price return data of the relevant ticker from the csv files provided in Data.ipynb and only consider the relevant columns.

In [None]:
# read insider trades from csv file

insider_trades = pd.read_csv('insider_trades.csv')
insider_trades = insider_trades.drop(columns=['Unnamed: 0'])
insider_trades['periodOfReport'] = pd.to_datetime(insider_trades['periodOfReport'])

We would like to consider the causal relation between the share returns and the corresponding insider tradings. Due to the dataset size we restrict us to the 10 shares included in S&P of the biggest weight in the index:

In [None]:
# take the top tickers in the S&P 500 index as representative tickers

top_10_tickers = ['MSFT', 'AAPL', 'NVDA', 'AMZN', 'META', 'GOOG','BRK.B', 'LLY', 'AVGO', 'TSLA']

In [None]:
# get insider trades from the tickers in top_10_tickers and only consider relevant columns

trades = []

for element in top_10_tickers :
    insider_trades_by_ticker = insider_trades[['periodOfReport', 'issuerTicker' , 'total']].copy()
    insider_trades_by_ticker = insider_trades_by_ticker[insider_trades_by_ticker['issuerTicker']==element] 
    insider_trades_by_ticker['periodOfReport'] = pd.to_datetime(insider_trades_by_ticker['periodOfReport'])
    insider_trades_by_ticker = insider_trades_by_ticker.set_index(['periodOfReport'])
    trades.append(insider_trades_by_ticker)

In [None]:
# get the returns of tickers in top_10_tickers

#returns = pd.read_csv('close_prices_returns.csv')
#returns['Date'] = pd.to_datetime(returns['Date'])
#returns = returns.set_index(['issuerTicker', 'Date'])
#returns = returns.reset_index().dropna().set_index(['issuerTicker', 'Date']) 
#returns = returns.loc[pd.IndexSlice[:, '2018':], :] 
returns = returns.loc[top_10_tickers][['returns']]

returns

Next, we merge for each ticker the insider trading and return data and drop all rows with NaN values or in other words with missing values.

In [None]:
data = []

for i in range(10) :
    index = trades[i].index
    df = pd.merge(trades[i], returns.loc[top_10_tickers[i]], how='left', left_on='periodOfReport', right_on='Date')
    df.rename(columns = {"total": "Insider trades"}, inplace = True)
    df = df.set_index(index).dropna(axis=0)
    df = df.drop(columns=['issuerTicker'])
    data.append(df)

Delete the sixth element in the list because there is only data for one day. Also delete the corresponding ticker from top_10_tickers.

In [None]:
data[6]

In [None]:
del data[6]

In [None]:
del top_10_tickers[6]

Before the application of VAR, we need to check whether the data is stationary as this is one of the conditions that needs to be fulfilled for VAR.

<a id="abs_2"></a>
<hr>

## 2. Time series analysis

<a id="abs_2a"></a>

### 2a. Building the model

In [None]:
models = {} 
insider_trades_metrics = ["Total value"]

lags =  range(7)
signif = 0.05
num_tickers = 9
counters = []

for ticker in range(num_tickers):
        for metric in insider_trades_metrics:
            counter = 0 
            for lag in lags:
                model = VAR(data[ticker]).fit(maxlags=lag, ic='aic')
                if model.k_ar == lag or model.k_ar not in lags:
                    models[str(top_10_tickers[ticker]) + "_" + str(model.k_ar) + str("lags_") +metric] = model
                    counter = counter + 1 
            counters.append(counter)

<a id="abs_2b"></a>

### 2b. Stationarity of the time series

In [None]:
# Data is stationary - One of the conditions for AR/VAR is fulfilled 

signif = 0.05
lags =  range(7)

for j in range(9):
    print('\n Ticker: ' + top_10_tickers[j])
    for name, column in data[j][["returns"]].items():
        for i in lags:
            r = adfuller(column, maxlag=i,regression='ct',autolag = None)
            output = {'test_statistic': round(r[0], 4),
                      'pvalue': round(r[1], 4),
                      'n_lags': round(r[2], 4),
                      'n_obs': r[3]}
            p_value = output['pvalue']

            # Print Summary
            print(f'    Augmented Dickey-Fuller Test on "{name}"', "\n   ", '-'*47)
            print(f' Null Hypothesis: Data has unit root. Non-Stationary.')
            print(f' Significance Level    = {signif}')
            print(f' Test Statistic        = {output["test_statistic"]}')
            print(f' No. Lags Chosen       = {output["n_lags"]}')

            print(f' Critical value {(str(round(signif*100))+"%").ljust(6)} = {round(r[4][str(round(signif*100))+"%"], 3)}')

            if p_value <= signif:
                print(f" => P-Value = {p_value}. Rejecting Null Hypothesis.")
                print(f" => Series is Stationary.")
            else:
                print(f" => P-Value = {p_value}. Weak evidence to reject the Null Hypothesis.")
                print(f" => Series is Non-Stationary.")

The data is stationary and we can continue with the Granger-Causality test.

<a id="abs_2c"></a>

### 2c. Granger-Causality

In [None]:
def granger_causality(results, column_names = ['Insider trades', 'returns']):
        test = results.test_causality(column_names[0], [column_names[1]], kind='Wald')
        print("\t " + str(column_names[1]) + " -> " + str(column_names[0]) + ":", test)
        # Print summary of results
        #print(results.summary())
    
        test = results.test_causality(column_names[1], [column_names[0]], kind='Wald')
        print("\t " + str(column_names[0]) + " -> " + str(column_names[1]) + ":", test)
        # Print summary of results
        #print(results.summary())

# Fit the VAR model
for model in models.keys():
    if model.split("_")[1] != "0lags":
        print('\n Model: ' + model)
        #Granger Causality test
        granger_causality(models[model])


<a id="abs_2d"></a>

### 2d. Impulse Response Functions

In [None]:
orth = False
repl=1000
signif=0.05
seed=None
stderr_type='asym'
plot_stderr=True
plot_params={'font' : 20} 
subplot_params=None
figsize = (20,7)
plt.rc('font', size=20)
title = ""

for model in models.keys():
    if model.split("_")[1] != "0lags":
        print("Model: ", model)
        irf = models[model].irf(20)
        
        cum_effects = irf.cum_effects
        lr_effects = irf.lr_effects
        
        
        stderr = irf.cum_effect_cov(orth=False)
        
        for k in range(2):
            impulse=1-k
            response=k
            fig = plotting.irf_grid_plot(cum_effects, stderr, impulse, response, 
                                         irf.model.names, title, signif=signif,
                                         hlines=lr_effects,
                                         subplot_params=subplot_params,
                                         plot_params=plot_params,
                                         figsize=figsize,
                                         stderr_type=stderr_type)
        
        
        
            if k==0:
                plt.title("Accumulated IRF's from VAR("+str(i)+") Model \n Response of %"u"Δ monthly returns to Innovations in %"u"ΔNNI" )
            else:
                plt.title("Accumulated IRF's from VAR("+str(i)+") Model \n Response of %"u"ΔNNI to Innovations in %"u"Δ'monthly returns'" )
            plt.xticks(np.arange(0, 21, 1))
            plt.grid(visible=True, axis='y')
            plt.show()
        del irf

<a id="abs_2"></a>
<hr>

## 3. Causality analysis based on monthly mean S&P returns

Similarly to Paragraph 2 we conduct the causality analysis based on monthly mean S&P returns and the insider trading transactions, where we build metrics for the later following the paper https://www.sciencedirect.com/science/article/pii/S1062976901001144?ref=cra_js_challenge&fr=RR-1.

In [None]:
# Read and print the stock tickers that make up S&P500
tickers = pd.read_html(
    'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
# Get the list of tickers
sp500_tickers = tickers['Symbol'].tolist()

# Filter the returns dataframe
filtered_returns = returns.loc[returns.index.get_level_values('issuerTicker').isin(sp500_tickers)]

# Group by Date and sum up the returns for each day
monthly_mean_returns = filtered_returns.groupby([filtered_returns.index.get_level_values(1).year, filtered_returns.index.get_level_values(1).month])['returns'].mean()
monthly_mean_returns.index.names =["Year", "Month"]

We build metrics for the number of insider transaction and number of acquired/disposed shares as in the following paper https://www.sciencedirect.com/science/article/pii/S1062976901001144?ref=cra_js_challenge&fr=RR-1. The metrics take into account the difference in "sign" of share aquistion and disposal

In [None]:
#aggregate insider trades from csv file
#Paggregate number of insider purchase transactions in a given month; aggregate number of shares purchased by insiders in a given month
#Saggregate number of insider sale transactions in a given month; aggregate number of shares sold by insiders in a given month
#NNI net number index
#NSI net share index
#PNI insider purchase index 
#SNI insider sale index 
#https://pdf.sciencedirectassets.com/272067/1-s2.0-S1062976900X00208/1-s2.0-S1062976901001144/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEL%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIA9Jm0431VRhla%2FynGiIVi78EI4R3uHytcvPrzcyGXu8AiEAgxODzglSJlVhjV2oElvIg%2F2n4WgKYPVBHtWNGEL18FUqvAUIiP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwwNTkwMDM1NDY4NjUiDHikV2glUujL3I4gfCqQBfB661ooUJN9B8iroM4TP8XA4Z26a2YCl1PXn%2B0tA3W4FX4h0t4OSGTu7bgYUI%2B9Jl6EBeQUWH1vuCGmveiToodhkr8KwDq%2FcrEiZhh%2F%2FQjbk%2BascW9pa0O2FR6ZlJmWD8pnIca1fFrAnLdnNqyuFU%2FY2bHmdiUGymEMqg%2BbJbb8qUIlbEJ8SE5XvySJTwSo4KJAX00hOxAVkePVQveuyAJLb4oGTz%2FMGIzo58WlvVjsLK2jKfg78J%2B3QZorWoFoQOsmIU6s2sSMmPE8xpxHCOJKCmt26QqlpY%2BZ5R0FvsHIcNoDEsq9z7a6gE0YjEIi5xTqz1BH6A6gWXjcFlkd%2BWzPdoq2mC2VeFSdGQdTkoxBaWkuEiPQ8WJmxiUHYNtyZDyslt6%2BTwgGp%2FPFjCpSVSu0sDzcjFT%2BmNKBc%2BmEl9NTRN5xnjDaFCs6ainQEuunwdyoqQW%2F8jq0TruRZz3bZH9A4%2FSecQkIrQQcwxwiq09lHO9AMyXAq5Xwn9xdVjjJFFTTSrZDUqtT5hQxIr8cBZcBuDWfJRoJ45LHQ%2Fh%2FVQD3gctytBYX%2BaV3Qxqjalzm7k1CNW43Ofxat1K5Bwa%2BImxnj6KGki0ZeXCqO%2FWFViRmKXxHvj%2FDsISqHosMDLEcmhrBgXNa2Z0FBcKpheEsWkIa6KKy%2Fyu4l4ToUWV7Cz9cWhyNfQojSZ1DcY5atyE8NK97sB3JojnZ03A%2FWp8ACVLGs4Gi%2BoUONKCfUUy5Sd1OpA2zquxEAVlDEQ7kRSZiSVEMrAw8KrvVeHG90XA5lZ0q0NA%2BJIcFXkmTUe%2FNMNaQGG%2FRkuxFopiKhhlJBlyS5fWMy33PXQjnYKSOfgHPCBXU1b4NQE5Kmx6UGk5Ls1bRMNnAyLQGOrEBi22DAjzLNtdOwRagWsSgy1%2BO9VVgUx22UvG7PgPE5Ot1FWkpyPhRklLeobrQnOTjtrC4ITF7tDvRQEg8nexFa1cDdQiBl26KFJ1d6WkVDZU6o6DaFEzhqY8vTdT8yAfeihHTvVcCuohojbnkEBntPohHn2XjoLmmtpi6A8mWaM16uhlSedSUvq0kuAOFfd0X8EERRpcmr6v%2FNJYtCvmwWbNmW2sILHNUiN6QayNjXN3t&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240713T071608Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYZ3B5UOG7%2F20240713%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=cbb7ad2625ca7ac14752e0e4a08b24ad492583283ae216a53c3376b64b101ed2&hash=52958844bc837c5efbec429827cd475c1a3041d26c2d1739a1da568c4e7adfa2&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S1062976901001144&tid=spdf-518aba43-0dad-40d7-8d28-a2d4b950f59f&sid=2e67adb8670c474b520a492-2525918a92b3gxrqa&type=client&tsoh=d3d3LnNjaWVuY2VkaXJlY3QuY29t&ua=130e5e06525251025500&rr=8a277c5e08b91da4&cc=ua

insider_trades["transactions"]=1
insider_trades = insider_trades.loc[insider_trades['issuerTicker'].isin(sp500_tickers)]
    
acquired = insider_trades.loc[insider_trades["acquiredDisposed"]=="A"]
acquired.set_index('periodOfReport', inplace = True)
P = acquired.groupby([acquired.index.year, acquired.index.month]).sum()[["transactions","shares"]]
#P.set_index(P.index.month, inplace = True)
disposed = insider_trades.loc[insider_trades["acquiredDisposed"]=="D"]
disposed.set_index('periodOfReport', inplace = True)
S = disposed.groupby([disposed.index.year, disposed.index.month]).sum()[["transactions","shares"]]
#S.set_index(S.index.month, inplace = True)
NNI = ((P-S)/ (P+S))["transactions"]
NSI = ((P-S)/ (P+S))["shares"]
PNI = (P/ (P+S))["transactions"]
SNI = (S/ (P+S))["transactions"]
    
NNI.rename("NNI", inplace = True)
NSI.rename("NSI", inplace = True)
PNI.rename("PNI", inplace = True)
SNI.rename("SNI", inplace = True)

NNI.index.names =["Year", "Month"]
NSI.index.names =["Year", "Month"]
PNI.index.names =["Year", "Month"]
SNI.index.names =["Year", "Month"]


In [None]:
dfs = []
insider_trades_metrics = ["NNI", "NSI", "PNI", "SNI"]
insider_trades_metrics_ = [NNI, NSI, PNI, SNI]
for metric in insider_trades_metrics_:
    df =  pd.merge(metric, monthly_mean_returns, how='left', left_on=["Year", "Month"], right_on=["Year", "Month"])
    dfs.append(df)

In [None]:
# Data is stationary - One of the conditions for AR/VAR is fulfilled 

signif = 0.05
lags =  range(7)

for df in dfs:
    print("We consider "+ str(df.columns[0]) + " as a metric")
    for i in lags:
        for name, column in df.items():
                r = adfuller(column, maxlag=i,regression='ct',autolag = None)
                output = {'test_statistic': round(r[0], 4),
                          'pvalue': round(r[1], 4),
                          'n_lags': round(r[2], 4),
                          'n_obs': r[3]}
                p_value = output['pvalue']
    
                # Print Summary
                print(f'    Augmented Dickey-Fuller Test on "{name}"', "\n   ", '-'*47)
                print(f' Null Hypothesis: Data has unit root. Non-Stationary.')
                print(f' Significance Level    = {signif}')
                print(f' Test Statistic        = {output["test_statistic"]}')
                print(f' No. Lags Chosen       = {output["n_lags"]}')
    
                print(f' Critical value {(str(round(signif*100))+"%").ljust(6)} = {round(r[4][str(round(signif*100))+"%"], 3)}')
    
                if p_value <= signif:
                    print(f" => P-Value = {p_value}. Rejecting Null Hypothesis.")
                    print(f" => Series is Stationary.")
                else:
                    print(f" => P-Value = {p_value}. Weak evidence to reject the Null Hypothesis.")
                    print(f" => Series is Non-Stationary.")

We build the models with the number of lags, so that the time series preserves the stationarity

In [None]:
models_fitted = [VAR(dfs[0]).fit(maxlags = 3),
                 VAR(dfs[1]).fit(maxlags = 3),
                 VAR(dfs[2]).fit(maxlags = 1),
                 VAR(dfs[3]).fit(maxlags = 1) ] 
for i, model_fitted in enumerate(models_fitted):
    granger_causality(model_fitted, [insider_trades_metrics[i], "returns"])
    
for model in models_fitted:
    irf = model_fitted.irf(20)
            
    cum_effects = irf.cum_effects
    lr_effects = irf.lr_effects
            
            
    stderr = irf.cum_effect_cov(orth=False)
            
    for k in range(2):
                impulse=1-k
                response=k
                fig = plotting.irf_grid_plot(cum_effects, stderr, impulse, response, 
                                             irf.model.names, title, signif=signif,
                                             hlines=lr_effects,
                                             subplot_params=subplot_params,
                                             plot_params=plot_params,
                                             figsize=figsize,
                                             stderr_type=stderr_type)
            
            
            
                if k==0:
                    plt.title("Accumulated IRF's from VAR(3) Model \n Response of %"u"Δ monthly returns to Innovations in %"u"ΔNNI" )
                else:
                    plt.title("Accumulated IRF's from VAR(3) Model \n Response of %"u"ΔNNI to Innovations in %"u"Δ'monthly returns'" )
                plt.xticks(np.arange(0, 21, 1))
                plt.grid(visible=True, axis='y')
                plt.show()
    del irf