**Assignment III - Advanced Econometrics**
2022-2023


Sven van Zoelen, Marte Ottesen, Jan Koolbergen

In [56]:
### First we import the packages that we are going to use:
# You have to run this cell before any other, as they otherwise won't work
# If you get an error, the package is probably not installed
# You can then run 'conda install <package_name>' from the anaconda integrated terminal
# All these packages should be installed by default though

import numpy as np                  # Numpy is used for all kinds of mathematical stuff, especially related to matrices
import scipy.stats as sts           # SciPy is built specifically for statistical computing
import pandas as pd                 # Pandas is for reading in files and using dataframes
import matplotlib.pyplot as plt     # This package is used for the plotting framework, and can be used standalone
import seaborn as sns               # I use this package for nicer plots than the standard matplotlib plots

In [40]:
### First of all we read the data into a dataframe
raw_data = pd.read_csv("Assignment3_dataset.csv")

# Print the dataframe for a visual inspection
raw_data

Unnamed: 0,PERMNO,date,TICKER,RET
0,11308,2001/01/02,KO,-0.002051
1,11308,2001/01/03,KO,-0.025694
2,11308,2001/01/04,KO,-0.028481
3,11308,2001/01/05,KO,0.002172
4,11308,2001/01/08,KO,0.016251
...,...,...,...,...
21131,22752,2021/12/27,MRK,0.011092
21132,22752,2021/12/28,MRK,0.003134
21133,22752,2021/12/29,MRK,0.001823
21134,22752,2021/12/30,MRK,0.002469


In [52]:
### Now we want to divide the dataframe into seperate dataframes for each stock
# In order to not have to hardcode the ticker values, I use a dictionary comprehension:
data_dict = {stock: raw_data[raw_data.TICKER == stock] for stock in raw_data.TICKER.unique()}

# Now you can access the dataset using the name for example
print(data_dict['JNJ'])

# And more importantly, you can loop over the keys to apply a function to all datasets, which makes the assignment way easier
for stock in data_dict:
    mean = np.mean(data_dict[stock].RET)
    print(f"This is the mean return for stock {stock}: {mean:.6f}")
    
print("") # Just skipping a line
# Or even easier, loop through name and value at the same time:
for stock_name, stock_data in data_dict.items():
    mean = np.mean(stock_data.RET)
    print(f"This is the mean return for stock {stock_name}: {mean:.6f}")

       PERMNO        date TICKER       RET
10568   22111  2001/01/02    JNJ -0.029149
10569   22111  2001/01/03    JNJ -0.031863
10570   22111  2001/01/04    JNJ -0.021519
10571   22111  2001/01/05    JNJ  0.012937
10572   22111  2001/01/08    JNJ -0.001277
...       ...         ...    ...       ...
15847   22111  2021/12/27    JNJ  0.008440
15848   22111  2021/12/28    JNJ  0.004008
15849   22111  2021/12/29    JNJ  0.007044
15850   22111  2021/12/30    JNJ  0.004430
15851   22111  2021/12/31    JNJ -0.007196

[5284 rows x 4 columns]
This is the mean return for stock KO: 0.000312
This is the mean return for stock PFE: 0.000313
This is the mean return for stock JNJ: 0.000395
This is the mean return for stock MRK: 0.000261

This is the mean return for stock KO: 0.000312
This is the mean return for stock PFE: 0.000313
This is the mean return for stock JNJ: 0.000395
This is the mean return for stock MRK: 0.000261


In [19]:
### The above cell boils down to doing this:
KO_data     = raw_data[raw_data.TICKER == 'KO']
PFE_data    = raw_data[raw_data.TICKER == 'PFE']
JNJ_data    = raw_data[raw_data.TICKER == 'JNJ']
MRK_data    = raw_data[raw_data.TICKER == 'KO']

KO_mean     = np.mean(KO_data.RET)
PFE_mean    = np.mean(PFE_data.RET)
JNJ_mean    = np.mean(JNJ_data.RET)
MRK_mean    = np.mean(MRK_data.RET)

print(f"This is the mean return for stock KO: {KO_mean:.6f}")
print(f"This is the mean return for stock PFE: {PFE_mean:.6f}")
print(f"This is the mean return for stock JNJ: {JNJ_mean:.6f}")
print(f"This is the mean return for stock MRK: {MRK_mean:.6f}")



This is the mean return for stock KO: 0.000312
This is the mean return for stock PFE: 0.000313
This is the mean return for stock JNJ: 0.000395
This is the mean return for stock MRK: 0.000312


In [61]:
### Apply the pre-processing. Be careful to not run this function more than once, as it will scale the data with 100 every run.
# To reset the underlying data, just run the cell with the list comprehension again
for stock_name, stock_data in data_dict.items():
    mean_return = np.mean(stock_data.RET)
    stock_data.RET = stock_data.RET.transform(lambda ret: ret - mean_return)
    stock_data.RET = stock_data.RET.transform(lambda ret: ret * 100)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  stock_data.RET = stock_data.RET.transform(lambda ret: ret - mean_return)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  stock_data.RET = stock_data.RET.transform(lambda ret: ret * 100)


Question 2:

In [57]:
### We first define a function to report the summary statistics for one stock
# We can then use a for loop to do this for every plot
def summary_statistics(stock_data):
    """This function takes in some stock data and reports some summary statistics."""
    returns = stock_data.RET # Just put this in a variable for convenience
    
    no_of_observations = len(returns)
    mean = np.mean(returns)
    
    # medians
    Q1 = np.percentile(returns, 25)
    Q2 = np.percentile(returns, 50)
    Q3 = np.percentile(returns, 75)
    
    standard_deviation = np.std(returns)
    skewness = sts.skew(returns)
    kurtosis = sts.kurtosis(returns)
    
    minimum = np.min(returns)
    maximum = np.max(returns)
    
    # Now we report the results:
    print(f"Number of observations: {no_of_observations}")
    print(f"Mean: {mean}")
    print(f"Q1: {Q1}")
    print(f"Q2: {Q2}")
    print(f"Q3: {Q3}")
    print(f"Standard deviation: {standard_deviation}")
    print(f"Skewness: {skewness}")
    print(f"Kurtosis {kurtosis}")
    print(f"Min: {minimum}")
    print(f"Max: {maximum}")


In [58]:
### Call the function in a for loop
for stock_name, stock_data in data_dict.items():
    print(f"\nSummary statistics for stock {stock_name}:")
    summary_statistics(stock_data)


Summary statistics for stock KO:
Number of observations: 5284
Mean: 1.9057007633138096e-17
Q1: -0.5510767789553369
Q2: 0.013148221044663174
Q3: 0.5715232210446632
Standard deviation: 1.224661791122344
Skewness: -0.07097367679761012
Kurtosis 10.688925222436152
Min: -10.092201778955337
Max: 13.848298221044663

Summary statistics for stock PFE:
Number of observations: 5284
Mean: 1.8027466978205606e-17
Q1: -0.7306891180923543
Q2: -0.03128911809235435
Q3: 0.7302358819076457
Standard deviation: 1.5276243265720826
Skewness: -0.015431064055568969
Kurtosis 6.06370370707225
Min: -11.176889118092355
Max: 10.823910881907645

Summary statistics for stock JNJ:
Number of observations: 5284
Mean: -7.992597247679968e-17
Q1: -0.5310065480696443
Q2: -0.00998154806964429
Q3: 0.5526934519303557
Standard deviation: 1.1690675191405404
Skewness: -0.2559401141391979
Kurtosis 14.881540049222341
Min: -15.885131548069644
Max: 12.189668451930356

Summary statistics for stock MRK:
Number of observations: 5284
Mean

In [59]:
### If necessary, we can also just call it on one dataset
summary_statistics(data_dict['JNJ'])

Number of observations: 5284
Mean: -7.992597247679968e-17
Q1: -0.5310065480696443
Q2: -0.00998154806964429
Q3: 0.5526934519303557
Standard deviation: 1.1690675191405404
Skewness: -0.2559401141391979
Kurtosis 14.881540049222341
Min: -15.885131548069644
Max: 12.189668451930356
