# Empirical Analysis

## Extracting the data

Import the necessary libraries

In [1]:
import os
import sys
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import warnings


In [2]:
warnings.filterwarnings('ignore')
path = os.getenv("ROOT_PATH")
sys.path.append(path)
print(path)


/Users/monic/Desktop/Master_Thesis/empirical


The list of **all the current components** of the OMX Stockholm PI index can be found [here](https://indexes.nasdaqomx.com/Index/Weighting/OMXSPI) by the end of the day of 16th February 2024.

The list of **large-caps** of the OMX Stockholm PI index can be found [here](https://indexes.nasdaqomx.com/Index/Weighting/OMXSLCPI) by the end of the day of 16th February 2024.

The list of **mid-caps** of the OMX Stockholm PI index can be found [here](https://indexes.nasdaqomx.com/Index/Weighting/OMXSMCPI) by the end of the day of 16th February 2024.

The list of **small-caps** of the OMS Stockholm PI index can be found [here](https://indexes.nasdaqomx.com/Index/Weighting/OMXSSCPI) by the end of the day of 16th February 2024.


In the following steps we're charging the name of all the components and turn them into a list.

The latter will be used to fetch the data - adjusted closed price and volume - from Yahoo Finance. And save accordingly in a file called `raw_data.csv`

In [3]:
tickers= pd.read_excel(f"{path}/raw_data/Weightings_20240216_OMXSPI.xlsx",header=0)
# If error shows up run: !pip3 install xlrd


In [4]:
tickers.head()


Unnamed: 0,Company Name,Security-Symbol
0,TRATON SE,8TRA.ST
1,AAK AB,AAK.ST
2,ABB Ltd,ABB.ST
3,Abliva AB,ABLI.ST
4,AcadeMedia AB,ACAD.ST


In [5]:
tickers_list=tickers['Security-Symbol'].to_list()


In [6]:
data = yf.download(tickers_list, start="2013-01-01")


[*********************100%%**********************]  393 of 393 completed


In [7]:
data.head()


Price,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Ticker,8TRA.ST,AAK.ST,ABB.ST,ABLI.ST,ACAD.ST,ACE.ST,ACRI-A.ST,ACRI-B.ST,ACTI.ST,ADDT-B.ST,...,VPLAY-B.ST,WALL-B.ST,WBGR-B.ST,WIHL.ST,WISE.ST,WTW-A.ST,XANO-B.ST,XBRANE.ST,XSPRAY.ST,XVIVO.ST
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2013-01-02,,39.280941,126.969963,13.91973,,,,,31.909622,11.440967,...,,329316,,216712,3273,,5632,,,37698
2013-01-03,,39.280941,125.590866,13.844893,,,,,32.190765,11.410128,...,,185792,,107776,2280,,17749,,,24970
2013-01-04,,39.565578,126.326378,13.545544,,,,,32.190765,11.502641,...,,160304,,215020,3305,,13726,,,37145
2013-01-07,,39.779072,125.223076,13.470707,,,,,32.050194,11.502641,...,,226744,,170032,9196,,18712,,,30260
2013-01-08,,39.280941,125.774734,13.770056,,,,,31.34734,11.718507,...,,495576,,210944,2322,,87030,,,23163


In [None]:
data.to_csv(f"{path}/raw_data/raw_data.csv")


During the following cells we are going to create different lists with the names of the companies considered large-caps, mid-caps, and small caps.

In [8]:
l_caps=pd.read_excel(f"{path}/raw_data/large_caps.xlsx")
l_caps_list=l_caps['Security-Symbol'].to_list()


In [9]:
m_caps=pd.read_excel(f"{path}/raw_data/mid_caps.xlsx")
m_caps_list=m_caps['Security-Symbol'].to_list()


In [10]:
s_caps=pd.read_excel(f"{path}/raw_data/small_caps.xlsx")
s_caps_list=s_caps['Security-Symbol'].to_list()


In [11]:
len(l_caps_list)+len(m_caps_list)+len(s_caps_list)


392

In [12]:
len(tickers_list)


393

There is one company that we cannot classify as large, mid or small-cap. 

It'll be pointed out in the following steps.

## Cleaning data

After downloading the data in the file `raw_data.csv` you must open it in Microsoft Excel.
In the **first row** we can find the number of the metric fetched.
In the **second row** we can find the names of the different companies.
In the **first column** we can find the dates we have exported.

To clean up the dataset, delete those columns where the first row differs from `adjClose` and `volume`.
As soon as this is done, cut those columns where the first row is `volume` and paste them in a new spreadsheet (not tab). 
Remove the first row as it doesn't add useful information at the moment. Call `volumes` to this new spreadsheet and save it as a .csv file.

Come back to the initial spreadsheet called `raw_data.csv`. 
Since we only have `adjClose` prices, remove the first row.
Rename the spreadsheet as `price` and save it as a .csv file



In [13]:
df_price = pd.read_excel(f'{path}/raw_data/price.xlsx')


In [14]:
print(f"Number of companies in the sample: {df_price.shape[1]-1}")


Number of companies in the sample: 393


In [15]:
null_percentage_dict={'Company':[],'Null_percentage':[],'Type':[]}

for column in df_price.columns[1:]:
    company_name=column
    null_percentage = df_price[company_name].isnull().mean()*100
    null_percentage_dict['Company'].append(company_name)
    null_percentage_dict['Null_percentage'].append(null_percentage)
    if company_name in l_caps_list:
        null_percentage_dict['Type'].append("l-cap")
    elif company_name in m_caps_list:
        null_percentage_dict['Type'].append("m-cap")
    elif company_name in s_caps_list:
        null_percentage_dict['Type'].append("s-cap")
    else: null_percentage_dict['Type'].append("non-registered")

df_null_percentage=pd.DataFrame.from_dict(null_percentage_dict)


In [16]:
df_null_percentage[df_null_percentage['Type']=="non-registered"]


Unnamed: 0,Company,Null_percentage,Type
249,NOKIA-SEK.ST,0.0,non-registered


In [18]:
df_null_percentage=df_null_percentage.sort_values(by="Null_percentage",ascending=False)

df_null_percentage.head()


Unnamed: 0,Company,Null_percentage,Type
152,HAKI-B.ST,98.463188,s-cap
294,RUSTA.ST,97.069335,m-cap
301,SAMPO-SDB.ST,88.813438,l-cap
15,ALLEI.ST,86.704789,l-cap
252,NORION.ST,86.275911,m-cap


In [23]:
df_null_percentage.columns


Index(['Company', 'Null_percentage', 'Type'], dtype='object')

In [24]:
fig = px.bar(df_null_percentage, x='Company', y='Null_percentage', color='Type',
             labels={'Null_percentage': 'Null_percentage'},
             title='Null Percentage of Companies by Cap Classification',
             hover_data=['Company', 'Null_percentage', 'Type'])
fig.update_layout(barmode='group', xaxis_title='Company', yaxis_title='Null_percentage')
fig.show()
