# S&P500 Stocks' Jensen's Alpha

## Introduction
##### Stock market is where the stocks are bought and sold, and stock prices regularly fluctuate. Diving deep into the stock price data and modeling the stock returns shows that some of stocks usually show better performance than the others. This notebook answers a very usual question of stock market enthusiasts and practitioners, what type of stocks beat the market?

Notebook is organized in a research paper style. It presents data set and its source, and the usual model for estimation of the Jensen's Alpha. Next, the main results are presented and the notebook is concluded.

##### But first what is the Jensen's Alpha?

Stock prices are volatilem, however, not a single stock can indiciate how the market as a whole is doing. For that reason we look into all of the stocks together for example the index (S&P500, Nikkei225).
So basically stock index shows the whole market itself. If index price is going up that means all the market is doing good.

Historically, individual stocks are tested if they follow the market or not. Usually some of the stocks go up with the market or go down when the market goes up. For that reason several mathematical financial models are developed and the most practical one is called Capital Asset Pricing Model (CAPM).

$$
R_{S} - R_{F} = \beta_{rm} (R_{M} - R_{F})
$$

Here, RS is a stock return, RF is the risk free rate, and RM is the market return. So expectation is that the stock return will relate to the market return by the parameter beta.

However, some of stocks perform better than the market and the CAPM model captures that factor by alpha. It is called Jensen's Alpha (Jensen, 1967., https://papers.ssrn.com/sol3/papers.cfm?abstract_id=244153).

$$
R_{S} - R_{F} = \alpha + \beta_{rm} (R_{M} - R_{F})
$$

Now, stocks with the larger alpha is assumed to perform better than the market and portfolio engineers always look for that alpha.

Thus, in this notebook we will look into S&P500 stocks and estimate the alpha. We also look into which industry is showing better alpha and better market and alpha relation. Since greater the alpha greater the chance of stock performing better than the market.

## Data

Data is coming from two sources. 
FED's open source is used to get risk free rates and Stooq.com is used to get stock prices.

But before, all required packages should be installed.

In [0]:
!pip install datapackage --quiet 

You should consider upgrading via the '/local_disk0/.ephemeral_nfs/envs/pythonEnv-35d262a9-03e5-4030-b713-95177aeffc02/bin/python -m pip install --upgrade pip' command.[0m


In [0]:
!pip install pandas_datareader --quiet 

You should consider upgrading via the '/local_disk0/.ephemeral_nfs/envs/pythonEnv-35d262a9-03e5-4030-b713-95177aeffc02/bin/python -m pip install --upgrade pip' command.[0m


In [0]:
### import packages 

import datapackage
import pandas as pd
import pandas_datareader as pdr
import statsmodels.api as sm
import numpy as np
np.random.seed(9876789)

### here S&P500 stock names are imported

data_url = "https://datahub.io/core/s-and-p-500-companies/datapackage.json"

package = datapackage.Package(data_url)

resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])




In [0]:
### prepare risk free rate data

rf_data = pdr.get_data_fred('DGS1MO', start = "2010-01-01",end = "2024-01-01")
rf_data["rf"] = rf_data["DGS1MO"] / 100

### lets take a look at the description of the data
### it looks ok since it is already processed and offered through fed source
### min is 0% as expected and max is 6%
### seems like rf data has std of 1.2%

rf_data.describe()

Unnamed: 0,DGS1MO,rf
count,3405.0,3405.0
mean,0.76679,0.007668
std,1.241183,0.012412
min,0.0,0.0
25%,0.04,0.0004
50%,0.11,0.0011
75%,1.14,0.0114
max,6.02,0.0602


In [0]:
### so rf data is from 2010 until 2023
rf_data

Unnamed: 0_level_0,DGS1MO,rf
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2010-01-01,,
2010-01-04,0.05,0.0005
2010-01-05,0.03,0.0003
2010-01-06,0.03,0.0003
2010-01-07,0.02,0.0002
...,...,...
2023-08-04,5.54,0.0554
2023-08-07,5.54,0.0554
2023-08-08,5.54,0.0554
2023-08-09,5.51,0.0551


In [0]:
#### uncomment this section if it is the first time running the notebook
#### here all stock data is downloaded from stooq

# stock_data_all = pd.DataFrame(index = pd.date_range("2010-01-01","2024-01-01"))

# for ticker in ticker_list[:]:
    
#     stock_data = pdr.get_data_stooq(ticker)
    
#     if stock_data.shape[0] == 0:
#         continue

#     stock_data[ticker + "_close"] = stock_data["Close"].copy()
    
#     stock_data_all = pd.merge(stock_data_all, stock_data[ticker + "_close"], left_index=True, right_index=True, how = "left")

In [0]:
#### if it is the second time running the notebook simply use already downloaded data

# file location and type
file_location = "/FileStore/tables/stck_data-2.csv"
file_type = "csv"

# schema info
infer_schema = "true"
first_row_is_header = "true"
delimiter = ","

# read data
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

# some cleaning and make it pandas df
stock_data_all = df.to_pandas_on_spark()
stock_data_all.set_index("_c0", inplace = True)
stock_data_all.dropna(inplace = True)

stock_data_all = stock_data_all.to_pandas()



In [0]:
### lets take a look at the description of the data
### we have all the stock prices for most of stocks in S&P500

stock_data_all.describe()

Unnamed: 0,MMM_close,AOS_close,ABT_close,ABBV_close,ACN_close,ATVI_close,ADM_close,ADBE_close,AAP_close,AMD_close,...,LRCX_close,LW_close,LVS_close,LEG_close,LDOS_close,LEN_close,LNC_close,LIN_close,LYV_close,LKQ_close
count,842.0,842.0,842.0,842.0,842.0,842.0,842.0,842.0,842.0,842.0,...,842.0,842.0,842.0,842.0,842.0,842.0,842.0,842.0,842.0,842.0
mean,145.627096,61.186117,109.435773,121.166836,279.43099,79.948988,66.324533,457.088563,161.186961,90.684804,...,493.721727,75.679435,47.754466,37.181471,95.424131,88.063281,46.685898,289.769903,80.6969,45.959838
std,27.171471,10.120109,10.909956,27.51184,47.378502,8.467008,17.816031,97.277947,38.906535,22.851178,...,116.234307,16.797236,9.087701,5.772118,7.143269,17.942782,15.736039,49.610363,20.022655,10.596039
min,93.31,36.332,78.4505,66.4941,158.022,56.942,31.6112,275.2,64.5112,47.52,...,226.774,49.4837,30.14,21.9498,76.8298,37.7335,19.11,168.683,34.51,19.3648
25%,125.0165,53.69835,103.0665,98.5412,251.3185,75.76825,51.388775,377.025,139.49825,77.4575,...,419.52025,62.6505,38.975,32.974975,90.2895,75.87815,32.504625,250.173,69.1425,37.014025
50%,145.236,62.0829,108.626,116.8765,280.282,78.8755,65.2045,453.805,160.5275,87.15,...,500.1915,74.0198,47.09,36.64435,96.0521,88.0889,47.40895,293.36,81.78,49.4703
75%,168.1405,68.3603,116.659,146.5975,312.26625,83.51925,81.94675,508.205,192.44025,105.27,...,588.5295,82.376325,56.5275,40.46765,100.865,100.811,61.588725,325.89725,91.9325,54.33
max,194.591,84.2683,139.123,170.151,409.834,102.7,97.67,688.37,229.551,161.91,...,721.26,115.12,66.2,53.7268,110.117,133.24,73.7276,391.04,126.04,59.01


In [0]:
### our stock price data is in a daily frequency and it is from 2020 April until 2023 August
stock_data_all

Unnamed: 0_level_0,MMM_close,AOS_close,ABT_close,ABBV_close,ACN_close,ATVI_close,ADM_close,ADBE_close,AAP_close,AMD_close,...,LRCX_close,LW_close,LVS_close,LEG_close,LDOS_close,LEN_close,LNC_close,LIN_close,LYV_close,LKQ_close
_c0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-04-06,127.274,36.3320,79.2164,66.7952,159.878,60.543,33.4236,319.13,89.1787,47.52,...,242.462,49.7015,40.70,22.5810,91.2191,37.7335,24.5451,168.683,34.51,19.5797
2020-04-07,130.790,37.7006,78.4505,66.4941,158.022,58.842,33.6090,308.93,92.5583,47.56,...,247.753,52.8480,42.70,23.6767,89.0183,37.7731,26.7544,172.540,37.85,20.4382
2020-04-08,134.781,38.2330,81.3439,69.2941,165.354,60.022,34.0766,317.18,95.4183,48.79,...,257.931,55.3901,46.55,25.6483,91.7483,41.7168,29.6094,179.244,37.76,21.4732
2020-04-09,133.686,38.2232,82.3884,70.3415,171.296,59.423,35.1027,318.70,99.4560,48.38,...,249.752,56.0124,47.86,26.6369,91.7681,43.1033,31.5723,181.676,38.27,21.9907
2020-04-13,132.436,37.6729,82.2040,70.8261,166.137,61.673,33.6928,320.65,102.3750,50.94,...,250.226,55.8783,46.49,25.5683,88.5177,39.7845,29.8047,178.452,39.41,20.8286
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-08-07,105.230,73.8400,108.4500,150.3300,315.870,91.570,87.2100,529.73,71.3000,116.81,...,705.360,103.1000,58.14,29.6900,97.6300,127.3100,27.6500,384.970,89.22,54.2300
2023-08-08,104.120,73.2400,105.5800,149.6200,315.150,91.590,86.4400,520.60,70.5600,113.23,...,696.000,100.4200,57.52,29.4900,97.0000,127.0300,27.5800,381.780,87.88,53.3150
2023-08-09,103.710,72.7900,105.3000,150.7700,311.530,91.440,85.8500,513.78,71.2000,110.47,...,686.050,99.3300,56.85,29.5900,97.5600,125.5800,26.6400,379.560,86.30,53.3600
2023-08-10,103.900,71.9900,104.7400,151.4400,310.410,91.420,85.9000,515.83,71.5300,110.23,...,686.390,97.9900,57.40,29.2300,97.3300,123.3700,26.3100,382.320,86.99,53.6600


In [0]:
### data preparation
### prepare S&P500 and risk free data

market_data = pdr.get_data_fred('SP500')
market_data["market_return"] = market_data["SP500"].diff() / market_data["SP500"]

rf_data = pdr.get_data_fred('DGS1MO', start = "2010-01-01",end = "2024-01-01")
rf_data["rf"] = rf_data["DGS1MO"] / 100
rf_data["yyyymm"] = rf_data.index.year.astype("str") + rf_data.index.month.astype("str").str.zfill(2)


## Modeling

First we start by fitting a linear regression. By using Ordinary Least Squares method we estimate parameters. They are estimated parameters so they will have a cap on it.

$$
R_{i,t} - R_{F} = \hat{\alpha}_{i} + \hat{\beta}_{i} (R_{M}  - R_{F} )+ \epsilon_{i,t}
$$

Here, i is for stock ticker, and t is for time.

So now we have market index price data, risk free data and stock prices. We can start modeling from now on.

But before that lets choose one stock and explain what is the model result and what is the estimated alpha.

In [0]:
## lets choose apple stocks
ticker = "AAPL"

## make temporary data
stock_data = stock_data_all[[ticker + "_close"]].copy()

## estimate returns
stock_data["stock_return"] = stock_data[ticker + "_close"].diff() / stock_data[ticker + "_close"]

## merge with market data
data_mr = pd.merge(market_data["market_return"], 
                   stock_data["stock_return"], 
                   left_index=True, 
                   right_index=True)

data_mr["yyyymm"] = data_mr.index.year.astype("str") + data_mr.index.month.astype("str").str.zfill(2)
data_mr["yyyymmdd"] = data_mr.index.copy()

## merge with rf data
data_mr = pd.merge(data_mr[["yyyymmdd","stock_return","market_return","yyyymm"]], 
                   rf_data[["rf","yyyymm"]], 
                   left_on="yyyymm", 
                   right_on="yyyymm")

data_mr["stock_return"] = data_mr["stock_return"] - data_mr["rf"] 
data_mr["market_return"] = data_mr["market_return"] - data_mr["rf"] 

## clean data
data_mr.dropna(inplace=True)

## add constant into the data so model will estimate alpha
data_mr = sm.add_constant(data_mr)

## model fitting
model = sm.OLS(data_mr.stock_return, data_mr[["const","market_return"]])
results = model.fit()


In [0]:
print(results.summary())

### here R-squared shows that mraket can explain almost 80% (0.789) of the fluctuations in apple stocks
### looking into coef for const (that is our alpha) we see P value is almost zero
### p value shows if the coef is statistically significant
### so in short, apple stocks showing 0.0015 better performance than the market

                            OLS Regression Results                            
Dep. Variable:           stock_return   R-squared:                       0.789
Model:                            OLS   Adj. R-squared:                  0.789
Method:                 Least Squares   F-statistic:                 6.301e+04
Date:                Sun, 13 Aug 2023   Prob (F-statistic):               0.00
Time:                        13:41:58   Log-Likelihood:                 49905.
No. Observations:               16835   AIC:                        -9.981e+04
Df Residuals:                   16833   BIC:                        -9.979e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const             0.0015      0.000     13.636

In [0]:
### here we repeat the same process for all other stocks 
### estimate the alpha and save them for further use

In [0]:
ticker_list = data["Symbol"].tolist()

market_pv_list = list()
alpha_pv_list = list()

market_coef_list = list()
alpha_coef_list = list()

symbol_list = list()
    
for ticker in ticker_list[:]:
    
    if ticker + "_close" not in stock_data_all.columns.tolist():
        continue

    stock_data = stock_data_all[[ticker + "_close"]].copy()

    if stock_data.shape[0] == 0:
        continue

    stock_data["stock_return"] = stock_data[ticker + "_close"].diff() / stock_data[ticker + "_close"]

    ## merge with market data
    data_mr = pd.merge(market_data["market_return"], 
                       stock_data["stock_return"], 
                       left_index=True, 
                       right_index=True)

    data_mr["yyyymm"] = data_mr.index.year.astype("str") + data_mr.index.month.astype("str").str.zfill(2)
    data_mr["yyyymmdd"] = data_mr.index.copy()

    ## merge with rf data
    data_mr = pd.merge(data_mr[["yyyymmdd","stock_return","market_return","yyyymm"]], 
                       rf_data[["rf","yyyymm"]], 
                       left_on="yyyymm", 
                       right_on="yyyymm")


    data_mr["stock_return"] = data_mr["stock_return"] - data_mr["rf"] 
    data_mr["market_return"] = data_mr["market_return"] - data_mr["rf"] 

    ## clean data
    data_mr.dropna(inplace=True)

    ## add constant into the data so model will estimate alpha
    data_mr = sm.add_constant(data_mr)

    ## model fitting
    model = sm.OLS(data_mr.stock_return, data_mr[["const","market_return"]])
    results = model.fit()

    ## save parameters
    market_pv_list.append(results.pvalues[0]) ##market pvalue
    alpha_pv_list.append(results.pvalues[1]) ##alpha pvalue

    ## save pvalues
    market_coef_list.append(results.params[0]) ##market pvalue
    alpha_coef_list.append(results.params[1]) ##alpha pvalue

    ## save tickers
    symbol_list.append(ticker)


In [0]:
### here save all data into dataframe
final_data = pd.DataFrame()
final_data["symbol"] = symbol_list
final_data["alpha_coef"] = alpha_coef_list
final_data["alpha_pv"] = alpha_pv_list
final_data["market_coef"] = market_coef_list
final_data["marke_pv"] = market_pv_list

### lets focus on only significant data
final_data_significant = final_data[final_data["alpha_pv"] <= 0.1]

## Results

### Estimated Jensen's alpha

In [0]:
## lets check alphas
## it seems it goes up to 1.35 and goes down to 1

final_data.describe()

Unnamed: 0,alpha_coef,alpha_pv,market_coef,marke_pv
count,272.0,272.0,272.0,272.0
mean,1.003836,0.0,-7.7e-05,0.06896246
std,0.100255,0.0,0.001489,0.208866
min,0.781651,0.0,-0.003749,4.270288e-139
25%,0.940413,0.0,-0.001065,8.226774e-27
50%,0.99491,0.0,-8.1e-05,1.222609e-10
75%,1.067934,0.0,0.000902,0.0009180182
max,1.352819,0.0,0.004972,0.9948215


In [0]:
### by looking into all alpha in below graph we see interesting result

### just for reminder the meaning of alpha is, higher the alpha from 1 better the performance and alphas lower than 1 are the cases when stock is doing poorly than the market

### thus we can conclude that stocks forming two groups
### first, over 1 alpha. they do better than the market return
### second is the lower than 1 alpha. they usually doing worse than the market

### next we check market coef

In [0]:
display(final_data_significant)

symbol,alpha_coef,alpha_pv,market_coef,marke_pv
MMM,0.942717808321348,0.0,-0.0015539360426961,1.670622229561558e-38
AOS,0.983206681056186,0.0,-0.00018329952352246748,0.1799963281256102
ABT,0.9437416800069042,0.0,-0.0009724477885143036,3.3920310773315203e-18
ABBV,0.8681055476238482,0.0,-0.0014565050303953,7.3504431813647e-29
ACN,1.0582471024393767,0.0,0.000911248152219124,5.303410964462857e-22
ATVI,0.8256361527697422,0.0,-0.002733113392205,5.688208762751675e-72
ADM,0.9525470565469556,0.0,-0.00015889983654266158,0.2300888410677401
ADBE,1.073428172095016,0.0,0.0008681931616285828,5.998061637252829e-08
AAP,1.090304304910073,0.0,-6.1123700666051e-05,0.790372371306961
AMD,1.1849455431520013,0.0,0.0023017699328884,4.548620447868185e-22


Databricks visualization. Run in Databricks to view.

In [0]:
### stocks's market relation is captured by the beta factor
### interestingly some of stocks are showing strong relation to the overall market while other stocks are showing less then 0 relation

### but most imporantly as we expected stocks vary depending on how they relate to the market
### we can see clearly two groups are formed, one positive, higher than 0 market coef, and the other group of stocks have negative market coef.
### stocks with positive coef will go up when the market goes up and stocks with negative coef will go down when market goes up


In [0]:
display(final_data_significant)

symbol,alpha_coef,alpha_pv,market_coef,marke_pv
MMM,0.942717808321348,0.0,-0.0015539360426961,1.670622229561558e-38
AOS,0.983206681056186,0.0,-0.00018329952352246748,0.1799963281256102
ABT,0.9437416800069042,0.0,-0.0009724477885143036,3.3920310773315203e-18
ABBV,0.8681055476238482,0.0,-0.0014565050303953,7.3504431813647e-29
ACN,1.0582471024393767,0.0,0.000911248152219124,5.303410964462857e-22
ATVI,0.8256361527697422,0.0,-0.002733113392205,5.688208762751675e-72
ADM,0.9525470565469556,0.0,-0.00015889983654266158,0.2300888410677401
ADBE,1.073428172095016,0.0,0.0008681931616285828,5.998061637252829e-08
AAP,1.090304304910073,0.0,-6.1123700666051e-05,0.790372371306961
AMD,1.1849455431520013,0.0,0.0023017699328884,4.548620447868185e-22


Databricks visualization. Run in Databricks to view.

In [0]:
### thus so far our main results
### alphas form two groups
### market relation also forms two groups

### thus we need to check by industry wise averages
### different type of industry should have different alpha


In [0]:
### lets look into data by industry wise
### we look into average alphas for each industry

data_final_mr = pd.merge(final_data, data, left_on="symbol", right_on="Symbol")
data_final_mr_gr = data_final_mr.groupby("Sector").mean()
data_final_mr_gr.reset_index(inplace=True)

In [0]:
### so on average energy, information technology, financials and consumer discretionary stocks are showing better alpha
display(data_final_mr_gr)

Sector,alpha_coef,alpha_pv,market_coef,marke_pv
Communication Services,0.9887690260119372,0.0,-0.0006176347960536284,0.0027453227350436
Consumer Discretionary,1.0660359552394278,0.0,0.0007130931998092277,0.1255952307271121
Consumer Staples,0.8644323607738976,0.0,-0.002059448748294,0.0123290897854144
Energy,1.0392656469313688,0.0,0.001200699444143,0.0814732074303626
Financials,1.0398462677077265,0.0,0.000411376681554383,0.0694783780731465
Health Care,0.9600701443875964,0.0,-0.0007720224840779571,0.0303174818611129
Industrials,1.0004097043336977,0.0,2.239521831201821e-05,0.100591288235162
Information Technology,1.0658617440040925,0.0,0.0007647665388793393,0.0945505805499622
Materials,1.0269500752596488,0.0,0.0003901891439694096,0.0871045796832961
Real Estate,1.0034779038531958,0.0,-0.00036672223145153984,0.0699540099969126


Databricks visualization. Run in Databricks to view.

In [0]:
### lets check the alpha and market relation in one graph
### this combination should find us the best performing sector
### diving deep into that shows that the energy sector shows the best combinations
### meaining that it has stronger relation with the market itself and also higher alpha

display(data_final_mr_gr)

Sector,alpha_coef,alpha_pv,market_coef,marke_pv
Communication Services,0.9887690260119372,0.0,-0.0006176347960536284,0.0027453227350436
Consumer Discretionary,1.0660359552394278,0.0,0.0007130931998092277,0.1255952307271121
Consumer Staples,0.8644323607738976,0.0,-0.002059448748294,0.0123290897854144
Energy,1.0392656469313688,0.0,0.001200699444143,0.0814732074303626
Financials,1.0398462677077265,0.0,0.000411376681554383,0.0694783780731465
Health Care,0.9600701443875964,0.0,-0.0007720224840779571,0.0303174818611129
Industrials,1.0004097043336977,0.0,2.239521831201821e-05,0.100591288235162
Information Technology,1.0658617440040925,0.0,0.0007647665388793393,0.0945505805499622
Materials,1.0269500752596488,0.0,0.0003901891439694096,0.0871045796832961
Real Estate,1.0034779038531958,0.0,-0.00036672223145153984,0.0699540099969126


Databricks visualization. Run in Databricks to view.

### Industry return

Since we found based on alpha and market relation the high performing industry, now it is time to actually dive deeper into industry returns.
As we saw earlier, energy sector shows the best combination of alpha and beta relation. So we assume energy sector return should be the best.

In [0]:
### construct portfolio of those high performing energy sectors

In [0]:
stock_data_all_v1 = pd.DataFrame((stock_data_all.loc[stock_data_all.index[-1]] - stock_data_all.loc[stock_data_all.index[0]]) / stock_data_all.loc[stock_data_all.index[0]], columns=["return"])
stock_data_all_v1.index = [a.split("_")[0] for a in stock_data_all_v1.index.tolist()]
stock_data_all_v1.reset_index(inplace = True)
stock_data_all_v1.rename(columns={"index":"symbol"}, inplace=True)

In [0]:
data_mr_v1 = pd.merge(data, stock_data_all_v1, left_on="Symbol", right_on="symbol")
data_mr_v1 = data_mr_v1.groupby("Sector").mean()
data_mr_v1.reset_index(inplace=True)

In [0]:
data_mr_v1.sort_values("return")

Unnamed: 0,Sector,return
10,Utilities,0.243739
2,Consumer Staples,0.415121
9,Real Estate,0.424422
0,Communication Services,0.521737
5,Health Care,0.591387
4,Financials,0.909714
6,Industrials,0.969772
7,Information Technology,0.991691
1,Consumer Discretionary,1.044476
8,Materials,1.126363


In [0]:
### based on results we see that energy sector indeed is showing the best return
### also materials and consumer discretionary also have high returns

In [0]:
display(data_mr_v1.sort_values("return"))

Sector,return
Utilities,0.2437386595941199
Consumer Staples,0.4151205199386811
Real Estate,0.424421898970612
Communication Services,0.5217368446832756
Health Care,0.5913869372533193
Financials,0.9097141694184504
Industrials,0.9697719584314548
Information Technology,0.9916908284375116
Consumer Discretionary,1.0444758167761303
Materials,1.1263629562645865


Databricks visualization. Run in Databricks to view.

## Conclusion

In this notebook we have seen how to estimate Jensen's alpha, a simple indicator that shows which stocks are doing better than the market.

By looking into almost 500 stocks we estimated the alphas and checked the best stock sector based on the alpha and beta relation.

Looking into that relation we found that indeed best sector that performed well for the past years is the energy sector stocks.