# McKinney Chapter 8 - Practice - Sec 03

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import yfinance as yf

In [2]:
%precision 4
pd.options.display.float_format = '{:.4f}'.format
%config InlineBackend.figure_format = 'retina'

## Announcements

1. ***The deadline for forming project groups is Tuesday, 2/11.*** That evening, I will create random project groups from the unassigned students.
2. ***The deadline for proposing (and voting on) students' choice topics is Tuesday, 2/25.*** That evening, I will finalize our schedule for the second half of the semester.

## Five-Minute Review

Chapter 8 of McKinney covers 3 important topics.

1. ***Hierarchical Indexing:***
Hierarchical indexes (or multi-indexes) organize data at multiple levels instead of just a flat, two-dimensional structure.
They help us work with high-dimensional data in a low-dimensional form.
For example, we can index rows by multiple levels like `Ticker` and `Date`, or columns by `Variable` and `Ticker`.
2. ***Combining Data:***
We will use three functions and methods to combine datasets on one or more keys.
All three offer `inner`, `outer`, `left`, or `right` combinations.
    1. The `pd.merge()` function (or the `.merge()` method) provides the most flexible way to perform database-style joins on data frames.
    2. The `.join()` method combines data frames with similar indexes.
    3. The `pd.concat()` function combines similarly-shaped series and data frames.
3. ***Reshaping Data:***
We can reshape data to change its structure, such as pivoting from wide to long format or vice versa.
We will most often use the `.stack()` and `.unstack()` methods, which pivot columns to rows and rows to columns, respectively.
Laster in the course we will learn about the `.pivot()` method for aggregating data and the `.melt()` method for more advanced reshaping.

## Practice

### Download data from Yahoo! Finance for BAC, C, GS, JPM, MS, and PNC and assign to data frame `stocks_wide`.

In [3]:
stocks_wide = yf.download(tickers='BAC, C, GS, JPM, MS, PNC', auto_adjust=False, progress=False)

In [4]:
stocks_wide.tail()

Price,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,...,Open,Open,Open,Open,Volume,Volume,Volume,Volume,Volume,Volume
Ticker,BAC,C,GS,JPM,MS,PNC,BAC,C,GS,JPM,...,GS,JPM,MS,PNC,BAC,C,GS,JPM,MS,PNC
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2025-01-29,46.75,80.0755,637.38,266.58,137.7931,202.18,46.75,80.63,637.38,266.58,...,637.0,267.21,138.47,201.72,23681500,10573000.0,1918900.0,7684900.0,5719800.0,2096700.0
2025-01-30,46.72,81.297,645.7,268.23,139.015,202.07,46.72,81.86,645.7,268.23,...,644.49,268.63,139.81,204.22,31199900,10775500.0,1812800.0,8753500.0,4545700.0,1705400.0
2025-01-31,46.3,80.87,640.4,267.3,138.43,200.95,46.3,81.43,640.4,267.3,...,650.0,269.23,139.14,202.11,30392900,12821400.0,1996200.0,7196300.0,5503400.0,2739800.0
2025-02-03,46.21,79.61,632.37,266.81,137.16,197.64,46.21,79.61,632.37,266.81,...,626.0,261.83,135.81,197.19,36481700,20984000.0,2099200.0,8376600.0,5305400.0,1474500.0
2025-02-04,46.71,78.48,634.18,267.94,136.77,198.98,46.71,78.48,634.18,267.94,...,632.0,269.84,137.73,198.06,23698116,9791535.0,1429857.0,4100124.0,4996375.0,1433751.0


### Reshape `stocks_wide` from wide to long with dates and tickers as row indexes and assign to data frame `stocks_long`.

We use the `.stack()` method to go from wider to longer, and the `.unstack()` method to go from long longer to wider.
Note that we set `future_stack=True` to accep the future default arguments for `.stack()` and suppress the `FutureWarning`.
A `FutureWarning` is not an error, just a warning about some expected change that could cause and error in the future.

In [5]:
stocks_long = stocks_wide.stack(future_stack=True)

In [6]:
stocks_long.tail()

Unnamed: 0_level_0,Price,Adj Close,Close,High,Low,Open,Volume
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-02-04,C,78.48,78.48,80.24,78.36,80.105,9791535.0
2025-02-04,GS,634.18,634.18,637.665,629.25,632.0,1429857.0
2025-02-04,JPM,267.94,267.94,269.84,266.83,269.84,4100124.0
2025-02-04,MS,136.77,136.77,137.88,136.29,137.73,4996375.0
2025-02-04,PNC,198.98,198.98,200.2788,197.797,198.06,1433751.0


::: {.callout-note}

The `.melt()` methods can reshape data frames from wide to long.
However, our data frame has a column multi-index, which makes `.melt()` difficult to use and `.stack()` a better option.

:::

### Add daily returns to both `stocks_wide` and `stocks_long` under the name `Returns`.

*Hint:* Use `pd.MultiIndex()` to create a multi index for the wide data frame `stocks_wide`.

In [7]:
stocks_wide['Adj Close'].columns

Index(['BAC', 'C', 'GS', 'JPM', 'MS', 'PNC'], dtype='object', name='Ticker')

In [8]:
_ = pd.MultiIndex.from_product([['Returns'], stocks_wide['Adj Close'].columns])

stocks_wide[_] = (
    stocks_wide
    ['Adj Close']
    .iloc[:-1] # do not use mid-day Adj Close for returns calculation
    .pct_change()
)

In [9]:
stocks_wide.tail()

Price,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,...,Volume,Volume,Volume,Volume,Returns,Returns,Returns,Returns,Returns,Returns
Ticker,BAC,C,GS,JPM,MS,PNC,BAC,C,GS,JPM,...,GS,JPM,MS,PNC,BAC,C,GS,JPM,MS,PNC
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2025-01-29,46.75,80.0755,637.38,266.58,137.7931,202.18,46.75,80.63,637.38,266.58,...,1918900.0,7684900.0,5719800.0,2096700.0,-0.0019,0.0086,-0.0007,-0.0021,0.0012,0.0023
2025-01-30,46.72,81.297,645.7,268.23,139.015,202.07,46.72,81.86,645.7,268.23,...,1812800.0,8753500.0,4545700.0,1705400.0,-0.0006,0.0153,0.0131,0.0062,0.0089,-0.0005
2025-01-31,46.3,80.87,640.4,267.3,138.43,200.95,46.3,81.43,640.4,267.3,...,1996200.0,7196300.0,5503400.0,2739800.0,-0.009,-0.0053,-0.0082,-0.0035,-0.0042,-0.0055
2025-02-03,46.21,79.61,632.37,266.81,137.16,197.64,46.21,79.61,632.37,266.81,...,2099200.0,8376600.0,5305400.0,1474500.0,-0.0019,-0.0156,-0.0125,-0.0018,-0.0092,-0.0165
2025-02-04,46.71,78.48,634.18,267.94,136.77,198.98,46.71,78.48,634.18,267.94,...,1429857.0,4100124.0,4996375.0,1433751.0,,,,,,


To add returns to `stocks_long` we have two options.
I prefer the first option, but I will present the second option to show an application of the `.join()` method.
I will assign the results of these two options to `stocks_long_1` and `stocks_long_2` so we can keep the original `stocks_long` as-is.

***Option 1:***
Make `stocks_wide` long!

In [10]:
stocks_long_1 = stocks_wide.stack(future_stack=True)

Recall, we omitted returns for the most recent trading day, which could include a partial data return.

In [11]:
stocks_long_1.tail(12)

Unnamed: 0_level_0,Price,Adj Close,Close,High,Low,Open,Volume,Returns
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-02-03,BAC,46.21,46.21,46.48,45.11,45.42,36481700.0,-0.0019
2025-02-03,C,79.61,79.61,79.91,76.89,78.5,20984000.0,-0.0156
2025-02-03,GS,632.37,632.37,638.48,622.48,626.0,2099200.0,-0.0125
2025-02-03,JPM,266.81,266.81,268.17,261.7,261.83,8376600.0,-0.0018
2025-02-03,MS,137.16,137.16,138.0,133.91,135.81,5305400.0,-0.0092
2025-02-03,PNC,197.64,197.64,198.76,194.3,197.19,1474500.0,-0.0165
2025-02-04,BAC,46.71,46.71,47.13,46.27,46.35,23698116.0,
2025-02-04,C,78.48,78.48,80.24,78.36,80.105,9791535.0,
2025-02-04,GS,634.18,634.18,637.665,629.25,632.0,1429857.0,
2025-02-04,JPM,267.94,267.94,269.84,266.83,269.84,4100124.0,


***Option 2:***
Calculate returns from `stocks_wide`, make them long, then `.join()` them to `stocks_long`!

In [12]:
_ = stocks_wide['Adj Close'].iloc[:-1].pct_change().stack().to_frame('Returns')

stocks_long_2 = stocks_long.join(_)

Recall, we omitted returns for the most recent trading day, which could include a partial data return.

In [13]:
stocks_long_2.tail(12)

Unnamed: 0_level_0,Unnamed: 1_level_0,Adj Close,Close,High,Low,Open,Volume,Returns
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-02-03,BAC,46.21,46.21,46.48,45.11,45.42,36481700.0,-0.0019
2025-02-03,C,79.61,79.61,79.91,76.89,78.5,20984000.0,-0.0156
2025-02-03,GS,632.37,632.37,638.48,622.48,626.0,2099200.0,-0.0125
2025-02-03,JPM,266.81,266.81,268.17,261.7,261.83,8376600.0,-0.0018
2025-02-03,MS,137.16,137.16,138.0,133.91,135.81,5305400.0,-0.0092
2025-02-03,PNC,197.64,197.64,198.76,194.3,197.19,1474500.0,-0.0165
2025-02-04,BAC,46.71,46.71,47.13,46.27,46.35,23698116.0,
2025-02-04,C,78.48,78.48,80.24,78.36,80.105,9791535.0,
2025-02-04,GS,634.18,634.18,637.665,629.25,632.0,1429857.0,
2025-02-04,JPM,267.94,267.94,269.84,266.83,269.84,4100124.0,


We can test the equality of `stocks_long_1` and `stocks_long_2` most easily with the `.equals()` method.

In [14]:
stocks_long_1.equals(stocks_long_2)

True

### Download the daily benchmark return factors from Ken French's data library.

*Hint:*
Use the `DataReader()` function in the pandas-datareader package.
We imported this package above with the `pdr.` prefix.

I often cannot remember the exact name for the daily factors.
We can use the `pdr.famafrench.get_available_datasets()` to list all the data in Kenneth French's data library.

In [15]:
pdr.famafrench.get_available_datasets()[:5]

['F-F_Research_Data_Factors',
 'F-F_Research_Data_Factors_weekly',
 'F-F_Research_Data_Factors_daily',
 'F-F_Research_Data_5_Factors_2x3',
 'F-F_Research_Data_5_Factors_2x3_daily']

In [16]:
ff = pdr.DataReader(
    name='F-F_Research_Data_Factors_daily',
    data_source='famafrench',
    start='1900'
)

  ff = pdr.DataReader(


In [17]:
type(ff)

dict

The daily factors only have one data frame (in the `0` key) and the data set description (in the `DESCR` key).

In [18]:
ff.keys()

dict_keys([0, 'DESCR'])

::: {.callout-note}

Data from the Kenneth French data library are *percent* returns instead of *decimal* returns!

:::

In [19]:
ff[0]

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1926-07-01,0.1000,-0.2500,-0.2700,0.0090
1926-07-02,0.4500,-0.3300,-0.0600,0.0090
1926-07-06,0.1700,0.3000,-0.3900,0.0090
1926-07-07,0.0900,-0.5800,0.0200,0.0090
1926-07-08,0.2100,-0.3800,0.1900,0.0090
...,...,...,...,...
2024-12-24,1.1100,-0.0900,-0.0500,0.0170
2024-12-26,0.0200,1.0400,-0.1900,0.0170
2024-12-27,-1.1700,-0.6600,0.5600,0.0170
2024-12-30,-1.0900,0.1200,0.7400,0.0170


In [20]:
print(ff['DESCR'])

F-F Research Data Factors daily
-------------------------------

This file was created by CMPT_ME_BEME_RETS_DAILY using the 202412 CRSP database. The Tbill return is the simple daily rate that, over the number of trading days compounds to 1-month TBill rate. The 1-month TBill rate data until 202405 are from Ibbotson Associates. Starting from 202406, the 1-month TBill rate is from ICE BofA US 1-Month Treasury Bill Index. Copyright 2024 Eugene F. Fama and Kenneth R. French

  0 : (25901 rows x 4 cols)


### Add the daily benchmark return factors to `stocks_wide` and `stocks_long`.

Since both `ff[0]` and `stocks_long_2` have date indexes, we can easily combine them with the `.join()` method.

In [21]:
ff[0].tail()

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2024-12-24,1.11,-0.09,-0.05,0.017
2024-12-26,0.02,1.04,-0.19,0.017
2024-12-27,-1.17,-0.66,0.56,0.017
2024-12-30,-1.09,0.12,0.74,0.017
2024-12-31,-0.46,0.0,0.71,0.017


In [22]:
stocks_long_2.tail(12)

Unnamed: 0_level_0,Unnamed: 1_level_0,Adj Close,Close,High,Low,Open,Volume,Returns
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-02-03,BAC,46.21,46.21,46.48,45.11,45.42,36481700.0,-0.0019
2025-02-03,C,79.61,79.61,79.91,76.89,78.5,20984000.0,-0.0156
2025-02-03,GS,632.37,632.37,638.48,622.48,626.0,2099200.0,-0.0125
2025-02-03,JPM,266.81,266.81,268.17,261.7,261.83,8376600.0,-0.0018
2025-02-03,MS,137.16,137.16,138.0,133.91,135.81,5305400.0,-0.0092
2025-02-03,PNC,197.64,197.64,198.76,194.3,197.19,1474500.0,-0.0165
2025-02-04,BAC,46.71,46.71,47.13,46.27,46.35,23698116.0,
2025-02-04,C,78.48,78.48,80.24,78.36,80.105,9791535.0,
2025-02-04,GS,634.18,634.18,637.665,629.25,632.0,1429857.0,
2025-02-04,JPM,267.94,267.94,269.84,266.83,269.84,4100124.0,


We can quickly combine `stocks_long_2` and `ff[0]` because both have indexes with daily dates named `Date`. 
Two notes:

1. The `.join()` method left joins by default, so the combined output has only dates in `stocks_long_2`
2. Kenneth French provides *percent* returns, so we divide them by 100 to convert them to *decimal* returns to match our Yahoo! Finance data

In [23]:
stocks_long_2.join(ff[0].div(100))

Unnamed: 0_level_0,Unnamed: 1_level_0,Adj Close,Close,High,Low,Open,Volume,Returns,Mkt-RF,SMB,HML,RF
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1973-02-21,BAC,1.5426,4.6250,4.6250,4.6250,4.6250,99200.0000,,-0.0074,-0.0039,0.0054,0.0002
1973-02-21,C,,,,,,,,-0.0074,-0.0039,0.0054,0.0002
1973-02-21,GS,,,,,,,,-0.0074,-0.0039,0.0054,0.0002
1973-02-21,JPM,,,,,,,,-0.0074,-0.0039,0.0054,0.0002
1973-02-21,MS,,,,,,,,-0.0074,-0.0039,0.0054,0.0002
...,...,...,...,...,...,...,...,...,...,...,...,...
2025-02-04,C,78.4800,78.4800,80.2400,78.3600,80.1050,9791535.0000,,,,,
2025-02-04,GS,634.1800,634.1800,637.6650,629.2500,632.0000,1429857.0000,,,,,
2025-02-04,JPM,267.9400,267.9400,269.8400,266.8300,269.8400,4100124.0000,,,,,
2025-02-04,MS,136.7700,136.7700,137.8800,136.2900,137.7300,4996375.0000,,,,,


---

We could instead convert the Yahoo! Finance *decimal* returns to *percent* returns.
I do not have a strong preference on all decimal returns or all percent returns, but all returns should have the same form.

In [24]:
(
    stocks_long_2
    .assign(Returns=lambda x: 100 * x['Returns'])
    .join(ff[0])
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Adj Close,Close,High,Low,Open,Volume,Returns,Mkt-RF,SMB,HML,RF
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1973-02-21,BAC,1.5426,4.6250,4.6250,4.6250,4.6250,99200.0000,,-0.7400,-0.3900,0.5400,0.0220
1973-02-21,C,,,,,,,,-0.7400,-0.3900,0.5400,0.0220
1973-02-21,GS,,,,,,,,-0.7400,-0.3900,0.5400,0.0220
1973-02-21,JPM,,,,,,,,-0.7400,-0.3900,0.5400,0.0220
1973-02-21,MS,,,,,,,,-0.7400,-0.3900,0.5400,0.0220
...,...,...,...,...,...,...,...,...,...,...,...,...
2025-02-04,C,78.4800,78.4800,80.2400,78.3600,80.1050,9791535.0000,,,,,
2025-02-04,GS,634.1800,634.1800,637.6650,629.2500,632.0000,1429857.0000,,,,,
2025-02-04,JPM,267.9400,267.9400,269.8400,266.8300,269.8400,4100124.0000,,,,,
2025-02-04,MS,136.7700,136.7700,137.8800,136.2900,137.7300,4996375.0000,,,,,


---

With `stocks_wide`, we have to do a little more work becuase of its column multi-index!
We will use the `pd.MultiIndex.from_product()` trick from above.

In [25]:
_ = pd.MultiIndex.from_product([['Factors'], ff[0].columns])
stocks_wide[_] = ff[0].div(100)

In [26]:
stocks_wide.loc[:'2024'].tail()

Price,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,...,Returns,Returns,Returns,Returns,Returns,Returns,Factors,Factors,Factors,Factors
Ticker,BAC,C,GS,JPM,MS,PNC,BAC,C,GS,JPM,...,BAC,C,GS,JPM,MS,PNC,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2024-12-24,44.38,70.5117,582.79,241.065,126.2201,192.4933,44.38,71.0,582.79,242.31,...,0.0112,0.0176,0.021,0.0164,0.021,0.005,0.0111,-0.0009,-0.0005,0.0002
2024-12-26,44.55,70.8593,581.23,241.8907,127.1837,193.1777,44.55,71.35,581.23,243.14,...,0.0038,0.0049,-0.0027,0.0034,0.0076,0.0036,0.0002,0.0104,-0.0019,0.0002
2024-12-27,44.34,70.5117,576.18,239.9308,125.9221,191.6999,44.34,71.0,576.18,241.17,...,-0.0047,-0.0049,-0.0087,-0.0081,-0.0099,-0.0077,-0.0117,-0.0066,0.0056,0.0002
2024-12-30,43.91,69.9059,573.55,238.0904,124.9188,190.956,43.91,70.39,573.55,239.32,...,-0.0097,-0.0086,-0.0046,-0.0077,-0.008,-0.0039,-0.0109,0.0012,0.0074,0.0002
2024-12-31,43.95,69.9059,572.62,238.4783,124.889,191.2734,43.95,70.39,572.62,239.71,...,0.0009,0.0,-0.0016,0.0016,-0.0002,0.0017,-0.0046,0.0,0.0071,0.0002


### Write a function `download()` that accepts tickers and returns a wide data frame of returns with the daily benchmark return factors.

We can even add a `shape` argument to return a wide or long data frame!

We can even add a `shape` argument to return a wide or long data frame!

In [27]:
import warnings

def download(tickers, shape='wide'):
    """
    Download stock price data and Fama-French factors, returning in either 'wide' or 'long' format.

    Parameters:
    - tickers (str or list of str): Stock ticker(s) to download.
    - shape (str): Output format, either 'wide' (default) or 'long'.

    Returns:
    - pd.DataFrame: A DataFrame containing stock prices, returns, and Fama-French factors.
    """
    
    # shape must be wide or long
    if shape not in ['wide', 'long']:
        raise ValueError('Invalid shape: must be "wide" or "long".')

    # Download stock data
    stocks = yf.download(tickers=tickers, auto_adjust=False, progress=False)

    # Download Fama-French factors
    # (suppressing FutureWarning for 'date_parser')
    with warnings.catch_warnings():
        warnings.simplefilter('ignore', category=FutureWarning)
        factors = pdr.DataReader(
            name='F-F_Research_Data_Factors_daily',
            data_source='famafrench',
            start='1900'
        )[0].div(100) # Convert percentages to decimals

    # Multi-index case
    if isinstance(stocks.columns, pd.MultiIndex):
        # Compute daily returns
        _ = pd.MultiIndex.from_product([['Returns'], stocks['Adj Close'].columns])
        stocks[_] = stocks['Adj Close'].pct_change()

        if shape == 'wide':
            # Add factors with multi-index
            _ = pd.MultiIndex.from_product([['Factors'], factors.columns])
            stocks[_] = factors
            return stocks

        # Convert to long format then add factors
        else:
            return stocks.stack(future_stack=True).join(factors)

    # Single index case
    # (redundant with recent versions of yfinance that always return a multi-index)
    stocks['Returns'] = stocks['Adj Close'].pct_change()
    return stocks.join(factors)


In [28]:
download(tickers='AAPL TSLA')

Price,Adj Close,Adj Close,Close,Close,High,High,Low,Low,Open,Open,Volume,Volume,Returns,Returns,Factors,Factors,Factors,Factors
Ticker,AAPL,TSLA,AAPL,TSLA,AAPL,TSLA,AAPL,TSLA,AAPL,TSLA,AAPL,TSLA,AAPL,TSLA,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2
1980-12-12,0.0988,,0.1283,,0.1289,,0.1283,,0.1283,,469033600,,,,0.0138,-0.0001,-0.0105,0.0006
1980-12-15,0.0937,,0.1217,,0.1222,,0.1217,,0.1222,,175884800,,-0.0522,,0.0011,0.0025,-0.0046,0.0006
1980-12-16,0.0868,,0.1127,,0.1133,,0.1127,,0.1133,,105728000,,-0.0734,,0.0071,-0.0075,-0.0047,0.0006
1980-12-17,0.0890,,0.1155,,0.1161,,0.1155,,0.1155,,86441600,,0.0248,,0.0152,-0.0086,-0.0034,0.0006
1980-12-18,0.0915,,0.1189,,0.1194,,0.1189,,0.1189,,73449600,,0.0290,,0.0041,0.0022,0.0126,0.0006
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-01-29,239.3600,389.1000,239.3600,389.1000,239.8600,398.5900,234.0100,384.4800,234.1200,395.2100,45486100,68033600.0000,0.0046,-0.0226,,,,
2025-01-30,237.5900,400.2800,237.5900,400.2800,240.7900,412.5000,237.2100,384.4100,238.6700,410.7800,55658300,98092900.0000,-0.0074,0.0287,,,,
2025-01-31,236.0000,404.6000,236.0000,404.6000,247.1900,419.9900,233.4400,401.3400,247.1900,401.5300,101075100,83568200.0000,-0.0067,0.0108,,,,
2025-02-03,228.0100,383.6800,228.0100,383.6800,231.8300,389.1700,225.7000,374.3600,229.9900,386.6800,72896300,93385500.0000,-0.0339,-0.0517,,,,


::: {.callout-note}

The `yfinance` package is a powerful tool for downloading market data, financial statements, and analyst estimates from Yahoo! Finance.  
However, because `yfinance` relies on Yahoo! Finance’s API, changes to the API can disrupt its functionality.  

Recently, Yahoo! Finance changed API access to earnings forecasts and announcement dates, so we **cannot complete the earnings announcement exercise I had planned**.  

Instead, I will prepare an alternative set of exercises for us to work on in class on Friday.  
Thank you for your flexibility!  

:::

### Download earnings per share for the stocks in `stocks_long` and combine to a long data frame `earnings`.

Use the `.earnings_dates` method described [here](https://pypi.org/project/yfinance/).
Use `pd.concat()` to combine the result of each the `.earnings_date` data frames and assign them to a new data frame `earnings`.
Name the row indexes `Ticker` and `Date` and swap to match the order of the row index in `stocks_long`.

### Combine `earnings` with the returns from `stocks_long`.

Use the `.earnings_dates` method described [here](https://pypi.org/project/yfinance/).
Use `pd.concat()` to combine the result of each the `.earnings_date` data frames and assign them to a new data frame `earnings`.
Name the row indexes `Ticker` and `Date` and swap to match the order of the row index in `stocks_long`.

### Plot the relation between daily returns and earnings surprises

### Repeat the earnings exercise with the S&P 100 stocks

With more data, we can more clearly see the positive relation between earnings surprises and returns!