# Module 1 Homework

In this homework, we're going to download finance data from various sources and make simple calculations/analysis.

In [1]:
import numpy as np
import pandas as pd

import yfinance as yf
import pandas_datareader as pdr
import yfinance as yf

from datetime import date, datetime

date_format = "%Y-%m-%d"

## Question 1: [Macro] Average growth of GDP in 2023

**What is the average growth (in %) of GDP in 2023?**

Download the timeseries Real Gross Domestic Product (GDPC1) from FRED (https://fred.stlouisfed.org/series/GDPC1). 
Calculate year-over-year (YoY) growth rate (that is, divide current value to one 4 quarters ago). Find the average YoY growth in 2023 (average from 4 YoY numbers).
Round to 1 digit after the decimal point: e.g. if you get 5.66% growth => you should answer  5.7

In [2]:
start = datetime.strptime("2022-01-01", date_format)
end = datetime.strptime("2023-12-31", date_format)
df = pdr.DataReader("GDPC1", "fred", start=start, end=end)
df

Unnamed: 0_level_0,GDPC1
DATE,Unnamed: 1_level_1
2022-01-01,21738.871
2022-04-01,21708.16
2022-07-01,21851.134
2022-10-01,21989.981
2023-01-01,22112.329
2023-04-01,22225.35
2023-07-01,22490.692
2023-10-01,22679.255


In [3]:
df["GDPC1_yoy"] = df["GDPC1"] / df["GDPC1"].shift(4) - 1
avg = df["GDPC1_yoy"].tail(4).mean()
print(f"The average YoY growth in 2023 is {round(avg * 100, 1)}%")

The average YoY growth in 2023 is 2.5%


## Question 2. [Macro] Inverse "Treasury Yield"

**Find the min value of (dgs10-dgs2) after since year 2000 (2000-01-01) and write it down as an answer, round to 1 digit after the decimal point.**


Download DGS2 and DGS10 interest rates series (https://fred.stlouisfed.org/series/DGS2,
 https://fred.stlouisfed.org/series/DGS10). Join them together to one dataframe on date (you might need to read about pandas.DataFrame.join()), calculate the difference dgs10-dgs2 daily.

(Additional: think about what does the "inverted yield curve" mean for the market and investors? do you see the same thing in your country/market of interest? Do you think it can be a good predictive feature for the models?)

In [4]:
start = datetime.strptime("2000-01-01", date_format)
df = pdr.DataReader(["DGS2", "DGS10"], "fred", start=start)
df

Unnamed: 0_level_0,DGS2,DGS10
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-01-03,6.38,6.58
2000-01-04,6.30,6.49
2000-01-05,6.38,6.62
2000-01-06,6.35,6.57
2000-01-07,6.31,6.52
...,...,...
2024-06-07,4.87,4.43
2024-06-10,4.87,4.47
2024-06-11,4.81,4.39
2024-06-12,4.75,4.31


In [5]:
df["DGS10-DGS2"] = df["DGS10"] - df["DGS2"]
min_value = df["DGS10-DGS2"].min()
print(f"The min value of (dgs10-dgs2) after since year 2000 is {round(min_value, 1)}")

The min value of (dgs10-dgs2) after since year 2000 is -1.1


## Question 3. [Index] Which Index is better recently?

**Compare S&P 500 and IPC Mexico indexes by the 5 year growth and write down the largest value as an answer (%)**

Download on Yahoo Finance two daily index prices for S&P 500 (^GSPC, https://finance.yahoo.com/quote/%5EGSPC/) and IPC Mexico (^MXX, https://finance.yahoo.com/quote/%5EMXX/). Compare 5Y growth for both (between 2019-04-09 and 2024-04-09). Select the higher growing index and write down the growth in % (closest integer %). E.g. if ratio end/start was 2.0925 (or growth of 109.25%), you need to write down 109 as your answer.

(Additional: think of other indexes and try to download stats and compare the growth? Do create 10Y and 20Y growth stats. What is an average yearly growth rate (CAGR) for each of the indexes you select?)

In [6]:
start = datetime.strptime("2019-04-09", date_format)
end = datetime.strptime("2024-04-09", date_format)
tickers = ["^GSPC", "^MXX"]
df= yf.download(tickers, start=start, end=end)
df

[*********************100%%**********************]  2 of 2 completed


Price,Adj Close,Adj Close,Close,Close,High,High,Low,Low,Open,Open,Volume,Volume
Ticker,^GSPC,^MXX,^GSPC,^MXX,^GSPC,^MXX,^GSPC,^MXX,^GSPC,^MXX,^GSPC,^MXX
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
2019-04-09,2878.199951,45151.628906,2878.199951,45151.628906,2886.879883,45346.828125,2873.330078,44854.531250,2886.580078,45346.828125,3.032480e+09,191575000.0
2019-04-10,2888.209961,44909.140625,2888.209961,44909.140625,2889.709961,45219.410156,2879.129883,44850.109375,2881.370117,45204.750000,3.092230e+09,145314900.0
2019-04-11,2888.320068,44580.058594,2888.320068,44580.058594,2893.419922,44966.500000,2881.989990,44373.488281,2891.919922,44872.531250,2.970650e+09,109090000.0
2019-04-12,2907.409912,44686.058594,2907.409912,44686.058594,2910.540039,44888.699219,2898.370117,44534.378906,2900.860107,44767.671875,3.726050e+09,143662400.0
2019-04-15,2905.580078,44625.781250,2905.580078,44625.781250,2909.600098,44900.929688,2896.479980,44347.531250,2908.320068,44649.738281,3.114530e+09,108627100.0
...,...,...,...,...,...,...,...,...,...,...,...,...
2024-04-02,5205.810059,57581.808594,5205.810059,57581.808594,5208.339844,57830.878906,5184.049805,57235.589844,5204.290039,57593.621094,3.886590e+09,180753600.0
2024-04-03,5211.490234,57503.390625,5211.490234,57503.390625,5228.750000,58086.421875,5194.370117,57300.109375,5194.370117,57547.191406,3.703250e+09,189285300.0
2024-04-04,5147.209961,57882.761719,5147.209961,57882.761719,5256.589844,58219.500000,5146.060059,57514.179688,5244.049805,57539.468750,4.075680e+09,184739700.0
2024-04-05,5204.339844,58092.441406,5204.339844,58092.441406,5222.180176,58227.839844,5157.209961,57678.609375,5158.950195,57805.191406,3.386780e+09,212252300.0


In [7]:
SP500_growth = df["Adj Close"]["^GSPC"].iloc[-1] / df["Adj Close"]["^GSPC"].iloc[0] - 1
IPC_growth = df["Adj Close"]["^MXX"].iloc[-1] / df["Adj Close"]["^MXX"].iloc[0] - 1
higher_growth = max([SP500_growth, IPC_growth])
print(f"The higher growth is {round(higher_growth * 100)}")

The higher growth is 81


## Question 4. [Stocks OHLCV] 52-weeks range ratio (2023) for the selected stocks


**Find the largest range ratio [=(max-min)/max] of Adj.Close prices in 2023**


Download the 2023 daily OHLCV data on Yahoo Finance for top6 stocks on earnings (https://companiesmarketcap.com/most-profitable-companies/): 2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM.

Here is the example data you should see in Pandas for "2222.SR": https://finance.yahoo.com/quote/2222.SR/history

Calculate maximum-minimim "Adj.Close" price for each stock and divide it by the maximum "Adj.Close" value.
Round the result to two decimal places (e.g. 0.1575 will be 0.16)

(Additional: why this may be important for your research?)

In [8]:
start = datetime.strptime("2023-01-01", date_format)
end = datetime.strptime("2023-12-31", date_format)
tickers = ["2222.SR", "BRK-B", "AAPL", "MSFT", "GOOG", "JPM"]
df = yf.download(tickers, start=start, end=end)
adj_close = df["Adj Close"]
adj_close

[*********************100%%**********************]  6 of 6 completed


Ticker,2222.SR,AAPL,BRK-B,GOOG,JPM,MSFT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-01-01,28.316216,,,,,
2023-01-02,28.097727,,,,,
2023-01-03,28.097727,124.216301,309.910004,89.598038,129.648499,236.609207
2023-01-04,27.704445,125.497498,314.549988,88.609169,130.857498,226.259171
2023-01-05,27.573349,124.166634,312.899994,86.671371,130.828476,219.553360
...,...,...,...,...,...,...
2023-12-25,32.742996,,,,,
2023-12-26,32.693539,192.803986,356.829987,142.657669,166.387451,373.295135
2023-12-27,32.792461,192.903839,356.950012,141.279236,167.385437,372.707275
2023-12-28,32.693539,193.333298,357.570007,141.119415,168.274734,373.912842


In [9]:
range_ratio = (adj_close.max() - adj_close.min())/ adj_close.max()
print(f"The largest range ratio [=(max-min)/max] of Adj.Close prices in 2023 is {round(range_ratio.max(), 2)}")

The largest range ratio [=(max-min)/max] of Adj.Close prices in 2023 is 0.42


## Question 5. [Stocks] Dividend Yield
**Find the largest dividend yield for the same set of stocks**

Use the same list of companies (2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM) and download all dividends paid in 2023.
You can use `get_actions()` method or `.dividends` field in yfinance library (https://github.com/ranaroussi/yfinance?tab=readme-ov-file#quick-start)

Sum up all dividends paid in 2023 per company and divide each value by the closing price (Adj.Close) at the last trading day of the year.

Find the maximm value in % and round to 1 digit after the decimal point. (E.g., if you obtained \$1.25 dividends paid and the end year stock price is $100, the dividend yield is 1.25% -- and your answer should be equal to 1.3)

In [10]:
start = datetime.strptime("2023-01-01", date_format)
end = datetime.strptime("2023-12-31", date_format)
tickers = ["2222.SR", "BRK-B", "AAPL", "MSFT", "GOOG", "JPM"]
dividend_yields = []
for ticker in tickers:
    history = yf.Ticker(ticker).history(start=start, end=end)
    dividend = history["Dividends"]
    adj_close = history["Close"]
    dividend_yield = dividend.sum() / adj_close.iloc[-1]
    dividend_yields.append(dividend_yield)

print(f"The largest dividend yield is {round(max(dividend_yields) * 100, 1)}")    

The largest dividend yield is 2.8


## Question 6. [Exploratory] Investigate new metrics

**Free text answer**

Download and explore a few additional metrics or time series that might be valuable for your project and write down why (briefly).

### Dow Jones Industrial Average (DIJA)

In [11]:
start = datetime.strptime("2021-01-01", date_format)
end = datetime.strptime("2023-12-31", date_format)
DIJA = yf.download("^DJI", start=start, end=end)
DIJA

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-01-04,30627.470703,30674.279297,29881.820312,30223.890625,30223.890625,475080000
2021-01-05,30204.250000,30504.890625,30141.779297,30391.599609,30391.599609,350910000
2021-01-06,30362.779297,31022.650391,30313.070312,30829.400391,30829.400391,500430000
2021-01-07,30901.179688,31193.400391,30897.859375,31041.130859,31041.130859,427810000
2021-01-08,31069.580078,31140.669922,30793.269531,31097.970703,31097.970703,381150000
...,...,...,...,...,...,...
2023-12-22,37349.269531,37534.519531,37268.878906,37385.968750,37385.968750,252970000
2023-12-26,37405.898438,37617.988281,37371.828125,37545.328125,37545.328125,212420000
2023-12-27,37518.621094,37683.699219,37488.601562,37656.519531,37656.519531,245530000
2023-12-28,37661.519531,37778.851562,37650.980469,37710.101562,37710.101562,199550000


### NASDAQ Index

In [12]:
NASDAQ = yf.download("^IXIC", start=start, end=end)
NASDAQ

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-01-04,12958.519531,12958.719727,12543.240234,12698.450195,12698.450195,6636170000
2021-01-05,12665.650391,12828.269531,12665.650391,12818.959961,12818.959961,6971860000
2021-01-06,12666.150391,12909.629883,12649.990234,12740.790039,12740.790039,7689880000
2021-01-07,12867.339844,13090.910156,12867.339844,13067.480469,13067.480469,6841480000
2021-01-08,13160.219727,13208.089844,13036.549805,13201.980469,13201.980469,7289390000
...,...,...,...,...,...,...
2023-12-22,15006.179688,15047.190430,14927.120117,14992.969727,14992.969727,4796600000
2023-12-26,15028.690430,15101.179688,15024.059570,15074.570312,15074.570312,6120600000
2023-12-27,15089.660156,15114.080078,15051.669922,15099.179688,15099.179688,7480170000
2023-12-28,15142.089844,15150.070312,15087.219727,15095.139648,15095.139648,5090570000


### Economic Indicators Related to Industry

In [13]:
# Industrial Production Index, Capacity Utilization: Total Industry
EI = pdr.DataReader(["INDPRO", "TCU"], "fred", start=start, end=end)
EI

Unnamed: 0_level_0,INDPRO,TCU
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-01-01,98.7836,76.4136
2021-02-01,95.3744,73.9408
2021-03-01,98.1351,76.2498
2021-04-01,98.2886,76.5375
2021-05-01,99.1508,77.3697
2021-06-01,99.5096,77.7972
2021-07-01,100.1231,78.4064
2021-08-01,100.1255,78.5176
2021-09-01,99.0614,77.7675
2021-10-01,100.3045,78.8064


## Question 7. [Exploratory] Time-driven strategy description around earnings releases

**Free text answer**

Explore earning dates for the whole month of April - e.g. using YahooFinance earnings calendar (https://finance.yahoo.com/calendar/earnings?from=2024-04-21&to=2024-04-27&day=2024-04-23). Compare with the previous closed earnings (e.g., recent dates with full data https://finance.yahoo.com/calendar/earnings?from=2024-04-07&to=2024-04-13&day=2024-04-08). 

Describe an analytical strategy/idea (you're not required to implement it) to select a subset companies of interest based on the future events data.

## Submitting the solutions

Form for submitting: https://courses.datatalks.club/sma-zoomcamp-2024/homework/hw01