## Module 1 Homework

In this homework, we're going to download finance data from various sources and make simple calculations/analysis.

---
### Question 1: [Macro] Average growth of GDP in 2023

**What is the average growth (in %) of GDP in 2023?**

Download the timeseries Real Gross Domestic Product (GDPC1) from FRED (https://fred.stlouisfed.org/series/GDPC1). 
Calculate year-over-year (YoY) growth rate (that is, divide current value to one 4 quarters ago). Find the average YoY growth in 2023 (average from 4 YoY numbers).
Round to 1 digit after the decimal point: e.g. if you get 5.66% growth => you should answer  5.7

---

In [1]:
import pandas as pd
import pandas_datareader.data as web
from datetime import datetime

# Set the start and end date for data retrieval
start = datetime(2020, 1, 1)
end = datetime(2023, 12, 31)

# Download the GDP data from FRED
gdp_data = web.DataReader('GDPC1', 'fred', start, end)

# Calculate YoY growth rates
gdp_data['YoY Growth'] = gdp_data['GDPC1'].pct_change(periods=4) * 100

# Filter the data for 2023 and calculate the average growth
average_growth_2023 = gdp_data['YoY Growth']['2023'].mean()

# Print the rounded result
print(f"The average YoY growth of GDP in 2023 is {average_growth_2023:.1f}%")

The average YoY growth of GDP in 2023 is 2.5%



### Question 2. [Macro] Inverse "Treasury Yield"

**Find the min value of (dgs10-dgs2) after since year 2000 (2000-01-01) and write it down as an answer, round to 1 digit after the decimal point.**


Download DGS2 and DGS10 interest rates series (https://fred.stlouisfed.org/series/DGS2,
 https://fred.stlouisfed.org/series/DGS10). Join them together to one dataframe on date (you might need to read about pandas.DataFrame.join()), calculate the difference dgs10-dgs2 daily.

(Additional: think about what does the "inverted yield curve" mean for the market and investors? do you see the same thing in your country/market of interest? Do you think it can be a good predictive feature for the models?)

---

In [2]:
import pandas as pd
import pandas_datareader.data as web
from datetime import datetime

# Set the start date for data retrieval
start = datetime(2000, 1, 1)

# Download the interest rate data from FRED(Manually downloaded the data and saved it in the data folder)
dgs2_data = pd.read_csv('./data/DGS2.csv')
dgs10_data = pd.read_csv('./data//DGS10.csv')

# Join the two dataframes on the date index on DATA
combined_data = dgs10_data.join(dgs2_data, how='inner', lsuffix='_10', rsuffix='_2')

# drop DATE_2 column
combined_data.drop('DATE_2', axis=1, inplace=True)

# converts DGS10 and DGS2 into numeric values, if '.' is present, convert to NaN
combined_data['DGS10'] = pd.to_numeric(combined_data['DGS10'], errors='coerce')
combined_data['DGS2'] = pd.to_numeric(combined_data['DGS2'], errors='coerce')
# drop rows where either DGS10 or DGS2 is NaN
combined_data.dropna(subset=['DGS10', 'DGS2'], inplace=True)

# Calculate the difference
combined_data['Difference'] = combined_data['DGS10'] - combined_data['DGS2']
# Find the minimum value of the difference
min_difference = combined_data['Difference'].min()

# Print the rounded result
print(f"The minimum value of (DGS10 - DGS2) since 2000 is {min_difference:.1f}")

The minimum value of (DGS10 - DGS2) since 2000 is -1.1


### Question 3. [Index] Which Index is better recently?

**Compare S&P 500 and IPC Mexico indexes by the 5 year growth and write down the largest value as an answer (%)**

Download on Yahoo Finance two daily index prices for S&P 500 (^GSPC, https://finance.yahoo.com/quote/%5EGSPC/) and IPC Mexico (^MXX, https://finance.yahoo.com/quote/%5EMXX/). Compare 5Y growth for both (between 2019-04-09 and 2024-04-09). Select the higher growing index and write down the growth in % (closest integer %). E.g. if ratio end/start was 2.0925 (or growth of 109.25%), you need to write down 109 as your answer.

(Additional: think of other indexes and try to download stats and compare the growth? Do create 10Y and 20Y growth stats. What is an average yearly growth rate (CAGR) for each of the indexes you select?)

---

In [3]:
import yfinance as yf
import datetime

# Define start and end dates for the 5-year period
start_date = datetime.datetime(2019, 4, 9)
end_date = datetime.datetime(2024, 4, 9)

# Download S&P 500 (^GSPC) and IPC Mexico (^MXX) data from Yahoo Finance
sp500_data = yf.download("^GSPC", start=start_date, end=end_date)
ipc_data = yf.download("^MXX", start=start_date, end=end_date)

# Calculate the 5-year growth for both indexes
sp500_start_price = sp500_data['Adj Close'].iloc[0]
sp500_end_price = sp500_data['Adj Close'].iloc[-1]
ipc_start_price = ipc_data['Adj Close'].iloc[0]
ipc_end_price = ipc_data['Adj Close'].iloc[-1]

sp500_growth = ((sp500_end_price - sp500_start_price) / sp500_start_price) * 100
ipc_growth = ((ipc_end_price - ipc_start_price) / ipc_start_price) * 100

# Determine which index had the highest growth
if sp500_growth > ipc_growth:
    better_index = "S&P 500"
    growth_rate = round(sp500_growth)
else:
    better_index = "IPC Mexico"
    growth_rate = round(ipc_growth)

print(f"The better index recently is {better_index} with a growth rate of {growth_rate}%.")


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

The better index recently is S&P 500 with a growth rate of 81%.






### Question 4. [Stocks OHLCV] 52-weeks range ratio (2023) for the selected stocks


**Find the largest range ratio [=(max-min)/max] of Adj.Close prices in 2023**


Download the 2023 daily OHLCV data on Yahoo Finance for top6 stocks on earnings (https://companiesmarketcap.com/most-profitable-companies/): 2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM.

Here is the example data you should see in Pandas for "2222.SR": https://finance.yahoo.com/quote/2222.SR/history

Calculate maximum-minimim "Adj.Close" price for each stock and divide it by the maximum "Adj.Close" value.
Round the result to two decimal places (e.g. 0.1575 will be 0.16)

(Additional: why this may be important for your research?)

---

In [4]:
import yfinance as yf

# Define the list of stock tickers
stock_tickers = ['2222.SR', 'BRK-B', 'AAPL', 'MSFT', 'GOOG', 'JPM']

# Download daily OHLCV data for 2023 for each stock
stock_data = {}
for ticker in stock_tickers:
    stock_data[ticker] = yf.download(ticker, start='2023-01-01', end='2023-12-31')

# Calculate the range ratio for each stock
max_min_ratios = {}
for ticker, data in stock_data.items():
    max_price = data['Adj Close'].max()
    min_price = data['Adj Close'].min()
    range_ratio = (max_price - min_price) / max_price
    max_min_ratios[ticker] = round(range_ratio, 2)

# Find the stock with the largest range ratio
max_range_stock = max(max_min_ratios, key=max_min_ratios.get)
largest_range_ratio = round(max_min_ratios[max_range_stock]*100,2)

print(f"The stock with the largest range ratio in 2023 is {max_range_stock} with a range ratio of {largest_range_ratio}%.")


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

The stock with the largest range ratio in 2023 is MSFT with a range ratio of 42.0%.





### Question 5. [Stocks] Dividend Yield
**Find the largest dividend yield for the same set of stocks**

Use the same list of companies (2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM) and download all dividends paid in 2023.
You can use `get_actions()` method or `.dividends` field in yfinance library (https://github.com/ranaroussi/yfinance?tab=readme-ov-file#quick-start)

Sum up all dividends paid in 2023 per company and divide each value by the closing price (Adj.Close) at the last trading day of the year.

Find the maximm value in % and round to 1 digit after the decimal point. (E.g., if you obtained $1.25 dividends paid and the end year stock price is $100, the dividend yield is 1.25% -- and your answer should be equal to 1.3)

---

In [5]:
import yfinance as yf
import pandas as pd
data = yf.download("2222.SR BRK-B AAPL MSFT GOOG JPM", start="2023-01-01", end="2024-01-01", actions=True)
columns = ["2222.SR","BRK-B","AAPL","MSFT","GOOG","JPM"]
dividend_yields = pd.DataFrame(index=columns, columns=["dividend_yield"])
for column in columns:
  dividend_yield = data["Dividends"][column].sum()
  last_day = data[~data["Adj Close"][column].isna()].index[-1]
  dividend_yield /= data.loc[last_day]["Adj Close"][column]
  dividend_yields.loc[column]["dividend_yield"] = dividend_yield
# find the largest dividend yield, round to 1 decimal place, and its ticker, then print
largest_dividend_yield = round(dividend_yields["dividend_yield"].max()*100,1)
print(f"The largest dividend yield in 2023 is {largest_dividend_yield:.1f}% for {dividend_yields['dividend_yield'].idxmax()}")

[*********************100%%**********************]  6 of 6 completed

The largest dividend yield in 2023 is 2.8% for 2222.SR






### Question 6. [Exploratory] Investigate new metrics

**Free text answer**

Download and explore a few additional metrics or time series that might be valuable for your project and write down why (briefly).

---


I am interested in exploring the spread trading or pair trading between tech stocks or mid-small cap stocks and large cap ETFs. This strategy inherently involves cyclicality, 
which can be leveraged using statistical methods to identify profitable opportunities. The risk should be acceptable.ETF have good liquidity and diversification, which can be practical for retail investors.

### Question 7. [Exploratory] Time-driven strategy description around earnings releases

**Free text answer**

Explore earning dates for the whole month of April - e.g. using YahooFinance earnings calendar 
(https://finance.yahoo.com/calendar/earnings?from=2024-04-21&to=2024-04-27&day=2024-04-23). 
Compare with the previous closed earnings (e.g., recent dates with full data https://finance.yahoo.com/calendar/earnings?from=2024-04-07&to=2024-04-13&day=2024-04-08). 

Describe an analytical strategy/idea (you're not required to implement it) to select a subset companies of interest based on the future events data.

# i have no good idea by now, so I afraid have no good answer.

## Submitting the solutions

Form for submitting: https://courses.datatalks.club/sma-zoomcamp-2024/homework/hw01