<a href="https://colab.research.google.com/github/kahramanmurat/stock-markets-analytics-zoomcamp-2024/blob/main/01-intro-and-data-sources/homework/homework.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Module 1 Homework

In this homework, we're going to download finance data from various sources and make simple calculations/analysis.

In [1]:
# install main library YFinance
!pip install yfinance



In [2]:
#IMPORTS
import numpy as np
import pandas as pd

#IMPORT FIN DATA SOURCES
import pandas_datareader as pdr
import yfinance as yf

#IMPORT DATA VIZ
import plotly.graph_objs as go
import plotly.express as px

import time
from datetime import date, datetime


---
### Question 1: [Macro] Average growth of GDP in 2023

**What is the average growth (in %) of GDP in 2023?**

Download the timeseries Real Gross Domestic Product (GDPC1) from FRED (https://fred.stlouisfed.org/series/GDPC1).
Calculate year-over-year (YoY) growth rate (that is, divide current value to one 4 quarters ago). Find the average YoY growth in 2023 (average from 4 YoY numbers).
Round to 1 digit after the decimal point: e.g. if you get 5.66% growth => you should answer  5.7

---

In [3]:
end=date.today()
start=date(end.year-3,end.month,end.day)

In [4]:
#  Real Gross Domestic Product (GDPC1) from FRED, Billions of Chained 2017 Dollars, QUARTERLY
# https://fred.stlouisfed.org/series/GDPC1
gdpc1 = pdr.DataReader("GDPC1", "fred", start=start)

In [5]:
gdpc1['gdpc1_us_yoy'] = round((gdpc1.GDPC1/gdpc1.GDPC1.shift(4)-1)*100,1)

In [6]:
gdpc1.tail(4)

Unnamed: 0_level_0,GDPC1,gdpc1_us_yoy
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01,22112.329,1.7
2023-04-01,22225.35,2.4
2023-07-01,22490.692,2.9
2023-10-01,22679.255,3.1


---
### Question 2. [Macro] Inverse "Treasury Yield"

**Find the min value of (dgs10-dgs2) after since year 2000 (2000-01-01) and write it down as an answer, round to 1 digit after the decimal point.**


Download DGS2 and DGS10 interest rates series (https://fred.stlouisfed.org/series/DGS2,
 https://fred.stlouisfed.org/series/DGS10). Join them together to one dataframe on date (you might need to read about pandas.DataFrame.join()), calculate the difference dgs10-dgs2 daily.

(Additional: think about what does the "inverted yield curve" mean for the market and investors? do you see the same thing in your country/market of interest? Do you think it can be a good predictive feature for the models?)

---

In [7]:
start = datetime(2000, 1, 1)

In [8]:
# DGS2 and DGS10 interest rates series (https://fred.stlouisfed.org/series/DGS2, https://fred.stlouisfed.org/series/DGS10)
# Fetch DGS2 and DGS10 interest rate data from FRED
dgs2 = pdr.DataReader("DGS2", "fred", start=start)
dgs10 = pdr.DataReader("DGS10", "fred", start=start)

# Join the two DataFrames on the date index
yield_data = dgs10.join(dgs2, how='inner')

# Calculate the difference (DGS10 - DGS2)
yield_data['yield_diff'] = yield_data['DGS10'] - yield_data['DGS2']

# Find the row with the minimum yield difference
min_yield_row = yield_data[yield_data['yield_diff'] == yield_data['yield_diff'].min()]

# Display the result
print(round(min_yield_row, 1))

            DGS10  DGS2  yield_diff
DATE                               
2023-07-03    3.9   4.9        -1.1


# Additional:

The concept of an inverted yield curve refers to a situation in financial markets where long-term debt instruments have a lower yield than short-term debt instruments of the same credit quality. This is an unusual situation because typically, longer-term securities have higher yields to compensate investors for the greater risk of holding them over a longer period.

### What an Inverted Yield Curve Means for the Market and Investors

*   **Recession Indicator:** Historically, an inverted yield curve has been one
of the most reliable predictors of a forthcoming recession. The inversion of the yield curve has preceded many of the U.S. recessions in recent decades.

*   **Investor Sentiment:** An inverted yield curve reflects negative investor sentiment about the future of the economy. Investors may expect that the central bank will have to lower interest rates in the future to stimulate a slowing economy.

*   **Lending Impact:** It can disincentivize banks and other financial institutions from lending, as the profit margin from borrowing short-term and lending long-term diminishes, potentially slowing economic growth further.


*   In the United States, the inverted yield curve is closely watched as a recession predictor.



---
### Question 3. [Index] Which Index is better recently?

**Compare S&P 500 and IPC Mexico indexes by the 5 year growth and write down the largest value as an answer (%)**

Download on Yahoo Finance two daily index prices for S&P 500 (^GSPC, https://finance.yahoo.com/quote/%5EGSPC/) and IPC Mexico (^MXX, https://finance.yahoo.com/quote/%5EMXX/). Compare 5Y growth for both (between 2019-04-09 and 2024-04-09). Select the higher growing index and write down the growth in % (closest integer %). E.g. if ratio end/start was 2.0925 (or growth of 109.25%), you need to write down 109 as your answer.

(Additional: think of other indexes and try to download stats and compare the growth? Do create 10Y and 20Y growth stats. What is an average yearly growth rate (CAGR) for each of the indexes you select?)

---

In [9]:
# Define the tickers and the date range
tickers = ["^GSPC", "^MXX"]  # S&P 500 and IPC Mexico
start_date = "2019-04-09"
end_date = "2024-04-09"

In [10]:
# Fetch the historical data
data = yf.download(tickers, start=start_date, end=end_date)

[*********************100%%**********************]  2 of 2 completed


In [11]:
# Calculate the 5-year growth for both indices:

growth_percentages = {}

for ticker in tickers:
    start_price = data["Adj Close"][ticker].iloc[0]
    end_price = data["Adj Close"][ticker].iloc[-1]
    growth_percentage = ((end_price / start_price) - 1) * 100
    growth_percentages[ticker] = int(round(growth_percentage))

In [12]:
# Find and print the index with the highest growth:

best_index_growth = max(growth_percentages, key=growth_percentages.get)
best_growth_value = growth_percentages[best_index_growth]
print(f"The index with the highest 5-year growth is {best_index_growth} with a growth of {best_growth_value}%")

The index with the highest 5-year growth is ^GSPC with a growth of 81%


# Additional:

Let's compare with VTI and QQQ (10Y)

In [13]:
# Define the tickers and the date range
tickers = ["VTI", "ITOT"]  # Vanguard Total Stock Market ETF & IShares Core S&P Total US Stock Market ETF
start_date = "2014-04-09"
end_date = "2024-04-09"

In [14]:
# Fetch the historical data
data = yf.download(tickers, start=start_date, end=end_date)

[*********************100%%**********************]  2 of 2 completed


In [15]:
# Calculate the 10-year growth for both indices:

growth_percentages = {}

for ticker in tickers:
    start_price = data["Adj Close"][ticker].iloc[0]
    end_price = data["Adj Close"][ticker].iloc[-1]
    growth_percentage = ((end_price / start_price) - 1) * 100
    growth_percentages[ticker] = int(round(growth_percentage))

In [16]:
# Find and print the index with the highest growth:

best_index_growth = max(growth_percentages, key=growth_percentages.get)
best_growth_value = growth_percentages[best_index_growth]
print(f"The index with the highest 10-year growth is {best_index_growth} with a growth of {best_growth_value}%")

The index with the highest 10-year growth is ITOT with a growth of 219%


Let's compare with VIG and DGRO (20Y)

In [17]:
# Define the tickers and the date range
tickers = ["VTI", "^GSPC"]
start_date = "2004-04-09"
end_date = "2024-04-09"

In [18]:
# Fetch the historical data
data = yf.download(tickers, start=start_date, end=end_date)

[                       0%%                      ][*********************100%%**********************]  2 of 2 completed


In [19]:
# Calculate the 20-year growth for both indices:

growth_percentages = {}

for ticker in tickers:
    start_price = data["Adj Close"][ticker].iloc[0]
    end_price = data["Adj Close"][ticker].iloc[-1]
    growth_percentage = ((end_price / start_price) - 1) * 100
    growth_percentages[ticker] = int(round(growth_percentage))

In [20]:
# Find and print the index with the highest growth:

best_index_growth = max(growth_percentages, key=growth_percentages.get)
best_growth_value = growth_percentages[best_index_growth]
print(f"The index with the highest 20-year growth is {best_index_growth} with a growth of {best_growth_value}%")

The index with the highest 20-year growth is VTI with a growth of 572%


---
### Question 4. [Stocks OHLCV] 52-weeks range ratio (2023) for the selected stocks


**Find the largest range ratio [=(max-min)/max] of Adj.Close prices in 2023**


Download the 2023 daily OHLCV data on Yahoo Finance for top5 stocks on earnings (https://companiesmarketcap.com/most-profitable-companies/): 2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM.

Here is the example data you should see in Pandas for "2222.SR": https://finance.yahoo.com/quote/2222.SR/history

Calculate maximum-minimim "Adj.Close" price for each stock and divide it by the maximum "Adj.Close" value.
Round the result to two decimal places (e.g. 0.1575 will be 0.16)

(Additional: why this may be important for your research?)

---

In [21]:
# Define the tickers for the stocks
tickers = ["2222.SR", "BRK-B", "AAPL", "MSFT", "GOOG", "JPM"]

In [22]:
# Set the time period for 2023
start_date = "2023-01-01"
end_date = "2023-12-31"

In [23]:
# Initialize a dictionary to store range ratios
range_ratios = {}

In [24]:
# Fetch the data and calculate the range ratio
for ticker in tickers:
    # Download the stock data
    stock_data = yf.download(ticker, start=start_date, end=end_date)

    # Calculate the maximum and minimum of the Adjusted Close prices
    max_price = stock_data['Adj Close'].max()
    min_price = stock_data['Adj Close'].min()

    # Calculate the range ratio
    range_ratio = (max_price - min_price) / max_price

    # Store the result, rounded to two decimal places
    range_ratios[ticker] = round(range_ratio, 2)

# Find the stock with the largest range ratio
stock_with_largest_ratio = max(range_ratios, key=range_ratios.get)
largest_ratio = range_ratios[stock_with_largest_ratio]

print("Stock with the largest range ratio:", stock_with_largest_ratio)
print("Largest range ratio:", largest_ratio)

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

Stock with the largest range ratio: MSFT
Largest range ratio: 0.42





---
### Question 5. [Stocks] Dividend Yield
**Find the largest dividend yield for the same set of stocks**

Use the same list of companies (2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM) and download all dividends paid in 2023.
You can use `get_actions()` method or `.dividends` field in yfinance library (https://github.com/ranaroussi/yfinance?tab=readme-ov-file#quick-start)

Sum up all dividends paid in 2023 per company and divide each value by the closing price (Adj.Close) at the last trading day of the year.

Find the maximm value in % and round to 1 digit after the decimal point. (E.g., if you obtained $1.25 dividends paid and the end year stock price is $100, the dividend yield is 1.25% -- and your answer should be equal to 1.3)

---

In [25]:
# Define the tickers for the stocks
tickers = ["2222.SR", "BRK-B", "AAPL", "MSFT", "GOOG", "JPM"]

# Set the time period for 2023
start_date = "2023-01-01"
end_date = "2023-12-31"

# Initialize a dictionary to store the dividend yields
dividend_yields = {}

# Fetch the data and calculate the dividend yield
for ticker in tickers:
    stock = yf.Ticker(ticker)

    # Get dividend actions and filter for 2023
    dividends = stock.dividends[start_date:end_date].sum()

    # Get the closing price on the last trading day of 2023
    # We adjust the date range to ensure data availability
    stock_data = yf.download(ticker, start="2023-12-20", end="2024-01-05")

    if not stock_data.empty:
        closing_price = stock_data['Adj Close'][-1]  # Get the last available closing price
        # Calculate the dividend yield
        dividend_yield = (dividends / closing_price) * 100
        # Store the result, rounded to one decimal place
        dividend_yields[ticker] = round(dividend_yield, 1)
    else:
        print(f"No data available for {ticker} near the end of 2023.")
        dividend_yields[ticker] = 0.0

# Find the stock with the highest dividend yield
if dividend_yields:
    stock_with_highest_yield = max(dividend_yields, key=dividend_yields.get)
    highest_yield = dividend_yields[stock_with_highest_yield]
    print("Stock with the highest dividend yield:", stock_with_highest_yield)
    print("Highest dividend yield:", highest_yield)
else:
    print("No valid dividend yield data available for any stock.")


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

Stock with the highest dividend yield: 2222.SR
Highest dividend yield: 2.8






---
### Question 6. [Exploratory] Investigate new metrics

**Free text answer**

Download and explore a few additional metrics or time series that might be valuable for your project and write down why (briefly).

---

### **1. Beta Coefficient**
What It Is: Beta measures the volatility of a stock or portfolio in relation to the overall market. A beta greater than 1 indicates higher volatility than the market, while a beta less than 1 indicates lower volatility.
Why It’s Valuable: Understanding beta helps in assessing risk and can be crucial for portfolio diversification and risk management. It is particularly useful for determining the risk-return profile of investment opportunities.

In [26]:
def get_beta(ticker):
    stock = yf.Ticker(ticker)
    beta = stock.info.get('beta', None)  # Retrieves the Beta value from the stock's info dictionary
    return beta

# Example usage
ticker = "AAPL"  # Replace 'AAPL' with your stock ticker
beta_value = get_beta(ticker)
print(f"The Beta coefficient for {ticker} is: {beta_value}")

The Beta coefficient for AAPL is: 1.276


---
### Question 7. [Exploratory] Time-driven strategy description around earnings releases

**Free text answer**

Explore earning dates for the whole month of April - e.g. using YahooFinance earnings calendar (https://finance.yahoo.com/calendar/earnings?from=2024-04-21&to=2024-04-27&day=2024-04-23). Compare with the previous closed earnings (e.g., recent dates with full data https://finance.yahoo.com/calendar/earnings?from=2024-04-07&to=2024-04-13&day=2024-04-08).

Describe an analytical strategy/idea (you're not required to implement it) to select a subset companies of interest based on the future events data.
---

In [35]:
upcoming_earnings_url = "https://finance.yahoo.com/calendar/earnings?from=2024-04-21&to=2024-04-27&day=2024-04-23"
recent_earnings_url = "https://finance.yahoo.com/calendar/earnings?from=2024-04-07&to=2024-04-13&day=2024-04-08"

In [36]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_earnings(url):
    response = requests.get(url)
    if response.status_code != 200:
        return None  # If the page isn't loaded properly

    soup = BeautifulSoup(response.text, 'html.parser')
    table = soup.find('table')
    headers = [header.text for header in table.find_all('th')]
    data_rows = []

    for row in table.find_all('tr')[1:]:  # Skipping the header row
        columns = row.find_all('td')
        row_data = [col.text.strip() for col in columns]
        data_rows.append(row_data)

    # Create a DataFrame
    return pd.DataFrame(data_rows, columns=headers)


In [39]:
def analyze_earnings(dataframe):
    # Placeholder for analysis logic, e.g., filtering based on surprise, comparing EPS, etc.
    # Example: Select companies with the biggest positive surprise
    dataframe['EPS Estimate'] = pd.to_numeric(dataframe['EPS Estimate'], errors='coerce')
    dataframe['Reported EPS'] = pd.to_numeric(dataframe['Reported EPS'], errors='coerce')
    dataframe['Surprise(%)'] = dataframe['Reported EPS'] - dataframe['EPS Estimate']
    filtered_df = dataframe[dataframe['Surprise(%)'] > 0]
    return filtered_df.sort_values(by='Surprise(%)', ascending=False)


In [40]:
upcoming_earnings_df = scrape_earnings(upcoming_earnings_url)
recent_earnings_df = scrape_earnings(recent_earnings_url)

if upcoming_earnings_df is not None and recent_earnings_df is not None:
    # Analysis
    analyzed_data = analyze_earnings(recent_earnings_df)
    print(analyzed_data)
else:
    print("Failed to scrape data.")


   Symbol     Company                           Event Name  \
70   BIVI  Biovie Inc  Q2 2024 BioVie Inc Earnings Release   

    Earnings Call Time  EPS Estimate  Reported EPS  Surprise(%)  
70  After Market Close         -0.33         -0.22         0.11  


**Momentum Strategy**: Buy stocks a few days before the earnings if the company has a history of positive earnings surprises.

**Post-Earnings Drift Strategy:** Invest in stocks after they announce better-than-expected earnings, betting on a continued price drift.

**Risk Management:** Implement stop-loss orders and adjust position sizes based on the historical volatility of the stock around earnings dates.