<a href="https://colab.research.google.com/github/Avyukth/Stock-Markets-Analytics-Zoomcamp/blob/main/hw1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# install main library YFinance
!pip install yfinance



In [None]:
# IMPORTS
import numpy as np
import pandas as pd

#Fin Data Sources
import yfinance as yf
import pandas_datareader as pdr

#Data viz
import plotly.graph_objs as go
import plotly.express as px

import time
from datetime import date


## Question 1: [Macro] Average growth of GDP in 2023
What is the average growth (in %) of GDP in 2023?

Download the timeseries Real Gross Domestic Product (GDPC1) from FRED (https://fred.stlouisfed.org/series/GDPC1). Calculate year-over-year (YoY) growth rate (that is, divide current value to one 4 quarters ago). Find the average YoY growth in 2023 (average from 4 YoY numbers). Round to 1 digit after the decimal point: e.g. if you get 5.66% growth => you should answer 5.7



In [None]:
end = date.today()
start = date(year=end.year-70, month=end.month, day=end.day)

# Download the GDP data from FRED
df = pdr.DataReader("GDPC1", "fred", start, end)

df['GDPC1_YoY'] = df['GDPC1'] / df['GDPC1'].shift(4) - 1

df_2023 = df[df.index.year == 2023]
average_growth_2023 = df_2023['GDPC1_YoY'].mean() * 100

average_growth_2023_rounded = round(average_growth_2023, 1)

print(f"The average YoY growth of GDP in 2023 is {average_growth_2023_rounded}%.")


The average YoY growth of GDP in 2023 is 2.5%.


## Question 2. [Macro] Inverse "Treasury Yield"
Find the min value of (dgs10-dgs2) after since year 2000 (2000-01-01) and write it down as an answer, round to 1 digit after the decimal point.

Download DGS2 and DGS10 interest rates series (https://fred.stlouisfed.org/series/DGS2, https://fred.stlouisfed.org/series/DGS10). Join them together to one dataframe on date (you might need to read about pandas.DataFrame.join()), calculate the difference dgs10-dgs2 daily.

(Additional: think about what does the "inverted yield curve" mean for the market and investors? do you see the same thing in your country/market of interest? Do you think it can be a good predictive feature for the models?)

In [None]:

# Set the start and end dates for the data retrieval
start = date(year=2000, month=1, day=1)
end = date.today()

dgs2 = pdr.DataReader("DGS2", "fred", start, end)

dgs10 = pdr.DataReader("DGS10", "fred", start, end)

df_merged = pd.merge(dgs10, dgs2, left_index=True, right_index=True, how='inner')

df_merged["difference"] = df_merged["DGS10"] - df_merged["DGS2"]

min_difference = round(df_merged["difference"].min(), 1)

print(f"The minimum value of (DGS10 - DGS2) since January 1, 2000, is {min_difference}.")


The minimum value of (DGS10 - DGS2) since January 1, 2000, is -1.1.


## Question 3. [Index] Which Index is better recently?
Compare S&P 500 and IPC Mexico indexes by the 5 year growth and write down the largest value as an answer (%)

Download on Yahoo Finance two daily index prices for S&P 500 (^GSPC, https://finance.yahoo.com/quote/%5EGSPC/) and IPC Mexico (^MXX, https://finance.yahoo.com/quote/%5EMXX/). Compare 5Y growth for both (between 2019-04-09 and 2024-04-09). Select the higher growing index and write down the growth in % (closest integer %). E.g. if ratio end/start was 2.0925 (or growth of 109.25%), you need to write down 109 as your answer.

(Additional: think of other indexes and try to download stats and compare the growth? Do create 10Y and 20Y growth stats. What is an average yearly growth rate (CAGR) for each of the indexes you select?)

In [None]:
# Define tickers and date range
tickers = ["^GSPC", "^MXX"]
start_date = "2019-04-09"
end_date = "2024-04-09"

start_end_prices = {}

for ticker in tickers:
    data = yf.download(ticker, start=start_date, end=end_date)

    start_price = data['Close'].iloc[0]
    end_price = data['Close'].iloc[-1]

    growth = ((end_price - start_price) / start_price) * 100

    # Store data
    start_end_prices[ticker] = {
        'start_price': start_price,
        'end_price': end_price,
        'growth_percent': growth
    }

higher_growth_index = max(start_end_prices, key=lambda x: start_end_prices[x]['growth_percent'])
higher_growth_value = round(start_end_prices[higher_growth_index]['growth_percent'])
print( f"\nThe {higher_growth_index} had the higher 5-year growth at {higher_growth_value}%.")


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


The ^GSPC had the higher 5-year growth at 81%.





## Question 4. [Stocks OHLCV] 52-weeks range ratio (2023) for the selected stocks
Find the largest range ratio [=(max-min)/max] of Adj.Close prices in 2023

Download the 2023 daily OHLCV data on Yahoo Finance for top6 stocks on earnings (https://companiesmarketcap.com/most-profitable-companies/): 2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM.

Here is the example data you should see in Pandas for "2222.SR": https://finance.yahoo.com/quote/2222.SR/history

Calculate maximum-minimim "Adj.Close" price for each stock and divide it by the maximum "Adj.Close" value. Round the result to two decimal places (e.g. 0.1575 will be 0.16)

(Additional: why this may be important for your research?)



In [None]:
# ask chatGPT: emulate clicking the link and downloading the content
import requests
from bs4 import BeautifulSoup

# URL of the webpage
url = "https://companiesmarketcap.com/most-profitable-companies/"

# Define headers with a user-agent to mimic a web browser
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}

# Send a GET request to the URL with headers
response = requests.get(url, headers=headers)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content of the webpage
    soup = BeautifulSoup(response.content, "html.parser")

    # Find the download link within the webpage
    download_link = soup.find("a", {"rel": "nofollow", "href": "?download=csv"})

    # If the download link is found
    if download_link:
        # Extract the href attribute which contains the actual download link
        download_url = 'https://companiesmarketcap.com/'+download_link["href"]

        # Download the CSV file using the obtained download URL
        download_response = requests.get(download_url, headers=headers)

        # Check if the download request was successful
        if download_response.status_code == 200:
            # Save the content of the response to a local file
            with open("global_stocks.csv", "wb") as f:
                f.write(download_response.content)
            print("CSV file downloaded successfully.")
        else:
            print("Failed to download the CSV file.")
    else:
        print("Download link not found on the webpage.")
else:
    print("Failed to retrieve data from the webpage.")

CSV file downloaded successfully.


In [None]:
global_stocks = pd.read_csv("/content/global_stocks.csv")

In [None]:

def fetch_and_calculate_price_range(ticker, start_date, end_date):
    try:

        stock_data = yf.download(tickers=ticker, start=start_date, end=end_date, interval="1d")

        max_adj_close = stock_data['Adj Close'].max()
        min_adj_close = stock_data['Adj Close'].min()

        price_range = max_adj_close - min_adj_close

        normalized_range = price_range / max_adj_close

        rounded_normalized_range = round(normalized_range, 2)

        return rounded_normalized_range
    except Exception as e:
        print(f"Failed to fetch data for {ticker}: {str(e)}")
        return None


start_date = "2023-01-01"
end_date = "2023-12-31"


top_20_stocks = global_stocks.sort_values(by='marketcap', ascending=False).head(20)

# Initialize a dictionary to store the results
price_ranges = {}

for index, row in top_20_stocks.iterrows():
    symbol = row['Symbol']
    if pd.notna(symbol):
        result = fetch_and_calculate_price_range(symbol, start_date, end_date)
        price_ranges[symbol] = result


results_df = pd.DataFrame(list(price_ranges.items()), columns=['Symbol', 'Normalized Range'])

# Display the results
print(results_df)


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

     Symbol  Normalized Range
0      MSFT              0.42
1      AAPL              0.37
2      NVDA              0.72
3      GOOG              0.39
4   2222.SR              0.20
5      AMZN              0.46
6      META              0.65
7     BRK-B              0.21
8       LLY              0.50
9       TSM              0.31
10     AVGO              0.53
11      NVO              0.38
12        V              0.22
13      JPM              0.28
14      XOM              0.18
15      WMT              0.20
16     TSLA              0.63
17      UNH              0.19
18       MA              0.20
19    MC.PA              0.26





## Question 5. [Stocks] Dividend Yield
Find the largest dividend yield for the same set of stocks

Use the same list of companies (2222.SR,BRK-B, AAPL, MSFT, GOOG, JPM) and download all dividends paid in 2023. You can use get_actions() method or .dividends field in yfinance library (https://github.com/ranaroussi/yfinance?tab=readme-ov-file#quick-start)

Sum up all dividends paid in 2023 per company and divide each value by the closing price (Adj.Close) at the last trading day of the year.

Find the maximum value in % and round to 1 digit after the decimal point. (E.g., if you obtained $1.25 dividends paid and the end year stock price is $100, the dividend yield is 1.25% -- and your answer should be equal to 1.3)

In [None]:
import yfinance as yf
import pandas as pd

def fetch_and_calculate_dividend_yield(ticker):
    start_date = "2023-01-01"
    end_date = "2023-12-31"

    stock_data = yf.download(tickers=ticker, start=start_date, end=end_date, interval="1d")

    stock = yf.Ticker(ticker)

    actions = stock.get_actions()

    if not actions.empty:
        actions.index = actions.index.tz_convert(None)
        actions_2023 = actions[(actions.index >= start_date) & (actions.index <= end_date)]
        # print(actions_2023.info())
        dividends_2023 = actions_2023['Dividends'].sum()
    else:
        dividends_2023 = 0

    if not stock_data.empty:
        last_price = stock_data['Adj Close'].iloc[-1]

        # Calculate the dividend yield as a percentage
        if last_price > 0:
            dividend_yield = (dividends_2023 / last_price) * 100
            return round(dividend_yield, 1)
        else:
            return None
    else:
        return None

# Example list of tickers
symbols = ['2222.SR', 'BRK-B', 'AAPL', 'MSFT', 'GOOG', 'JPM']

dividend_yields = {}


for symbol in symbols:
    yield_value = fetch_and_calculate_dividend_yield(symbol)
    if yield_value is not None:
        dividend_yields[symbol] = yield_value

if dividend_yields:
    max_yield_symbol, max_yield = max(dividend_yields.items(), key=lambda x: x[1])
    print(f"The highest dividend yield for 2023 is {max_yield}% by {max_yield_symbol}.")
else:
    print("No dividend data available.")


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


The highest dividend yield for 2023 is 2.8% by 2222.SR.
