In this homework, we're going to combine data from various sources to process it in Pandas and generate additional fields.

If not stated otherwise, please use the [LINK][link] covered at the livestream to re-use the code snippets.

[link]: https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp/blob/main/02-dataframe-analysis/%5B2025%5D_Module_02_Colab_Working_with_the_data.ipynb
---

# Question 1: [IPO] Withdrawn IPOs by Company Type

What is the total withdrawn IPO value (in $ millions) for the company class with the highest total withdrawal value?

From the withdrawn IPO list ([stockanalysis.com/ipos/withdrawn][wdipo]), collect and process the data to find out which company type saw the most withdrawn IPO value.
Steps:

1. Use `pandas.read_html()` with the URL above to load the IPO withdrawal table into a DataFrame.
   _It is a similar process to Code Snippet 1 discussed at the livestream._ You should get **99 entries**.
2. Create a new column called `Company Class`, categorizing company names based on patterns like:
    - "Acquisition Corp" or "Acquisition Corporation" → `Acq.Corp`
    - "Inc" or "Incorporated" → `Inc`
    - "Group" → `Group`
    - "Holdings" → `Holdings`
    - "Ltd" or "Limited" → `Ltd`
    - Others → `Other`
   
    Hint: make your function more robust by converting names to lowercase and splitting into words before matching patterns.
3. Define a new field `Avg. price` by parsing the `Price Range` field (create a function and apply it to the `Price` Range column).
   Examples: 
   - '$8.00-$10.00' → `9.0`
   - '$5.00' → `5.0`
   - '-' → `None`
4. Convert Shares Offered to numeric, clean missing or invalid values.
5. Create a new column:
   `Withdrawn Value = Shares Offered * Avg Price` **(71 non-null values)**
6. Group by Company Class and calculate total withdrawn value.
7. **Answer**: Which class had the highest total value of withdrawals?

[wdipo]: https://stockanalysis.com/ipos/withdrawn/

In [47]:
from bs4 import BeautifulSoup
from polars import col as c
import polars as pl
import requests as r

def read_table(url: str) -> pl.DataFrame:
    resp = r.get(url)
    doc = BeautifulSoup(resp.content, 'html.parser')
    table = doc.find(id='main-table')
    keys = [e.text.strip() for e in table.find_all(name='th')]

    return pl.DataFrame([
        dict(zip(keys, [e.text for e in row.find_all(name='td')]))
        for row in table.find(name='tbody').find_all(name='tr')
    ])

In [None]:
df = read_table('https://stockanalysis.com/ipos/withdrawn')
df.shape

In [38]:
df.head()

Symbol,Company Name,Price Range,Shares Offered
str,str,str,str
"""UNFL""","""Unifoil Holdings, Inc.""","""$3.00 - $4.00""","""2,000,000"""
"""AURN""","""Aurion Biotech, Inc.""","""-""","""-"""
"""ROTR""","""PHI Group, Inc.""","""-""","""-"""
"""ONE""","""One Power Company""","""-""","""-"""
"""HPOT""","""The Great Restaurant Developme…","""$4.00 - $6.00""","""1,400,000"""


In [39]:
df = (
    df.with_columns(
        c('Shares Offered').str.replace_all(',', '').str.to_integer(strict=False),
        (
            c('Price Range').str.replace_all(r'\$', '')
            .str.split(' - ')
            .list.eval(pl.element().cast(pl.Float32, strict=False))
            .list.mean()
        ).alias('Avg. price'),
        pl.when(c('Company Name').str.contains_any(['acquisition corp'], ascii_case_insensitive=True)).then(pl.lit('Acq.Corp'))
        .when(c('Company Name').str.contains_any(['inc'],                ascii_case_insensitive=True)).then(pl.lit('Inc'))
        .when(c('Company Name').str.contains_any(['group'],              ascii_case_insensitive=True)).then(pl.lit('Group'))
        .when(c('Company Name').str.contains_any(['holdings'],           ascii_case_insensitive=True)).then(pl.lit('Holdings'))
        .when(c('Company Name').str.contains_any(['ltd', 'limited'],     ascii_case_insensitive=True)).then(pl.lit('Ltd'))
        .otherwise(pl.lit('Other'))
        .cast(pl.Categorical(ordering='lexical'))
        .alias('Company Class')
    )
    .with_columns(
        (c('Avg. price') * c('Shares Offered') / 1_000_000.0).alias('Withdrawn Value (million)')
    )
    .filter(c('Withdrawn Value (million)').is_not_null())
)
df

Symbol,Company Name,Price Range,Shares Offered,Avg. price,Company Class,Withdrawn Value (million)
str,str,str,i64,f32,cat,f64
"""UNFL""","""Unifoil Holdings, Inc.""","""$3.00 - $4.00""",2000000,3.5,"""Inc""",7.0
"""HPOT""","""The Great Restaurant Developme…","""$4.00 - $6.00""",1400000,5.0,"""Holdings""",7.0
"""CABR""","""Caring Brands, Inc.""","""$4.00""",750000,4.0,"""Inc""",3.0
"""SQVI""","""Sequoia Vaccines, Inc.""","""$8.00 - $10.00""",2775000,9.0,"""Inc""",24.975
"""SNI""","""Shenni Holdings Limited""","""$4.00 - $6.00""",3000000,5.0,"""Holdings""",15.0
…,…,…,…,…,…,…
"""DPAC""","""Deep Space Acquisition Corp. I""","""$10.00""",21000000,10.0,"""Acq.Corp""",210.0
"""GIF""","""GigCapital6, Inc.""","""$10.00""",20000000,10.0,"""Inc""",200.0
"""HYIV""","""Haymaker Acquisition Corp. IV""","""$10.00""",26100000,10.0,"""Acq.Corp""",261.0
"""IFIT""","""iFIT Health & Fitness Inc.""","""$18.00 - $21.00""",30769231,19.5,"""Inc""",600.000004


In [41]:
(
    df.group_by('Company Class')
    .agg(c('Withdrawn Value (million)').sum())
).sort('Withdrawn Value (million)', descending=True)

Company Class,Withdrawn Value (million)
cat,f64
"""Acq.Corp""",4021.0
"""Inc""",2257.164205
"""Other""",767.919999
"""Ltd""",321.734585
"""Holdings""",303.0
"""Group""",33.7875


# Question 2: [IPO] Median Sharpe Ratio for 2024 IPOs (First 5 Months)

What is the median Sharpe ratio (as of 6 June 2025) for companies that went public in the first 5 months of 2024?

The goal is to replicate the large-scale yfinance OHLCV data download and perform basic financial calculations on IPO stocks.

Steps:

1. Using the same approach as in Question 1, download the IPOs in 2024 from: https://stockanalysis.com/ipos/2024/
   Filter to keep only those IPOs before 1 June 2024 (first 5 months of 2024). ➤ You should have 75 tickers.
2. Use Code Snippet 7 to download daily stock data for those tickers (via yfinance).
   Make sure you understand how `growth_1d` ... `growth_365d`, and volatility columns are defined.
   Define a new column `growth_252d` representing growth after 252 trading days (~1 year),
   in addition to any other growth periods you already track.
3. Calculate the Sharpe ratio assuming a risk-free rate of 4.5%:

    stocks_df['Sharpe'] = (stocks_df['growth_252d'] - 0.045) / stocks_df['volatility']

4. Filter the DataFrame to keep data only for the trading day: '2025-06-06'.
   Compute descriptive statistics (e.g., .describe()) for these columns:
   - `growth_252`
   - Sharpe
   You should observe:
   - `growth_252d` is defined for 71 out of 75 stocks (some IPOs are too recent or data starts later).
   - Median `growth_252d` is approximately 0.75 (indicating a 25% decline), while mean is about 1.15,
     showing a bias towards high-growth companies pushing the average up.

5. Answer:
   - What is the median Sharpe ratio for these 71 stocks?
   - Note: Positive Sharpe means growth exceeding the risk-free rate of 4.5%.
   - [Additional] Do you observe the same top 10 companies when sorting by `growth_252d` versus sorting by Sharpe?

In [51]:
df = read_table('https://stockanalysis.com/ipos/2024/')
df

IPO Date,Symbol,Company Name,IPO Price,Current,Return
str,str,str,str,str,str
"""Dec 31, 2024""","""ONEG""","""OneConstruction Group Limited""","""$4.00""","""$3.51""","""-12.25%"""
"""Dec 27, 2024""","""PHH""","""Park Ha Biological Technology …","""$4.00""","""$17.84""","""346.00%"""
"""Dec 23, 2024""","""HIT""","""Health In Tech, Inc.""","""$4.00""","""$0.63""","""-84.25%"""
"""Dec 23, 2024""","""TDAC""","""Translational Development Acqu…","""$10.00""","""$10.27""","""2.70%"""
"""Dec 20, 2024""","""RANG""","""Range Capital Acquisition Corp…","""$10.00""","""$10.25""","""2.50%"""
…,…,…,…,…,…
"""Jan 18, 2024""","""CCTG""","""CCSC Technology International …","""$6.00""","""$1.10""","""-81.67%"""
"""Jan 18, 2024""","""PSBD""","""Palmer Square Capital BDC Inc.""","""$16.45""","""$13.88""","""-15.62%"""
"""Jan 12, 2024""","""SYNX""","""Silynxcom Ltd.""","""$4.00""","""$1.68""","""-58.00%"""
"""Jan 11, 2024""","""SDHC""","""Smith Douglas Homes Corp.""","""$21.00""","""$19.27""","""-8.24%"""


In [55]:
from datetime import date

(
    df.with_columns(
        c('IPO Date').str.strptime(pl.Date, '%B %d, %Y'),
        c('IPO Price').str.replace(r'\$', '').cast(pl.Float32, strict=False),
        c('Current').str.replace(r'\$', '').cast(pl.Float32, strict=False),
        c('Return').str.replace('%', '').cast(pl.Float32, strict=False),
    )
    .filter((c('IPO Date') < date(2024, 6, 1)) & c('Return').is_not_null())
)

IPO Date,Symbol,Company Name,IPO Price,Current,Return
date,str,str,f32,f32,f32
2024-05-23,"""BOW""","""Bowhead Specialty Holdings Inc…",17.0,36.389999,114.059998
2024-05-17,"""HDL""","""Super Hi International Holding…",19.559999,20.440001,4.5
2024-05-17,"""RFAI""","""RF Acquisition Corp II""",10.0,10.51,5.1
2024-05-15,"""JDZG""","""JIADE Limited""",4.0,0.3,-92.629997
2024-05-15,"""RAY""","""Raytech Holding Limited""",4.0,1.26,-68.629997
…,…,…,…,…,…
2024-01-18,"""CCTG""","""CCSC Technology International …",6.0,1.1,-81.669998
2024-01-18,"""PSBD""","""Palmer Square Capital BDC Inc.""",16.450001,13.88,-15.62
2024-01-12,"""SYNX""","""Silynxcom Ltd.""",4.0,1.68,-58.0
2024-01-11,"""SDHC""","""Smith Douglas Homes Corp.""",21.0,19.27,-8.24
