In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv("../data/startup_funding.csv")

# recreate numeric funding view (same logic as step 5)
funding_numeric = (
    df["Amount in USD"]
    .astype(str)
    .str.replace(",", "", regex=True)
)

funding_numeric = pd.to_numeric(funding_numeric, errors="coerce")

df_funding = df.loc[funding_numeric.notna()].copy()
df_funding["Amount in USD"] = funding_numeric[funding_numeric.notna()]

Recreate a cleaned numeric `Amount in USD` column to ensure consistency with prior cleaning steps.

In [2]:
mean_funding = df_funding["Amount in USD"].mean()
median_funding = df_funding["Amount in USD"].median()

mean_funding, median_funding

(np.float64(18429897.27080872), 1700000.0)

Compute mean and median to assess central tendency and skew.

In [3]:
mean_funding / median_funding

np.float64(10.841116041652187)

Compare mean to median to quantify skew (ratio >1 indicates right skew).

In [4]:
df_funding["Amount in USD"].quantile([0.5, 0.75, 0.9, 0.95, 0.99])

0.50    1.700000e+06
0.75    8.000000e+06
0.90    2.860000e+07
0.95    6.000000e+07
0.99    2.378400e+08
Name: Amount in USD, dtype: float64

Show several upper quantiles to understand the distribution tail and extreme values.

In [5]:
df_funding.groupby("Industry Vertical")["Amount in USD"].median().sort_values(ascending=False).head(10)

Industry Vertical
E-Commerce & M-Commerce platform      680000000.0
Ecommerce Marketplace                 500000000.0
Car Aggregator & Retail Mobile App    500000000.0
Cab Aggregator                        400000000.0
Online Marketplace                    350071500.0
Automation                            300000000.0
B2B                                   293500000.0
Cab rental Mobile app                 225000000.0
B2B Platform                          225000000.0
E-Tech                                200000000.0
Name: Amount in USD, dtype: float64

List top industries by median funding to compare central values across sectors.