# Dispersion Analysis: Understanding Variability in Startup Funding

How scattered is funding across startups? We'll measure that using variance, standard deviation, and look at how it varies by industry. High spread means unpredictable funding; low spread means it's more consistent.

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv("../data/startup_funding.csv")

funding_numeric = (
    df["Amount in USD"]
    .astype(str)
    .str.replace(",", "", regex=True)
)

funding_numeric = pd.to_numeric(funding_numeric, errors="coerce")

df_funding = df.loc[funding_numeric.notna()].copy()
df_funding["Amount in USD"] = funding_numeric[funding_numeric.notna()]

## Step 1: Data Preparation

Recreate the clean numeric funding column—remove commas, convert to numbers, and drop any invalid entries.

In [4]:
variance_funding = df_funding["Amount in USD"].var()
std_funding = df_funding["Amount in USD"].std()

variance_funding, std_funding

(1.4731512939394228e+16, 121373444.12759419)

## Step 2: Variance and Standard Deviation

**Variance** shows how spread out funding is. **Standard deviation** is just the square root of that, in dollars. High std = wild differences in funding; low std = pretty consistent amounts.

In [5]:
mean_funding = df_funding["Amount in USD"].mean()
std_funding / mean_funding

np.float64(6.585682076472487)

## Step 3: Coefficient of Variation

**CV = std ÷ mean**. It's a normalized way to see variability—useful when comparing different things. Low CV = predictable; high CV = all over the place.

In [6]:
df_funding.groupby("Industry Vertical")["Amount in USD"].std().sort_values(ascending=False).head(10)

Industry Vertical
Transportation              1.947229e+09
Online Marketplace          4.948736e+08
B2B                         4.122433e+08
FinTech                     3.259681e+08
eCommerce                   2.544728e+08
ECommerce                   2.307254e+08
Last Mile Transportation    1.998991e+08
Health and Wellness         1.969168e+08
Food and Beverages          9.984444e+07
Ecommerce                   9.730066e+07
Name: Amount in USD, dtype: float64

## Step 4: Which Industries Are Most Consistent?

Let's see which industries have the wildest funding swings vs. which ones are more stable. High std might mean risky sectors or ones with both tiny seeds and huge rounds.

## Key Takeaways

- **Variance & std** tell you how scattered funding is around the average.
- **Coefficient of variation** normalizes that spread so you can compare apples to apples.
- **Some industries are wild**, others are predictable—now you can see which is which.
- Combined with mean and median, you get the full picture of how startup funding really works.