In [8]:
import pandas as pd
import numpy as np

df = pd.read_csv("../data/startup_funding.csv")

Import libraries and load the dataset into `df`.

In [9]:
df.shape
df.columns

Index(['Sr No', 'Date dd/mm/yyyy', 'Startup Name', 'Industry Vertical',
       'SubVertical', 'City  Location', 'Investors Name', 'InvestmentnType',
       'Amount in USD', 'Remarks'],
      dtype='object')

Check size and column names to confirm expected structure.

In [10]:
df["Amount in USD"].head(15)

0     20,00,00,000
1        80,48,394
2      1,83,58,860
3        30,00,000
4        18,00,000
5        90,00,000
6     15,00,00,000
7        60,00,000
8      7,00,00,000
9      5,00,00,000
10     2,00,00,000
11     1,20,00,000
12     3,00,00,000
13       59,00,000
14       20,00,000
Name: Amount in USD, dtype: object

Inspect sample values of the `Amount in USD` column before cleaning.

In [11]:
funding_numeric = (
    df["Amount in USD"]
    .astype(str)
    .str.replace(",", "", regex=True)
)

Remove thousands separators and coerce values to string before numeric conversion.

In [12]:
funding_numeric = pd.to_numeric(
    funding_numeric,
    errors="coerce"
)

Convert cleaned strings to numeric, coercing invalid values to NaN.

In [14]:
funding_numeric.head()

0    200000000.0
1      8048394.0
2     18358860.0
3      3000000.0
4      1800000.0
Name: Amount in USD, dtype: float64

Verify the numeric conversion on a small sample.

In [15]:
funding_numeric.isnull().sum()

np.int64(979)

Count how many values failed conversion (will be excluded).

In [16]:
funding_numeric.shape

(3044,)

Check array shape to ensure indexing operations align.

In [17]:
df_funding = df.loc[funding_numeric.notna()].copy()
df_funding["Amount in USD"] = funding_numeric[funding_numeric.notna()]

Keep only rows with valid numeric funding amounts for analysis.

In [19]:
df_funding.shape

(2065, 10)

Confirm the reduced dataset shape after filtering invalid amounts.

In [None]:
df_funding["Amount in USD"].dtype

dtype('float64')

Verify the `Amount in USD` column is now numeric for analysis.