Task 1: Data Overview

Tasks:
•	Load data and display first 10 rows.
•	Show shape, columns, and data types.
•	Generate statistical summaries.
•	Count unique cities.


In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('startup_funding.csv', thousands=',')

print(df.head(10))
print(df.shape)
print(df.columns)
print(df.dtypes)
print(df.describe(include='all'))
print("Unique Cities:", df['CityLocation'].nunique())


   index  SNo        Date       StartupName   IndustryVertical  \
0      0    0  01/08/2017          TouchKin         Technology   
1      1    1  02/08/2017           Ethinos         Technology   
2      2    2  02/08/2017      Leverage Edu  Consumer Internet   
3      3    3  02/08/2017              Zepo  Consumer Internet   
4      4    4  02/08/2017      Click2Clinic  Consumer Internet   
5      5    5  01/07/2017     Billion Loans  Consumer Internet   
6      6    6  03/07/2017  Ecolibriumenergy         Technology   
7      7    7  04/07/2017             Droom          eCommerce   
8      8    8  05/07/2017         Jumbotail          eCommerce   
9      9    9  05/07/2017            Moglix          eCommerce   

                                     SubVertical CityLocation  \
0                       Predictive Care Platform    Bangalore   
1                       Digital Marketing Agency       Mumbai   
2  Online platform for Higher Education Services    New Delhi   
3            

Task 2: Data Cleaning

•	Identify missing values.
•	Fill numeric nulls with median.
•	Fill categorical nulls with "Unknown".
•	Remove duplicates.


In [22]:
print(df.isnull().sum())
df['AmountInUSD'] = df['AmountInUSD'].fillna(df['AmountInUSD'].median())
df[['CityLocation', 'InvestmentType', 'InvestorsName']] = df[['CityLocation', 'InvestmentType', 'InvestorsName']].fillna('Unknown')
df.drop_duplicates(inplace=True)


index                  0
SNo                    0
Date                   0
StartupName            0
IndustryVertical     171
SubVertical          936
CityLocation         179
InvestorsName          9
InvestmentType         1
AmountInUSD          869
Remarks             2003
dtype: int64


Task 3: Data Standardization

•	Fix inconsistent text (e.g., 'SeedFunding' → 'Seed Funding').
•	Convert city names to lowercase.
•	Count startups with "Ventures" in investor names.


In [23]:
df['InvestmentType'] = df['InvestmentType'].str.replace('SeedFunding', 'Seed Funding')
df['CityLocation'] = df['CityLocation'].str.lower()
ventures_count = df[df['InvestorsName'].str.contains('Ventures', case=False, na=False)].shape[0]
print("Startups with 'Ventures' in InvestorsName:", ventures_count)


Startups with 'Ventures' in InvestorsName: 399


Task 4: Date Analysis

•	Convert Date column to datetime.
•	Extract Year, Month, and Day.
•	Find the month with most investments.
•	Compute average investment per year.


In [24]:
df['Date'] = df['Date'].astype(str).str.replace('//', '/').str.replace('.', '/')
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
print("Month with most investments:", df['Month'].value_counts().idxmax())
print("Average investment per year:\n", df.groupby('Year')['AmountInUSD'].mean())


Month with most investments: 6
Average investment per year:
 Year
2015    9.597460e+06
2016    4.305930e+06
2017    1.358934e+07
Name: AmountInUSD, dtype: float64


Task 5: Advanced Filtering

•	Find startups in Bangalore with funding > ₹5,000,000.
•	Filter startups in top 3 cities with seed funding.
•	Top 10 startups by funding.
•	Count investments by industry.
•	Average funding by investment type.


In [25]:
bangalore_high = df[(df['CityLocation'] == 'bangalore') & (df['AmountInUSD'] > 5000000)]
print(bangalore_high[['StartupName', 'AmountInUSD']])
top3 = df['CityLocation'].value_counts().head(3).index
seed_top3 = df[(df['CityLocation'].isin(top3)) & (df['InvestmentType'] == 'Seed Funding')]
print(seed_top3[['StartupName', 'CityLocation']])
print(df.nlargest(10, 'AmountInUSD')[['StartupName', 'AmountInUSD']])
print(df['IndustryVertical'].value_counts())
print(df.groupby('InvestmentType')['AmountInUSD'].mean())


       StartupName  AmountInUSD
8        Jumbotail    8500000.0
19        Innoviti   18500000.0
31       Rentomojo   10000000.0
40          Byju’s   35000000.0
74         Goodera    5500000.0
...            ...          ...
2148       Olacabs  400000000.0
2152  Urban Ladder   50000000.0
2157        ZopNow   10000000.0
2178   Simplilearn   15000000.0
2190      Babajobs   10000000.0

[118 rows x 2 columns]
        StartupName CityLocation
2      Leverage Edu    new delhi
3              Zepo       mumbai
5     Billion Loans    bangalore
11           Minjar    bangalore
13         Clip App    bangalore
...             ...          ...
2183      EazyDiner    new delhi
2184  Phone Warrior    new delhi
2187           Grab       mumbai
2197           Dazo    bangalore
2198       Tradelab    bangalore

[791 rows x 2 columns]
       StartupName   AmountInUSD
158          Paytm  1.400000e+09
294       Flipkart  1.400000e+09
1976  Flipkart.com  7.000000e+08
1787         Paytm  6.800000e+08
1572   

Task 6: Data Manipulation

•	Create new column FundingCategory:
o	Small: < ₹500,000
o	Medium: ₹500,000–₹5,000,000
o	Large: > ₹5,000,000
•	Drop unnecessary columns (SNo, Remarks).
•	Sort by AmountInUSD.
•	Filter “Technology” startups.
•	Compute funding share by city.


In [26]:
def category(x):
    if x < 500000: return 'Small'
    elif x <= 5000000: return 'Medium'
    else: return 'Large'
df['FundingCategory'] = df['AmountInUSD'].apply(category)
df.drop(columns=['SNo', 'Remarks'], errors='ignore', inplace=True)
df = df.sort_values(by='AmountInUSD', ascending=False)
tech_df = df[df['IndustryVertical'].str.contains('Technology', case=False, na=False)]
city_share = df.groupby('CityLocation')['AmountInUSD'].sum() / df['AmountInUSD'].sum() * 100
print("Funding share by city:\n", city_share)


Funding share by city:
 CityLocation
agra                      0.011411
ahmedabad                 0.566344
bangalore                44.758729
bangalore / palo alto     0.005187
bangalore / san mateo     0.041496
                           ...    
us/india                  0.015561
usa                       0.005706
usa/india                 0.086104
vadodara                  0.054152
varanasi                  0.000270
Name: AmountInUSD, Length: 71, dtype: float64
