### Import the libraries

In [None]:
import pandas as pd
import numpy as np

### Read the dataset

In [None]:
data = pd.read_csv('startup_funding.csv')
data.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,


### Clean the dataset

We want to find out the maximum amount of funding given to any startup in the following regions -

Bangalore 
NCR (which includes New Delhi, Gurgaon and Noida)
Mumbai 
Pune 
Hyderabad 

So create your population data first by extracting those startups which are in given cities.

Before that we need to replace:

`Delhi to New Delhi`

`bangalore to Bangalore`

`New Delhi, Gurgaon and Noida to NCR.`



In [None]:
data.CityLocation = data.CityLocation.replace(to_replace =["bangalore"], value ="Bangalore")
data.CityLocation = data.CityLocation.replace(to_replace =["Delhi"], value ="New Delhi")
data.CityLocation = data.CityLocation.replace(to_replace =["New Delhi", "Gurgaon", "Noida"], value ="NCR")

In [None]:
data.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,NCR,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,


In [None]:
data.shape

(2372, 10)

Now extract only the data for cities `['Bangalore', 'NCR', 'Mumbai', 'Pune', 'Hyderabad']`

In [None]:
city = data.loc[data['CityLocation'].isin(['Bangalore', 'NCR', 'Mumbai', 'Pune', 'Hyderabad'])]

In [None]:
city.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,NCR,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,


In [None]:
city.shape

(1937, 10)

In [None]:
city.CityLocation.value_counts()

NCR          703
Bangalore    628
Mumbai       446
Pune          84
Hyderabad     76
Name: CityLocation, dtype: int64

Create strata containing proportionate samples from each city

In [None]:
import numpy as np
N = 100

prop_strata = city.groupby('CityLocation', group_keys=False).apply(lambda x: x.sample(int(np.rint(N*len(x)/len(city))))).sample(frac=1).reset_index(drop=True)


In [None]:
prop_strata

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,1513,23/12/2015,LafaLafa,"Digital Coupons, Deal & Cashback aggregator app",,Mumbai,Vectr Ventures,Private Equity,,
1,1332,01/1/2016,Smartcooky,ECommerce,Health Food / Personal Care Marketplace,NCR,"Rajan Anandan, Pramod Bhasin, Siddharth Pai, T...",Seed Funding,,
2,1641,08/10/2015,Cube26,Gesture based Mobile Development,,NCR,"Tiger Global Management, Flipkart",Private Equity,7700000,Series A
3,1045,19/5/2016,Niki.ai,Technology,Artificial Intelligence Platform,Bangalore,Ratan Tata,Private Equity,,
4,1807,04/08/2015,Pickingo,On-Demand Delivery Logistics,,NCR,"Orios Venture Partners, Zishaan Hayath",Private Equity,1300000,Series A
...,...,...,...,...,...,...,...,...,...,...
94,405,18/01/2017,Kratikal,Technology,Cyber Security Solution provider,NCR,"Amajit Gupta, Praveen Dubey, J.P. Bhatt",Seed Funding,,
95,1357,08/01/2016,FlatFurnish,Consumer Internet,Online Furnishing Rental Platform,NCR,Arun Chandra Mohan,Seed Funding,,
96,635,18/10/2016,3Dexter,Technology,3D Printing Solutions for Edu space,NCR,Newbie Promoters,Seed Funding,150000,
97,789,22/08/2016,Zzungry,Consumer Internet,Food Delivery platform,Bangalore,"Satish Vasudeva, Madhusudhan Jujare & Others",Seed Funding,,


In [None]:
prop_strata.CityLocation.value_counts()

NCR          36
Bangalore    32
Mumbai       23
Hyderabad     4
Pune          4
Name: CityLocation, dtype: int64

In [None]:
city_amount = city['AmountInUSD']
city_amount.dropna(inplace=True)

city_amount = city_amount.str.replace(',', '')

city_amount = pd.to_numeric(city_amount)

city_amount_max = city_amount.max()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._update_inplace(result)


In [None]:
prop_strata_amount = prop_strata['AmountInUSD']
prop_strata_amount.dropna(inplace=True)

prop_strata_amount = prop_strata_amount.str.replace(',', '')

prop_strata_amount = pd.to_numeric(prop_strata_amount)

prop_strata_amount_max = prop_strata_amount.max()

In [None]:
city_amount_max - prop_strata_amount_max

720000000