## Project 1 - Analysis of Indian Start-up Funding

### Introduction
Analyzing funding received by start-ups in India from 2018 to 2021

### Business Understanding
The Indian start-up ecosystem is a competitive and fast-growing industry that offers potential investment and innovation opportunities. Understanding the industry trends and patterns is essential to making well informed investment decisions and achieving success.

### Hypothesis
Null Hypothesis:
Start-ups in the technology sector are more likely to secure higher funding

Alternate Hypothesis:
Start-ups in the technology sector are not necessarily likely to secure higher funding

### Questions
1. What is the trend in value of Indian start-up funding over the years? 
2. Has the trend in the number of Indian start-ups increased or decreased over the years?
3. Which industries have the highest number of start-ups?
4. Which industries have received the highest funding amounts?
5. What is the average funding amount for Indian start-ups?
6. Are there any correlations between headquarters location and share of funding?

## Importing libraries

In [1]:
import pandas as pd
import numpy as np
df = pd.read_csv('startup_funding2019.csv')

In [2]:
df.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


## Data cleaning

### Identifying inconsistencies and errors

In [50]:
df.info() #Overview

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        89 non-null     int32  
 2   HeadQuarter    89 non-null     object 
 3   Sector         89 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     float64
 8   Stage          89 non-null     object 
dtypes: float64(1), int32(1), object(7)
memory usage: 6.0+ KB


### Check for duplicates

In [18]:
# Check for duplicates in company names
df[df["Company/Brand"].duplicated(keep=False)].sort_values(by="Company/Brand")

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
7,Kratikal,2013,Noida,Technology,It is a product-based cybersecurity solutions ...,"Pavan Kushwaha, Paratosh Bansal, Dip Jung Thapa","Gilda VC, Art Venture, Rajeev Chitrabhanu.","$1,000,000",Pre series A
82,Kratikal,2015,Uttar pradesh,Technology,Provides cyber security solutions,Pavan Kushwaha,"Gilda VC, Art Venture, Rajeev Chitrabhanu","$1,000,000",Pre-series A
30,Licious,2015,Bangalore,Foodtech,Online meat shop,"Vivek Gupta, Abhay Hanjura",Vertex Growth Fund,"$30,000,000",Series E
68,Licious,2015,Bangalore,Foodtech,Online meat shop,"Vivek Gupta, Abhay Hanjura",Vertex Ventures,"$25,000,000",Series D


In [19]:
# Drop row 82 because it doesnt have year founded and didnt list all founders
# Drop row 30 because it doesnt have year founded so doesnt seem confident as row 68

df.drop(82, axis= 0, inplace=True)
df.drop(30, axis= 0, inplace=True)

In [21]:
# confirm if corrections were successful
df[df["Company/Brand"].duplicated(keep=False)].sort_values(by="Company/Brand")

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage


### Fill empty cells
Replace empty cells in Founded with Mode; Replace empty in HeadQuarter with unknown; Replace empty in Sector with unknown

In [3]:
Founded_mode = df['Founded'].mode()[0]
df['Founded'].fillna(Founded_mode, inplace=True)

In [5]:
df['HeadQuarter'].fillna('unknown', inplace=True)

In [6]:
df['Sector'].fillna('unknown', inplace=True)

In [45]:
df['Stage'].fillna('unknown', inplace=True)

In [46]:
df.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,2015,unknown,E-commerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,6300000.0,unknown
1,Ruangguru,2014,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,150000000.0,Series C
2,Eduisfun,2015,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey",28000000.0,Fresh funding
3,HomeLane,2014,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...",30000000.0,Series D
4,Nu Genes,2004,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),6000000.0,unknown


### Replace cells with appropraite values
Convert values in Founded to Integers;
Provide common names to similar sectos;
Change name of Amount($) column;
Replace undisclosed with mode of Amount

In [8]:
df['Founded'] = df['Founded'].astype(int)

In [9]:
df['Sector'] = df['Sector'].replace(['AI & Tech'], 'AI', regex=True)

In [16]:
df['Sector'] = df['Sector'].str.replace(r'\b(?!Automobile)\w*Auto\w*\b', 'Automobile', regex=True)

In [11]:
df['Sector'] = df['Sector'].replace({'Food':'Foodtech','Foodtech tech':'Foodtech', 'Foodtechtech':'Foodtech', 'Food & tech':'Foodtech', 'Food tech':'Foodtech'})

In [14]:
df['Sector'] = df['Sector'].replace({'Ecommerce':'E-commerce','E-commerce & AR':'E-commerce', 'E-commerce':'E-commerce'})

In [26]:
df = df.rename(columns={'Amount': 'Amount($)'})

In [22]:
df['Amount'] = df['Amount'].replace(['Undisclosed'], '', regex=True)

In [27]:
df

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,2015,unknown,E-commerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,$6300000,
1,Ruangguru,2014,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,$150000000,Series C
2,Eduisfun,2015,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey",$28000000,Fresh funding
3,HomeLane,2014,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...",$30000000,Series D
4,Nu Genes,2004,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),$6000000,
...,...,...,...,...,...,...,...,...,...
84,Infra.Market,2015,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...",$20000000,Series A
85,Oyo,2013,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...",$693000000,
86,GoMechanic,2016,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,$5000000,Series B
87,Spinny,2015,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...",$50000000,


### Handling Amount Column

In [32]:
df['Amount($)'] = df['Amount($)'].apply(lambda x: x.replace(",",""))  # removing comma from the string

In [28]:
df['Amount($)'] = df['Amount($)'].str.replace('$', '') #removing $ sign

In [34]:
amount_mean = df['Amount($)'].mean()
df['Amount($)'] = df['Amount($)'].fillna(amount_mean) # Replace "Undisclosed" and "Undisclosed" with the mean amount

In [36]:
df

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,2015,unknown,E-commerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,6300000,
1,Ruangguru,2014,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,150000000,Series C
2,Eduisfun,2015,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey",28000000,Fresh funding
3,HomeLane,2014,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...",30000000,Series D
4,Nu Genes,2004,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),6000000,
...,...,...,...,...,...,...,...,...,...
84,Infra.Market,2015,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...",20000000,Series A
85,Oyo,2013,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...",693000000,
86,GoMechanic,2016,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,5000000,Series B
87,Spinny,2015,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...",50000000,


In [42]:
# Replace empty cells in the "Amount" column with the mean
mean_amount = pd.to_numeric(df['Amount($)'], errors='coerce').mean()
df['Amount($)'] = pd.to_numeric(df['Amount($)'], errors='coerce').fillna(mean_amount) 

In [43]:
df['Amount($)'] = df['Amount($)'].astype(float) #convert to float

In [22]:
pd.set_option('display.max_rows', None)
df

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,2015,unknown,E-commerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",unknown
1,Ruangguru,2014,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,2015,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",unknown
5,FlytBase,2015,Pune,Technology,A drone automation platform,Nitin Gupta,Undisclosed,Undisclosed,unknown
6,Finly,2015,Bangalore,SaaS,It builds software products that makes work si...,"Vivek AG, Veekshith C Rai","Social Capital, AngelList India, Gemba Capital...",Undisclosed,unknown
7,Kratikal,2013,Noida,Technology,It is a product-based cybersecurity solutions ...,"Pavan Kushwaha, Paratosh Bansal, Dip Jung Thapa","Gilda VC, Art Venture, Rajeev Chitrabhanu.","$1,000,000",Pre series A
8,Quantiphi,2015,unknown,AI,It is an AI and big data services company prov...,Renuka Ramnath,Multiples Alternate Asset Management,"$20,000,000",Series A
9,Lenskart,2010,Delhi,E-commerce,It is a eyewear company,"Peyush Bansal, Amit Chaudhary, Sumeet Kapahi",SoftBank,"$275,000,000",Series G
