# Data Analysis Project -- Indian Start-up Funding Analysis
Ideas, creativity, and execution are essential for a start-up to flourish. But are they enough? Investors provide start-ups and other entrepreneurial ventures with the capital---popularly known as "funding"---to think big, grow rich, and leave a lasting impact. In this project, I am  going to analyse funding received by start-ups in India from 2018 to 2021. I will find the data for each year of funding in a separate csv file in the dataset provided. In these files I'll find the start-ups' details, the funding amounts received, and the investors' information.
## Column names and description:
#### Company/Brand:
Name of the company/start-up
#### Founded:
Year start-up was founded
#### Sector: 
Sector of service
#### What it does:
Description about Company
#### Founders:
Founders of the Company
#### Investor:
Investors
#### Amount($):
Raised fund
#### Stage:
Round of funding reached

## Business Questions:
 
1. What are the trends in funding amounts received by Indian startups from 2018 to 2021?
2. Which sectors attracted the highest amount of funding during this period?
3. How do the funding trends vary across different stages of startup development (early-stage, growth-stage, etc.)?
4. Is there a correlation between the geographical location of startups and the funding they received?
5. What is the relationship between funding amounts and the subsequent success or failure of startups?
 
## Hypothesis to Test:
 
Given the goal of assessing the investment potential in the Indian startup ecosystem, we hypothesize that:
 
- H1: The funding amounts received by Indian startups have shown a positive trend from 2018 to 2021, indicating investor confidence and potential for returns.
- H2: Sectors such as technology, e-commerce, and fintech have attracted substantial funding, suggesting growth opportunities and market demand.
- H3: Early-stage startups have garnered significant funding, indicating a fertile ground for innovation and potential high returns on investment.
- H4: There is a correlation between the geographical location of startups and the funding they received, with certain hubs like Bangalore, Mumbai, and Delhi attracting more investment due to infrastructure, talent pool, and market access.
- H5: Startups that received higher funding amounts are more likely to achieve success and provide satisfactory returns on investment, thus indicating the potential for profitable investment opportunities in the Indian startup ecosystem.
 
## Objectives:
 
1. To assess the overall attractiveness of the Indian startup ecosystem based on funding trends and investor activity from 2018 to 2021.
2. To identify key sectors with high potential for investment based on their funding attractiveness and growth prospects.
3. To evaluate the investment opportunities across different stages of startup development and their risk-return profiles.
4. To analyze the geographical distribution of startups and funding to identify strategic investment locations and regional investment disparities.
5. To determine the correlation between funding amounts received by startups and their subsequent performance, providing insights into potential returns on investment and success rates.
 
These objectives aim to provide a comprehensive evaluation of the investment landscape in the Indian startup ecosystem, helping the team make informed decisions regarding the feasibility and potential of investing in Indian startups.

## Install necessary packages

In [394]:
%pip install pyodbc




In [395]:
%pip install python-dotenv




 ### Import all the necessary packages
 

In [396]:
import pyodbc     
from dotenv import dotenv_values    #import the dotenv_values function from the dotenv package
import pandas as pd
import warnings 
warnings.filterwarnings('ignore')


## Load the datasets to use in this project

In [397]:
SERVER="dap-projects-database.database.windows.net"
LOGIN="LP1_learner"
PASSWORD="Hyp0th3s!$T3$t!ng"
DATABASE="dapDB"

In [398]:
# load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')
 
# Get the values for the credentials from .env file
database=environment_variables.get("DATABASE")
server=environment_variables.get("SERVER")
login=environment_variables.get("LOGIN")
password=environment_variables.get("PASSWORD")
 
# create a connection string
connection_string=f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={login};PWD={password}"

In [399]:
connection = pyodbc.connect(connection_string)

In [400]:
# selecting tables from DB
db_query = ''' SELECT *
            FROM INFORMATION_SCHEMA.TABLES
            WHERE TABLE_TYPE = 'BASE TABLE' '''

In [401]:
# call selected table from DataFrame
data1=pd.read_sql(db_query, connection)
 
data1

Unnamed: 0,TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,TABLE_TYPE
0,dapDB,dbo,LP1_startup_funding2021,BASE TABLE
1,dapDB,dbo,LP1_startup_funding2020,BASE TABLE


In [402]:
query = "select * from dbo.LP1_startup_funding2020"
data = pd.read_sql(query, connection)
data.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [403]:
data.describe()

Unnamed: 0,Founded,Amount
count,842.0,801.0
mean,2015.36342,113043000.0
std,4.097909,2476635000.0
min,1973.0,12700.0
25%,2014.0,1000000.0
50%,2016.0,3000000.0
75%,2018.0,11000000.0
max,2020.0,70000000000.0


In [404]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.5+ KB


In [405]:
data.dtypes

Company_Brand     object
Founded          float64
HeadQuarter       object
Sector            object
What_it_does      object
Founders          object
Investor          object
Amount           float64
Stage             object
column10          object
dtype: object

In [406]:
data.shape

(1055, 10)

In [407]:
data.isna().sum()

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64

In [408]:
data['Amount'] = data['Amount'].astype(str)

In [409]:
# Remove the $ sign and convert to float, handling non-numeric values
data['Amount'] = pd.to_numeric(data['Amount'].str.replace('$', '').str.replace(',', ''), errors='coerce')


In [410]:


# Founded: replace null values with median
data['Founded'].fillna(data['Founded'].median(), inplace=True)

# Sector: replace with most repeated
data['Sector'].fillna(data['Sector'].mode()[0], inplace=True)

# dealing with missing values in Headquarter column
data['HeadQuarter'].fillna('HeadQuarter Unknown', inplace=True)



# Founders: simulate by filling with "Unknown")
data['Founders'].fillna('Unknown Founders', inplace=True)

#Investor: simulate by filling with "Various Investors")
data['Investor'].fillna('Various Investors', inplace=True)

# Amount($): simulate by filling with median of existing amounts
data['Amount'].fillna(data['Amount'].mean(), inplace=True)

# Stage: simulate by mode
data['Stage'].fillna(data['Stage'].mode()[0], inplace=True)

data.drop('column10',axis=1)

print(data)

     Company_Brand  Founded          HeadQuarter              Sector  \
0       Aqgromalin   2019.0              Chennai            AgriTech   
1         Krayonnz   2019.0            Bangalore              EdTech   
2     PadCare Labs   2018.0                 Pune  Hygiene management   
3            NCOME   2020.0            New Delhi              Escrow   
4       Gramophone   2016.0               Indore            AgriTech   
...            ...      ...                  ...                 ...   
1050  Leverage Edu   2016.0                Delhi              Edtech   
1051         EpiFi   2016.0  HeadQuarter Unknown             Fintech   
1052       Purplle   2012.0               Mumbai           Cosmetics   
1053        Shuttl   2015.0                Delhi           Transport   
1054         Pando   2017.0              Chennai            Logitech   

                                           What_it_does  \
0                          Cultivating Ideas for Profit   
1     An academy-

In [411]:
data.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,Series A,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,3000000.0,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,Series A,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,Series A,


In [412]:
data.isna().sum()

Company_Brand       0
Founded             0
HeadQuarter         0
Sector              0
What_it_does        0
Founders            0
Investor            0
Amount              0
Stage               0
column10         1053
dtype: int64

In [413]:
query = "select * from dbo.LP1_startup_funding2021"
data1 = pd.read_sql(query, connection)
data1

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed
...,...,...,...,...,...,...,...,...,...
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B


In [414]:
data1.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [415]:
data1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [416]:
data1.columns

Index(['Company_Brand', 'Founded', 'HeadQuarter', 'Sector', 'What_it_does',
       'Founders', 'Investor', 'Amount', 'Stage'],
      dtype='object')

In [417]:
data1.describe()

Unnamed: 0,Founded
count,1208.0
mean,2016.655629
std,4.517364
min,1963.0
25%,2015.0
50%,2018.0
75%,2020.0
max,2021.0


In [418]:
data1.isna().sum()

Company_Brand      0
Founded            1
HeadQuarter        1
Sector             0
What_it_does       0
Founders           4
Investor          62
Amount             3
Stage            428
dtype: int64

In [419]:
data1['Amount'] = data1['Amount'].astype(str)


In [420]:

# Remove the $ sign and convert to float, handling non-numeric values
data1['Amount'] = pd.to_numeric(data1['Amount'].str.replace('$', '').str.replace(',', ''), errors='coerce')


In [421]:

# Founded: replace null values with median
data1['Founded'].fillna(data1['Founded'].median(), inplace=True)



# dealing with missing values in Headquarter column
data1['HeadQuarter'].fillna('HeadQuarter Unknown', inplace=True)



# Founders: simulate by filling with "Unknown")
data1['Founders'].fillna('Unknown Founders', inplace=True)

#Investor: simulate by filling with "Various Investors")
data1['Investor'].fillna('Various Investors', inplace=True)

# Amount($): simulate by filling with median of existing amounts
data1['Amount'].fillna(data1['Amount'].median(), inplace=True)

# Stage: simulate by mode
data1['Stage'].fillna(data1['Stage'].mode()[0], inplace=True)


print(data1)

       Company_Brand  Founded HeadQuarter                 Sector  \
0     Unbox Robotics   2019.0   Bangalore             AI startup   
1             upGrad   2015.0      Mumbai                 EdTech   
2        Lead School   2012.0      Mumbai                 EdTech   
3            Bizongo   2015.0      Mumbai         B2B E-commerce   
4           FypMoney   2021.0    Gurugram                FinTech   
...              ...      ...         ...                    ...   
1204        Gigforce   2019.0    Gurugram  Staffing & Recruiting   
1205          Vahdam   2015.0   New Delhi       Food & Beverages   
1206    Leap Finance   2019.0   Bangalore     Financial Services   
1207    CollegeDekho   2015.0    Gurugram                 EdTech   
1208          WeRize   2019.0   Bangalore     Financial Services   

                                           What_it_does  \
0     Unbox Robotics builds on-demand AI-driven ware...   
1        UpGrad is an online higher education platform.   
2     

In [422]:
data2 =pd.read_csv("startup_funding2018.csv")
data2

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif..."
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...


In [423]:
data2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [424]:
data2.shape

(526, 6)

In [425]:
data2.dtypes

Company Name     object
Industry         object
Round/Series     object
Amount           object
Location         object
About Company    object
dtype: object

In [426]:
data2.describe().T

Unnamed: 0,count,unique,top,freq
Company Name,526,525,TheCollegeFever,2
Industry,526,405,—,30
Round/Series,526,21,Seed,280
Amount,526,198,—,148
Location,526,50,"Bangalore, Karnataka, India",102
About Company,526,524,"TheCollegeFever is a hub for fun, fiesta and f...",2


In [427]:
data2.isna().sum()

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
dtype: int64

In [428]:
data2['Amount'] = data2['Amount'].astype(str)

In [429]:
# Remove the $ sign and convert to float, handling non-numeric values
data2['Amount'] = pd.to_numeric(data2['Amount'].str.replace('$', '').str.replace(',', ''), errors='coerce')


In [430]:
data3 = pd.read_csv("startup_funding2019.csv")
data3

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",
...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",


In [431]:
data3.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [432]:
data3.tail()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",
88,Ess Kay Fincorp,,Rajasthan,Banking,Organised Non-Banking Finance Company,Rajendra Setia,"TPG, Norwest Venture Partners, Evolvence India","$33,000,000",


In [433]:
data3.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Founded,60.0,2014.533333,2.937003,2004.0,2013.0,2015.0,2016.25,2019.0


In [434]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [435]:
data3.dtypes

Company/Brand     object
Founded          float64
HeadQuarter       object
Sector            object
What it does      object
Founders          object
Investor          object
Amount($)         object
Stage             object
dtype: object

In [436]:
data3.shape

(89, 9)

In [437]:
data3.isnull().sum()

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
dtype: int64

In [438]:
data3['Amount($)'] = data3['Amount($)'].astype(str)

In [439]:
# Remove the $ sign and convert to float, handling non-numeric values
data3['Amount($)'] = pd.to_numeric(data3['Amount($)'].str.replace('$', '').str.replace(',', ''), errors='coerce')


In [440]:


# Founded: replace null values with median
data3['Founded'].fillna(data3['Founded'].median(), inplace=True)

# Sector: replace with most repeated
data3['Sector'].fillna(data3['Sector'].mode()[0], inplace=True)

# dealing with missing values in Headquarter column
data3['HeadQuarter'].fillna('HeadQuarter Unknown', inplace=True)


# Founders: simulate by filling with "Unknown")
data3['Founders'].fillna('Unknown Founders', inplace=True)


# Stage: simulate by mode
data3['Stage'].fillna(data3['Stage'].mode()[0], inplace=True)


print(data3)

      Company/Brand  Founded          HeadQuarter                   Sector  \
0    Bombay Shaving   2015.0  HeadQuarter Unknown                Ecommerce   
1         Ruangguru   2014.0               Mumbai                   Edtech   
2          Eduisfun   2015.0               Mumbai                   Edtech   
3          HomeLane   2014.0              Chennai          Interior design   
4          Nu Genes   2004.0            Telangana                 AgriTech   
..              ...      ...                  ...                      ...   
84     Infra.Market   2015.0               Mumbai                Infratech   
85              Oyo   2013.0             Gurugram              Hospitality   
86       GoMechanic   2016.0                Delhi  Automobile & Technology   
87           Spinny   2015.0                Delhi               Automobile   
88  Ess Kay Fincorp   2015.0            Rajasthan                  Banking   

                                         What it does  \
0     

In [441]:
# Define a common schema for renaming
common_schema = {
    'Company_Brand': 'Company',
    'Company Name': 'Company',
    'Company/Brand': 'Company',
    'Founded': 'Founded',
    'HeadQuarter': 'Headquarter',
    'HeadQuarter': 'Headquarter',
    'Sector': 'Sector',
    'Industry': 'Sector',
    'What_it_does': 'Description',
    'What it does': 'Description',
    'About Company': 'Description',
    'Founders': 'Founders',
    'Investor': 'Investor',
    'Amount': 'Amount',
    'Amount($)': 'Amount',
    'Stage': 'Stage',
    'Round/Series': 'Stage',
    'Location': 'Headquarter',
    'column10': 'column10'  # Assuming this is an additional column not present in all datasets
}

# Rename columns in each dataset
data.rename(columns=common_schema, inplace=True)
data1.rename(columns=common_schema, inplace=True)
data2.rename(columns=common_schema, inplace=True)
data3.rename(columns=common_schema, inplace=True)

# Concatenate datasets
df = pd.concat([data, data1, data2, data3], ignore_index=True)

# Display the result
print(df)


              Company  Founded Headquarter                   Sector  \
0          Aqgromalin   2019.0     Chennai                 AgriTech   
1            Krayonnz   2019.0   Bangalore                   EdTech   
2        PadCare Labs   2018.0        Pune       Hygiene management   
3               NCOME   2020.0   New Delhi                   Escrow   
4          Gramophone   2016.0      Indore                 AgriTech   
...               ...      ...         ...                      ...   
2874     Infra.Market   2015.0      Mumbai                Infratech   
2875              Oyo   2013.0    Gurugram              Hospitality   
2876       GoMechanic   2016.0       Delhi  Automobile & Technology   
2877           Spinny   2015.0       Delhi               Automobile   
2878  Ess Kay Fincorp   2015.0   Rajasthan                  Banking   

                                            Description  \
0                          Cultivating Ideas for Profit   
1     An academy-guardian-sch