# Unicorn Startups Analysis
### Naman Goyal

## About the Dataset
The Unicorn Startups Dataset is a comprehensive and dynamic collection of information that showcases the fascinating realm of unicorn startups. Unicorns refer to privately-held companies that have achieved a remarkable valuation of $1 billion or more, signifying their exceptional growth and market influence. <br>
This dataset offers an extensive range of details about each unicorn startup, providing insights into their industry, founding year, location, key investors, and current valuation. It encompasses a diverse array of sectors, including technology, finance, e-commerce, healthcare, transportation, and more, capturing the innovation and disruption prevalent in today's business landscape.

## Importing the basic libraries and Reading the Dataset

In [27]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [28]:
data=pd.read_csv('Unicorn.csv')

## Data description

In [29]:
data

Unnamed: 0,Company,Valuation ($B),Date Joined,Country,City,Industry,Investors
0,ByteDance,$140,04-07-2017,China,Beijing,Artificial intelligence,"Sequoia Capital China, SIG Asia Investments, S..."
1,SpaceX,$127,12-01-2012,United States,Hawthorne,Other,"Founders Fund, Draper Fisher Jurvetson, Rothen..."
2,SHEIN,$100,07-03-2018,China,Shenzhen,E-commerce & direct-to-consumer,"Tiger Global Management, Sequoia Capital China..."
3,Stripe,$95,1/23/2014,United States,San Francisco,Fintech,"Khosla Ventures, LowercaseCapital, capitalG"
4,Canva,$40,01-08-2018,Australia,Surry Hills,Internet software & services,"Sequoia Capital China, Blackbird Ventures, Mat..."
...,...,...,...,...,...,...,...
1181,LeadSquared,$1,6/21/2022,India,Bengaluru,Internet software & services,"Gaja Capital Partners, Stakeboat Capital, West..."
1182,FourKites,$1,6/21/2022,United States,Chicago,"Supply chain, logistics, & delivery","Hyde Park Venture Partners, Bain Capital Ventu..."
1183,VulcanForms,$1,07-05-2022,United States,Burlington,"Supply chain, logistics, & delivery","Eclipse Ventures, D1 Capital Partners, Industr..."
1184,SingleStore,$1,07-12-2022,United States,San Francisco,Data management & analytics,"Google Ventures, Accel, Data Collective"


In [30]:
data.shape

(1186, 7)

In [31]:
data.columns

Index(['Company', 'Valuation ($B)', 'Date Joined', 'Country', 'City',
       'Industry', 'Investors'],
      dtype='object')

In [32]:
data.dtypes

Company           object
Valuation ($B)    object
Date Joined       object
Country           object
City              object
Industry          object
Investors         object
dtype: object

## Data Cleaning
### Checking the null and duplicate values

In [33]:
data['Industry'].unique()

array(['Artificial intelligence', 'Other',
       'E-commerce & direct-to-consumer', 'Fintech',
       'Internet software & services',
       'Supply chain, logistics, & delivery',
       'Data management & analytics',
       'Sequoia Capital, Thoma Bravo, Softbank', 'Edtech', 'Hardware',
       'Consumer & retail', 'Health', 'Auto & transportation',
       'Cybersecurity', 'Mobile & telecommunications', 'Travel',
       'Kuang-Chi',
       'Tiger Global Management, Tiger Brokers, DCM Ventures',
       'Jungle Ventures, Accel, Venture Highway',
       'Artificial Intelligence', 'GIC. Apis Partners, Insight Partners',
       'Vision Plus Capital, GSR Ventures, ZhenFund',
       'Hopu Investment Management, Boyu Capital, DC Thomson Ventures',
       'Internet', '500 Global, Rakuten Ventures, Golden Gate Ventures',
       'Sequoia Capital China, ING, Alibaba Entrepreneurs Fund',
       'Sequoia Capital China, Shunwei Capital Partners, Qualgro',
       'Dragonfly Captial, Qiming Venture Pa

In [34]:
missing_Investor=data['Investors'].isnull()

In [35]:
data.loc[missing_Investor, 'Investors'] = data.loc[missing_Investor, 'Industry']
data.loc[missing_Investor, 'Industry'] = data.loc[missing_Investor, 'City']
data.loc[missing_Investor, 'City'] = np.nan

In [36]:
data['Industry'].unique()

array(['Artificial intelligence', 'Other',
       'E-commerce & direct-to-consumer', 'Fintech',
       'Internet software & services',
       'Supply chain, logistics, & delivery',
       'Data management & analytics', 'Edtech', 'Hardware',
       'Consumer & retail', 'Health', 'Auto & transportation',
       'Cybersecurity', 'Mobile & telecommunications', 'Travel',
       'Artificial Intelligence', 'Internet', 'Shanghai'], dtype=object)

In [37]:
data.replace({'Artificial intelligence':'Artificial Intelligence'},inplace=True)

In [38]:
data['Industry'].unique()

array(['Artificial Intelligence', 'Other',
       'E-commerce & direct-to-consumer', 'Fintech',
       'Internet software & services',
       'Supply chain, logistics, & delivery',
       'Data management & analytics', 'Edtech', 'Hardware',
       'Consumer & retail', 'Health', 'Auto & transportation',
       'Cybersecurity', 'Mobile & telecommunications', 'Travel',
       'Internet', 'Shanghai'], dtype=object)

In [39]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1186 entries, 0 to 1185
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Company         1186 non-null   object
 1   Valuation ($B)  1186 non-null   object
 2   Date Joined     1186 non-null   object
 3   Country         1186 non-null   object
 4   City            1168 non-null   object
 5   Industry        1186 non-null   object
 6   Investors       1186 non-null   object
dtypes: object(7)
memory usage: 65.0+ KB


### Converting column "Date Joined" to "Year"

In [40]:
data['Year']=pd.to_datetime(data['Date Joined']).dt.year
data=data.drop('Date Joined',axis=1)
data['Valuation ($B)']=data['Valuation ($B)'].str.replace('$', '',regex=True)
data['Valuation ($B)']=data['Valuation ($B)'].astype(float)

In [41]:
data

Unnamed: 0,Company,Valuation ($B),Country,City,Industry,Investors,Year
0,ByteDance,140.0,China,Beijing,Artificial Intelligence,"Sequoia Capital China, SIG Asia Investments, S...",2017
1,SpaceX,127.0,United States,Hawthorne,Other,"Founders Fund, Draper Fisher Jurvetson, Rothen...",2012
2,SHEIN,100.0,China,Shenzhen,E-commerce & direct-to-consumer,"Tiger Global Management, Sequoia Capital China...",2018
3,Stripe,95.0,United States,San Francisco,Fintech,"Khosla Ventures, LowercaseCapital, capitalG",2014
4,Canva,40.0,Australia,Surry Hills,Internet software & services,"Sequoia Capital China, Blackbird Ventures, Mat...",2018
...,...,...,...,...,...,...,...
1181,LeadSquared,1.0,India,Bengaluru,Internet software & services,"Gaja Capital Partners, Stakeboat Capital, West...",2022
1182,FourKites,1.0,United States,Chicago,"Supply chain, logistics, & delivery","Hyde Park Venture Partners, Bain Capital Ventu...",2022
1183,VulcanForms,1.0,United States,Burlington,"Supply chain, logistics, & delivery","Eclipse Ventures, D1 Capital Partners, Industr...",2022
1184,SingleStore,1.0,United States,San Francisco,Data management & analytics,"Google Ventures, Accel, Data Collective",2022


### Splitting column "Investors"

In [42]:
data=pd.concat([data, data['Investors'].str.split(',', expand=True)], axis=1)
data=data.drop('Investors', axis=1)
data=data.rename(columns={0: 'Investor1', 1: 'Investor2',2:'Investor3',3:'Investor4'})

In [43]:
data

Unnamed: 0,Company,Valuation ($B),Country,City,Industry,Year,Investor1,Investor2,Investor3,Investor4
0,ByteDance,140.0,China,Beijing,Artificial Intelligence,2017,Sequoia Capital China,SIG Asia Investments,Sina Weibo,Softbank Group
1,SpaceX,127.0,United States,Hawthorne,Other,2012,Founders Fund,Draper Fisher Jurvetson,Rothenberg Ventures,
2,SHEIN,100.0,China,Shenzhen,E-commerce & direct-to-consumer,2018,Tiger Global Management,Sequoia Capital China,Shunwei Capital Partners,
3,Stripe,95.0,United States,San Francisco,Fintech,2014,Khosla Ventures,LowercaseCapital,capitalG,
4,Canva,40.0,Australia,Surry Hills,Internet software & services,2018,Sequoia Capital China,Blackbird Ventures,Matrix Partners,
...,...,...,...,...,...,...,...,...,...,...
1181,LeadSquared,1.0,India,Bengaluru,Internet software & services,2022,Gaja Capital Partners,Stakeboat Capital,WestBridge Capital,
1182,FourKites,1.0,United States,Chicago,"Supply chain, logistics, & delivery",2022,Hyde Park Venture Partners,Bain Capital Ventures,Hyde Park Angels,
1183,VulcanForms,1.0,United States,Burlington,"Supply chain, logistics, & delivery",2022,Eclipse Ventures,D1 Capital Partners,Industry Ventures,
1184,SingleStore,1.0,United States,San Francisco,Data management & analytics,2022,Google Ventures,Accel,Data Collective,


data.Industry.value_counts()

### Converting data type of Valuation to float

In [44]:
data['Valuation ($B)']=data['Valuation ($B)'].astype(str)
data['Valuation ($B)']=data['Valuation ($B)'].str.replace('$', '',regex=True)
data['Valuation ($B)']=data['Valuation ($B)'].astype(float)

In [45]:
data.describe()

Unnamed: 0,Valuation ($B),Year
count,1186.0,1186.0
mean,3.251282,2020.12226
std,7.641574,1.984172
min,1.0,2007.0
25%,1.1,2019.0
50%,1.6,2021.0
75%,3.0,2021.0
max,140.0,2022.0


In [46]:
data

Unnamed: 0,Company,Valuation ($B),Country,City,Industry,Year,Investor1,Investor2,Investor3,Investor4
0,ByteDance,140.0,China,Beijing,Artificial Intelligence,2017,Sequoia Capital China,SIG Asia Investments,Sina Weibo,Softbank Group
1,SpaceX,127.0,United States,Hawthorne,Other,2012,Founders Fund,Draper Fisher Jurvetson,Rothenberg Ventures,
2,SHEIN,100.0,China,Shenzhen,E-commerce & direct-to-consumer,2018,Tiger Global Management,Sequoia Capital China,Shunwei Capital Partners,
3,Stripe,95.0,United States,San Francisco,Fintech,2014,Khosla Ventures,LowercaseCapital,capitalG,
4,Canva,40.0,Australia,Surry Hills,Internet software & services,2018,Sequoia Capital China,Blackbird Ventures,Matrix Partners,
...,...,...,...,...,...,...,...,...,...,...
1181,LeadSquared,1.0,India,Bengaluru,Internet software & services,2022,Gaja Capital Partners,Stakeboat Capital,WestBridge Capital,
1182,FourKites,1.0,United States,Chicago,"Supply chain, logistics, & delivery",2022,Hyde Park Venture Partners,Bain Capital Ventures,Hyde Park Angels,
1183,VulcanForms,1.0,United States,Burlington,"Supply chain, logistics, & delivery",2022,Eclipse Ventures,D1 Capital Partners,Industry Ventures,
1184,SingleStore,1.0,United States,San Francisco,Data management & analytics,2022,Google Ventures,Accel,Data Collective,


In [47]:
data.to_csv('Cleaned_Unicorn.csv',index=False)