# **Market Intelligence: Analyzing Unicorns and Investment Opportunities**

## **Business Understanding**

### Project Background

**NovaCapital Ventures** is a Venture Capital firm with 15 years of experience, focused on identifying and investing in innovative startups in the areas of technology, fintech and health. The company manages a diversified portfolio, with a strong presence in Latin America, and is looking to expand its investments in Brazil.

**NovaCapital's** business model involves initial funding rounds, with the aim of obtaining significant returns at advanced stages of growth, through IPOs or acquisitions.

They want us to analyze trends and provide industry insights and recommendations for future investments

### Assumptions

1. There are no duplicated companies in the dataset. However, there are companies with the same name.
2. The LinkSure Network investors are not in the dataset. Investor information was acquired externally and inserted into the dataset. 

## **Data Understanding**

### Analysis Preparation

Here we'll import all the libraries we'll use in the analysis.

In [97]:
import pandas as pd

### Data Structure

To make the analysis, they provide us with a csv file. The description of the data is as follows:

| Field            | Description                                                                              |
|------------------|------------------------------------------------------------------------------------------|
| Company          | Company Name                                                                             |
| Valuation	       | Company valuation in billions (B) of dollars                                             |
| Date Joined      | The date in which the company reached $1 billion in valuation                            |
| Industry         | Company industry                                                                         |
| City             | City the company was founded in                                                          |
| Country	       | Country the company was founded in                                                       |
| Continent	       | Continent the company was founded in                                                     |
| Year Founded     | Year the company was founded                                                             |
| Funding	       | Total amount raised across all funding rounds in billions (B) or millions (M) of dollars |
| Select Investors | Top 4 investing firms or individual investors (some have less than 4)                    |


### Data Loading And Overview

Let's load the dataset and get a brief overview of the data.

In [98]:
# Loading the dataset
df: pd.DataFrame = pd.read_csv("data/Unicorn_Companies.csv")

In [99]:
# Dataset dimensions
print(f"Number of columns: {df.shape[1]}")
print(f"Number of rows: {df.shape[0]}")

Number of columns: 10
Number of rows: 1074


In [100]:
# Overview of the first rows
df.head()

Unnamed: 0,Company,Valuation,Date Joined,Industry,City,Country,Continent,Year Founded,Funding,Select Investors
0,Bytedance,$180B,2017-04-07,Artificial intelligence,Beijing,China,Asia,2012,$8B,"Sequoia Capital China, SIG Asia Investments, S..."
1,SpaceX,$100B,2012-12-01,Other,Hawthorne,United States,North America,2002,$7B,"Founders Fund, Draper Fisher Jurvetson, Rothen..."
2,SHEIN,$100B,2018-07-03,E-commerce & direct-to-consumer,Shenzhen,China,Asia,2008,$2B,"Tiger Global Management, Sequoia Capital China..."
3,Stripe,$95B,2014-01-23,Fintech,San Francisco,United States,North America,2010,$2B,"Khosla Ventures, LowercaseCapital, capitalG"
4,Klarna,$46B,2011-12-12,Fintech,Stockholm,Sweden,Europe,2005,$4B,"Institutional Venture Partners, Sequoia Capita..."


In [101]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1074 entries, 0 to 1073
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Company           1074 non-null   object
 1   Valuation         1074 non-null   object
 2   Date Joined       1074 non-null   object
 3   Industry          1074 non-null   object
 4   City              1058 non-null   object
 5   Country           1074 non-null   object
 6   Continent         1074 non-null   object
 7   Year Founded      1074 non-null   int64 
 8   Funding           1074 non-null   object
 9   Select Investors  1073 non-null   object
dtypes: int64(1), object(9)
memory usage: 84.0+ KB


**Insights about the data**

1. The Dataset has 10 columns and 1074 rows.
2. All the columns are of type **object**, with exception of `Year Founded` that has an **int** type.
3. We have some null values in the `City` column and a null value in the `Select Investors` column.

### Data Cleaning

The goal of this step is to:
* Treat missing data.
* Check for data inconsistencies (duplicates, inconsistent values).

#### Treating Missing Data

In [102]:
# Counting the number of missing values
df.isna().sum()

Company              0
Valuation            0
Date Joined          0
Industry             0
City                16
Country              0
Continent            0
Year Founded         0
Funding              0
Select Investors     1
dtype: int64

In [103]:
# Filtering the rows where City is null
df[df['City'].isnull()]

Unnamed: 0,Company,Valuation,Date Joined,Industry,City,Country,Continent,Year Founded,Funding,Select Investors
12,FTX,$32B,2021-07-20,Fintech,,Bahamas,North America,2018,$2B,"Sequoia Capital, Thoma Bravo, Softbank"
170,HyalRoute,$4B,2020-05-26,Mobile & telecommunications,,Singapore,Asia,2015,$263M,Kuang-Chi
242,Moglix,$3B,2021-05-17,E-commerce & direct-to-consumer,,Singapore,Asia,2015,$471M,"Jungle Ventures, Accel, Venture Highway"
251,Trax,$3B,2019-07-22,Artificial intelligence,,Singapore,Asia,2010,$1B,"Hopu Investment Management, Boyu Capital, DC T..."
325,Amber Group,$3B,2021-06-21,Fintech,,Hong Kong,Asia,2015,$328M,"Tiger Global Management, Tiger Brokers, DCM Ve..."
382,Ninja Van,$2B,2021-09-27,"Supply chain, logistics, & delivery",,Singapore,Asia,2014,$975M,"B Capital Group, Monk's Hill Ventures, Dynamic..."
541,Advance Intelligence Group,$2B,2021-09-23,Artificial intelligence,,Singapore,Asia,2016,$536M,"Vision Plus Capital, GSR Ventures, ZhenFund"
811,Carousell,$1B,2021-09-15,E-commerce & direct-to-consumer,,Singapore,Asia,2012,$288M,"500 Global, Rakuten Ventures, Golden Gate Vent..."
848,Matrixport,$1B,2021-06-01,Fintech,,Singapore,Asia,2019,$100M,"Dragonfly Captial, Qiming Venture Partners, DS..."
880,bolttech,$1B,2021-07-01,Fintech,,Singapore,Asia,2018,$210M,"Mundi Ventures, Doqling Capital Partners, Acti..."


In [104]:
# Filtering the rows where Select Investors is null
df[df['Select Investors'].isnull()]

Unnamed: 0,Company,Valuation,Date Joined,Industry,City,Country,Continent,Year Founded,Funding,Select Investors
629,LinkSure Network,$1B,2015-01-01,Mobile & telecommunications,Shanghai,China,Asia,2013,$52M,


In [105]:
# Inputing the investors of LinkSure Network
linksure_investors: str = "Northern Light Venture Capital, Eight Roads Ventures"

df.loc[df['Company'] == "LinkSure Network", "Select Investors"] = linksure_investors

#### Case-Sensitive Standardization

In [106]:
# Case Standardization
df['Industry'] = df['Industry'].str.title()
df['City'] = df['City'].str.title()
df['Country'] = df['Country'].str.title()
df['Continent'] = df['Continent'].str.title()

#### Checking for Duplicates

In [107]:
# Checking for duplicated companies
print(f"Number of Companies Duplicated: {df['Company'].duplicated().sum()}")

# Checking for duplicated rows
print(f"Number of Rows Duplicated: {df.duplicated().sum()}")

Number of Companies Duplicated: 1
Number of Rows Duplicated: 0


In [108]:
# Checking the duplicated companies

df[df['Company'].duplicated(keep=False)]

Unnamed: 0,Company,Valuation,Date Joined,Industry,City,Country,Continent,Year Founded,Funding,Select Investors
40,Bolt,$11B,2018-05-29,Auto & Transportation,Tallinn,Estonia,Europe,2013,$1B,"Didi Chuxing, Diamler, TMT Investments"
44,Bolt,$11B,2021-10-08,Fintech,San Francisco,United States,North America,2014,$1B,"Activant Capital, Tribe Capital, General Atlantic"


**Insights from this step**

1. Singapore companies and some Hong Kong companies have null values in the `city` column. As this information will not affect our analysis, **we will keep these values empty**.
2. The company who doesn't have the Investors information is LinkSure Network, from China. We gather the information that **Northern Light Venture Capital and Eight Roads Ventures** are investors, so we inserted this information in the dataset.
3. The **Bolt** companies are two different companies, from different Countries. There are no duplicated records in the dataset.

### Exploratory Analysis

At this step, we will do a simple exploratory analysis of the data to understand the dataset and its values, and to see if we need to make any transformations to the data.

#### Understanding the dataset

In [109]:
print(f'Number of Companies: {df.Company.nunique()}')
print(f'Number of Industries: {df.Industry.str.upper().nunique()}')
print(f'Number of Countries: {df.Country.str.upper().nunique()}')

Number of Companies: 1073
Number of Industries: 15
Number of Countries: 46


In [110]:
df.Industry.unique()

array(['Artificial Intelligence', 'Other',
       'E-Commerce & Direct-To-Consumer', 'Fintech',
       'Internet Software & Services',
       'Supply Chain, Logistics, & Delivery', 'Consumer & Retail',
       'Data Management & Analytics', 'Edtech', 'Health', 'Hardware',
       'Auto & Transportation', 'Travel', 'Cybersecurity',
       'Mobile & Telecommunications'], dtype=object)

In [111]:
df_investors = df['Select Investors'].str.split(', ')

df_investors = df_investors.explode('Select Investors').str.strip()

print(f'Number of Investors: {df_investors.nunique()}')

Number of Investors: 1258


**Data Insights**

1. The Dataset has 10 columns and 1074 rows.
2. All the columns are of type **object**, with exception of `Year Founded` that has an **int** type.
3. We have some null values in the `City` column and a null value in the `Select Investors` column.

**Business Insights**
1. There are 1073 unicorn companies from 16 industries in 46 different countries.
2. 1258 different investors have invested in at least one company.

### Data Preparation