# Tech Company Fundings (Since 2020) - Exploration

I recently posted [this](https://www.kaggle.com/shivamb/tech-company-fundings-2020-onwards) dataset about technology company fundings since 2020. The dataset contains information from January 2020 and contains 3200+ company funding information. The data attributes include - company name, website, funding stage, funding date, funding amount in US Dollars, and the region. This notebook contains exploration of the dataset to find some intutive insights. 

### Dataset: [Tech Company Fundings](https://www.kaggle.com/shivamb/tech-company-fundings-2020-onwards)

Last Date Update: Till September, 2021

### 1. Overview of the dataset

In [1]:
import pandas as pd 
import seaborn as sns 
import plotly.express as px
import matplotlib.pyplot as plt 
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import numpy as np 
pd.set_option('display.float_format', lambda x: '%.0f' % x)

df = pd.read_csv("/kaggle/input/tech-company-fundings-2020-onwards/tech_fundings.csv")

df['Funding Date Converted'] = "01-" +  df['Funding Date']
df['Funding Date Converted'] = pd.to_datetime(df['Funding Date Converted'])
df['Funding Amount (USD)'] = df['Funding Amount (USD)'].apply(lambda x : float(x) if x != "Unknown" else np.nan)
df = df[df['Company'] != 'WestConnex']
df.head()

> **Companies:** The dataset contains 3575 company fundings since January 2020 - September 2021. There are 3224 unique companies.  
> **Regions:** There are fundings across 69 different countries, with maximum from USA, UK, and India.  
> **Verticals:** There are 140+ different verticals (categories) in the tech companies, with most common being B2B platforms.  
> **Stage:** 20+ different funding stages with average funding amount of USD 5.2M and maximum funding amount of USD 1.9B

### 2. Univariate Insights 

Lets perform some univariate analysis to start with. First, we can generate plot top countries, top categories etc by maximum number of fundings. 

In [2]:
def vbar(col):
    vc = df[col].value_counts()
    c = {
        'x' : list(vc.values)[:15][::-1], 
        'y' : list(vc.index)[:15][::-1],
        'title' : "Top "+col+"s with most fundings",
    }

    trace = go.Bar(y=[_ + "    " for _ in c['y']], x=c['x'], orientation="h", marker=dict(color="#34ebc3"))
    return trace 

trace1 = vbar('Region') 
trace2 = vbar('Vertical') 
trace3 = vbar('Funding Stage') 

vc = df['Funding Date Converted'].value_counts()
vc = vc.reset_index().sort_values('index').rename(columns = {  'index' : "Month", 'Funding Date Converted' : "# Fundings" })
labels = list(vc['Month'])
values = list(vc['# Fundings'])
trace4 = go.Bar(x=labels, y=values, orientation="v", marker=dict(color="#34ebc3"))

titles = ['Top Regions with most fundings', 'Top Verticals with most fundings', 'Funding Stages Distribution', 'Fundings over time']
fig = make_subplots(rows=2, cols=2, subplot_titles = titles)

fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=1, col=2)
fig.add_trace(trace3, row=2, col=1)
fig.add_trace(trace4, row=2, col=2)

fig.update_layout(height=800, paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)', showlegend = False)
fig.show()

In [3]:
fig = px.histogram(x=np.log(df['Funding Amount (USD)']))
fig.update_layout(title="Distribution of Log(Funding Amount)", 
                       paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
                       xaxis_title="Log Scale - Funding Amount", yaxis_title="Counts")
fig.show()

> - United States had maximum number of tech fundings (1888), ie. about 58%, followed by United Kingdom (8%) and India (4%).  
> - Most of the companies raised their Series A rounds since 2020 (ie. about 875 companies)  
> - B2B Softwares, Artificial Intelligence, and Cloud remained the most popular categories with maximum number of fundings  
> - 2020 reletaively had less number of fundings as compared to first half of 2021, with max observed in May 2021 (about 330)

In [4]:
vc = df.sort_values("Funding Amount (USD)", ascending = False).head(20)
labels = list(vc['Company'])[:25][::-1]
categories = list(vc['Vertical'])[:25][::-1]

values = list(vc['Funding Amount (USD)'])[:25][::-1]
values1 = [int(_  / 1000000) for _ in values]
values1 = ["$" + str(round(_/1000, 2)) + "B" if _ >= 1000 else str(_) + "M" for _ in values1]

labels = [_ + "  | " +categories[i]+ " |  " for i, _ in enumerate(labels)]
labels = [_+" ("+values1[i]+") " for i,_ in enumerate(labels)]

trace5 = go.Bar(y=labels, x=values, orientation="h", marker=dict(color="#34ebc3"))
layout = go.Layout(title="Top Tech Fundings since Jan 2020", 
                       paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
                       xaxis_title="", yaxis_title="", width=650)
fig = go.Figure([trace5], layout=layout)
fig.update_xaxes(tickangle=45, tickfont=dict(color='crimson'))
fig.update_yaxes(tickangle=0, tickfont=dict(color='crimson'))
fig.show()

In [5]:
a = df.groupby("Vertical").agg({"Funding Amount (USD)" : "sum"})
a = a.reset_index()
a['funding'] = a['Funding Amount (USD)'].apply(lambda x : round(x / 1000000000, 2)) 
a = a.sort_values('funding', ascending = False)
a = a.head(15)


labels = list(a['Vertical'][::-1])
values = list(a['funding'][::-1])
labels = [_ + " ($"+str(values[i])+"B)    " for i,_ in enumerate(labels)]

trace5 = go.Bar(y=labels, x=values, orientation="h", marker=dict(color="#34ebc3"))
layout = go.Layout(title="Total Amount - Tech Fundings by Vertical", 
                       paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
                       xaxis_title="", yaxis_title="", width=650)
fig = go.Figure([trace5], layout=layout)
fig.update_xaxes(tickangle=45, tickfont=dict(color='crimson'))
fig.update_yaxes(tickangle=0, tickfont=dict(color='crimson'))
fig.show()

In [6]:
a = df.groupby("Region").agg({"Funding Amount (USD)" : "sum"})
a = a.reset_index()
a['funding'] = a['Funding Amount (USD)'].apply(lambda x : round(x / 1000000000, 2)) 
a = a.sort_values('funding', ascending = False)
a = a.head(15)


labels = list(a['Region'][::-1])
values = list(a['funding'][::-1])
labels = [_ + " ($"+str(values[i])+"B)    " for i,_ in enumerate(labels)]

trace5 = go.Bar(y=labels, x=values, orientation="h", marker=dict(color="#34ebc3"))
layout = go.Layout(title="Total Amount - Tech Fundings by Region", 
                       paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
                       xaxis_title="", yaxis_title="", width=650)
fig = go.Figure([trace5], layout=layout)
fig.update_xaxes(tickangle=45, tickfont=dict(color='crimson'))
fig.update_yaxes(tickangle=0, tickfont=dict(color='crimson'))
fig.show()



In [7]:
a = df.groupby("Funding Date Converted").agg({"Funding Amount (USD)" : "sum"})
a = a.reset_index()
a['funding'] = a['Funding Amount (USD)'].apply(lambda x : round(x / 1000000000, 2)) 
# a = a.sort_values('funding', ascending = False)
# a = a.head(15)


labels = list(a['Funding Date Converted'][::-1])
values = list(a['funding'][::-1])
# labels = [_ + " ($"+str(values[i])+"B)    " for i,_ in enumerate(labels)]

trace5 = go.Bar(x=labels, y=values, orientation="v", marker=dict(color="#34ebc3"))
layout = go.Layout(title="Total Amount - Tech Fundings by Date", 
                       paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
                       xaxis_title="", yaxis_title="", height=450)
fig = go.Figure([trace5], layout=layout)
# fig.update_xaxes(tickangle=45, tickfont=dict(color='crimson'))
fig.update_yaxes(tickangle=0, tickfont=dict(color='crimson'))
fig.show()



### 3. Bivariate Insights 

Let's now look at fundings by multiple categories - by Regions, by Stages, by Date

In [8]:
def convert_to_size(x):
    if x < 10:
        return 6
    elif x < 50:
        return 10
    elif x < 100:
        return 15
    elif x < 200:
        return 18 
    else:
        return 25

def bubble(col1, col2):
    vc = df.groupby([col1, col2]).agg({"index" : "count"}).reset_index().sort_values("index", ascending = False)
    vc = vc.rename(columns = {"index" : "# Fundings"})
    vc = vc.sort_values(col2)

    vc['size'] = vc['# Fundings'].apply(lambda x : convert_to_size(x))
    trace1 = go.Scatter(x=vc[col1], y=vc[col2], mode='markers', marker=dict(colorscale='icefire', size=vc['size'], 
                                                                                    showscale=True, color=vc['# Fundings']))
    layout = go.Layout(title="Fundings by "+col1+" and " + col2, 
                       paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
                       xaxis_title="", yaxis_title="", height=800)
    fig = go.Figure([trace1], layout=layout)
    fig.update_xaxes(tickangle=45, tickfont=dict(color='crimson'))
    fig.update_yaxes(tickangle=0, tickfont=dict(color='crimson'))
    fig.show()
    
bubble('Region', 'Vertical')

In [9]:
bubble('Funding Stage', 'Region')

In [10]:
bubble('Funding Stage', 'Vertical')

In [11]:
bubble('Funding Date Converted', 'Funding Stage')

In [12]:
bubble('Funding Date Converted', 'Vertical')

### 4. Fundings in USA, UK, and India

Let's take a look at tech-fundings in USA, UK and India in detail 

In [13]:
usa_df = df[df['Region'] == 'United States']
aggdf = usa_df.groupby(['Vertical', 'Funding Stage']).agg({"index"  : "count"}).reset_index()
aggdf = aggdf.rename(columns = {"index" : "Num Fundings"})

fig = px.treemap(aggdf, path=[px.Constant("United States"), 'Vertical', 'Funding Stage'], values='Num Fundings')
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()

In [14]:
usa_df = df[df['Region'] == 'United Kingdom']
aggdf = usa_df.groupby(['Vertical', 'Funding Stage']).agg({"index"  : "count"}).reset_index()
aggdf = aggdf.rename(columns = {"index" : "Num Fundings"})

fig = px.treemap(aggdf, path=[px.Constant("United Kingdom"), 'Vertical', 'Funding Stage'], values='Num Fundings')
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()

In [15]:
usa_df = df[df['Region'] == 'India']
aggdf = usa_df.groupby(['Vertical', 'Funding Stage']).agg({"index"  : "count"}).reset_index()
aggdf = aggdf.rename(columns = {"index" : "Num Fundings"})

fig = px.treemap(aggdf, path=[px.Constant("India"), 'Vertical', 'Funding Stage'], values='Num Fundings')
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()

### 5. Comparing USA vs Other Regions

In [16]:
df['isUSA'] = df['Region'].apply(lambda x : 1 if x == 'United States' else 0)

aggdf = df.groupby(["isUSA", "Vertical"]).agg({"index" : "count", "Funding Amount (USD)" : "mean"}).reset_index()
com = pd.pivot_table(aggdf, values = 'index', index=['Vertical'], columns = 'isUSA').reset_index().dropna()
com['total'] = com[0] + com[1]
com = com.sort_values('total', ascending = False).head(15)
com['Vertical'] = com['Vertical'].apply(lambda x : x + "    ")
fig = go.Figure(data=[
    go.Bar(name='USA Tech Fundings', x=com[0][::-1], y=com['Vertical'][::-1], orientation = 'h', marker=dict(color='red', opacity=0.6)),
    go.Bar(name='Non USA Tech Fundings', x=com[1][::-1], y=com['Vertical'][::-1], orientation = 'h', marker=dict(color='green', opacity=0.6))
])
fig.update_layout(barmode='group', height=700, paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)', title = 'Comparison : Count of USA vs Non USA Fundings in Tech')
fig.show()

In [17]:
aggdf = df.groupby(["isUSA", "Funding Stage"]).agg({"index" : "count", "Funding Amount (USD)" : "mean"}).reset_index()
com = pd.pivot_table(aggdf, values = 'index', index=['Funding Stage'], columns = 'isUSA').reset_index().dropna()
com['total'] = com[0] + com[1]
com = com.sort_values('total', ascending = False).head(15)
com['Funding Stage'] = com['Funding Stage'].apply(lambda x : x + "    ")
fig = go.Figure(data=[
    go.Bar(name='USA Tech Fundings', x=com[0][::-1], y=com['Funding Stage'][::-1], orientation = 'h', marker=dict(color='red', opacity=0.6)),
    go.Bar(name='Non USA Tech Fundings', x=com[1][::-1], y=com['Funding Stage'][::-1], orientation = 'h', marker=dict(color='green', opacity=0.6))
])
fig.update_layout(barmode='group', height=700, paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)', title = 'Comparison : Count of USA vs Non USA Fundings in Tech')
fig.show()

In [18]:
aggdf = df.groupby(["isUSA", "Funding Date"]).agg({"index" : "count", "Funding Amount (USD)" : "mean"}).reset_index()
com = pd.pivot_table(aggdf, values = 'index', index=['Funding Date'], columns = 'isUSA').reset_index().dropna()
com['total'] = com[0] + com[1]
com = com.sort_values('total', ascending = False)
com['Funding Date'] = com['Funding Date'].apply(lambda x : str(x) + "    ")
fig = go.Figure(data=[
    go.Bar(name='USA Tech Fundings', x=com[0][::-1], y=com['Funding Date'][::-1], orientation = 'h', marker=dict(color='red', opacity=0.6)),
    go.Bar(name='Non USA Tech Fundings', x=com[1][::-1], y=com['Funding Date'][::-1], orientation = 'h', marker=dict(color='green', opacity=0.6))
])
fig.update_layout(barmode='group', height=700, paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)', title = 'Comparison : Count of USA vs Non USA Fundings in Tech')
fig.show()

- More to add | Hope you like | Feel free to checkout the dataset : https://www.kaggle.com/shivamb/tech-company-fundings-2020-onwards/