<br><br>


<h1 style="text-align: center;"> Startup investment in the Nordic countries </h1>

<p style="text-align: center;"> by Omar Ismail, Arda Acikalin, Yemeskabeba Gessesse and Dmitry Tolonen </p> 

<br>

<img src="https://storage.googleapis.com/kaggle-datasets-images/517018/952128/223d16c25beb4f366ec5fa21e801deda/dataset-cover.jpg?t=2020-02-17-22-01-31" alt="Kaggle: Startup investment in the Nordic countries">

<br>

Based on the <b>StartUp Investments (Crunchbase) dataset</b> - Information about startup companies and investment via Crunchbase 
https://www.kaggle.com/arindam235/startup-investments-crunchbase

<br><br><br>    

        
### <u>Table of contents:</u>


    
[Introduction to the dataset](#Introduction-to-the-dataset)

[Challenges of the dataset and Jupyter collaboration](#Challenges-of-the-dataset-and-Jupyter-collaboration)

[Focusing our study: Startup investment in the Nordic countries](#Focusing-our-study:-Startup-investment-in-the-Nordic-countries)

[Data clean up stage](#Data-clean-up-stage)

[Visualisation: Status, scale and popular market segments for investors](#Visualisation:-Status,-scale-and-popular-market-segments-for-investors) 

[Visualisation: Startup status, Public grants, debt and a word about Finland](#Visualisation:-Startup-status,-Public-grants,-debt-and-a-word-about-Finland)
    
[Visualisation: Success stories, funding rounds and private funding](#Visualisation:-Success-stories,-funding-rounds-and-private-funding)

[In closing: conclusions and future research](#In-closing:-conclusions-and-future-opportunities-for-research)

<br>

### **Introduction to the dataset**

**StartUp Investments (Crunchbase):**
https://www.kaggle.com/arindam235/startup-investments-crunchbase


- The dataset we used was downloaded from Kaggle but the source of the data is from Crunchbase, which is described as "a platform for finding business information about private and public companies." 

<br>

**The dataset consists mostly of financial data relating to startup investment, such as:**

- startup name 
- url
- market segment/category (e.g. software, biotech, health and fitness, real estate, search, mobile, education, transportation, finance etc.)
- status (operating, acquired, closed)
- location: country, city, region
- funding: total funding in usd, type of funding (public grant, seed, angel, debt taken, venture capital & so-called funding rounds)


<br>

- The Kaggle page description indicates an interest in seeing whether subsequent rounds of investment help a company to move to a company status of 'operating/closed/aquired', presumably from a status of starting up or seed/grant/angel funding. At face level, this sounds like a obvious description, but we were interested in digging deeper into the data relating to the industries that could possibly employ us and funders that could possibly finance any startup ideas of ours.


In [None]:
# Importing necessary Python packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

### **Challenges of the dataset and Jupyter collaboration**
- The dataset is fairly large at 39 columns and 50k rows, so we knew a lot of the data would not readily serve our purposes. Also there was some data that would be redundant for us, duplicating date data etc. This we would tackle in our choices tof focus on a subset of countries and columns as well as in the clean up stage. 
- In researching collaboration methods for our group, we landed on using Google's Colab tools, which is an online Jupyter environment linked to your Google account. We used this, Hangouts and Whatsapp for communication and had set up a Google Slides doc in case we had to present in Powerpoint/Slides format. 
- Although Colab has proven to be reliable, there have been some hiccups and therefore, we mainly used our Colab Jupyter document as a central repository of our current/overnight work as well as a sketchbook while meeting in Google Hangouts. We did our main data analysis on our own Jupyter environments on our computers. 
- Another thing to note about Colab: we stored our Colab work in Dmitry's Google Drive, which required frequent authentication for every user, so to solve this issue, the CSV file was moved to a website (dmitrytolonen.com in the read_csv cell), as this avoided the authentication issue and allowed everyone better access to the work file. 
- Finally, the dataset lacked some data to which we would have liked to have access, such as more data about the entities making acquisitions, whether they were large, rich countries such as the US, China etc. or whether the Nordic IP remained in the region (with its potential employment and taxation ramifications).
​

In [None]:
pd.options.mode.chained_assignment = None
pd.set_option('display.max_column',None)

In [None]:
investments = pd.read_csv('/kaggle/input/startup-investments-crunchbase/investments_VC.csv', encoding = "ISO-8859-1")

### **Focusing our study: Startup investment in the Nordic countries**

Ok, let's get to work! Well, we have too much data at our hands, so first we had a look at the structure and brief contents of the data, such as the columns. We soon chose to select a subset of columns along our focus on the Nordics, avoiding some overlap especially in the date-related columns.

In [None]:
investments.head()

# for a quick glance at the material's first five rows.

In [None]:
investments.info()

# using the Pandas info function, we get a summary of the dataframe's contents, 
# eg. dtypes, columns and their names, entries and non-null values.


In [None]:
investments.describe()
# for some statistics on the dataframe's non-null entries (49k+) as well as its the mean, min, max values etc. - how the data is distributed. 

In [None]:
# Choosing the most suitable columns for our study
investments = investments[['name', ' market ',
       ' funding_total_usd ', 'status', 'country_code', 'region',
       'city', 'funding_rounds', 'founded_at','first_funding_at',
       'last_funding_at', 'seed', 'venture', 'equity_crowdfunding',
       'undisclosed', 'convertible_note', 'debt_financing', 'angel', 'grant',
       'private_equity', 'post_ipo_equity', 'post_ipo_debt',
       'secondary_market', 'product_crowdfunding', 'round_A', 'round_B',
       'round_C', 'round_D', 'round_E', 'round_F', 'round_G', 'round_H']]

In [None]:
# Further limiting our dataset to only nordic countries and setting the index from the default to 'country_code'
investments = investments.set_index('country_code')

In [None]:
# Using the Pandas loc function to access specific labelled rows and columns 
investments = investments.loc[['FIN', 'SWE', 'NOR', 'DNK', 'ISL'], :]

In [None]:
# For a random sample from the selected subset

investments.sample(10)

In [None]:
investments['round_F'].unique()
# We were interested in seeing whether there is much - or any - useful data in round_F - or, actually, after round_C

In [None]:
investments.head()

In [None]:
investments.reset_index()

### **Data clean up stage**

- At this stage, we wanted to make sure some of the spaces, commas, dtypes, date types and decimal places were correct and usable for our purposes. We also dropped columns with zero values (in particular, funding rounds) and grouped countries by country_code.
- We used the replace function for commas and spaces, pd.to.datetime function for date changes and groupby for grouping by country_code label and, finally, the drop function for dropping columns with zero values. 


In [None]:
#cleaning column names and reformatting the data
#Column names had some empty spaces
investments.columns = investments.columns.str.replace(' ','')

In [None]:
#removed the commas from the total funding
investments.funding_total_usd = investments.funding_total_usd.str.replace(',','')

In [None]:
#it was string, now it is a float.
investments.funding_total_usd = pd.to_numeric(investments.funding_total_usd, errors='coerce')

In [None]:
# changing the data type for column founded_at to datetime type
investments['founded_at'] = pd.to_datetime(investments['founded_at'], errors = 'coerce' )

In [None]:
# changing the data type for column first_funding_at to datetime type
investments.first_funding_at = pd.to_datetime(investments.first_funding_at, format='%Y/%m/%d', errors='coerce')

In [None]:
# changing the data type for column first_funding_at to datetime type
investments.first_funding_at = pd.to_datetime(investments.first_funding_at, format='%Y/%m/%d', errors='coerce')

In [None]:
# Now we can groupby country code.
investments.groupby('country_code').mean()

In [None]:
# deleting the columns round_G and round_H because all the countries have zero values
investments = investments.drop(["round_G", "round_H"], axis=1)

In [None]:
# rounding the mean values to 4 decimal places
investments.groupby('country_code').mean().round(4)

In [None]:
# rounding the values of the data frame to 4 decimal places
investments= investments.round(4)
investments

In [None]:
# We discuss whether to keep the region column instead of the city column, 
# as 'region' is clearer and less complex when we run data analysis

investments['region'].unique()

In [None]:
# So, our option would be to drop the city column as it DOES ADD lots of complex data when we run data analysis

investments['city'].unique()

In [None]:
# We decide to go with dropping the city column for clarity's sake.
investments = investments.drop(["city"], axis=1)


### **Visualisation: Status, scale and popular market segments for investors**
- Here's where we get into the fun part. In both tabular and visual form, we can inspect the data we have and make some observations and connections from our selected statistics. 
- We move closer into our central focus and compare the Nordic countries in a number of areas.

In [None]:
investments.head()

In [None]:
plt.subplots(figsize=(20,15))

sns.heatmap(investments.corr(), annot=True, linewidth=0.5);

# This heat map serves to give us a bigger picture of our investments data. 

In [None]:
# Ungroup dataset

investments = investments.reset_index(level='country_code')

In [None]:
# Startup etablishment for the past 20 years in Nordic countries (NC)

plt.rcParams['figure.figsize'] = 15,6
investments['name'].groupby(investments["founded_at"].dt.year).count().tail(20).plot(kind="bar")

ax = plt.axes()        
ax.yaxis.grid()
plt.ylabel('Count')
plt.title("Founded distribution ", fontdict=None, position= [0.48,1.05], size = 'x-large')
plt.show()

In [None]:
# The total number of startups in each Nordic country

investments['country_code'].value_counts()

In [None]:
# The total number of startups in each Nordic country

plt.figure(figsize=(10,5))

sns.barplot(x=investments['country_code'].value_counts(), y=investments['country_code'].value_counts().index, palette='Reds_d')

ax = plt.axes()        
ax.xaxis.grid()
plt.xlabel('Number of startups')
plt.ylabel('Nordic Countries')
plt.show()

In [None]:
# Top 10 startup status based on market sector

operating = investments[investments.status == 'operating']
acquired = investments[investments.status == 'acquired']
closed = investments[investments.status == 'closed']

In [None]:
operating_count  = operating['market'].value_counts()
operating_count = operating_count[:10,]

print('Operating')
print(operating_count)

In [None]:
acquired_count  = acquired['market'].value_counts()
acquired_count = acquired_count[:10,]

print('Acquired')
print(acquired_count)

In [None]:
closed_count  = closed['market'].value_counts()
closed_count = closed_count[:10,]

print('Closed')
print(closed_count)

In [None]:
# Startup status for each Nordic country

startup = investments.groupby('country_code').status.value_counts()

startup

In [None]:
# Startup status for each Nordic country for bar plot
startup1 = investments.groupby('country_code').status.value_counts().reset_index(name='counts')
sns.catplot(x="country_code", y="counts", hue="status", kind="bar", data=startup1, height=8.27, aspect=11.7/8.27)

In [None]:
# Relationship between operating, acquired and closed startups in each Nordic country - normalised to a 100 percent bar chart 

import matplotlib.ticker as mtick

investments.groupby(['country_code','status']).size().groupby(level=0).apply(
    lambda x: 100 * x / x.sum()
).unstack().plot(kind='bar',stacked=True)

plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter())
plt.legend(bbox_to_anchor=(1,1), frameon = True, fancybox = True, framealpha = 0.95, shadow = True, 
           borderpad = 1)
plt.show()

In [None]:
# Status of Nordic startups in donut form

chart = startup.plot(kind="pie", figsize=(20,15), autopct="%1.0f%%", rotatelabels=True )
cen_cir = plt.Circle((0,0),0.70, fc='w')
plt.gcf().gca().add_artist(cen_cir)

chart.set_ylabel('')
plt.title("Status of nordic startups", loc='left')
plt.show()

In [None]:
# Most funded startups/companies in Nordic countries

most_funded = investments.nlargest(20, ['funding_total_usd'])
most_funded

In [None]:
Spotify_founded_year = investments['founded_at'][investments['name']=="Spotify"].dt.year.values[0]
Symphogen_founded_year  = investments['founded_at'][investments['name']=="Symphogen"].dt.year.values[0]
Klarna_founded_year = investments['founded_at'][investments['name']=="Klarna"].dt.year.values[0]
Supercell_founded_year  = investments['founded_at'][investments['name']=="Supercell"].dt.year.values[0]
Rovio_founded_year  = investments['founded_at'][investments['name']=="Rovio Entertainment"].dt.year.values[0]

In [None]:
# Comparison of founding dates selected, successful Nordic startups

plt.rcParams['figure.figsize'] = 15,6
investments['name'][investments["founded_at"].dt.year >= 1995].groupby(investments["founded_at"].dt.year).count().plot(kind="line")
plt.ylabel('Count')

plt.axvline(Spotify_founded_year,color='blue',linestyle ="--")
plt.text(Spotify_founded_year+0.15, 50,"Spotify \n (2006)")

plt.axvline(Symphogen_founded_year,color='black',linestyle ="--")
plt.text(Symphogen_founded_year+0.15, 20,"Symphogen \n(2000)")

plt.axvline(Klarna_founded_year,color='orange',linestyle ="--")
plt.text(Klarna_founded_year-1.00, 35,"Klarna \n(2005)")

plt.axvline(Supercell_founded_year,color='red',linestyle ="--")
plt.text(Supercell_founded_year-1.30, 70,"Supercell \n(2010)")

plt.axvline(Rovio_founded_year,color='grey',linestyle ="--")
plt.text(Rovio_founded_year-1.30, 30,"Rovio \nEntertainment \n(2003)")

plt.title("When were the well-known companies founded?", fontdict=None, position= [0.48,1.05])
plt.show()


### **Visualisation: Startup status, Public grants, debt and a word about Finland**


In [None]:
# Total number of Nordic startups per market/industry with more than 1 million USD investment

most_high= investments[['market', 'name']][investments['funding_total_usd'] > 1000000].groupby(['market'], 
                                        as_index=False).count().sort_values('name', ascending=False)
most_high.head(20)
top20 = most_high.head(20)
top20

In [None]:
Nordm = sns.catplot(x="market", y="name",  kind="bar", data=top20, height=5.27, aspect=11.7/5.27)
Nordm.set_xticklabels(rotation=45, horizontalalignment='right')

In [None]:
# Total number of startups per Finnish market with more than 1 million USD investment

fin_high= investments[investments['country_code'] == 'FIN']

finh = fin_high[['market', 'name']][fin_high['funding_total_usd'] > 1000000].groupby(['market'], 
                                        as_index=False).count().sort_values('name', ascending=False)

Finl10=finh.head(10)
Finl10

In [None]:
Finm = sns.catplot(x="market", y="name",  kind="bar", data=Finl10, height=4.27, aspect=8.7/4.27,palette=sns.dark_palette("green"))
Finm.set_xticklabels(rotation=45, horizontalalignment='right')

In [None]:
# Total number of startups per Swedish market with more than 1 million USD investment

swe_high= investments[investments['country_code'] == 'SWE']

sweh = swe_high[['market', 'name']][swe_high['funding_total_usd'] > 1000000].groupby(['market'], 
                                        as_index=False).count().sort_values('name', ascending=False)

SWL10=sweh.head(10)
SWL10


In [None]:
SWm = sns.catplot(x="market", y="name",  kind="bar", data=SWL10, height=4.27, aspect=8.7/4.27,palette=sns.dark_palette("red"))

SWm.set_xticklabels(rotation=45, horizontalalignment='right')

In [None]:
# Total number of startups per Norwegian market with more than 1 million USD investment

nor_high= investments[investments['country_code'] == 'NOR']

norh = nor_high[['market', 'name']][nor_high['funding_total_usd'] > 1000000].groupby(['market'], 
                                        as_index=False).count().sort_values('name', ascending=False)

Nor10 =norh.head(10)
Nor10

In [None]:
NRm = sns.catplot(x="market", y="name",  kind="bar", data=Nor10, height=4.27, aspect=8.7/4.27,palette=sns.dark_palette("navy", reverse=True))

NRm.set_xticklabels(rotation=45, horizontalalignment='right')

In [None]:
# Total number of startups per Danish market with more than 1 million USD investment

dnk_high= investments[investments['country_code'] == 'DNK']

dnkh = dnk_high[['market', 'name']][dnk_high['funding_total_usd'] > 1000000].groupby(['market'], 
                                        as_index=False).count().sort_values('name', ascending=False)

DN10=dnkh.head(10)
DN10

In [None]:
DNm = sns.catplot(x="market", y="name",  kind="bar", data=DN10, height=4.27, aspect=8.7/4.27,palette=sns.diverging_palette(255, 133, l=60, n=7, center="dark"))

DNm.set_xticklabels(rotation=45, horizontalalignment='right')

In [None]:
# Total number of startups per Icelandic market with more than 1 million USD investment

isl_high= investments[investments['country_code'] == 'ISL']

islh = isl_high[['market', 'name']][isl_high['funding_total_usd'] > 1000000].groupby(['market'], 
                                        as_index=False).count().sort_values('name', ascending=False)

IS10=islh.head(10)
IS10

In [None]:
ISm = sns.catplot(x="market", y="name",  kind="bar", data=IS10, height=4.27, aspect=6.7/8.27,palette=sns.color_palette("BrBG", 7))

ISm.set_xticklabels(rotation=45, horizontalalignment='right')

In [None]:
# 833 largest  debt_financing in the Nordic startups 


LDF = investments.nlargest(833,'debt_financing')
ax = LDF.country_code.value_counts().plot(kind='pie',autopct='%.2f%%',figsize=(12,12))
add_circle = plt.Circle((0,0),0.7,color='white')
fig=plt.gcf()
fig.gca().add_artist(add_circle)
ax.set_title(' debt_financing by country_code')


In [None]:
# grant recipients by country_code

LG = investments.nlargest(833,'grant')
ax = LG.country_code.value_counts().plot(kind='pie',autopct='%.2f%%',figsize=(12,12))
figG=plt.gcf()
figG.gca()
ax.set_title(' grant recipient startups by country_code')



In [None]:
# 10 largest funding_total_used and respective venture in Finland market
gbf=investments[(investments['country_code'] == 'FIN')]

gg= gbf.groupby('market').sum()
LF = gg.nlargest(10,'funding_total_usd')
fg=LF.plot(kind ='bar', y=['funding_total_usd','venture'], figsize=(20,10))

fg.set_title('Startups\' 10 largest funding_total_used and respective venture in Finland\'s market',fontsize=(20))

In [None]:
# The average fundding_total_used and respective venture by region in Finland
gbf=investments[(investments['country_code'] == 'FIN')]
gg= gbf.groupby('region').mean()
fg=gg.plot(kind ='bar', y=['funding_total_usd','venture'], figsize=(20,10))

fg.set_title('Startups\' average funding_total_usd and venture in  finland by region',fontsize=(20))


In [None]:
# The average funding from grant and debt_financing in Finland by region

gbf=investments[(investments['country_code'] == 'FIN')]
rg= gbf.groupby('region').mean()
fr=rg.plot(kind ='line', y=['grant','debt_financing'], figsize=(15,5))

fr.set_title('Startups\'s,grant and debt_financing in  finland by region',fontsize=(20))


In [None]:
# 200 largest grant for startups in Finland by region'
fin =investments[(investments['country_code'] == 'FIN')]
Lfd = fin.nlargest(200,'grant')
ax = Lfd.region.value_counts().plot(kind='pie',autopct='%.2f%%',figsize=(16,20))
add_circle = plt.Circle((0,0),0.7,color='white')
figd=plt.gcf()
figd.gca().add_artist(add_circle)
ax.set_title('200 largest grant for startups in Finland by region')

### **Visualisation: Success stories, funding rounds and private funding**

In [None]:
#checking differences in funding sources among the countries.
investments.groupby('country_code').sum()[['seed',
                                           'venture',
                                           'equity_crowdfunding', 
                                           'undisclosed', 
                                           'convertible_note',
                                           'debt_financing',
                                           'angel',
                                           'grant',
                                           'private_equity',
                                           'post_ipo_equity',
                                           'post_ipo_debt',
                                           'secondary_market', 
                                           'product_crowdfunding']].plot(kind = 'bar', figsize = (20,12), width = 1)
plt.title('Funding sources in Nordic countries', size = 'x-large')

In [None]:
investments['funding_in_seed'] = investments['seed'].map(lambda x :1  if x > 0 else 0)

In [None]:
plt.rcParams['figure.figsize'] =4,4
labels = ['No funding','Get funding']
sizes = investments['funding_in_seed'].value_counts().tolist()
explode = (0, 0.1)
colors =  ['#ff9999','#99ff99'] 

plt.pie(sizes, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
plt.axis('equal')
plt.tight_layout()
plt.title("Startups got funding in seed stage", fontdict=None, position= [0.48,1.1], size = 'x-large')

plt.show()

In [None]:
grouped_by_country = investments.groupby('country_code')
fin = grouped_by_country.get_group('FIN')
dnk = grouped_by_country.get_group('DNK')
isl = grouped_by_country.get_group('ISL')
nor = grouped_by_country.get_group('NOR')
swe = grouped_by_country.get_group('SWE')

fig, ax = plt.subplots(nrows=5, ncols=1, figsize = (5,30))

labels = ['No funding','Get funding']
sizes_fin = fin['funding_in_seed'].value_counts().tolist()
sizes_dnk = dnk['funding_in_seed'].value_counts().tolist()
sizes_isl = isl['funding_in_seed'].value_counts().tolist()
sizes_nor = nor['funding_in_seed'].value_counts().tolist()
sizes_swe = swe['funding_in_seed'].value_counts().tolist()
explode = (0, 0.1)
colors =  ['#ff9999','#99ff99'] 

ax[0].set_title("Finland")
ax[0].pie(sizes_fin, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[1].set_title("Denmark")
ax[1].pie(sizes_dnk, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[2].set_title("Iceland")
ax[2].pie(sizes_isl, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[3].set_title("Norway")
ax[3].pie(sizes_nor, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[4].set_title("Sweden")
ax[4].pie(sizes_swe, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
fig.suptitle('Startups got funding in seed stage in each Nordic country' , size = 'xx-large')
plt.show()


In [None]:
investments['funding_vc'] = investments['venture'].map(lambda v :1  if v > 0 else 0)

In [None]:
plt.rcParams['figure.figsize'] =3,3
labels = ['No funding','Get funding']
sizes = investments['funding_vc'].value_counts().tolist()
explode = (0, 0.1)
colors =  ['#ff9999','#99ff99'] 

plt.pie(sizes, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
plt.axis('equal')
plt.tight_layout()
plt.title("Startups got funding by VC", fontdict=None, position= [0.48,1.1], size = 'x-large')

plt.show()

In [None]:
grouped_by_country = investments.groupby('country_code')
fin = grouped_by_country.get_group('FIN')
dnk = grouped_by_country.get_group('DNK')
isl = grouped_by_country.get_group('ISL')
nor = grouped_by_country.get_group('NOR')
swe = grouped_by_country.get_group('SWE')

fig, ax = plt.subplots(nrows=5, ncols=1, figsize = (5,30))

labels = ['No funding','Get funding']
sizes_fin = fin['funding_vc'].value_counts().tolist()
sizes_dnk = dnk['funding_vc'].value_counts().tolist()
sizes_isl = isl['funding_vc'].value_counts().tolist()
sizes_nor = nor['funding_vc'].value_counts().tolist()
sizes_swe = swe['funding_vc'].value_counts().tolist()
explode = (0, 0.1)
colors =  ['#ff9999','#99ff99'] 

ax[0].set_title("Finland")
ax[0].pie(sizes_fin, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[1].set_title("Denmark")
ax[1].pie(sizes_dnk, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[2].set_title("Iceland")
ax[2].pie(sizes_isl, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[3].set_title("Norway")
ax[3].pie(sizes_nor, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[4].set_title("Sweden")
ax[4].pie(sizes_swe, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
fig.suptitle("Startups got funding by VC in each Nordic country" , size = 'xx-large')
plt.show()

In [None]:
investments['funding_angel'] = investments['angel'].map(lambda a :1  if a > 0 else 0)

In [None]:
plt.rcParams['figure.figsize'] =5,5
labels = ['No funding','Get funding']
sizes = investments['funding_angel'].value_counts().tolist()
explode = (0, 0.1)
colors =  ['#ff9999','#99ff99'] 

plt.pie(sizes, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
plt.axis('equal')
plt.tight_layout()
plt.title("Startups got funding by VC by angels", fontdict=None, position= [0.48,1.1], size = 'x-large')

plt.show()

In [None]:
grouped_by_country = investments.groupby('country_code')
fin = grouped_by_country.get_group('FIN')
dnk = grouped_by_country.get_group('DNK')
isl = grouped_by_country.get_group('ISL')
nor = grouped_by_country.get_group('NOR')
swe = grouped_by_country.get_group('SWE')

fig, ax = plt.subplots(nrows=5, ncols=1, figsize = (5,30))

labels = ['No funding','Get funding']
sizes_fin = fin['funding_angel'].value_counts().tolist()
sizes_dnk = dnk['funding_angel'].value_counts().tolist()
sizes_isl = isl['funding_angel'].value_counts().tolist()
sizes_nor = nor['funding_angel'].value_counts().tolist()
sizes_swe = swe['funding_angel'].value_counts().tolist()
explode = (0, 0.1)
colors =  ['#ff9999','#99ff99'] 

ax[0].set_title("Finland")
ax[0].pie(sizes_fin, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[1].set_title("Denmark")
ax[1].pie(sizes_dnk, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[2].set_title("Iceland")
ax[2].pie(sizes_isl, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[3].set_title("Norway")
ax[3].pie(sizes_nor, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
ax[4].set_title("Sweden")
ax[4].pie(sizes_swe, explode = explode, colors = colors ,labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=190)
fig.suptitle("Startups got funding by angels in each Nordic country" , size = 'xx-large')
plt.show()

In [None]:
# Total number of values and sum of each round (A - F) 

print('Total number of values in round_A: ', len(investments[investments['round_A'] != 0]))
print('Sum of round_A: $', investments['round_A'].sum())
print('')
print('Total number of values in round_B: ', len(investments[investments['round_B'] != 0]))
print('Sum of round_B: $', investments['round_B'].sum())
print('')
print('Total number of values in round_C: ', len(investments[investments['round_C'] != 0]))
print('Sum of round_C: $', investments['round_C'].sum())
print('')
print('Total number of values in round_D: ', len(investments[investments['round_D'] != 0]))
print('Sum of round_D: $', investments['round_D'].sum())
print('')
print('Total number of values in round_E: ', len(investments[investments['round_E'] != 0]))
print('Sum of round_E: $', investments['round_E'].sum())
print('')
print('Total number of values in round_F: ', len(investments[investments['round_F'] != 0]))
print('Sum of round_F: $', investments['round_F'].sum())

In [None]:
rounds = ['round_A','round_B','round_C','round_D','round_E','round_F']
amount = [investments['round_A'].sum(),
          investments['round_B'].sum(),
          investments['round_C'].sum(),
          investments['round_D'].sum(),
          investments['round_E'].sum(),
          investments['round_F'].sum()]

In [None]:
plt.rcParams['figure.figsize'] = 15,8
height = amount
bars =  rounds
y_pos = np.arange(len(bars))

plt.bar(y_pos, height , width=0.7, color= ['goldenrod','tomato','olivedrab','teal','chocolate','seagreen'] )
plt.ticklabel_format(style = 'plain')
plt.xticks(y_pos, bars)
ax = plt.axes()        
ax.yaxis.grid()
plt.title("Sum investment in each round", fontdict=None, position= [0.48,1.05], size = 'x-large')
plt.show()

### **In closing: conclusions and future opportunities for research**

<br>

1) **Cleanup and collaboration** 
- Online collaboration tools are only as strong as their weakest link: in Google Colab’s case, if all participants have full editing access to a file, but Colab keeps asking for authorisation, you need to start hacking solutions. 
- Try not to change software, tools, platforms or update py modules in the middle of a project, it may cause problems or change your output in unexpected ways. 



2) **Status, scale and market segments**
- We can see that there is still a significantly larger volume of startups coming out of Sweden than Finland, and that the startups are largely driven by the software, biotech and mobile market segments. Certain large companies such as Spotify and Supercell account for a large part of this.
- The ratio of acquired startups (in 'status') to the total seems to be lower than one would expect based on the public perception of fast-moving startup culture.
- The top 5 funded startups can be found in entertainment (Spotify leading the way), biotech (Symphogen), payments (Klarna), clean technology (NorSun), and games (Supercell).
- according to our data, startup formation has had two spikes in 2006 and 2012.  


3) **Public grants, debt and Finland**
- It is likely that taxation and incentives would play a part in investment. Unfortunately, this didn't form a part of our dataset, but is a worthy area of research in the future. Here, a comparison of neighbouring countries could yield an understanding e.g. of Finland's place in the startup investment arena.
- There were 11 startups each in the categories of Mobile and Software, and 7 in Biotech, with a higher than 1 million USD investment in the Finnish market. By comparison, the Nordic figures put the number of startups with over 1 million USD investment at 50 startups (Software), 46 startups (Biotech) and 32 (Mobile). We can see that the Nordic figures reveal a higher focus on the Biotech segment than in Finland.
- Startups seem to take more debt than receive grants in Finland, although the main investment type is still venture capital, and those grants are focused mostly in the Helsinki region, with Oulu coming in second and Turku third. The further we move into the periphery, the fewer the grants in an industry that is built on decentralisation.  
 

4) **Success stories, funding rounds and private funding**

- Data for interesting questions related to who is acquiring startups was thin. Yes, there was data related to how many startups had been acquired, but we would need further research to understand better the retention of IP in the Nordic countries or Finland.
- To this end, we would have liked to take advantage of such additional data to employ Machine Learning techniques for predictions of trends in startup investment.
- We can see that, overall,  just over a quarter of startups got seed funding, which is slightly higher in Finland at 39.7 percent. Only Iceland has a higher share of startups receiving seed funding at 43.8 percent. 
- When it comes to venture capital funding, however, Iceland is the lowest at 25 percent with Finland (48.5 percent) sitting roughly at the Nordic average of 46.9 percent. 
- Angel funding accounts for 8.9 percent for the Nordic countries, with most companies in this area - Finland sits at 8.2 percent.
