# Who Is Investing in CyberSecurity?

Cybersecurity is red-hot and there is a lot of money being thrown at the industry. Where is the bulk of the money going?

### Data Sources

* Crunchbase

    - investment-2017 in investment-2019.csv
    
    - investment-XXX.csv {security, cybersecurity, cloud security, network security, privacy, blockchain, (bitcoin, ethereum, cryptocurrency), identity management, compliance}


* CyberSecurityVenures

### Ideas

* Is it worth seeing where the different investments are for each category?

* Who are the ones investing in security?

In [1]:
import requests
import re
import pandas as pd
import numpy as np

import datetime

pd.set_option('display.max_columns', 500)

import statsmodels.api as sm


## Prepare the main data set 

of all companies that received some kind of funding.

### Editorial decision: 

two year period, the last half of 2017, all of 2018 and first half of 2019
    

In [17]:
input_data = pd.read_csv('investment-2017.csv')
input_data.shape

(641, 20)

In [18]:
#create master dataframe
all_data = input_data
all_data.shape

(641, 20)

In [19]:
input_data = pd.read_csv('investment-20181.csv')
input_data.shape

(928, 20)

In [20]:
all_data = pd.concat([all_data, input_data], axis = 0, join="outer")
all_data.shape

(1569, 20)

In [21]:
input_data = pd.read_csv('investment-20182.csv')
input_data.shape

(851, 20)

In [22]:
all_data = pd.concat([all_data, input_data], axis = 0, join="outer")
all_data.shape

(2420, 20)

In [23]:
input_data = pd.read_csv('investment-2019.csv')
input_data.shape

(789, 20)

In [24]:
all_data = pd.concat([all_data, input_data], axis = 0, join="outer")
all_data.shape

(3209, 20)

In [25]:
all_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3209 entries, 0 to 788
Data columns (total 20 columns):
Organization Name                               3209 non-null object
Organization Name URL                           3209 non-null object
Categories                                      3209 non-null object
Headquarters Location                           3131 non-null object
Headquarters Regions                            2695 non-null object
Founded Date                                    2830 non-null object
Founded Date Precision                          2830 non-null object
Exit Date                                       109 non-null object
Exit Date Precision                             109 non-null object
Last Funding Date                               3209 non-null object
Last Funding Amount                             2212 non-null float64
Last Funding Amount Currency                    2212 non-null object
Last Funding Amount Currency (in USD)           2212 non-null float

### Cleanup and format master table

In [26]:
all_data['Founded Date']= pd.to_datetime(all_data['Founded Date']) 
all_data['Founded'] = pd.DatetimeIndex(all_data['Founded Date']).year
all_data['Last Funding Date'] = pd.to_datetime(all_data['Last Funding Date'])
all_data['Period']=pd.to_datetime(all_data['Last Funding Date']).dt.to_period('M')
all_data['Quarter']=all_data['Period'].astype(str)
all_data.Quarter.replace(['2019-01', '2019-02', '2019-03'], '20191Q', inplace=True)
all_data.Quarter.replace(['2019-04', '2019-05', '2019-06'], '20192Q', inplace=True)
all_data.Quarter.replace(['2019-07', '2019-08', '2019-09'], '20193Q', inplace=True)
all_data.Quarter.replace(['2018-01', '2018-02', '2018-03'], '20181Q', inplace=True)
all_data.Quarter.replace(['2018-04', '2018-05', '2018-06'], '20182Q', inplace=True)
all_data.Quarter.replace(['2018-07', '2018-08', '2018-09'], '20183Q', inplace=True)
all_data.Quarter.replace(['2018-10', '2018-11', '2018-12'], '20184Q', inplace=True)
all_data.Quarter.replace(['2017-07', '2017-08', '2017-09'], '20173Q', inplace=True)
all_data.Quarter.replace(['2017-10', '2017-11', '2017-12'], '20174Q', inplace=True)

all_data = all_data.drop(['Organization Name URL','Exit Date', 'Founded Date','Founded Date Precision','Exit Date Precision','Last Funding Amount Currency','Last Funding Amount','Last Equity Funding Amount', 'Last Equity Funding Amount Currency'], axis=1)
all_data.head(10)

Unnamed: 0,Organization Name,Categories,Headquarters Location,Headquarters Regions,Exit Date,Last Funding Date,Last Funding Amount Currency (in USD),Last Funding Type,Last Equity Funding Amount Currency (in USD),Funding Status,Top 5 Investors,Number of Investors,Founded,Period,Quarter
0,Recorded Future,"Analytics, Cyber Security, Machine Learning, R...","Somerville, Massachusetts, United States","Greater Boston Area, East Coast, New England",2019-05-30,2017-10-31,25000000.0,Series E,25000000.0,M&A,"Balderton Capital, GV, IA Ventures, Insight Pa...",8.0,2009.0,2017-10,20174Q
1,Binance Labs,"Cryptocurrency, Ethereum","Valletta, NA - Malta, Malta",European Union (EU),NaT,2017-11-01,,Initial Coin Offering,,,,,2017.0,2017-11,20174Q
2,Zscaler,"Cloud Security, Cyber Security, Enterprise Sof...","San Jose, California, United States","San Francisco Bay Area, Silicon Valley, West C...",2018-03-16,2017-08-08,,Secondary Market,110000000.0,IPO,"CapitalG, EquityZen, Lightspeed Venture Partne...",4.0,2008.0,2017-08,20173Q
3,Bitdefender,"Cloud Security, Cyber Security, Network Securi...","Bucharest, Bucuresti, Romania",European Union (EU),NaT,2017-12-01,180000000.0,Secondary Market,7000000.0,,"Balkan Accession Fund, Romanian-American Enter...",3.0,2001.0,2017-12,20174Q
4,Duo Security,"Cloud Security, Cyber Security, Enterprise Sof...","Ann Arbor, Michigan, United States","Great Lakes, Midwestern US",2018-08-02,2017-10-18,70000000.0,Series D,70000000.0,M&A,"GV, Index Ventures, Redpoint, True Ventures, W...",12.0,2009.0,2017-10,20174Q
5,Tenable Network Security,"Compliance, Network Security, Risk Management,...","Columbia, Maryland, United States","Greater Baltimore-Maryland Area, East Coast, S...",2018-07-26,2017-07-03,,Secondary Market,250000000.0,IPO,"Accel, In-Q-Tel, Insight Partners, SharesPost ...",4.0,2002.0,2017-07,20173Q
6,ForgeRock,"Cyber Security, Enterprise Software, Identity ...","San Francisco, California, United States","San Francisco Bay Area, West Coast, Western US",NaT,2017-09-05,88000000.0,Series D,88000000.0,Late Stage Venture,"Accel, Foundation Capital, Kohlberg Kravis Rob...",4.0,2010.0,2017-09,20173Q
7,Threat Stack,"Cloud Security, SaaS, Security","Boston, Massachusetts, United States","Greater Boston Area, East Coast, New England",NaT,2017-09-19,45000000.0,Series C,45000000.0,Late Stage Venture,"Atlas Venture, Eight Roads Ventures, Right Sid...",8.0,2012.0,2017-09,20173Q
8,Skybox Security,"Cloud Security, Cyber Security, Enterprise Sof...","San Jose, California, United States","San Francisco Bay Area, Silicon Valley, West C...",NaT,2017-10-25,150000000.0,Private Equity,150000000.0,Private Equity,"CVC Capital Partners, Lightspeed Venture Partn...",14.0,2002.0,2017-10,20174Q
9,Huobi,"Bitcoin, Blockchain, Cryptocurrency, FinTech","Singapore, Central Region, Singapore","Asia-Pacific (APAC), Association of Southeast ...",NaT,2017-11-01,1000000.0,Initial Coin Offering,1000000.0,,Node Capital,1.0,2013.0,2017-11,20174Q


In [27]:
#rename fields for easier access
all_data = all_data.rename(columns={"Organization Name": "Organization", 
                   "Headquarters Location": "Location",
                   "Headquarters Regions" : "Region",
                   "Last Funding Date" : "FundingDate",
                   "Last Funding Amount Currency (in USD)" : "FundingAmount",
                   "Last Funding Type" : "FundingType",
                   "Last Equity Funding Amount Currency (in USD)" : "EquityAmount",
                   "Funding Status" : "Status",
                   "Top 5 Investors" : "Top5",
                   "Number of Investors" : "Investors"
                  })
all_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3209 entries, 0 to 788
Data columns (total 15 columns):
Organization     3209 non-null object
Categories       3209 non-null object
Location         3131 non-null object
Region           2695 non-null object
ExitDate         109 non-null datetime64[ns]
FundingDate      3209 non-null datetime64[ns]
FundingAmount    2212 non-null float64
FundingType      3209 non-null object
EquityAmount     2137 non-null float64
Status           2268 non-null object
Top5             2487 non-null object
Investors        2487 non-null float64
Founded          2830 non-null float64
Period           3209 non-null period[M]
Quarter          3209 non-null object
dtypes: datetime64[ns](2), float64(4), object(8), period[M](1)
memory usage: 401.1+ KB


In [30]:
#understanding the data
all_data.isnull().sum()

Organization        0
Categories          0
Location           78
Region            514
ExitDate         3100
FundingDate         0
FundingAmount     997
FundingType         0
EquityAmount     1072
Status            941
Top5              722
Investors         722
Founded           379
Period              0
Quarter             0
dtype: int64

### Editorial Decision
In a research about investments, there's no value in keeping the private investments or other undisclosed rounds. Dropping rows that have no funding amount. Some had an exit (ipo, m&a, etc), some are undisclosed, and some have weird exit dates

In [31]:
all_data = all_data.dropna(subset=['FundingAmount'])
all_data.shape

(2212, 15)

In [32]:
all_data.isnull().sum()

Organization        0
Categories          0
Location           37
Region            371
ExitDate         2139
FundingDate         0
FundingAmount       0
FundingType         0
EquityAmount      104
Status            616
Top5              510
Investors         510
Founded           214
Period              0
Quarter             0
dtype: int64

### Format the investors and region fields

In [34]:
all_data["Invest"]= all_data["Top5"].str.split(",", n = -1) 
all_data.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,Organization,Categories,Location,Region,ExitDate,FundingDate,FundingAmount,FundingType,EquityAmount,Status,Top5,Investors,Founded,Period,Quarter,Invest
0,Recorded Future,"Analytics, Cyber Security, Machine Learning, R...","Somerville, Massachusetts, United States","Greater Boston Area, East Coast, New England",2019-05-30,2017-10-31,25000000.0,Series E,25000000.0,M&A,"Balderton Capital, GV, IA Ventures, Insight Pa...",8.0,2009.0,2017-10,20174Q,"[Balderton Capital, GV, IA Ventures, Insigh..."
3,Bitdefender,"Cloud Security, Cyber Security, Network Securi...","Bucharest, Bucuresti, Romania",European Union (EU),NaT,2017-12-01,180000000.0,Secondary Market,7000000.0,,"Balkan Accession Fund, Romanian-American Enter...",3.0,2001.0,2017-12,20174Q,"[Balkan Accession Fund, Romanian-American Ent..."
4,Duo Security,"Cloud Security, Cyber Security, Enterprise Sof...","Ann Arbor, Michigan, United States","Great Lakes, Midwestern US",2018-08-02,2017-10-18,70000000.0,Series D,70000000.0,M&A,"GV, Index Ventures, Redpoint, True Ventures, W...",12.0,2009.0,2017-10,20174Q,"[GV, Index Ventures, Redpoint, True Venture..."
6,ForgeRock,"Cyber Security, Enterprise Software, Identity ...","San Francisco, California, United States","San Francisco Bay Area, West Coast, Western US",NaT,2017-09-05,88000000.0,Series D,88000000.0,Late Stage Venture,"Accel, Foundation Capital, Kohlberg Kravis Rob...",4.0,2010.0,2017-09,20173Q,"[Accel, Foundation Capital, Kohlberg Kravis ..."
7,Threat Stack,"Cloud Security, SaaS, Security","Boston, Massachusetts, United States","Greater Boston Area, East Coast, New England",NaT,2017-09-19,45000000.0,Series C,45000000.0,Late Stage Venture,"Atlas Venture, Eight Roads Ventures, Right Sid...",8.0,2012.0,2017-09,20173Q,"[Atlas Venture, Eight Roads Ventures, Right ..."
8,Skybox Security,"Cloud Security, Cyber Security, Enterprise Sof...","San Jose, California, United States","San Francisco Bay Area, Silicon Valley, West C...",NaT,2017-10-25,150000000.0,Private Equity,150000000.0,Private Equity,"CVC Capital Partners, Lightspeed Venture Partn...",14.0,2002.0,2017-10,20174Q,"[CVC Capital Partners, Lightspeed Venture Par..."
9,Huobi,"Bitcoin, Blockchain, Cryptocurrency, FinTech","Singapore, Central Region, Singapore","Asia-Pacific (APAC), Association of Southeast ...",NaT,2017-11-01,1000000.0,Initial Coin Offering,1000000.0,,Node Capital,1.0,2013.0,2017-11,20174Q,[Node Capital]
12,Bitwise Asset Management,"Blockchain, Cryptocurrency, Financial Services...","San Francisco, California, United States","San Francisco Bay Area, West Coast, Western US",NaT,2017-12-12,4000000.0,Seed,4000000.0,Seed,"Blockchain Capital, Collaborative Fund, Elad G...",15.0,2017.0,2017-12,20174Q,"[Blockchain Capital, Collaborative Fund, Ela..."
14,Instart,"Advertising, Cloud Computing, Cyber Security, ...","Palo Alto, California, United States","San Francisco Bay Area, Silicon Valley, West C...",NaT,2017-11-02,30000000.0,Series E,30000000.0,Late Stage Venture,"Andreessen Horowitz, Greylock Partners, StartX...",12.0,2010.0,2017-11,20174Q,"[Andreessen Horowitz, Greylock Partners, Sta..."
15,Lastline,"Network Security, Security","Redwood City, California, United States","San Francisco Bay Area, Silicon Valley, West C...",NaT,2017-07-12,28500000.0,Series C,28500000.0,Late Stage Venture,"Barracuda Networks, Dell Technologies Capital,...",9.0,2011.0,2017-07,20173Q,"[Barracuda Networks, Dell Technologies Capita..."


In [40]:
tags = all_data.Invest.apply(pd.Series)
tags = tags.drop(5, axis=1)
tags = tags.rename(columns = lambda x : 'investor_' + str(x+1))
tags.head(5)

Unnamed: 0,investor_1,investor_2,investor_3,investor_4,investor_5
0,Balderton Capital,GV,IA Ventures,Insight Partners,MassMutual Ventures
3,Balkan Accession Fund,Romanian-American Enterprise Fund,Vitruvian Partners,,
4,GV,Index Ventures,Redpoint,True Ventures,Workday
6,Accel,Foundation Capital,Kohlberg Kravis Roberts,Meritech Capital Partners,
7,Atlas Venture,Eight Roads Ventures,Right Side Capital Management,Scale Venture Partners,Techstars


In [49]:
my_data = pd.concat([all_data[:], tags[:]], axis=1)
all_data = my_data
all_data.shape

(2212, 21)

In [51]:
all_data["Place"]= all_data["Region"].str.split(",", n = -1) 
all_data.head(5)

Unnamed: 0,Organization,Categories,Location,Region,ExitDate,FundingDate,FundingAmount,FundingType,EquityAmount,Status,Top5,Investors,Founded,Period,Quarter,Invest,investor_1,investor_2,investor_3,investor_4,investor_5,Place
0,Recorded Future,"Analytics, Cyber Security, Machine Learning, R...","Somerville, Massachusetts, United States","Greater Boston Area, East Coast, New England",2019-05-30,2017-10-31,25000000.0,Series E,25000000.0,M&A,"Balderton Capital, GV, IA Ventures, Insight Pa...",8.0,2009.0,2017-10,20174Q,"[Balderton Capital, GV, IA Ventures, Insigh...",Balderton Capital,GV,IA Ventures,Insight Partners,MassMutual Ventures,"[Greater Boston Area, East Coast, New England]"
3,Bitdefender,"Cloud Security, Cyber Security, Network Securi...","Bucharest, Bucuresti, Romania",European Union (EU),NaT,2017-12-01,180000000.0,Secondary Market,7000000.0,,"Balkan Accession Fund, Romanian-American Enter...",3.0,2001.0,2017-12,20174Q,"[Balkan Accession Fund, Romanian-American Ent...",Balkan Accession Fund,Romanian-American Enterprise Fund,Vitruvian Partners,,,[European Union (EU)]
4,Duo Security,"Cloud Security, Cyber Security, Enterprise Sof...","Ann Arbor, Michigan, United States","Great Lakes, Midwestern US",2018-08-02,2017-10-18,70000000.0,Series D,70000000.0,M&A,"GV, Index Ventures, Redpoint, True Ventures, W...",12.0,2009.0,2017-10,20174Q,"[GV, Index Ventures, Redpoint, True Venture...",GV,Index Ventures,Redpoint,True Ventures,Workday,"[Great Lakes, Midwestern US]"
6,ForgeRock,"Cyber Security, Enterprise Software, Identity ...","San Francisco, California, United States","San Francisco Bay Area, West Coast, Western US",NaT,2017-09-05,88000000.0,Series D,88000000.0,Late Stage Venture,"Accel, Foundation Capital, Kohlberg Kravis Rob...",4.0,2010.0,2017-09,20173Q,"[Accel, Foundation Capital, Kohlberg Kravis ...",Accel,Foundation Capital,Kohlberg Kravis Roberts,Meritech Capital Partners,,"[San Francisco Bay Area, West Coast, Western..."
7,Threat Stack,"Cloud Security, SaaS, Security","Boston, Massachusetts, United States","Greater Boston Area, East Coast, New England",NaT,2017-09-19,45000000.0,Series C,45000000.0,Late Stage Venture,"Atlas Venture, Eight Roads Ventures, Right Sid...",8.0,2012.0,2017-09,20173Q,"[Atlas Venture, Eight Roads Ventures, Right ...",Atlas Venture,Eight Roads Ventures,Right Side Capital Management,Scale Venture Partners,Techstars,"[Greater Boston Area, East Coast, New England]"


In [54]:
tags = all_data.Place.apply(pd.Series)
tags = tags.rename(columns = lambda x : 'region_' + str(x+1))
tags.head(5)

Unnamed: 0,region_1,region_2,region_3
0,Greater Boston Area,East Coast,New England
3,European Union (EU),,
4,Great Lakes,Midwestern US,
6,San Francisco Bay Area,West Coast,Western US
7,Greater Boston Area,East Coast,New England


In [55]:
my_data = pd.concat([all_data[:], tags[:]], axis=1)
all_data = my_data
all_data.shape

(2212, 25)

In [59]:
all_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2212 entries, 0 to 690
Data columns (total 25 columns):
Organization     2212 non-null object
Categories       2212 non-null object
Location         2175 non-null object
Region           1841 non-null object
ExitDate         73 non-null datetime64[ns]
FundingDate      2212 non-null datetime64[ns]
FundingAmount    2212 non-null float64
FundingType      2212 non-null object
EquityAmount     2108 non-null float64
Status           1596 non-null object
Top5             1702 non-null object
Investors        1702 non-null float64
Founded          1998 non-null float64
Period           2212 non-null period[M]
Quarter          2212 non-null object
Invest           1702 non-null object
investor_1       1702 non-null object
investor_2       1194 non-null object
investor_3       949 non-null object
investor_4       737 non-null object
investor_5       592 non-null object
Place            1841 non-null object
region_1         1841 non-null object
re

In [62]:
all_data = all_data.drop(['Region', 'FundingDate', Top5','Invest','Place'], axis=1)
all_data.head(5)

Unnamed: 0,Organization,Categories,Location,FundingAmount,FundingType,EquityAmount,Status,Investors,Founded,Period,Quarter,investor_1,investor_2,investor_3,investor_4,investor_5,region_1,region_2,region_3
0,Recorded Future,"Analytics, Cyber Security, Machine Learning, R...","Somerville, Massachusetts, United States",25000000.0,Series E,25000000.0,M&A,8.0,2009.0,2017-10,20174Q,Balderton Capital,GV,IA Ventures,Insight Partners,MassMutual Ventures,Greater Boston Area,East Coast,New England
3,Bitdefender,"Cloud Security, Cyber Security, Network Securi...","Bucharest, Bucuresti, Romania",180000000.0,Secondary Market,7000000.0,,3.0,2001.0,2017-12,20174Q,Balkan Accession Fund,Romanian-American Enterprise Fund,Vitruvian Partners,,,European Union (EU),,
4,Duo Security,"Cloud Security, Cyber Security, Enterprise Sof...","Ann Arbor, Michigan, United States",70000000.0,Series D,70000000.0,M&A,12.0,2009.0,2017-10,20174Q,GV,Index Ventures,Redpoint,True Ventures,Workday,Great Lakes,Midwestern US,
6,ForgeRock,"Cyber Security, Enterprise Software, Identity ...","San Francisco, California, United States",88000000.0,Series D,88000000.0,Late Stage Venture,4.0,2010.0,2017-09,20173Q,Accel,Foundation Capital,Kohlberg Kravis Roberts,Meritech Capital Partners,,San Francisco Bay Area,West Coast,Western US
7,Threat Stack,"Cloud Security, SaaS, Security","Boston, Massachusetts, United States",45000000.0,Series C,45000000.0,Late Stage Venture,8.0,2012.0,2017-09,20173Q,Atlas Venture,Eight Roads Ventures,Right Side Capital Management,Scale Venture Partners,Techstars,Greater Boston Area,East Coast,New England


In [63]:
all_data.isnull().sum()

Organization        0
Categories          0
Location           37
FundingAmount       0
FundingType         0
EquityAmount      104
Status            616
Investors         510
Founded           214
Period              0
Quarter             0
investor_1        510
investor_2       1018
investor_3       1263
investor_4       1475
investor_5       1620
region_1          371
region_2         1106
region_3         1294
dtype: int64

Write to CSV for backup

In [64]:
all_data.to_csv('investment-complete.csv', index=0)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2212 entries, 0 to 690
Data columns (total 25 columns):
Organization     2212 non-null object
Categories       2212 non-null object
Location         2175 non-null object
Region           1841 non-null object
ExitDate         73 non-null datetime64[ns]
FundingDate      2212 non-null datetime64[ns]
FundingAmount    2212 non-null float64
FundingType      2212 non-null object
EquityAmount     2108 non-null float64
Status           1596 non-null object
Top5             1702 non-null object
Investors        1702 non-null float64
Founded          1998 non-null float64
Period           2212 non-null period[M]
Quarter          2212 non-null object
Invest           1702 non-null object
investor_1       1702 non-null object
investor_2       1194 non-null object
investor_3       949 non-null object
investor_4       737 non-null object
investor_5       592 non-null object
Place            1841 non-null object
region_1         1841 non-null object
re

In [48]:
my_data.head(5)

Unnamed: 0,Organization,Categories,Location,Region,ExitDate,FundingDate,FundingAmount,FundingType,EquityAmount,Status,Top5,Investors,Founded,Period,Quarter,Invest,investor_1,investor_2,investor_3,investor_4,investor_5
0,Recorded Future,"Analytics, Cyber Security, Machine Learning, R...","Somerville, Massachusetts, United States","Greater Boston Area, East Coast, New England",2019-05-30,2017-10-31,25000000.0,Series E,25000000.0,M&A,"Balderton Capital, GV, IA Ventures, Insight Pa...",8.0,2009.0,2017-10,20174Q,"[Balderton Capital, GV, IA Ventures, Insigh...",Balderton Capital,GV,IA Ventures,Insight Partners,MassMutual Ventures
3,Bitdefender,"Cloud Security, Cyber Security, Network Securi...","Bucharest, Bucuresti, Romania",European Union (EU),NaT,2017-12-01,180000000.0,Secondary Market,7000000.0,,"Balkan Accession Fund, Romanian-American Enter...",3.0,2001.0,2017-12,20174Q,"[Balkan Accession Fund, Romanian-American Ent...",Balkan Accession Fund,Romanian-American Enterprise Fund,Vitruvian Partners,,
4,Duo Security,"Cloud Security, Cyber Security, Enterprise Sof...","Ann Arbor, Michigan, United States","Great Lakes, Midwestern US",2018-08-02,2017-10-18,70000000.0,Series D,70000000.0,M&A,"GV, Index Ventures, Redpoint, True Ventures, W...",12.0,2009.0,2017-10,20174Q,"[GV, Index Ventures, Redpoint, True Venture...",GV,Index Ventures,Redpoint,True Ventures,Workday
6,ForgeRock,"Cyber Security, Enterprise Software, Identity ...","San Francisco, California, United States","San Francisco Bay Area, West Coast, Western US",NaT,2017-09-05,88000000.0,Series D,88000000.0,Late Stage Venture,"Accel, Foundation Capital, Kohlberg Kravis Rob...",4.0,2010.0,2017-09,20173Q,"[Accel, Foundation Capital, Kohlberg Kravis ...",Accel,Foundation Capital,Kohlberg Kravis Roberts,Meritech Capital Partners,
7,Threat Stack,"Cloud Security, SaaS, Security","Boston, Massachusetts, United States","Greater Boston Area, East Coast, New England",NaT,2017-09-19,45000000.0,Series C,45000000.0,Late Stage Venture,"Atlas Venture, Eight Roads Ventures, Right Sid...",8.0,2012.0,2017-09,20173Q,"[Atlas Venture, Eight Roads Ventures, Right ...",Atlas Venture,Eight Roads Ventures,Right Side Capital Management,Scale Venture Partners,Techstars


In [None]:
tags = all_data.Cats.apply(pd.Series)
#tags = tags.rename(columns = lambda x : 'tag_' + str(x))
tags.head(5)

In [None]:
my_data = pd.concat([all_data[:], tags[:]], axis=1)
my_data.shape

In [None]:
my_data.head(5)

In [None]:
my_data.info()

In [None]:
all_data['Cloud']=np.where(all_data['Categories'].str.contains('[Cc]loud', regex=True), "Yes", np.NaN)
all_data['Cloud']=np.where(all_data['Categories'].str.contains('aa[sS]', regex=True), "Yes", np.NaN)
all_data['AI']=np.where(all_data['Categories'].str.contains('[Aa]rtificial', regex=True), "Yes", np.NaN)
all_data['AI']=np.where(all_data['Categories'].str.contains('[Ll]earning', regex=True), "Yes", np.NaN)
all_data['Network']=np.where(all_data['Categories'].str.contains('[Nn]etwork', regex=True), "Yes", np.NaN)
all_data['Compliance']=np.where(all_data['Categories'].str.contains('[Cc]ompliance', regex=True), "Yes", np.NaN)
all_data['Compliance']=np.where(all_data['Categories'].str.contains('[Rr]isk', regex=True), "Yes", np.NaN)
all_data['SW']=np.where(all_data['Categories'].str.contains('[Ss]oftware', regex=True), "Yes", np.NaN)
all_data['SW']=np.where(all_data['Categories'].str.contains('[Ss]aa[Ss]', regex=True), "Yes", np.NaN)
all_data['HW']=np.where(all_data['Categories'].str.contains('[Hh]ardware', regex=True), "Yes", np.NaN)
all_data['Mobile']=np.where(all_data['Categories'].str.contains('obile', regex=True), "Yes", np.NaN)
all_data.head(10)

In [None]:
all_data.Categories