## Objective :
You work for Spark Funds, an asset management company. Spark Funds wants to make investments in a few companies. The CEO of Spark Funds wants to understand the global trends in investments so that she can take the investment decisions effectively.


## Business and Data Understanding :
Spark Funds has two minor constraints for investments:

    - It wants to invest between 5 to 15 million USD per round of investment

    - It wants to invest only in English-speaking countries because of the ease of communication with the companies it would invest in. For the analysis, consider a country to be English speaking only if English is one of the official languages in that country
    
## Business objective: 
The objective is to identify the best sectors, countries, and a suitable investment type for making investments. The overall strategy is to invest where others are investing, implying that the 'best' sectors and countries are the ones 'where most investors are investing'. (Spark Funds wants to invest where most other investors are investing. This pattern is often observed among early stage startup investors.)

In [1]:
# Supress Warnings

import warnings
warnings.filterwarnings('ignore')

# Import the numpy and pandas packages

import numpy as np
import pandas as pd

## Task 1: Data Cleaning

-  ### Subtask 1.1: Import and read

Load the companies and rounds data (provided on the previous page) into two data frames and name them `companies` and `rounds2` respectively.

In [2]:
#Reading compaines.txt, changing the encoding type because of special characters. And then solving the multiple encoding issue.
companies = pd.read_csv('../input/companies.txt',encoding='ISO-8859-1',sep='\t')
companies.permalink = companies.permalink.str.encode('ISO-8859-1').str.decode('ascii', 'ignore')
companies.name = companies.name.str.encode('ISO-8859-1').str.decode('ascii', 'ignore')
companies.head()

Unnamed: 0,permalink,name,homepage_url,category_list,status,country_code,state_code,region,city,founded_at
0,/Organization/-Fame,#fame,http://livfame.com,Media,operating,IND,16,Mumbai,Mumbai,
1,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014
2,/Organization/-The-One-Of-Them-Inc-,"(THE) ONE of THEM,Inc.",http://oneofthem.jp,Apps|Games|Mobile,operating,,,,,
3,/Organization/0-6-Com,0-6.com,http://www.0-6.com,Curated Web,operating,CHN,22,Beijing,Beijing,01-01-2007
4,/Organization/004-Technologies,004 Technologies,http://004gmbh.de/en/004-interact,Software,operating,USA,IL,"Springfield, Illinois",Champaign,01-01-2010


In [3]:
rounds2 = pd.read_csv('../input/rounds2.csv',encoding='ISO-8859-1')
rounds2.company_permalink = rounds2.company_permalink.str.encode('ISO-8859-1').str.decode('ascii', 'ignore')
rounds2.head()

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd
0,/organization/-fame,/funding-round/9a01d05418af9f794eebff7ace91f638,venture,B,05-01-2015,10000000.0
1,/ORGANIZATION/-QOUNTER,/funding-round/22dacff496eb7acb2b901dec1dfe5633,venture,A,14-10-2014,
2,/organization/-qounter,/funding-round/b44fbb94153f6cdef13083530bb48030,seed,,01-03-2014,700000.0
3,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/funding-round/650b8f704416801069bb178a1418776b,venture,B,30-01-2014,3406878.0
4,/organization/0-6-com,/funding-round/5727accaeaa57461bd22a9bdd945382d,venture,A,19-03-2008,2000000.0


-  ### Subtask 1.2: Understand the Dataset

    - How many unique companies are present in `rounds2`?
    - How many unique companies are present in `companies`?
    - Are there any companies in the `rounds2` file which are not present in `companies`? Answer yes or no: **Y/N**
    - Merge the two data frames so that all variables (columns) in the `companies` frame are added to the `rounds2` data frame. Name the merged frame `master_frame`. How many observations are present in `master_frame`?

In [4]:
#How many unique companies are present in rounds2?
rounds2['company_permalink'] = rounds2['company_permalink'].str.lower()
print(len(rounds2['company_permalink'].unique()))

#Reconfirming -
rounds2['company_permalink'].str.lower().describe()

66368


count                       114949
unique                       66368
top       /organization/solarflare
freq                            19
Name: company_permalink, dtype: object

In [5]:
# How many unique companies are present in companies?
companies['permalink'] = companies['permalink'].str.lower()
print(len(companies['permalink'].unique()))

#Reconfirming -
companies['permalink'].str.lower().describe()

66368


count                      66368
unique                     66368
top       /organization/pingwhen
freq                           1
Name: permalink, dtype: object

In [6]:
#Are there any companies in the rounds2 file which are not present in companies?
temp1 = pd.DataFrame(rounds2.company_permalink.unique())
temp2 = pd.DataFrame(companies.permalink.unique())
temp2.equals(temp1)

True

In [7]:
set(companies['permalink'].unique()).difference(set(rounds2['company_permalink'].unique()))

set()

In [8]:
#Merge the two data frames so that all variables (columns) in the companies frame are added to the rounds2 data frame. Name the merged frame master_frame.
master_frame = pd.merge(rounds2, companies, how = 'left', left_on = 'company_permalink', right_on = 'permalink')
len(master_frame.index)

114949

-  ### Subtask 1.3: Cleaning the Data

    - Inspecting Null Values
    - Dropping unnecessary columns
    - Dropping unnecessary rows

In [9]:
#Inspecting the Null values , column-wise
master_frame.isnull().sum(axis=0)

company_permalink              0
funding_round_permalink        0
funding_round_type             0
funding_round_code         83809
funded_at                      0
raised_amount_usd          19990
permalink                      0
name                           1
homepage_url                6134
category_list               3410
status                         0
country_code                8678
state_code                 10946
region                     10167
city                       10164
founded_at                 20521
dtype: int64

In [10]:
#Inspecting the Null values percentage , column-wise
print(round(100*(master_frame.isnull().sum()/len(master_frame.index)), 2))

company_permalink           0.00
funding_round_permalink     0.00
funding_round_type          0.00
funding_round_code         72.91
funded_at                   0.00
raised_amount_usd          17.39
permalink                   0.00
name                        0.00
homepage_url                5.34
category_list               2.97
status                      0.00
country_code                7.55
state_code                  9.52
region                      8.84
city                        8.84
founded_at                 17.85
dtype: float64


- #### Dropping unnecessary columns 

For Sparks Funds, we are mostly driving our analysis based on funding round type, category, country etc. Hence, so many columns present in the `master_frame` are not needed, we will drop those columns.

In [11]:
master_frame = master_frame.drop(['funding_round_code', 'funding_round_permalink', 'funded_at','permalink', 'homepage_url',
                                 'state_code', 'region', 'city', 'founded_at','status'], axis = 1)

In [12]:
#Inspecting the Null values percentage again after deletion, column-wise
print(round(100*(master_frame.isnull().sum()/len(master_frame.index)), 2))

company_permalink      0.00
funding_round_type     0.00
raised_amount_usd     17.39
name                   0.00
category_list          2.97
country_code           7.55
dtype: float64


- #### Dropping unnecessary rows

For the remaining columns of `master_frame` dataframe, we can see that there are columns which still have null counts, let's drop those rows and inspect the dataframe again.

In [13]:
#Dropping rows based on null columns
master_frame = master_frame[~(master_frame['raised_amount_usd'].isnull() | master_frame['country_code'].isnull() |
                             master_frame['category_list'].isnull())]

In [14]:
#Percentage of retained rows
print(100*(len(master_frame.index)/114949))

77.01589400516751


In [15]:
master_frame.shape

(88529, 6)

## Task 2: Funding Type Analysis

-  ### Subtask 2.1: Retaining the rows with only four investment types.

Spark Funds wants to choose one of these four investment types(venture, angel, seed, and private equity) for each potential investment they will make. So let's observe and see how many funding types are present in `master_frame` and then retain the rows with above-mentioned investment types.

In [16]:
#Observing the unique funding_round_type
master_frame.funding_round_type.value_counts()

venture                  47809
seed                     21095
debt_financing            6506
angel                     4400
grant                     1939
private_equity            1820
undisclosed               1345
convertible_note          1320
equity_crowdfunding       1128
post_ipo_equity            598
product_crowdfunding       330
post_ipo_debt              151
non_equity_assistance       60
secondary_market            28
Name: funding_round_type, dtype: int64

In [17]:
#Retaining the rows with only four investment types
master_frame = master_frame[(master_frame['funding_round_type'] == 'venture') 
                            | (master_frame['funding_round_type'] == 'seed')
                            | (master_frame['funding_round_type'] == 'angel')
                            | (master_frame['funding_round_type'] == 'private_equity')]
master_frame.head()

Unnamed: 0,company_permalink,funding_round_type,raised_amount_usd,name,category_list,country_code
0,/organization/-fame,venture,10000000.0,#fame,Media,IND
2,/organization/-qounter,seed,700000.0,:Qounter,Application Platforms|Real Time|Social Network...,USA
4,/organization/0-6-com,venture,2000000.0,0-6.com,Curated Web,CHN
7,/organization/0ndine-biomedical-inc,seed,43360.0,Ondine Biomedical Inc.,Biotechnology,CAN
8,/organization/0ndine-biomedical-inc,venture,719491.0,Ondine Biomedical Inc.,Biotechnology,CAN


-  ### Subtask 2.2: Calculate the average investment amount for each of the four funding types.

    - Average funding amount of **venture** type
    - Average funding amount of **seed** type
    - Average funding amount of **angel** type
    - Average funding amount of **private_equity** type

In [18]:
#Converting $ to million $.
master_frame['raised_amount_usd'] = master_frame['raised_amount_usd']/1000000
master_frame.head()

Unnamed: 0,company_permalink,funding_round_type,raised_amount_usd,name,category_list,country_code
0,/organization/-fame,venture,10.0,#fame,Media,IND
2,/organization/-qounter,seed,0.7,:Qounter,Application Platforms|Real Time|Social Network...,USA
4,/organization/0-6-com,venture,2.0,0-6.com,Curated Web,CHN
7,/organization/0ndine-biomedical-inc,seed,0.04336,Ondine Biomedical Inc.,Biotechnology,CAN
8,/organization/0ndine-biomedical-inc,venture,0.719491,Ondine Biomedical Inc.,Biotechnology,CAN


In [19]:
#calculating average investment amount for each of the four funding types.
round(master_frame.groupby('funding_round_type').raised_amount_usd.mean(), 2)

funding_round_type
angel              0.97
private_equity    73.94
seed               0.75
venture           11.72
Name: raised_amount_usd, dtype: float64

In [20]:
#Retaining rows with only venture type. As Spark Funds wants to invest between 5 to 15 million USD per investment round
master_frame = master_frame[master_frame['funding_round_type'] == 'venture'] 

#Dropping the column 'funding_round_type' as it is going to be venture type this point forward
master_frame = master_frame.drop(['funding_round_type'], axis = 1)

## Task 3: Country Analysis

-  ### Subtask 3.1: Analysing the countries based on investment amount

    - Spark Funds wants to see the top nine countries which have received the highest total funding (across ALL sectors for the chosen investment type)

    - For the chosen investment type, make a data frame named top9 with the top nine countries (based on the total investment amount each country has received)

In [21]:
top9 = master_frame.pivot_table(values = 'raised_amount_usd', index = 'country_code', aggfunc = 'sum')
top9 = top9.sort_values(by = 'raised_amount_usd', ascending = False)
top9 = top9.iloc[:9, ]
top9

Unnamed: 0_level_0,raised_amount_usd
country_code,Unnamed: 1_level_1
USA,420068.029342
CHN,39338.918773
GBR,20072.813004
IND,14261.508718
CAN,9482.217668
FRA,7226.851352
ISR,6854.350477
DEU,6306.921981
JPN,3167.647127


In [22]:
#Retaining rows with only USA, GBR and IND country_codes. As SparksFunds wants to invest in only top three English speaking countries.
master_frame = master_frame[(master_frame['country_code'] == 'USA')
                            | (master_frame['country_code'] == 'GBR')
                            | (master_frame['country_code'] == 'IND')]

## Task 4: Sector Analysis 1

-  ### Subtask 4.1: Extract the primary sector of each category

Extract the primary sector value into *category_list* column. According to the  business rule the first string before the vertical bar will be considered the primary sector.

In [23]:
#Extracting the primary vector value
master_frame['category_list'] = master_frame['category_list'].apply(lambda x: x.split('|')[0])

-  ### Subtask 4.2: Map each primary sector to one of the eight main sectors

Use the mapping file 'mapping.csv' to map each primary sector to one of the eight main sectors (Note that ‘Others’ is also considered one of the main sectors)

In [24]:
#Reading mapping.csv file 
mapping = pd.read_csv('../input/mapping.csv')
mapping.category_list = mapping.category_list.replace({'0':'na', '2.na' :'2.0'}, regex=True)
mapping.head()

Unnamed: 0,category_list,Automotive & Sports,Blanks,Cleantech / Semiconductors,Entertainment,Health,Manufacturing,"News, Search and Messaging",Others,"Social, Finance, Analytics, Advertising"
0,,0,1,0,0,0,0,0,0,0
1,3D,0,0,0,0,0,1,0,0,0
2,3D Printing,0,0,0,0,0,1,0,0,0
3,3D Technology,0,0,0,0,0,1,0,0,0
4,Accounting,0,0,0,0,0,0,0,0,1


In [25]:
#Reshaping the mapping dataframe to merge with the master_frame dataframe. Using melt() function to unpivot the table.
mapping = pd.melt(mapping, id_vars =['category_list'], value_vars =['Manufacturing','Automotive & Sports',
                                                              'Cleantech / Semiconductors','Entertainment',
                                                             'Health','News, Search and Messaging','Others',
                                                             'Social, Finance, Analytics, Advertising']) 
mapping = mapping[~(mapping.value == 0)]
mapping = mapping.drop('value', axis = 1)
mapping = mapping.rename(columns = {"variable":"main_sector"})
mapping.head()

Unnamed: 0,category_list,main_sector
1,3D,Manufacturing
2,3D Printing,Manufacturing
3,3D Technology,Manufacturing
7,Advanced Materials,Manufacturing
15,Agriculture,Manufacturing


In [26]:
master_frame = master_frame.merge(mapping, how = 'left', on ='category_list')
master_frame.head()

Unnamed: 0,company_permalink,raised_amount_usd,name,category_list,country_code,main_sector
0,/organization/-fame,10.0,#fame,Media,IND,Entertainment
1,/organization/0xdata,20.0,H2O.ai,Analytics,USA,"Social, Finance, Analytics, Advertising"
2,/organization/0xdata,1.7,H2O.ai,Analytics,USA,"Social, Finance, Analytics, Advertising"
3,/organization/0xdata,8.9,H2O.ai,Analytics,USA,"Social, Finance, Analytics, Advertising"
4,/organization/1-mainstream,5.0,1 Mainstream,Apps,USA,"News, Search and Messaging"


In [27]:
#List of primary sectors which have no main sectors in the master_frame
print(master_frame[master_frame.main_sector.isnull()].category_list.unique())

['Nanotechnology' 'Natural Gas Uses' 'Natural Language Processing'
 'Adaptive Equipment' 'Racing' 'Specialty Retail'
 'Biotechnology and Semiconductor' 'Rapidly Expanding' 'Navigation'
 'Product Search' 'GreenTech' 'Retirement']


In [28]:
#Number of rows with NaN masin_sector value
len(master_frame[master_frame.main_sector.isnull()])

161

In [29]:
#Retaining the rows which have main_sector values
master_frame = master_frame[~(master_frame.main_sector.isnull())]
len(master_frame.index)

38642

## Task 5: Sector Analysis 2

-  ### Subtask 5.1: Create DataFrames D1, D2, D3 based on three countries

    - Create three separate data frames D1, D2 and D3 for each of the three countries containing the observations of funding type FT falling within the 5-15 million USD range. The three data frames should contain:

        - All the columns of the master_frame along with the primary sector and the main sector

        - The total number (or count) of investments for each main sector in a separate column

        - The total amount invested in each main sector in a separate column

In [30]:
D1 = master_frame[(master_frame['country_code'] == 'USA') & 
             (master_frame['raised_amount_usd'] >= 5) & 
             (master_frame['raised_amount_usd'] <= 15)]
D1_gr = D1[['raised_amount_usd','main_sector']].groupby('main_sector').agg(['sum', 'count']).rename(
    columns={'sum':'Total_amount','count' : 'Total_count'})
D1 = D1.merge(D1_gr, how='left', on ='main_sector')
D1.head()

Unnamed: 0,company_permalink,raised_amount_usd,name,category_list,country_code,main_sector,"(raised_amount_usd, Total_amount)","(raised_amount_usd, Total_count)"
0,/organization/0xdata,8.9,H2O.ai,Analytics,USA,"Social, Finance, Analytics, Advertising",23807.376964,2714
1,/organization/1-mainstream,5.0,1 Mainstream,Apps,USA,"News, Search and Messaging",13959.567428,1582
2,/organization/128-technology,11.999347,128 Technology,Service Providers,USA,Others,26321.007002,2950
3,/organization/1366-technologies,15.0,1366 Technologies,Manufacturing,USA,Manufacturing,7258.553378,799
4,/organization/1366-technologies,5.0,1366 Technologies,Manufacturing,USA,Manufacturing,7258.553378,799


In [31]:
D2 = master_frame[(master_frame['country_code'] == 'GBR') & 
             (master_frame['raised_amount_usd'] >= 5) & 
             (master_frame['raised_amount_usd'] <= 15)]
D2_gr = D2[['raised_amount_usd','main_sector']].groupby('main_sector').agg(['sum', 'count']).rename(
    columns={'sum':'Total_amount','count' : 'Total_count'})
D2 = D2.merge(D2_gr, how='left', on ='main_sector')
D2.head()

Unnamed: 0,company_permalink,raised_amount_usd,name,category_list,country_code,main_sector,"(raised_amount_usd, Total_amount)","(raised_amount_usd, Total_count)"
0,/organization/365scores,5.5,365Scores,Android,GBR,"Social, Finance, Analytics, Advertising",1089.404014,133
1,/organization/7digital,8.468328,7digital,Content Creators,GBR,Entertainment,482.784687,56
2,/organization/7digital,10.0,7digital,Content Creators,GBR,Entertainment,482.784687,56
3,/organization/90min,15.0,90min,Media,GBR,Entertainment,482.784687,56
4,/organization/90min,5.8,90min,Media,GBR,Entertainment,482.784687,56


In [32]:
D3 = master_frame[(master_frame['country_code'] == 'IND') & 
             (master_frame['raised_amount_usd'] >= 5) & 
             (master_frame['raised_amount_usd'] <= 15)]
D3_gr = D3[['raised_amount_usd','main_sector']].groupby('main_sector').agg(['sum', 'count']).rename(
    columns={'sum':'Total_amount','count' : 'Total_count'})
D3 = D3.merge(D3_gr, how='left', on ='main_sector')
D3.head()

Unnamed: 0,company_permalink,raised_amount_usd,name,category_list,country_code,main_sector,"(raised_amount_usd, Total_amount)","(raised_amount_usd, Total_count)"
0,/organization/-fame,10.0,#fame,Media,IND,Entertainment,280.83,33
1,/organization/21diamonds-india,6.369507,21Diamonds,E-Commerce,IND,Others,1013.409507,110
2,/organization/a-little-world,6.41,A LITTLE WORLD,Finance,IND,"Social, Finance, Analytics, Advertising",550.54955,60
3,/organization/adlabs-imagica,8.18,Adlabs Imagica,Entertainment,IND,Entertainment,280.83,33
4,/organization/agile,5.74,Agile,Finance,IND,"Social, Finance, Analytics, Advertising",550.54955,60


-  ### Subtask 5.2: Sector-wise Investment Analysis

    - For D1, D2, D3, analyse the below points :

In [33]:
#Total number of investments (count)
print(D1.raised_amount_usd.count())
print(D2.raised_amount_usd.count())
print(D3.raised_amount_usd.count())

12012
619
328


In [34]:
#Total amount of investment (USD)
print(round(D1.raised_amount_usd.sum(), 2))
print(round(D2.raised_amount_usd.sum(), 2))
print(round(D3.raised_amount_usd.sum(), 2))

107318.29
5365.23
2949.54


In [35]:
#Top sector, second-top, third-top for D1 (based on count of investments)
#Number of investments in the top, second-top, third-top sector in D1
D1_gr

Unnamed: 0_level_0,raised_amount_usd,raised_amount_usd
Unnamed: 0_level_1,Total_amount,Total_count
main_sector,Unnamed: 1_level_2,Unnamed: 2_level_2
Automotive & Sports,1454.104361,167
Cleantech / Semiconductors,21206.628192,2300
Entertainment,5099.197982,591
Health,8211.859357,909
Manufacturing,7258.553378,799
"News, Search and Messaging",13959.567428,1582
Others,26321.007002,2950
"Social, Finance, Analytics, Advertising",23807.376964,2714


In [36]:
#Top sector, second-top, third-top for D2 (based on count of investments)
#Number of investments in the top, second-top, third-top sector in D2
D2_gr

Unnamed: 0_level_0,raised_amount_usd,raised_amount_usd
Unnamed: 0_level_1,Total_amount,Total_count
main_sector,Unnamed: 1_level_2,Unnamed: 2_level_2
Automotive & Sports,167.051565,16
Cleantech / Semiconductors,1150.139665,128
Entertainment,482.784687,56
Health,214.53751,24
Manufacturing,361.940335,42
"News, Search and Messaging",615.746235,73
Others,1283.624289,147
"Social, Finance, Analytics, Advertising",1089.404014,133


In [37]:
#Top sector, second-top, third-top for D2 (based on count of investments)
#Number of investments in the top, second-top, third-top sector in D3
D3_gr

Unnamed: 0_level_0,raised_amount_usd,raised_amount_usd
Unnamed: 0_level_1,Total_amount,Total_count
main_sector,Unnamed: 1_level_2,Unnamed: 2_level_2
Automotive & Sports,136.9,13
Cleantech / Semiconductors,165.38,20
Entertainment,280.83,33
Health,167.74,19
Manufacturing,200.9,21
"News, Search and Messaging",433.834545,52
Others,1013.409507,110
"Social, Finance, Analytics, Advertising",550.54955,60


In [38]:
#For the top sector USA , which company received the highest investment?
company = D1[D1['main_sector']=='Others']
company = company.pivot_table(values = 'raised_amount_usd', index = 'company_permalink', aggfunc = 'sum')
company = company.sort_values(by = 'raised_amount_usd', ascending = False).head()
print(company.head(1))

#For the second top sector USA , which company received the highest investment?
company = D1[D1['main_sector']=='Social, Finance, Analytics, Advertising']
company = company.pivot_table(values = 'raised_amount_usd', index = 'company_permalink', aggfunc = 'sum')
company = company.sort_values(by = 'raised_amount_usd', ascending = False).head()
print(company.head(1))

                           raised_amount_usd
company_permalink                           
/organization/virtustream               64.3
                           raised_amount_usd
company_permalink                           
/organization/shotspotter          67.933006


In [39]:
#For the top sector GBR , which company received the highest investment?
company = D2[D2['main_sector']=='Others']
company = company.pivot_table(values = 'raised_amount_usd', index = 'company_permalink', aggfunc = 'sum')
company = company.sort_values(by = 'raised_amount_usd', ascending = False).head()
print(company.head(1))

#For the second top sector GBR , which company received the highest investment?
company = D2[D2['main_sector']=='Social, Finance, Analytics, Advertising']
company = company.pivot_table(values = 'raised_amount_usd', index = 'company_permalink', aggfunc = 'sum')
company = company.sort_values(by = 'raised_amount_usd', ascending = False).head()
print(company.head(1))

                              raised_amount_usd
company_permalink                              
/organization/electric-cloud               37.0
                                     raised_amount_usd
company_permalink                                     
/organization/celltick-technologies               37.5


In [40]:
#For the top sector IND , which company received the highest investment?
company = D3[D3['main_sector']=='Others']
company = company.pivot_table(values = 'raised_amount_usd', index = 'company_permalink', aggfunc = 'sum')
company = company.sort_values(by = 'raised_amount_usd', ascending = False).head()
print(company.head(1))

#For the second top sector IND , which company received the highest investment?
company = D3[D3['main_sector']=='News, Search and Messaging']
company = company.pivot_table(values = 'raised_amount_usd', index = 'company_permalink', aggfunc = 'sum')
company = company.sort_values(by = 'raised_amount_usd', ascending = False).head()
print(company.head(1))

                            raised_amount_usd
company_permalink                            
/organization/firstcry-com               39.0
                                                raised_amount_usd
company_permalink                                                
/organization/gupshup-technology-india-pvt-ltd               33.0


## Analysis Result :

- #### Based on the data analysis performed, SparksFunds should invest in -

    - Funding type - `Venture`.
    - Countries - `USA`, `Britain` and `India`, respectively.
    - Top two sectors to invest in are - `Others` and `Social, Finance, Analytics, Advertising`.