## Research Questions

#### What are the funding trends by sector and across different countries
#### Is there any evidence of a funding bubble or funding winter in certain years
#### Which startup raised a significant amount of money but eventually shut down
#### In which parts of the world are the top funded startups based?

### Import Libraries

In [2]:
import pandas as pd

# data collection and importation

In [8]:
df = pd.read_csv(r"C:\Users\HP\Desktop\companies.csv")
df.head(2)

Unnamed: 0,id,Unnamed: 0.1,entity_type,entity_id,parent_id,name,normalized_name,permalink,category_code,status,...,first_milestone_at,last_milestone_at,milestones,relationships,created_by,created_at,updated_at,lat,lng,ROI
0,c:1,0,Company,1,,Wetpaint,wetpaint,/company/wetpaint,web,operating,...,2010-09-05,2013-09-18,5.0,17.0,initial-importer,2007-05-25 06:51:27,2013-04-13 03:29:00,47.606209,-122.332071,15.5
1,c:10,1,Company,10,,Flektor,flektor,/company/flektor,games_video,acquired,...,,,,6.0,initial-importer,2007-05-31 21:11:51,2008-05-23 23:23:14,34.021122,-118.396467,


#### Check the number of rows and columns

In [10]:
df.shape

(196553, 44)

#### Select the columns needed for analysis

In [21]:
required_columns = ['permalink', 'name', 'homepage_url', 'category_code', 
                    'funding_total_usd', 'status', 'country_code', 'state_code',
                       'region', 'city', 'funding_rounds', 'founded_at']

df = df[required_columns]
df.head(3)

Unnamed: 0,permalink,name,homepage_url,category_code,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at
0,/company/wetpaint,Wetpaint,http://wetpaint-inc.com,web,39750000.0,operating,USA,WA,Seattle,Seattle,3.0,2005-10-17
1,/company/flektor,Flektor,http://www.flektor.com,games_video,,acquired,USA,CA,Los Angeles,Culver City,,
2,/company/there,There,http://www.there.com,games_video,,acquired,USA,CA,SF Bay,San Mateo,,


#### Count the number of duplicates

In [25]:
print(f"The number of duplicate values is {df.duplicated().sum()}")

The number of duplicate values is 38


#### Remove duplicates and confirm if they have been dropped

In [30]:
df.drop_duplicates(inplace=True)
df.duplicated().sum()

0

#### Check for Null Values in the funding column

In [34]:
df['funding_total_usd'].isnull().sum()

168641

#### Drop the null values

In [39]:
df.dropna(subset = ['funding_total_usd'], inplace=True)

#### Confirm dataset

In [47]:
df['funding_total_usd'].isnull().sum()

0

#### Change the name of the category_code column to Sector

In [51]:
df.rename(columns = {'category_code':'sector'}, inplace = True)
df.head(3)

Unnamed: 0,permalink,name,homepage_url,sector,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at
0,/company/wetpaint,Wetpaint,http://wetpaint-inc.com,web,39750000.0,operating,USA,WA,Seattle,Seattle,3.0,2005-10-17
13,/company/friendfeed,FriendFeed,http://friendfeed.com,web,5000000.0,acquired,USA,CA,SF Bay,Mountain View,1.0,2007-10-01
19,/company/fitbit,Fitbit,http://www.fitbit.com,health,68069200.0,operating,USA,CA,SF Bay,San Francisco,5.0,2007-10-01


#### Change the case of selected column headers to lowercase

In [58]:
df[['name', 'region', 'city']] = df[['name', 'region', 'city']].apply(lambda x : x.str.lower())

df.head(2)

Unnamed: 0,permalink,name,homepage_url,sector,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at
0,/company/wetpaint,wetpaint,http://wetpaint-inc.com,web,39750000.0,operating,USA,WA,seattle,seattle,3.0,2005-10-17
13,/company/friendfeed,friendfeed,http://friendfeed.com,web,5000000.0,acquired,USA,CA,sf bay,mountain view,1.0,2007-10-01


In [67]:
df.head(10)

Unnamed: 0,permalink,name,homepage_url,sector,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at
0,/company/wetpaint,wetpaint,http://wetpaint-inc.com,web,39750000.0,operating,USA,WA,seattle,seattle,3.0,2005-10-17
13,/company/friendfeed,friendfeed,http://friendfeed.com,web,5000000.0,acquired,USA,CA,sf bay,mountain view,1.0,2007-10-01
19,/company/fitbit,fitbit,http://www.fitbit.com,health,68069200.0,operating,USA,CA,sf bay,san francisco,5.0,2007-10-01
20,/company/mtpv,mtpv,http://www.mtpv.com,cleantech,10125293.0,operating,USA,TX,austin,austin,3.0,2003-01-01
24,/company/demandbase,demandbase,http://www.demandbase.com,analytics,33000000.0,operating,USA,CA,sf bay,san francisco,3.0,2006-01-01
26,/company/locatrix-communications,locatrix communications,http://locatrix.com,mobile,250000.0,operating,AUS,,sf bay,brisbane,1.0,2003-11-01
32,/company/ihirehelp,ihirehelp,http://www.iHireHelp.com,education,100000.0,operating,USA,NJ,new jersey - other,,1.0,2010-10-01
36,/company/cardiosolutions,cardiosolutions,http://www.cardiosolutionsinc.com,medical,11300000.0,operating,USA,MA,west bridgewater,west bridgewater,2.0,2006-01-01
40,/company/blend-biosciences,blend biosciences,,biotech,2800000.0,operating,,,unknown,,1.0,
41,/company/wevod,wevod,http://www.wevod.tv,games_video,414840.0,operating,FRA,,paris,paris,2.0,2006-05-04


In [69]:
df.shape

(27874, 12)

## Save the dataframe into a csv file

In [72]:
df.to_csv('cleaned_research_data.csv', index=False )