## Clean.ipynb

**AUTHOR:** Shiyan Boxer

**DATE:** Dec 28th, 2020


**DESCRIPTION:** Clean dataset by doing the following:
- Remove companies with blank cells in funding_total_usd, status, country_code, founded_year and remove total funding with NA
- Make new column for company “success” (0 = “operating or “acquired” and 1 = “closed”)

**DEPENDENCIES:**
- Python 3.8.6
- pandas
- csv

#### Import dependencies

In [1]:
import pandas as pd
import os 
import csv

#### Create new CSV file called after_investments.csv that will be manipulated (done manually)

In [4]:
df = pd.read_excel("C://Users//shiya//Documents//Startup-Success-Predictor-v2//after.xlsx")
df.head()

Unnamed: 0,permalink,name,homepage_url,category_list,market,funding_total_usd (divide by 1000),status,country_code,state_code,region,...,product_crowdfunding,round_A,round_B,round_C,round_D,round_E,round_F,round_G,round_H,success
0,/organization/advanced-northern-graphite-leade...,Advanced Northern Graphite Leaders,http://www.anglinc.ca,|Clean Technology|,Clean Technology,0,operating,CAN,AB,Sherwood Park,...,0,0,0,0,0,0,0,0,0,1
1,/organization/celebration-creation,Celebration Creation,http://www.celebrationcreation.ca,|Real Estate|,Real Estate,0,operating,CAN,AB,Calgary,...,0,0,0,0,0,0,0,0,0,1
2,/organization/justparts,JustParts,http://www.JustParts.com,|Auto|Marketplaces|E-Commerce|,Marketplaces,0,operating,CAN,AB,Thunder Bay,...,0,0,0,0,0,0,0,0,0,1
3,/organization/knighthaven,KnightHaven,http://www.knighthaven.com/,|Entertainment|Games|,Games,0,operating,CAN,AB,AB - Other,...,0,0,0,0,0,0,0,0,0,1
4,/organization/kotch-international-transportati...,Kotch International Transportation Design Spec...,http://www.kotchexotictours.com,|Transportation|,Transportation,0,operating,CAN,AB,AB - Other,...,0,0,0,0,0,0,0,0,0,1


#### Remove companies with NA cells 
- From funding_total_usd, status, country_code, founded_year, and total funding
- Using the dropna function we drop rows with missing values if any NA values are present

In [5]:
df = df.dropna(axis=0, how='any') # drop rows (axis = 0), if (any) NA appear
df.head()

Unnamed: 0,permalink,name,homepage_url,category_list,market,funding_total_usd (divide by 1000),status,country_code,state_code,region,...,product_crowdfunding,round_A,round_B,round_C,round_D,round_E,round_F,round_G,round_H,success
0,/organization/advanced-northern-graphite-leade...,Advanced Northern Graphite Leaders,http://www.anglinc.ca,|Clean Technology|,Clean Technology,0,operating,CAN,AB,Sherwood Park,...,0,0,0,0,0,0,0,0,0,1
1,/organization/celebration-creation,Celebration Creation,http://www.celebrationcreation.ca,|Real Estate|,Real Estate,0,operating,CAN,AB,Calgary,...,0,0,0,0,0,0,0,0,0,1
2,/organization/justparts,JustParts,http://www.JustParts.com,|Auto|Marketplaces|E-Commerce|,Marketplaces,0,operating,CAN,AB,Thunder Bay,...,0,0,0,0,0,0,0,0,0,1
3,/organization/knighthaven,KnightHaven,http://www.knighthaven.com/,|Entertainment|Games|,Games,0,operating,CAN,AB,AB - Other,...,0,0,0,0,0,0,0,0,0,1
4,/organization/kotch-international-transportati...,Kotch International Transportation Design Spec...,http://www.kotchexotictours.com,|Transportation|,Transportation,0,operating,CAN,AB,AB - Other,...,0,0,0,0,0,0,0,0,0,1


#### Make new column for company “success”
- 1 = “operating or “acquired” and 1 = “closed”
- DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)[source]
- Apply a function along an axis of the DataFrame.

In [6]:
df['success'] = df['status'].apply(lambda x: 0 if 'closed' in x.lower() else 1)
df.head()

Unnamed: 0,permalink,name,homepage_url,category_list,market,funding_total_usd (divide by 1000),status,country_code,state_code,region,...,product_crowdfunding,round_A,round_B,round_C,round_D,round_E,round_F,round_G,round_H,success
0,/organization/advanced-northern-graphite-leade...,Advanced Northern Graphite Leaders,http://www.anglinc.ca,|Clean Technology|,Clean Technology,0,operating,CAN,AB,Sherwood Park,...,0,0,0,0,0,0,0,0,0,1
1,/organization/celebration-creation,Celebration Creation,http://www.celebrationcreation.ca,|Real Estate|,Real Estate,0,operating,CAN,AB,Calgary,...,0,0,0,0,0,0,0,0,0,1
2,/organization/justparts,JustParts,http://www.JustParts.com,|Auto|Marketplaces|E-Commerce|,Marketplaces,0,operating,CAN,AB,Thunder Bay,...,0,0,0,0,0,0,0,0,0,1
3,/organization/knighthaven,KnightHaven,http://www.knighthaven.com/,|Entertainment|Games|,Games,0,operating,CAN,AB,AB - Other,...,0,0,0,0,0,0,0,0,0,1
4,/organization/kotch-international-transportati...,Kotch International Transportation Design Spec...,http://www.kotchexotictours.com,|Transportation|,Transportation,0,operating,CAN,AB,AB - Other,...,0,0,0,0,0,0,0,0,0,1
