# Machine Learning Project - Kickstarter Data Set
## EDA Notebook
*Contributor: Max Langer, René Ebrecht, Jens Reich*

This is the very first project where we build a machine learning model from scratch based on an unknown dataset.
The dataset includes data from Kickstarter projects from the years 2009 to 2019.

Our goal is to help our (fictional) stackholder, PPC Consultants with a model that can predict whether a Kickstarter project will be successful or not. 
PPC Consultants advises potential project creators (PPCs) with their projects to get them off the ground as successfully as possible.
Therefore, the value of our data product (the predictive model) is to show opportunities, save time, and in the end make money for both PPC consultants and PPCs.

In [1]:
# Import the organization modules
import pandas as pd
import numpy as np
# Import module to ignore warnings
import warnings
warnings.filterwarnings('ignore')
# Import the plot modules
import matplotlib.pyplot as plt
import seaborn as sns
# Import own scripts
from scripts.data_cleaning import (
    read_all_csvs, 
    clean_data,
    create_csv
    )

In [5]:
# Create data frame from all single CSV files
df = read_all_csvs()

In [6]:
df.columns.to_list()

['backers_count',
 'blurb',
 'category',
 'converted_pledged_amount',
 'country',
 'created_at',
 'creator',
 'currency',
 'currency_symbol',
 'currency_trailing_code',
 'current_currency',
 'deadline',
 'disable_communication',
 'friends',
 'fx_rate',
 'goal',
 'id',
 'is_backing',
 'is_starrable',
 'is_starred',
 'launched_at',
 'location',
 'name',
 'permissions',
 'photo',
 'pledged',
 'profile',
 'slug',
 'source_url',
 'spotlight',
 'staff_pick',
 'state',
 'state_changed_at',
 'static_usd_rate',
 'urls',
 'usd_pledged',
 'usd_type']

In [None]:
# Clean the data
df = clean_data(df)

In [3]:
# Get a look at the data frame
df.head()

Unnamed: 0,backers_count,goal,state,usd_pledged,days_launched_till_changed,days_prelaunch,days_total,project_name_len,creator_name_len,country_AT,...,category_sub_wearables,category_sub_weaving,category_sub_web,category_sub_webcomics,category_sub_webseries,category_sub_woodworking,category_sub_workshops,category_sub_world music,category_sub_young adult,category_sub_zines
0,21,200.0,successful,802.0,45,4,49,21,6,0,...,0,0,0,0,0,0,0,0,0,0
1,97,400.0,successful,2259.0,20,5,25,31,9,0,...,0,0,0,0,0,0,0,0,0,0
2,88,27224.0,successful,29638.0,30,9,39,60,13,0,...,0,0,0,0,0,0,0,0,0,0
3,193,40000.0,successful,49075.15252,42,3,45,25,3,0,...,0,0,0,0,0,0,0,0,0,0
4,20,1000.0,failed,549.0,30,2,32,30,11,0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
df.columns.to_list()

['backers_count',
 'goal',
 'state',
 'usd_pledged',
 'days_launched_till_changed',
 'days_prelaunch',
 'days_total',
 'project_name_len',
 'creator_name_len',
 'country_AT',
 'country_AU',
 'country_BE',
 'country_CA',
 'country_CH',
 'country_DE',
 'country_DK',
 'country_ES',
 'country_FR',
 'country_GB',
 'country_HK',
 'country_IE',
 'country_IT',
 'country_JP',
 'country_LU',
 'country_MX',
 'country_NL',
 'country_NO',
 'country_NZ',
 'country_SE',
 'country_SG',
 'country_US',
 'original_currency_AUD',
 'original_currency_CAD',
 'original_currency_CHF',
 'original_currency_DKK',
 'original_currency_EUR',
 'original_currency_GBP',
 'original_currency_HKD',
 'original_currency_JPY',
 'original_currency_MXN',
 'original_currency_NOK',
 'original_currency_NZD',
 'original_currency_SEK',
 'original_currency_SGD',
 'original_currency_USD',
 'disable_communication_False',
 'disable_communication_True',
 'is_starrable_False',
 'is_starrable_True',
 'spotlight_False',
 'spotlight_True',