# Kickstarter Initial Data Exploration

* **Data Source**: https://webrobots.io/kickstarter-datasets/

**NOTE 1**: Need to ensure that the variables that we incorporate into the model are not giving data leakage. For example, we would need to leave out the staff pick variable (staff are potentially picking things that they believe are going to succeed). 

**NOTE 2**: There is a data dictionary for the kickstarter dataset in the references folder. 

## INTRODUCTION
Kickstarter is a US based global crowd funding platform focused on bringing funding to creative projects.
Since the platform’s launch in 2009, the site has hosted over 159,000 successfully funded projects with over
15 million unique backers. Kickstarter uses an “all-or-nothing” funding system. This means that funds are
only dispersed for projects that meet the original funding goal set by the creator.

## PROJECT OBJECTIVE
Kickstarter earns 5% commission on projects that are successfully funded. Currently, less than 40% of
projects on the platform succeed. The objective is to predict which projects are likely to succeed so that these projects can be highlighted on the site either through 'staff picks' or 'featured product' lists. 

In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sys
import glob
import functools

src_dir = os.path.join(os.getcwd(), '..', '..', 'src')
sys.path.append(src_dir)

In [2]:
df_csv = pd.read_csv('../../data/02_intermediate/kick_deduped.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
len(df_csv)

331114

In [4]:
df_csv.columns

Index(['index', 'backers_count', 'blurb', 'category',
       'converted_pledged_amount', 'country', 'created_at', 'creator',
       'currency', 'currency_symbol', 'currency_trailing_code',
       'current_currency', 'deadline', 'disable_communication', 'friends',
       'fx_rate', 'goal', 'id', 'is_backing', 'is_starrable', 'is_starred',
       'last_update_published_at', 'launched_at', 'location', 'name',
       'permissions', 'pledged', 'slug', 'source_url', 'spotlight',
       'staff_pick', 'state', 'state_changed_at', 'static_usd_rate',
       'unread_messages_count', 'unseen_activity_count', 'urls', 'usd_pledged',
       'usd_type'],
      dtype='object')

In [6]:
df_csv.state.value_counts()

successful    139923
failed        121786
live           52690
canceled       15697
suspended       1018
Name: state, dtype: int64

**Dataset date span**

* 04/21/2009 @ 5:35pm (UTC) **_to_** 07/18/2019 @ 12:54am (UTC)

In [101]:
df_csv.tail()

Unnamed: 0,index,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,...,spotlight,staff_pick,state,state_changed_at,static_usd_rate,unread_messages_count,unseen_activity_count,urls,usd_pledged,usd_type
331109,4011177,24,WHY MULTIPLE PROJECTS? to give you the choice ...,"{""id"":17,""name"":""Theater"",""slug"":""theater"",""po...",6575.0,US,1240588900,"{""id"":1802123423,""name"":""Accidental Nostalgia""...",USD,$,...,True,True,successful,1244174425,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",6575.0,international
331110,7113608,55,Acclaimed cult pop entity My Teenage Stride is...,"{""urls"":{""web"":{""discover"":""http://www.kicksta...",,US,1240514851,"{""urls"":{""web"":{""user"":""https://www.kickstarte...",USD,$,...,True,True,successful,1266998407,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",2450.0,
331111,3218244,60,Using a framework that ensures for resiliency ...,"{""id"":13,""name"":""Journalism"",""slug"":""journalis...",3425.0,US,1240456019,"{""id"":1782188740,""name"":""Enthusiastic Grad Stu...",USD,$,...,True,False,successful,1263801609,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",3425.0,domestic
331112,6100244,114,"UPDATE: Shannon Powell, Walter Payton, Lucien ...","{""urls"":{""web"":{""discover"":""http://www.kicksta...",,US,1240366270,"{""urls"":{""web"":{""user"":""https://www.kickstarte...",USD,$,...,True,True,successful,1244185224,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",4100.6,
331113,4556333,33,"William Brittelle's ""Television Landscape"" is ...","{""id"":40,""name"":""Indie Rock"",""slug"":""music/ind...",2000.0,US,1240335335,"{""id"":1505954783,""name"":""william brittelle"",""s...",USD,$,...,True,False,successful,1253030408,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",2000.0,international


## COLUMN EXPLORATION

In [98]:
df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 331114 entries, 0 to 331113
Data columns (total 39 columns):
index                       331114 non-null int64
backers_count               331114 non-null int64
blurb                       331113 non-null object
category                    331114 non-null object
converted_pledged_amount    192238 non-null float64
country                     331114 non-null object
created_at                  331114 non-null int64
creator                     331114 non-null object
currency                    331114 non-null object
currency_symbol             331114 non-null object
currency_trailing_code      331114 non-null bool
current_currency            192238 non-null object
deadline                    331114 non-null int64
disable_communication       331114 non-null bool
friends                     2299 non-null object
fx_rate                     181535 non-null float64
goal                        331114 non-null float64
id                          3

In [99]:
df_csv.isnull().sum()

index                            0
backers_count                    0
blurb                            1
category                         0
converted_pledged_amount    138876
country                          0
created_at                       0
creator                          0
currency                         0
currency_symbol                  0
currency_trailing_code           0
current_currency            138876
deadline                         0
disable_communication            0
friends                     328815
fx_rate                     149579
goal                             0
id                               0
is_backing                  328815
is_starrable                127452
is_starred                  328815
last_update_published_at    331114
launched_at                      0
location                      1089
name                             0
permissions                 329000
pledged                          0
slug                             0
source_url          

### blurb

In [49]:
df_csv.blurb[99]

"Bubba Fontaine's 'Marshmallow People' on CD - a young man's dream of an alien landscape inhabited by marshmallow people."

### Category

In [50]:
df_csv.category[99]

'{"id":14,"name":"Music","slug":"music","position":11,"color":10878931,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/music"}}}'

In [93]:
print('Number of Unique Categories: ',len(df_csv.category.value_counts()))

Number of Unique Categories:  368


In [97]:
df_csv.groupby('category').sum()

Unnamed: 0_level_0,index,backers_count,converted_pledged_amount,created_at,currency_trailing_code,deadline,disable_communication,fx_rate,goal,id,last_update_published_at,launched_at,pledged,spotlight,staff_pick,state_changed_at,static_usd_rate,unread_messages_count,unseen_activity_count,usd_pledged
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
"{""id"":1,""name"":""Art"",""slug"":""art"",""position"":1,""color"":16760235,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/art""}}}",9823796210,219731,15244987.0,4801289478169,2595.0,4817811493702,0.0,3168.480572,4.954839e+07,3497201937890,0.0,4809204171371,2.375853e+07,2032.0,307.0,4814464736182,3163.020001,0.0,0.0,1.524496e+07
"{""id"":10,""name"":""Food"",""slug"":""food"",""position"":8,""color"":16725570,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/food""}}}",10369343241,494116,47379801.0,4120197149831,2614.0,4141827802918,0.0,2890.527224,6.472890e+07,3106879894364,0.0,4133351499054,4.881344e+07,2290.0,581.0,4139789286460,2915.177032,0.0,0.0,4.740718e+07
"{""id"":11,""name"":""Film & Video"",""slug"":""film & video"",""position"":7,""color"":16734574,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/film%20&%20video""}}}",3751375953,90340,8482877.0,1720005912511,946.0,1726208577018,0.0,1158.138777,2.999147e+07,1260799446090,0.0,1722566742026,9.515358e+06,704.0,121.0,1724646638281,1157.612857,0.0,0.0,8.482580e+06
"{""id"":12,""name"":""Games"",""slug"":""games"",""position"":9,""color"":51627,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/games""}}}",2405989591,140700,11719527.0,1026406224636,545.0,1032043930342,0.0,701.383806,2.929956e+07,759598747715,0.0,1029969946604,1.378443e+07,357.0,64.0,1030942509896,700.735690,0.0,0.0,1.172723e+07
"{""id"":13,""name"":""Journalism"",""slug"":""journalism"",""position"":10,""color"":1228010,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/journalism""}}}",4491257439,83212,6192502.0,1795606191250,1131.0,1802479493959,0.0,1289.047080,3.413488e+07,1417324952446,0.0,1798505834843,7.758693e+06,440.0,150.0,1802377483546,1313.077082,0.0,0.0,6.202813e+06
"{""id"":14,""name"":""Music"",""slug"":""music"",""position"":11,""color"":10878931,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/music""}}}",8333877369,226497,16596022.0,4042454324065,2301.0,4060535937295,0.0,2694.748906,5.420054e+07,2946598035071,0.0,4052171481512,2.163469e+07,1819.0,301.0,4057550569324,2690.422908,0.0,0.0,1.659659e+07
"{""id"":15,""name"":""Photography"",""slug"":""photography"",""position"":12,""color"":58341,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/photography""}}}",6305999546,159745,16449301.0,2271688219040,1372.0,2280992597311,0.0,1697.409232,1.072107e+07,1775984169494,0.0,2276131613484,1.633612e+07,1545.0,401.0,2280652127556,1732.251315,0.0,0.0,1.643361e+07
"{""id"":16,""name"":""Technology"",""slug"":""technology"",""position"":14,""color"":6526716,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/technology""}}}",5548613888,643165,88719276.0,2419556795783,1271.0,2432919345684,0.0,1697.489786,9.864733e+07,1743112371836,0.0,2427791878242,1.040747e+08,905.0,204.0,2430499906846,1601.559263,0.0,0.0,8.430900e+07
"{""id"":17,""name"":""Theater"",""slug"":""theater"",""position"":15,""color"":16743775,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/theater""}}}",9177547834,147245,11581888.0,3431219345481,2246.0,3443187150298,0.0,2564.031042,1.481124e+07,2692084664962,0.0,3435931143419,1.241269e+07,2438.0,332.0,3442905410259,2587.629959,0.0,0.0,1.158206e+07
"{""id"":18,""name"":""Publishing"",""slug"":""publishing"",""position"":13,""color"":14867664,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/publishing""}}}",4142277668,159829,10438816.0,1956105838896,1006.0,1964969799391,0.0,1299.471872,2.413839e+07,1354625187322,0.0,1961076497742,1.536574e+07,769.0,255.0,1963181632805,1296.421604,0.0,0.0,1.044115e+07


### converted pledge amount

In [51]:
df_csv.converted_pledged_amount[99]

86.0

### country

In [52]:
df_csv.country[99]

'US'

### created_at

In [100]:
df_csv.head()

Unnamed: 0,index,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,...,spotlight,staff_pick,state,state_changed_at,static_usd_rate,unread_messages_count,unseen_activity_count,urls,usd_pledged,usd_type
0,1323085,1,The story of a young man and the girl who brok...,"{""id"":40,""name"":""Indie Rock"",""slug"":""music/ind...",1.0,US,1563411275,"{""id"":744201718,""name"":""Josiah Gennell"",""slug""...",USD,$,...,False,False,live,1563414209,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",1.0,domestic
1,1401860,0,Giuliano Clothing is on a mission to reinvent ...,"{""id"":9,""name"":""Fashion"",""slug"":""fashion"",""pos...",0.0,CA,1563405193,"{""id"":424525464,""name"":""Giuliano Clothing"",""sl...",CAD,$,...,False,False,live,1563421807,0.766254,,,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
2,1326640,1,CincyPet Magazine is the official lifestyle pe...,"{""id"":49,""name"":""Periodicals"",""slug"":""publishi...",1.0,US,1563402428,"{""id"":135323763,""name"":""Susannah Maynard"",""slu...",USD,$,...,False,False,live,1563418513,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",1.0,domestic
3,1485776,4,I'll be creating an enamel pin of Eleven from ...,"{""id"":1,""name"":""Art"",""slug"":""art"",""position"":1...",76.0,US,1563401054,"{""id"":1127711744,""name"":""Joelle Nagy"",""slug"":""...",USD,$,...,False,False,live,1563403806,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",76.0,domestic
4,1504884,1,Creating packages for Girl Scout crafting acti...,"{""id"":26,""name"":""Crafts"",""slug"":""crafts"",""posi...",1.0,US,1563399312,"{""id"":1380518341,""name"":""Decca"",""slug"":""creati...",USD,$,...,False,False,live,1563402253,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",1.0,international


In [53]:
df_csv.created_at[99]

1562980076

### creator

In [54]:
df_csv.creator[99]

'{"id":1124061927,"name":"Bubba Fontaine","slug":"bubbafontainemusic","is_registered":null,"chosen_currency":null,"is_superbacker":null,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/025/792/502/f215519ec39e30e9942d9b18adc4e955_original.jpg?ixlib=rb-2.1.0&w=40&h=40&fit=crop&v=1562979736&auto=format&frame=1&q=92&s=fa7563e08df710c73f6d313355f8d993","small":"https://ksr-ugc.imgix.net/assets/025/792/502/f215519ec39e30e9942d9b18adc4e955_original.jpg?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1562979736&auto=format&frame=1&q=92&s=baa975fce069785ec797b82bb195927f","medium":"https://ksr-ugc.imgix.net/assets/025/792/502/f215519ec39e30e9942d9b18adc4e955_original.jpg?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1562979736&auto=format&frame=1&q=92&s=baa975fce069785ec797b82bb195927f"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/bubbafontainemusic"},"api":{"user":"https://api.kickstarter.com/v1/users/1124061927?signature=1563510675.c47ec8ff4bb44768bb9b1c64efaabe2d4bfbb835"}}}'

### currency

In [55]:
df_csv.currency[99]

'USD'

### currency_symbol

In [56]:
df_csv.currency_symbol[99]

'$'

### currency_trailing_code

In [57]:
df_csv.currency_trailing_code[99]

True

### current_currency

In [58]:
df_csv.current_currency[99]

'USD'

### deadline

In [59]:
df_csv.deadline[99]

1565577279

### disable_communication

In [60]:
df_csv.disable_communication[99]

False

### friends

In [61]:
df_csv.friends[99]

nan

### fx_rate

In [62]:
df_csv.fx_rate[99]

1.0

### goal

In [63]:
df_csv.goal[99]

500.0

### id

In [64]:
df_csv.id[99]

310725143

### is_backing

In [65]:
df_csv.is_backing[99]

nan

### is_starrable

In [66]:
df_csv.is_starrable[99]

True

### is_starred

In [67]:
df_csv.is_starred[99]

nan

### last_update_published_at

In [68]:
df_csv.last_update_published_at[99]

nan

### launched_at

In [69]:
df_csv.launched_at[99]

1562985279

### location

In [70]:
df_csv.location[99]

'{"id":2427032,"name":"Indianapolis","slug":"indianapolis-in","short_name":"Indianapolis, IN","displayable_name":"Indianapolis, IN","localized_name":"Indianapolis","country":"US","state":"IN","type":"Town","is_root":false,"urls":{"web":{"discover":"https://www.kickstarter.com/discover/places/indianapolis-in","location":"https://www.kickstarter.com/locations/indianapolis-in"},"api":{"nearby_projects":"https://api.kickstarter.com/v1/discover?signature=1563484916.c9919dbb98ac98dcf4260648e86f32f5d9db0e8b&woe_id=2427032"}}}'

### name

In [71]:
df_csv.name[99]

'Marshmallow People'

### permissions

In [42]:
df_csv.permissions[99]

nan

### pledged

In [72]:
df_csv.pledged[99]

86.0

### slug

In [73]:
df_csv.slug[99]

'marshmallow-people'

### source_url

In [74]:
df_csv.source_url[99]

'https://www.kickstarter.com/discover/categories/music'

### spotlight

In [75]:
df_csv.spotlight[99]

False

### staff_pick

In [76]:
df_csv.staff_pick[99]

False

### state

In [77]:
df_csv.state[99]

'live'

In [78]:
df_csv.state.value_counts()

successful    139923
failed        121786
live           52690
canceled       15697
suspended       1018
Name: state, dtype: int64

### state_changed_at

In [79]:
df_csv.state_changed_at[99]

1562985280

### static_usd_rate

In [81]:
df_csv.static_usd_rate[99]

1.0

### unread_messages_count

In [82]:
df_csv.unread_messages_count[99]

nan

### unseen_activity_count

In [83]:
df_csv.unseen_activity_count[99]

nan

### urls

In [84]:
df_csv.urls[99]

'{"web":{"project":"https://www.kickstarter.com/projects/bubbafontainemusic/marshmallow-people?ref=discovery_category_newest","rewards":"https://www.kickstarter.com/projects/bubbafontainemusic/marshmallow-people/rewards"}}'

### usd_pledged

In [85]:
df_csv.usd_pledged[99]

86.0

### usd_type

In [86]:
df_csv.usd_type[99]

'domestic'

In [87]:
df_csv.usd_type.value_counts()

domestic         109611
international     80475
Name: usd_type, dtype: int64

## CHECK OUT CAMPAIGN BREAKDOWN 

In [90]:
print('Number of Completed Campaigns: ',len(df_csv.loc[df_csv['state'] != 'live']))

Number of Completed Campaigns:  278424
