# Kickstarter Initial Data Exploration

* **Data Source**: https://webrobots.io/kickstarter-datasets/
* [Kickstarter Stats](https://www.kickstarter.com/help/stats)
* [Kickstarter Data Blog](https://kickstarter.engineering/tagged/data)

**NOTE 1**: Need to ensure that the variables that we incorporate into the model are not giving data leakage. For example, we would need to leave out the staff pick variable (staff are potentially picking things that they believe are going to succeed). 

**NOTE 2**: There is a data dictionary for the kickstarter dataset in the references folder. 

## INTRODUCTION
Kickstarter is a US based global crowd funding platform focused on bringing funding to creative projects.
Since the platform’s launch in 2009, the site has hosted over 159,000 successfully funded projects with over
15 million unique backers. Kickstarter uses an “all-or-nothing” funding system. This means that funds are
only dispersed for projects that meet the original funding goal set by the creator.

## PROJECT OBJECTIVE
Kickstarter earns 5% commission on projects that are successfully funded. Currently, less than 40% of
projects on the platform succeed. The objective is to predict which projects are likely to succeed so that these projects can be highlighted on the site either through 'staff picks' or 'featured product' lists. 

In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sys
import glob
import functools

src_dir = os.path.join(os.getcwd(), '..', '..', 'src')
sys.path.append(src_dir)

In [2]:
df_csv = pd.read_csv('../../data/02_intermediate/kick_deduped.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
df_csv.id.nunique()

326266

In [4]:
pd.set_option('display.max_columns', None)

In [5]:
len(df_csv)

332899

In [6]:
df_csv.columns

Index(['index', 'backers_count', 'blurb', 'category',
       'converted_pledged_amount', 'country', 'created_at', 'creator',
       'currency', 'currency_symbol', 'currency_trailing_code',
       'current_currency', 'deadline', 'disable_communication', 'friends',
       'fx_rate', 'goal', 'id', 'is_backing', 'is_starrable', 'is_starred',
       'last_update_published_at', 'launched_at', 'location', 'name',
       'permissions', 'pledged', 'slug', 'source_url', 'spotlight',
       'staff_pick', 'state', 'state_changed_at', 'static_usd_rate',
       'unread_messages_count', 'unseen_activity_count', 'urls', 'usd_pledged',
       'usd_type'],
      dtype='object')

In [7]:
df_csv.state.value_counts()

successful    147268
failed        129466
live           38382
canceled       16728
suspended       1055
Name: state, dtype: int64

**Dataset date span**

* 04/21/2009 @ 5:35pm (UTC) **_to_** 07/18/2019 @ 12:54am (UTC)

## COLUMN EXPLORATION

In [8]:
df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 332899 entries, 0 to 332898
Data columns (total 39 columns):
index                       332899 non-null int64
backers_count               332899 non-null int64
blurb                       332889 non-null object
category                    332899 non-null object
converted_pledged_amount    195183 non-null float64
country                     332899 non-null object
created_at                  332899 non-null int64
creator                     332899 non-null object
currency                    332899 non-null object
currency_symbol             332899 non-null object
currency_trailing_code      332899 non-null bool
current_currency            195183 non-null object
deadline                    332899 non-null int64
disable_communication       332899 non-null bool
friends                     1629 non-null object
fx_rate                     185035 non-null float64
goal                        332899 non-null float64
id                          3

In [9]:
df_csv.isnull().sum()

index                            0
backers_count                    0
blurb                           10
category                         0
converted_pledged_amount    137716
country                          0
created_at                       0
creator                          0
currency                         0
currency_symbol                  0
currency_trailing_code           0
current_currency            137716
deadline                         0
disable_communication            0
friends                     331270
fx_rate                     147864
goal                             0
id                               0
is_backing                  331270
is_starrable                126772
is_starred                  331270
last_update_published_at    332899
launched_at                      0
location                      1067
name                             1
permissions                 331401
pledged                          0
slug                             0
source_url          

### blurb

In [10]:
df_csv.blurb[99]

'A board book that introduces chemistry to toddlers and babies through use of comparison activities.'

### Category

In [11]:
df_csv.category[99]

'{"id":46,"name":"Children\'s Books","slug":"publishing/children\'s books","position":5,"parent_id":18,"color":14867664,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/publishing/children\'s%20books"}}}'

In [12]:
print('Number of Unique Categories: ',len(df_csv.category.value_counts()))

Number of Unique Categories:  368


In [13]:
df_csv.groupby('category').sum()

Unnamed: 0_level_0,index,backers_count,converted_pledged_amount,created_at,currency_trailing_code,deadline,disable_communication,fx_rate,goal,id,last_update_published_at,launched_at,pledged,spotlight,staff_pick,state_changed_at,static_usd_rate,unread_messages_count,unseen_activity_count,usd_pledged
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
"{""id"":1,""name"":""Art"",""slug"":""art"",""position"":1,""color"":16760235,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/art""}}}",10239628977,235320,16143174.0,4981043669202,2702.0,4997886764947,0.0,3284.727091,4.880073e+07,3604737222186,0.0,4989009601240,2.320235e+07,2359.0,364.0,4995083724363,3281.745545,0.0,0.0,1.614648e+07
"{""id"":10,""name"":""Food"",""slug"":""food"",""position"":8,""color"":16725570,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/food""}}}",10472786762,521985,50906578.0,4145785719233,2645.0,4167270576263,0.0,2902.696850,6.550208e+07,3126859368774,0.0,4158767742451,5.242832e+07,2342.0,599.0,4165328914419,2923.496007,0.0,0.0,5.092621e+07
"{""id"":11,""name"":""Film & Video"",""slug"":""film & video"",""position"":7,""color"":16734574,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/film%20&%20video""}}}",3720810538,92001,8826692.0,1714401438513,943.0,1720609035466,0.0,1154.184005,2.975086e+07,1247923642299,0.0,1716971961273,9.927728e+06,749.0,128.0,1719189830280,1154.523286,0.0,0.0,8.827748e+06
"{""id"":12,""name"":""Games"",""slug"":""games"",""position"":9,""color"":51627,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/games""}}}",2374681840,144166,12414722.0,1041395469760,551.0,1046949576314,0.0,711.302840,2.953965e+07,763618843122,0.0,1044847591716,1.456996e+07,405.0,69.0,1045953804632,709.954170,0.0,0.0,1.241442e+07
"{""id"":13,""name"":""Journalism"",""slug"":""journalism"",""position"":10,""color"":1228010,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/journalism""}}}",4561676115,83229,6135193.0,1788780160577,1123.0,1795582551932,0.0,1284.536709,5.418875e+07,1388504093726,0.0,1791653963735,7.575568e+06,437.0,136.0,1795540869534,1311.245953,0.0,0.0,6.153452e+06
"{""id"":14,""name"":""Music"",""slug"":""music"",""position"":11,""color"":10878931,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/music""}}}",8611535762,240493,17465043.0,4104570148076,2335.0,4123023583298,0.0,2738.602740,5.373037e+07,2943273975718,0.0,4114483392046,2.483216e+07,2062.0,309.0,4120649493377,2734.572242,0.0,0.0,1.747113e+07
"{""id"":15,""name"":""Photography"",""slug"":""photography"",""position"":12,""color"":58341,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/photography""}}}",6354727052,159332,16046829.0,2276151458923,1373.0,2285526587885,0.0,1696.388142,1.032199e+07,1761052626802,0.0,2280713410462,1.593430e+07,1563.0,402.0,2285235338943,1733.898515,0.0,0.0,1.603284e+07
"{""id"":16,""name"":""Technology"",""slug"":""technology"",""position"":14,""color"":6526716,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/technology""}}}",5707162465,627058,88563716.0,2429321822065,1275.0,2442625212155,0.0,1709.586003,9.683812e+07,1765355274651,0.0,2437466234304,1.050200e+08,947.0,199.0,2440316435462,1605.724950,0.0,0.0,8.447286e+07
"{""id"":17,""name"":""Theater"",""slug"":""theater"",""position"":15,""color"":16743775,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/theater""}}}",8861580528,141329,11278289.0,3311788063906,2160.0,3323380060007,0.0,2469.517396,1.433269e+07,2595476920009,0.0,3316424241917,1.211548e+07,2370.0,325.0,3323169397812,2495.190976,0.0,0.0,1.127845e+07
"{""id"":18,""name"":""Publishing"",""slug"":""publishing"",""position"":13,""color"":14867664,""urls"":{""web"":{""discover"":""http://www.kickstarter.com/discover/categories/publishing""}}}",3916705955,164383,10955116.0,1953979998686,1003.0,1962875975247,0.0,1297.969037,2.434361e+07,1370129702453,0.0,1958990900524,1.600592e+07,836.0,260.0,1961290339802,1294.807819,0.0,0.0,1.095671e+07


### converted pledge amount

In [14]:
df_csv.converted_pledged_amount[99]

418.0

### country

In [15]:
df_csv.country[99]

'US'

### created_at

In [16]:
df_csv.head()

Unnamed: 0,index,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,current_currency,deadline,disable_communication,friends,fx_rate,goal,id,is_backing,is_starrable,is_starred,last_update_published_at,launched_at,location,name,permissions,pledged,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,unread_messages_count,unseen_activity_count,urls,usd_pledged,usd_type
0,1351799,0,"I'm just going to say it, I'm not special. I'm...","{""id"":263,""name"":""Apparel"",""slug"":""fashion/app...",0.0,US,1563159576,"{""id"":1309738689,""name"":""Dima01"",""slug"":""dima0...",USD,$,True,USD,1566018288,False,,1.0,5000.0,1893102245,,True,,,1563426288,"{""id"":2514971,""name"":""Wasilla"",""slug"":""wasilla...",Shirt and hat,,0.0,shirt-and-hat,https://www.kickstarter.com/discover/categorie...,False,False,live,1563426288,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
1,1446990,568,for Tabletop Role Playing Games like Dungeons ...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",18969.0,US,1559509615,"{""id"":2117846298,""name"":""quEmpire Gaming"",""slu...",USD,$,True,USD,1563422100,False,,1.0,5000.0,1175125319,,False,,,1560651641,"{""id"":2423096,""name"":""Holland"",""slug"":""holland...",RPG Minimalist Creature Dice & Status / Condit...,,18969.0,rpg-minimalist-creature-dice-and-status-condit...,https://www.kickstarter.com/discover/categorie...,True,False,successful,1563422101,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",18969.0,domestic
2,1401860,0,Giuliano Clothing is on a mission to reinvent ...,"{""id"":9,""name"":""Fashion"",""slug"":""fashion"",""pos...",0.0,CA,1563405193,"{""id"":424525464,""name"":""Giuliano Clothing"",""sl...",CAD,$,True,USD,1566013807,False,,0.766388,5000.0,1290757180,,True,,,1563421807,"{""id"":4118,""name"":""Toronto"",""slug"":""toronto-on...",Giuliano Clothing: Modern Fashion,,0.0,giuliano-clothing-modern-fashion,https://www.kickstarter.com/discover/categorie...,False,False,live,1563421807,0.766254,,,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
3,1383926,80,We have a new album that we are ready to relea...,"{""id"":14,""name"":""Music"",""slug"":""music"",""positi...",3691.0,US,1561660600,"{""id"":467960938,""name"":""Drank The Gold"",""slug""...",USD,$,True,USD,1563420600,False,,1.0,3500.0,920424993,,False,,,1561781837,"{""id"":2489059,""name"":""Saratoga Springs"",""slug""...",Drank The Gold's new album: Sipped The Silver,,3691.0,drank-the-golds-new-album-sipped-the-silver,https://www.kickstarter.com/discover/categorie...,True,True,successful,1563420600,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",3691.0,domestic
4,1370224,0,The film follows 4 frustrated campaigns as the...,"{""id"":293,""name"":""Drama"",""slug"":""film & video/...",0.0,US,1563392717,"{""id"":346253657,""name"":""Anthony Stephen Hamilt...",USD,$,True,USD,1566012181,False,,1.0,2000.0,255952264,,True,,,1563420181,"{""id"":2457170,""name"":""Nashville"",""slug"":""nashv...",A Period Piece DVD Funding,,0.0,a-period-piece-dvd-funding,https://www.kickstarter.com/discover/categorie...,False,False,live,1563420182,1.0,,,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic


In [17]:
df_csv.created_at[99]

1558615823

### creator

In [18]:
df_csv.creator[99]

'{"id":380624288,"name":"Stephanie Ryan","slug":"chemistrytoddlers","is_registered":null,"chosen_currency":null,"is_superbacker":null,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/025/241/786/ae6eed822ffcbbec5a7bb041e5c3e519_original.PNG?ixlib=rb-2.1.0&w=40&h=40&fit=crop&v=1558621743&auto=format&frame=1&q=92&s=886adaced6e78d615944738b80508079","small":"https://ksr-ugc.imgix.net/assets/025/241/786/ae6eed822ffcbbec5a7bb041e5c3e519_original.PNG?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1558621743&auto=format&frame=1&q=92&s=6cac9555dc13b6e3af0dc1bc92066c22","medium":"https://ksr-ugc.imgix.net/assets/025/241/786/ae6eed822ffcbbec5a7bb041e5c3e519_original.PNG?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1558621743&auto=format&frame=1&q=92&s=6cac9555dc13b6e3af0dc1bc92066c22"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/chemistrytoddlers"},"api":{"user":"https://api.kickstarter.com/v1/users/380624288?signature=1563506858.5be57fe5391bba66e85912f1d5f3c30e638f4ca0"}}}'

### currency

In [19]:
df_csv.currency[99]

'USD'

### currency_symbol

In [20]:
df_csv.currency_symbol[99]

'$'

### currency_trailing_code

In [21]:
df_csv.currency_trailing_code[99]

True

### current_currency

In [22]:
df_csv.current_currency[99]

'USD'

### deadline

In [23]:
df_csv.deadline[99]

1565975666

### disable_communication

In [24]:
df_csv.disable_communication[99]

False

### friends

In [25]:
df_csv.friends[99]

nan

### fx_rate

In [26]:
df_csv.fx_rate[99]

1.0

### goal

In [27]:
df_csv.goal[99]

18000.0

### id

In [28]:
df_csv.id[99]

903958683

### is_backing

In [29]:
df_csv.is_backing[99]

nan

### is_starrable

In [30]:
df_csv.is_starrable[99]

True

### is_starred

In [31]:
df_csv.is_starred[99]

nan

### last_update_published_at

In [32]:
df_csv.last_update_published_at[99]

nan

### launched_at

In [33]:
df_csv.launched_at[99]

1563383666

### location

In [34]:
df_csv.location[99]

'{"id":2375129,"name":"Carmel","slug":"carmel-hamilton-in","short_name":"Carmel, IN","displayable_name":"Carmel, IN","localized_name":"Carmel","country":"US","state":"IN","type":"Town","is_root":false,"urls":{"web":{"discover":"https://www.kickstarter.com/discover/places/carmel-hamilton-in","location":"https://www.kickstarter.com/locations/carmel-hamilton-in"},"api":{"nearby_projects":"https://api.kickstarter.com/v1/discover?signature=1563484945.e0f76b46c91e78ef9228a708d230633a814b10eb&woe_id=2375129"}}}'

### name

In [35]:
df_csv.name[99]

"Let's Learn About Chemistry: A Chemistry Book for Toddlers"

### permissions

In [36]:
df_csv.permissions[99]

nan

### pledged

In [37]:
df_csv.pledged[99]

418.0

### slug

In [38]:
df_csv.slug[99]

'lets-learn-about-chemistry-a-chemistry-book-for-toddlers'

### source_url

In [39]:
df_csv.source_url[99]

"https://www.kickstarter.com/discover/categories/publishing/children's%20books"

### spotlight

In [40]:
df_csv.spotlight[99]

False

### staff_pick

In [41]:
df_csv.staff_pick[99]

False

### state

In [42]:
df_csv.state[99]

'live'

In [43]:
df_csv.state.value_counts()

successful    147268
failed        129466
live           38382
canceled       16728
suspended       1055
Name: state, dtype: int64

### state_changed_at

In [44]:
df_csv.state_changed_at[99]

1563383668

### static_usd_rate

In [45]:
df_csv.static_usd_rate[99]

1.0

### unread_messages_count

In [46]:
df_csv.unread_messages_count[99]

nan

### unseen_activity_count

In [47]:
df_csv.unseen_activity_count[99]

nan

### urls

In [48]:
df_csv.urls[99]

'{"web":{"project":"https://www.kickstarter.com/projects/chemistrytoddlers/lets-learn-about-chemistry-a-chemistry-book-for-toddlers?ref=discovery_category_newest","rewards":"https://www.kickstarter.com/projects/chemistrytoddlers/lets-learn-about-chemistry-a-chemistry-book-for-toddlers/rewards"}}'

### usd_pledged

In [49]:
df_csv.usd_pledged[99]

418.0

### usd_type

In [50]:
df_csv.usd_type[99]

'domestic'

In [51]:
df_csv.usd_type.value_counts()

domestic         112500
international     80890
Name: usd_type, dtype: int64

## CHECK OUT CAMPAIGN BREAKDOWN 

In [52]:
print('Number of Completed Campaigns: ',len(df_csv.loc[df_csv['state'] != 'live']))

Number of Completed Campaigns:  294517
