# 10 Years of Crowdfunding on Kickstarter - Data Wrangling

# Table of contents
1. [Introduction](#introduction)
2. [Data Wrangling](#datawrangling)
    1. [Assessment](#assessment)
    2. [Data Summary](#datasummary)
    3. [Issues Summary](#issuessummary)
    4. [Cleaning Data](#cleaningdata)
    5. [Store Master](#storemaster)
3. [Wrangling Summary](#wranglingsummary)

# Introduction <a name="introduction"></a>
This notebook is dedicated to the data wrangling in preparation to the data analysis of "10 Years of Crowdfunding on Kickstarter".

I will first gather the data needed to make the analysis possible. Then, I'm going to make myself familiar with the data to understand the information and how I can use it for analysis. This involves the assessment of the data for quality and tidiness issues. The 3rd step of the wrangling is clean to our data. In order to prepare our records for analysis, I am going to correct or remove any corrupt, inaccurate and unnecessary observations. Lastly, I will output csv files containing a clean version of the data. 

The Kickstarter data set was gathered in July 2019 from an automated web scraping service called Web Robots. Since 2016, every months, they have been publishing data on all ongoing and completed Kickstarter projects and released it to the public in the form of csv files on their website. 

https://webrobots.io/kickstarter-datasets/

The web service does not give concrete information on how the data was scraped from the website, nor does it provide documentation about each data point. To really comprehend the data and to rule out erroneous information, I will need to collate this data set with Kickstarter's project archive, which is accessible online without any restrictions on www.kickstarter.com.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import requests
import time
from csv import writer
import sys

# style settings
# uncomment the below to avoid collapsing of dataframes
# pd.set_option('display.max_rows', 2500)
# pd.set_option('display.max_columns', 100)
# pd.set_option('display.max_colwidth', -1)

# Data Wrangling <a name="datawrangling"></a>
Let's start by reading in all 55 files, that I had downloaded from Web Robots and combine the data into one data frame. 

In [2]:
# read in all files into one dataframe
file_name =  './data/Kickstarter_{}.csv'
kickstarter = pd.concat([pd.read_csv(file_name.format(i)) for i in range(57)])
kickstarter.reset_index(drop=True, inplace=True)
kickstarter.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,48,Citizen Carpentry is building community around...,"{""id"":356,""name"":""Woodworking"",""slug"":""crafts/...",5528,US,1479389359,"{""id"":2044486203,""name"":""Marcis Curtis"",""is_re...",USD,$,True,...,citizen-carpentry-community-workshop-and-tool-...,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1485878182,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",5528.0,domestic
1,0,El buscador sostenible para gente responsable,"{""id"":260,""name"":""Interactive Design"",""slug"":""...",0,ES,1469258234,"{""id"":1498616534,""name"":""Raül Gómez Freixa"",""i...",EUR,€,False,...,simbin,https://www.kickstarter.com/discover/categorie...,False,False,failed,1473524887,1.098901,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
2,0,Would love to show everyone the beauty Eastern...,"{""id"":277,""name"":""Nature"",""slug"":""photography/...",0,US,1427808788,"{""id"":2019826059,""name"":""Carmen"",""is_registere...",USD,$,True,...,photographing-places-i-love-mountains-of-easte...,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1429360155,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
3,44,A surrealist's guide to navigating the dark de...,"{""id"":42,""name"":""Pop"",""slug"":""music/pop"",""posi...",1782,US,1457626762,"{""id"":184955320,""name"":""T.S. Woodward"",""is_reg...",USD,$,True,...,how-to-breathe-underwater-in-the-black-box-deb...,https://www.kickstarter.com/discover/categorie...,True,False,successful,1467648766,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",1782.0,domestic
4,3,Continue to develop a user friendly one-person...,"{""id"":336,""name"":""Flight"",""slug"":""technology/f...",100,US,1396014888,"{""id"":760226815,""name"":""Chris Thomas"",""is_regi...",USD,$,True,...,backpackable-high-resolution-uav-habitat-mappi...,https://www.kickstarter.com/discover/categorie...,False,False,failed,1399697848,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",100.0,domestic


## Assessment <a name="assessment"></a>
First, let's make ourselves visually familiar with our data. Please find the summary of each variable interpretation at the end of the section. Subsequently, I will provide a summary of quality and tidiness issues.

In [3]:
# show data frame
kickstarter

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,48,Citizen Carpentry is building community around...,"{""id"":356,""name"":""Woodworking"",""slug"":""crafts/...",5528,US,1479389359,"{""id"":2044486203,""name"":""Marcis Curtis"",""is_re...",USD,$,True,...,citizen-carpentry-community-workshop-and-tool-...,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1485878182,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",5528.000000,domestic
1,0,El buscador sostenible para gente responsable,"{""id"":260,""name"":""Interactive Design"",""slug"":""...",0,ES,1469258234,"{""id"":1498616534,""name"":""Raül Gómez Freixa"",""i...",EUR,€,False,...,simbin,https://www.kickstarter.com/discover/categorie...,False,False,failed,1473524887,1.098901,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
2,0,Would love to show everyone the beauty Eastern...,"{""id"":277,""name"":""Nature"",""slug"":""photography/...",0,US,1427808788,"{""id"":2019826059,""name"":""Carmen"",""is_registere...",USD,$,True,...,photographing-places-i-love-mountains-of-easte...,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1429360155,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
3,44,A surrealist's guide to navigating the dark de...,"{""id"":42,""name"":""Pop"",""slug"":""music/pop"",""posi...",1782,US,1457626762,"{""id"":184955320,""name"":""T.S. Woodward"",""is_reg...",USD,$,True,...,how-to-breathe-underwater-in-the-black-box-deb...,https://www.kickstarter.com/discover/categorie...,True,False,successful,1467648766,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",1782.000000,domestic
4,3,Continue to develop a user friendly one-person...,"{""id"":336,""name"":""Flight"",""slug"":""technology/f...",100,US,1396014888,"{""id"":760226815,""name"":""Chris Thomas"",""is_regi...",USD,$,True,...,backpackable-high-resolution-uav-habitat-mappi...,https://www.kickstarter.com/discover/categorie...,False,False,failed,1399697848,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",100.000000,domestic
5,5,LOVE STORY IS ABOUT THE REAL LOVE THAT PEOPLE ...,"{""id"":44,""name"":""World Music"",""slug"":""music/wo...",265,US,1376248448,"{""id"":1430060595,""name"":""LeTroy"",""is_registere...",USD,$,True,...,love-story,https://www.kickstarter.com/discover/categorie...,False,False,failed,1379104056,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",265.000000,domestic
6,2,Constructed from the most technologically adva...,"{""id"":263,""name"":""Apparel"",""slug"":""fashion/app...",2,GB,1540592817,"{""id"":521314400,""name"":""chris bamber"",""is_regi...",GBP,£,False,...,mens-athleisure-innovative-fashion-sportswear-...,https://www.kickstarter.com/discover/categorie...,False,False,failed,1543262049,1.281307,"{""web"":{""project"":""https://www.kickstarter.com...",2.562614,domestic
7,9,"Trinaad, the sound of three Worlds which was p...","{""id"":44,""name"":""World Music"",""slug"":""music/wo...",570,FR,1491551387,"{""id"":2147136997,""name"":""JEAN DAVOISNE"",""is_re...",EUR,€,False,...,trinaad-the-sound-of-three-worlds-new-album,https://www.kickstarter.com/discover/categorie...,False,False,failed,1494225191,1.064353,"{""web"":{""project"":""https://www.kickstarter.com...",553.463513,domestic
8,0,Together we can bring a star back on the map o...,"{""id"":44,""name"":""World Music"",""slug"":""music/wo...",0,US,1374507930,"{""id"":533614462,""name"":""Basil Burdalas"",""is_re...",USD,$,True,...,detroit-revue-concert,https://www.kickstarter.com/discover/categorie...,False,False,failed,1378828200,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
9,165,Songwriter Joel B. New is creating his first-e...,"{""id"":42,""name"":""Pop"",""slug"":""music/pop"",""posi...",7001,US,1447342675,"{""id"":1319958753,""name"":""Joel B. New"",""is_regi...",USD,$,True,...,lets-commit-to-a-murder-she-wrote-album-of-pop...,https://www.kickstarter.com/discover/categorie...,True,False,successful,1459948483,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",7001.000000,domestic


My first impression on the data is, that we got a lot of useful information. However, I also notice that I am unable to comprehend the information ad-hoc. There are quite some JSON strings that were insufficiently extracted from the website. 
To gain a better understanding, I will need to apply programmatic assessment techniques to this data set. 

In [4]:
kickstarter.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 212378 entries, 0 to 212377
Data columns (total 37 columns):
backers_count               212378 non-null int64
blurb                       212370 non-null object
category                    212378 non-null object
converted_pledged_amount    212378 non-null int64
country                     212378 non-null object
created_at                  212378 non-null int64
creator                     212378 non-null object
currency                    212378 non-null object
currency_symbol             212378 non-null object
currency_trailing_code      212378 non-null bool
current_currency            212378 non-null object
deadline                    212378 non-null int64
disable_communication       212378 non-null bool
friends                     204 non-null object
fx_rate                     212378 non-null float64
goal                        212378 non-null float64
id                          212378 non-null int64
is_backing                  204 

In [5]:
# find duplicated projects
kickstarter[kickstarter.id.duplicated()].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27240 entries, 1609 to 212376
Data columns (total 37 columns):
backers_count               27240 non-null int64
blurb                       27240 non-null object
category                    27240 non-null object
converted_pledged_amount    27240 non-null int64
country                     27240 non-null object
created_at                  27240 non-null int64
creator                     27240 non-null object
currency                    27240 non-null object
currency_symbol             27240 non-null object
currency_trailing_code      27240 non-null bool
current_currency            27240 non-null object
deadline                    27240 non-null int64
disable_communication       27240 non-null bool
friends                     102 non-null object
fx_rate                     27240 non-null float64
goal                        27240 non-null float64
id                          27240 non-null int64
is_backing                  102 non-null objec

In [6]:
# assess composition of catgory 
kickstarter[kickstarter.index == 192044]['category'][192044]

'{"id":51,"name":"Software","slug":"technology/software","position":11,"parent_id":16,"color":6526716,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/technology/software"}}}'

In [7]:
# assess category example with missing subcategory
kickstarter[kickstarter.index == 210071]['category'][210071]

'{"id":1,"name":"Art","slug":"art","position":1,"color":16760235,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/art"}}}'

In [8]:
# assess composition of blurb
kickstarter[kickstarter.index == 22954]['blurb'][22954]

"Separating friends from enemies is hard enough, but when the city is at stake, even speedsters don't always have the time they need!"

In [9]:
# show projects with missing blurb, they were almost all canceled
kickstarter[kickstarter.blurb.isna()]

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
39268,0,,"{""id"":20,""name"":""Conceptual Art"",""slug"":""art/c...",0,US,1424103554,"{""id"":1316410093,""name"":""Rumi Forum"",""slug"":""i...",USD,$,True,...,international-festival-of-language-and-culture,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1424449267,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
68694,39,,"{""id"":269,""name"":""Ready-to-wear"",""slug"":""fashi...",8675,DE,1504364375,"{""id"":1303591875,""name"":""Annabelle Deisler"",""i...",EUR,€,False,...,serious-business-collection,https://www.kickstarter.com/discover/categorie...,False,False,failed,1507625190,1.2037,"{""web"":{""project"":""https://www.kickstarter.com...",8873.674115,domestic
77878,0,,"{""id"":20,""name"":""Conceptual Art"",""slug"":""art/c...",0,US,1331063276,"{""id"":79887943,""name"":""Brian Mercer"",""is_regis...",USD,$,True,...,the-lineup-0,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1331581327,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
91956,242,,"{""id"":339,""name"":""Sound"",""slug"":""technology/so...",54599,GB,1435830664,"{""id"":161070731,""name"":""ACWorldwide"",""slug"":""a...",GBP,£,False,...,star-wars-bluetooth-speakers,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1443825021,1.516337,"{""web"":{""project"":""https://www.kickstarter.com...",54676.079907,domestic
104928,0,,"{""id"":286,""name"":""Spaces"",""slug"":""theater/spac...",0,US,1449537429,"{""id"":376626888,""name"":""Amanda Donnadio (delet...",USD,$,True,...,long-island-school-auditorium,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1449571384,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
137586,2,,"{""id"":351,""name"":""Printing"",""slug"":""crafts/pri...",20,US,1406991938,"{""id"":2029667279,""name"":""Danger Grills"",""slug""...",USD,$,True,...,online-sticker-book-vending-machine,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1408333920,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",20.0,domestic
139968,0,,"{""id"":21,""name"":""Digital Art"",""slug"":""art/digi...",0,US,1509679461,"{""id"":1454907110,""name"":""moe"",""is_registered"":...",USD,$,True,...,charivari,https://www.kickstarter.com/discover/categorie...,False,False,failed,1515800048,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
160790,0,,"{""id"":311,""name"":""Food Trucks"",""slug"":""food/fo...",0,US,1473363868,"{""id"":874463436,""name"":""LeMae Fitzwater"",""is_r...",USD,$,True,...,foragers-cuisine-food-truck,https://www.kickstarter.com/discover/categorie...,False,False,canceled,1473968146,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic


In [10]:
# show values of country values 
kickstarter.country.value_counts()

US    149350
GB     23816
CA      9960
AU      5002
DE      3628
FR      2862
MX      2524
IT      2472
ES      2120
NL      1838
SE      1465
HK      1192
NZ       947
DK       940
SG       753
CH       709
IE       672
BE       602
NO       516
AT       509
JP       432
LU        69
Name: country, dtype: int64

In [11]:
# assess composition of location
kickstarter[['country', 'location']]

Unnamed: 0,country,location
0,US,"{""id"":2486982,""name"":""St. Louis"",""slug"":""st-lo..."
1,ES,"{""id"":753692,""name"":""Barcelona"",""slug"":""barcel..."
2,US,"{""id"":2396395,""name"":""Eastern"",""slug"":""eastern..."
3,US,"{""id"":2356940,""name"":""Athens"",""slug"":""athens-g..."
4,US,"{""id"":2441116,""name"":""Logan"",""slug"":""logan-ut""..."
5,US,"{""id"":2503713,""name"":""Tallahassee"",""slug"":""tal..."
6,GB,"{""id"":32562,""name"":""Preston"",""slug"":""preston-l..."
7,FR,"{""id"":29332634,""name"":""France"",""slug"":""france-..."
8,US,"{""id"":2391585,""name"":""Detroit"",""slug"":""detroit..."
9,US,"{""id"":2459115,""name"":""New York"",""slug"":""new-yo..."


In [12]:
# assess composition of creator
kickstarter[kickstarter.index == 100156]['creator'][100156]

'{"id":171726442,"name":"Hetty ten Holt","slug":"gastvrijorganiseren","is_registered":null,"chosen_currency":null,"is_superbacker":null,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/008/774/231/c30a12b2feab3bf6fe1a0aca1e6955f2_original.JPG?ixlib=rb-2.1.0&w=40&h=40&fit=crop&v=1461543211&auto=format&frame=1&q=92&s=5e941d4d87d09175b56b82f2f606aa62","small":"https://ksr-ugc.imgix.net/assets/008/774/231/c30a12b2feab3bf6fe1a0aca1e6955f2_original.JPG?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1461543211&auto=format&frame=1&q=92&s=19f7439d9347382ee9726516d01f6a46","medium":"https://ksr-ugc.imgix.net/assets/008/774/231/c30a12b2feab3bf6fe1a0aca1e6955f2_original.JPG?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1461543211&auto=format&frame=1&q=92&s=19f7439d9347382ee9726516d01f6a46"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/gastvrijorganiseren"},"api":{"user":"https://api.kickstarter.com/v1/users/171726442?signature=1563510175.9af7d0de52614820829a7c7a0aa79bfcf15280f6"}}}'

In [13]:
# assess currency
kickstarter.currency.value_counts()

USD    149350
GBP     23816
EUR     14772
CAD      9960
AUD      5002
MXN      2524
SEK      1465
HKD      1192
NZD       947
DKK       940
SGD       753
CHF       709
NOK       516
JPY       432
Name: currency, dtype: int64

In [14]:
# assess currency_symbol
kickstarter.currency_symbol.value_counts()

$      169728
£       23816
€       14772
kr       2921
Fr        709
¥         432
Name: currency_symbol, dtype: int64

In [15]:
# assess current currency
kickstarter.current_currency.value_counts()

USD    212258
CAD        48
NZD        36
SGD        12
HKD        12
EUR        12
Name: current_currency, dtype: int64

In [16]:
# assess fx_rate
kickstarter.fx_rate.value_counts()

1.000000    149279
1.243823     16308
1.123945     10241
1.241500      7501
0.766388      6677
1.121383      4515
0.703082      3453
0.764765      3274
0.052443      1692
0.700985      1545
0.106855      1027
0.052362       832
0.127989       786
0.150507       661
0.674044       621
0.735756       538
1.014894       502
0.106453       436
0.127997       402
0.116524       361
0.671022       326
0.009283       295
0.150179       278
0.735109       215
1.012785       206
0.116516       155
0.009245       137
1.483582        24
1.304822        21
1.360343        10
1.307592         7
7.812712         7
0.891756         7
1.466549         6
0.681983         3
1.137000         3
1.667465         3
1.466311         3
0.917397         3
1.845313         3
8.761042         3
1.688866         2
1.622967         2
7.912600         1
0.167003         1
0.916602         1
0.139427         1
0.114142         1
0.158529         1
0.223290         1
0.189882         1
Name: fx_rate, dtype: int64

In [17]:
# assess static usd rate
kickstarter.static_usd_rate.value_counts()

1.000000    149351
1.133748        76
1.274202        69
1.303101        61
1.269503        56
1.115888        55
1.273699        54
1.123154        54
1.123469        53
1.122209        52
1.137915        52
1.252199        50
1.303494        49
1.256915        47
1.257197        46
1.138650        46
1.122151        45
1.269856        45
1.330237        45
1.258101        44
1.123053        44
1.307198        43
1.714466        43
1.252194        43
1.378198        43
1.312465        43
1.158051        43
1.297308        43
1.128150        43
1.271399        42
             ...  
0.127771         1
0.731984         1
0.167877         1
0.108585         1
0.680825         1
0.690387         1
1.597449         1
0.009006         1
0.167422         1
0.151681         1
1.035051         1
1.255792         1
0.733811         1
0.008851         1
0.113263         1
0.052516         1
0.144165         1
1.019960         1
0.944988         1
1.011127         1
1.036554         1
1.586662    

In [18]:
# compare currency values
kickstarter[['country','currency', 'currency_symbol', 'currency_trailing_code', 'goal', 'usd_pledged', 'usd_type', 'converted_pledged_amount']]

Unnamed: 0,country,currency,currency_symbol,currency_trailing_code,goal,usd_pledged,usd_type,converted_pledged_amount
0,US,USD,$,True,14000.0,5528.000000,domestic,5528
1,ES,EUR,€,False,20000.0,0.000000,domestic,0
2,US,USD,$,True,5000.0,0.000000,domestic,0
3,US,USD,$,True,1500.0,1782.000000,domestic,1782
4,US,USD,$,True,10000.0,100.000000,domestic,100
5,US,USD,$,True,2500.0,265.000000,domestic,265
6,GB,GBP,£,False,8000.0,2.562614,domestic,2
7,FR,EUR,€,False,6500.0,553.463513,domestic,570
8,US,USD,$,True,35500.0,0.000000,domestic,0
9,US,USD,$,True,6500.0,7001.000000,domestic,7001


In [19]:
# assess disable communication
print(kickstarter.disable_communication.value_counts())
kickstarter[kickstarter.disable_communication == True]

False    211743
True        635
Name: disable_communication, dtype: int64


Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
14,0,Help fund my life long dream of becoming a Bus...,"{""id"":336,""name"":""Flight"",""slug"":""technology/f...",0,US,1421357956,"{""id"":291975465,""name"":""Rob Keith"",""is_registe...",USD,$,True,...,bush-pilot,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1422045322,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
126,1,I want to get my life back on track and and I ...,"{""id"":361,""name"":""Web"",""slug"":""journalism/web""...",5,US,1425286575,"{""id"":796344264,""name"":""Anthony Roberts"",""is_r...",USD,$,True,...,my-life-and-how-i-want-to-better-it,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1429287330,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",5.000000,domestic
167,134,Imagine if you could physically step into your...,"{""id"":271,""name"":""Live Games"",""slug"":""games/li...",54418,US,1425250261,"{""id"":1719438806,""name"":""Harbinger Entertainme...",USD,$,True,...,camp-sidereus,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1430424322,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",54418.000000,domestic
618,2,Hi! I am currently a Student Teacher in the st...,"{""id"":323,""name"":""Academic"",""slug"":""publishing...",42,US,1417904774,"{""id"":164067136,""name"":""Bryan Norkus"",""is_regi...",USD,$,True,...,student-teacher-needs-a-vehicle-to-create-bett...,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1418234428,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",42.000000,domestic
1521,0,-,"{""id"":279,""name"":""Places"",""slug"":""photography/...",0,NL,1447361968,"{""id"":942196753,""name"":""Robin"",""is_registered""...",EUR,€,False,...,not-so-boring-photo-album,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1448312046,1.079139,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
1705,0,I am try to see the world and take pictures of...,"{""id"":279,""name"":""Places"",""slug"":""photography/...",0,US,1441512785,"{""id"":479966649,""name"":""brayton stavis"",""is_re...",USD,$,True,...,travel-to-london-to-see-the-architecture,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1441982447,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
1917,2,"A BBQ in Salford, on 6th May, from 2pm in my b...","{""id"":304,""name"":""Bacon"",""slug"":""food/bacon"",""...",7,GB,1493657480,"{""id"":2118328009,""name"":""Claire Wicher"",""is_re...",GBP,£,False,...,bbq-in-salford,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1493752838,1.294951,"{""web"":{""project"":""https://www.kickstarter.com...",7.769706,domestic
1974,1,I want to have a BBQ at my friends house. He t...,"{""id"":304,""name"":""Bacon"",""slug"":""food/bacon"",""...",1,US,1432689363,"{""id"":242023762,""name"":""Ian"",""is_registered"":n...",USD,$,True,...,i-want-to-have-a-bbq-at-my-friends-house,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1433174652,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",1.000000,domestic
2120,4,"I tried selling my boat after moving, no luck....","{""id"":271,""name"":""Live Games"",""slug"":""games/li...",4,US,1436046901,"{""id"":291011486,""name"":""Shiftyjohnson"",""is_reg...",USD,$,True,...,watch-me-axe-my-boat-or-watch-me-make-someones...,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1436814613,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",4.000000,domestic
2141,1,"The farm house we rent is being sold July 1st,...","{""id"":271,""name"":""Live Games"",""slug"":""games/li...",100,US,1431444093,"{""id"":1814127414,""name"":""Kayla Simmons"",""is_re...",USD,$,True,...,farm-carnival,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1431564375,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",100.000000,domestic


In [20]:
# asses friends
print(kickstarter.friends.value_counts())

[]    204
Name: friends, dtype: int64


In [21]:
# assess duplicated projects with duplicated project ids
kickstarter[kickstarter.id.duplicated()]

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
1609,1682,PreShow allows you to see movies - in theaters...,"{""id"":298,""name"":""Movie Theaters"",""slug"":""film...",56721,US,1550157390,"{""id"":454129649,""name"":""Stacy Spikes"",""is_regi...",USD,$,True,...,preshow-attend-first-run-movies-in-theaters-free,https://www.kickstarter.com/discover/categorie...,True,True,successful,1556369556,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",56721.000000,domestic
2018,91,Our idea is to provide a kit so you can make m...,"{""id"":307,""name"":""Drinks"",""slug"":""food/drinks""...",4801,GB,1498487085,"{""id"":1514980463,""name"":""Karl Holmstrom"",""is_r...",GBP,£,False,...,mead-in-ireland-mead-making-kit,https://www.kickstarter.com/discover/categorie...,True,False,successful,1501603152,1.302750,"{""web"":{""project"":""https://www.kickstarter.com...",4733.281466,domestic
2800,356,Never forget to check the weather again. This ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",50887,AU,1542153523,"{""id"":1806793473,""name"":""Oli"",""is_registered"":...",AUD,$,True,...,a-poster-that-knows-the-weather-first-updating...,https://www.kickstarter.com/discover/categorie...,True,True,successful,1551724378,0.723941,"{""web"":{""project"":""https://www.kickstarter.com...",51937.932259,domestic
3014,58,Hjälp Alastor kickstarta Klassikerserien!,"{""id"":47,""name"":""Fiction"",""slug"":""publishing/f...",5554,SE,1529005929,"{""id"":1836356407,""name"":""Alastor Press"",""is_re...",SEK,kr,True,...,alastor-klassiker,https://www.kickstarter.com/discover/categorie...,True,False,successful,1544004579,0.110335,"{""web"":{""project"":""https://www.kickstarter.com...",5536.693154,domestic
3168,42,A brand new concept album from KeyStone A Capp...,"{""id"":42,""name"":""Pop"",""slug"":""music/pop"",""posi...",5056,US,1542246428,"{""id"":2041355785,""name"":""KeyStone A Cappella"",...",USD,$,True,...,keystone-a-cappellas-new-studio-album,https://www.kickstarter.com/discover/categorie...,True,False,successful,1547441941,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",5056.000000,domestic
3503,111,Documentary book about the lives of disabled p...,"{""id"":280,""name"":""Photobooks"",""slug"":""photogra...",8267,GB,1464115638,"{""id"":617736105,""name"":""Jadwiga Brontē"",""is_re...",GBP,£,False,...,invisible-people-of-belarus,https://www.kickstarter.com/discover/categorie...,True,False,successful,1468593246,1.418782,"{""web"":{""project"":""https://www.kickstarter.com...",8720.216198,domestic
3939,190,A local multiplayer video game where humans fl...,"{""id"":35,""name"":""Video Games"",""slug"":""games/vi...",4516,CA,1537813488,"{""id"":923731342,""name"":""Totema Studio"",""slug"":...",CAD,$,True,...,zombiotik,https://www.kickstarter.com/discover/categorie...,True,True,successful,1555721630,0.750328,"{""web"":{""project"":""https://www.kickstarter.com...",4530.699689,domestic
4394,2,Psycho Skull Music Documentary World Tour (UK),"{""id"":43,""name"":""Rock"",""slug"":""music/rock"",""po...",33,HK,1556809404,"{""id"":1916387565,""name"":""Zamu (OXR)"",""slug"":""z...",HKD,$,True,...,psycho-skull-road-to-sin-city-world-tour-uk,https://www.kickstarter.com/discover/categorie...,False,True,live,1559903503,0.127555,"{""web"":{""project"":""https://www.kickstarter.com...",33.164316,domestic
4969,11,New lifstyle apparel brand startup. For anyone...,"{""id"":263,""name"":""Apparel"",""slug"":""fashion/app...",304,US,1530531796,"{""id"":613065579,""name"":""Nick Mominee"",""slug"":""...",USD,$,True,...,space-force-brand-apparel,https://www.kickstarter.com/discover/categorie...,True,False,successful,1534608246,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",304.000000,domestic
5976,46,Andys Bread would love a cargo bike to deliver...,"{""id"":314,""name"":""Spaces"",""slug"":""food/spaces""...",2638,GB,1487709731,"{""id"":2017610037,""name"":""Andy Wright"",""is_regi...",GBP,£,False,...,andys-bread-cargo-bike-for-deliveries-around-l...,https://www.kickstarter.com/discover/categorie...,True,False,successful,1493117940,1.256697,"{""web"":{""project"":""https://www.kickstarter.com...",2595.078520,domestic


In [22]:
# assess is_backing
kickstarter.is_backing.value_counts()

False    203
True       1
Name: is_backing, dtype: int64

In [23]:
kickstarter[kickstarter.is_backing == False]

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
204,6,A strategy game with asymmetry of information ...,"{""id"":273,""name"":""Playing Cards"",""slug"":""games...",348,FR,1554459632,"{""id"":2044310129,""name"":""BrexitTheCardGame"",""i...",EUR,€,False,...,brexit-the-card-game,https://www.kickstarter.com/discover/categorie...,False,False,live,1563391539,1.125983,"{""api"":{""star"":""https://api.kickstarter.com/v1...",3.501807e+02,domestic
509,11,Charming Vault STL printable files for D&D ins...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",649,IT,1562919931,"{""id"":1061689088,""name"":""Heresy Lab"",""slug"":""h...",EUR,€,False,...,3d-printable-charming-adventurers-chibi-style-...,https://www.kickstarter.com/discover/categorie...,False,False,live,1563322760,1.127233,"{""api"":{""star"":""https://api.kickstarter.com/v1...",6.526679e+02,domestic
528,388,"An epic game of politics, ethics, and strategy...","{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",50415,US,1548053451,"{""id"":486051251,""name"":""Zain Memon"",""slug"":""za...",USD,$,True,...,shasn-the-political-strategy-board-game-break-ks,https://www.kickstarter.com/discover/categorie...,False,True,live,1563285672,1.000000,"{""api"":{""star"":""https://api.kickstarter.com/v1...",3.398225e+04,
3105,465,"A stainless steel, PVD coated version of the M...","{""id"":28,""name"":""Product Design"",""slug"":""desig...",56826,US,1555356331,"{""id"":1194862319,""name"":""Dan Provost & Tom Ger...",USD,$,True,...,mark-one-apollo-11-limited-edition-space-pen,https://www.kickstarter.com/discover/categorie...,False,False,live,1563283150,1.000000,"{""api"":{""star"":""https://api.kickstarter.com/v1...",5.682600e+04,domestic
3386,78,8 INDICATORS - multifunctional super slim auto...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",39488,HK,1562326538,"{""id"":581662481,""name"":""BEHRENS ORIGINAL"",""slu...",HKD,$,True,...,multifunctional-super-slim-automatic-mechanica...,https://www.kickstarter.com/discover/categorie...,False,False,live,1562897261,0.127936,"{""api"":{""star"":""https://api.kickstarter.com/v1...",3.947198e+04,domestic
8432,1,A product designed to protect children in the ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",1,US,1493072535,"{""id"":997102866,""name"":""Knarf - SafetySpout"",""...",USD,$,True,...,safetyspout,https://www.kickstarter.com/discover/categorie...,False,False,live,1562932334,1.000000,"{""api"":{""star"":""https://api.kickstarter.com/v1...",1.000000e+00,domestic
8694,32,An 80's Themed Retro Escape Room for VR (HTC V...,"{""id"":35,""name"":""Video Games"",""slug"":""games/vi...",996,US,1559774372,"{""id"":196982857,""name"":""LunaBeat"",""slug"":""luna...",USD,$,True,...,paranormal-detective-escape-from-the-80s,https://www.kickstarter.com/discover/categorie...,False,False,live,1563287486,1.000000,"{""api"":{""star"":""https://api.kickstarter.com/v1...",6.719900e+02,
9002,99,"Canine Kleptomaniacs is a simple, silly, addic...","{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",3373,GB,1530252802,"{""id"":833075113,""name"":""Golden Ginty Games Ltd...",GBP,£,False,...,canine-kleptomaniacs,https://www.kickstarter.com/discover/categorie...,False,False,live,1563299521,1.256915,"{""api"":{""star"":""https://api.kickstarter.com/v1...",3.415037e+03,domestic
9234,2,Published under d20 OGL (3.x) and Pathfinder c...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",11,US,1560568384,"{""id"":1030442627,""name"":""Kurt Stoffer"",""slug"":...",USD,$,True,...,d20-random-extraplanar-encounter-generator-exc...,https://www.kickstarter.com/discover/categorie...,False,False,live,1563351894,1.000000,"{""api"":{""star"":""https://api.kickstarter.com/v1...",1.100000e+01,domestic
10678,4,Hide to survive ...,"{""id"":35,""name"":""Video Games"",""slug"":""games/vi...",12,MX,1557035923,"{""id"":1717691242,""name"":""KalaOh Games"",""slug"":...",MXN,$,True,...,find-me-horror-game,https://www.kickstarter.com/discover/categorie...,False,False,live,1563311428,0.052692,"{""api"":{""star"":""https://api.kickstarter.com/v1...",1.284626e+01,domestic


In [24]:
# asses is starrable
kickstarter.is_starrable.value_counts()

False    206210
True       6168
Name: is_starrable, dtype: int64

In [25]:
# compare is starrable, is_backing and is_starred
kickstarter[(kickstarter.is_starred == False) | (kickstarter.is_starred == True)][['is_backing', 'is_starrable','is_starred']]

Unnamed: 0,is_backing,is_starrable,is_starred
204,False,True,False
509,False,True,False
528,False,True,False
3105,False,True,False
3386,False,True,False
8432,False,True,False
8694,False,True,False
9002,False,True,False
9234,False,True,False
10678,False,True,False


In [26]:
# assess permission
kickstarter.permissions.value_counts()

[]    204
Name: permissions, dtype: int64

In [27]:
# compare ermission, is_backing and is_starrable
kickstarter[kickstarter.permissions == "[]"][['friends', 'permissions', 'is_backing', 'is_starrable']]

Unnamed: 0,friends,permissions,is_backing,is_starrable
204,[],[],False,True
509,[],[],False,True
528,[],[],False,True
3105,[],[],False,True
3386,[],[],False,True
8432,[],[],False,True
8694,[],[],False,True
9002,[],[],False,True
9234,[],[],False,True
10678,[],[],False,True


In [28]:
# assess photo
kickstarter[kickstarter.index == 1435]['photo'][1435]

'{"key":"assets/011/496/795/1396b180874abd417f3dc9c987b15d79_original.jpg","full":"https://ksr-ugc.imgix.net/assets/011/496/795/1396b180874abd417f3dc9c987b15d79_original.jpg?ixlib=rb-2.1.0&crop=faces&w=560&h=315&fit=crop&v=1463683598&auto=format&frame=1&q=92&s=6c01cb3bb9118a90d70fa2e8e5d857ab","ed":"https://ksr-ugc.imgix.net/assets/011/496/795/1396b180874abd417f3dc9c987b15d79_original.jpg?ixlib=rb-2.1.0&crop=faces&w=352&h=198&fit=crop&v=1463683598&auto=format&frame=1&q=92&s=4cd8e1b7bcab9d758c53a0df14e2ebe5","med":"https://ksr-ugc.imgix.net/assets/011/496/795/1396b180874abd417f3dc9c987b15d79_original.jpg?ixlib=rb-2.1.0&crop=faces&w=272&h=153&fit=crop&v=1463683598&auto=format&frame=1&q=92&s=26c8633ec34b4766b939941d5eb061d2","little":"https://ksr-ugc.imgix.net/assets/011/496/795/1396b180874abd417f3dc9c987b15d79_original.jpg?ixlib=rb-2.1.0&crop=faces&w=208&h=117&fit=crop&v=1463683598&auto=format&frame=1&q=92&s=ff8ecfb7859bf1591df1bb91b555fb39","small":"https://ksr-ugc.imgix.net/assets/011/

In [29]:
# assess profile
kickstarter[kickstarter.index == 1435]['profile'][1435]

'{"id":508103,"project_id":508103,"state":"inactive","state_changed_at":1425915827,"name":null,"blurb":null,"background_color":null,"text_color":null,"link_background_color":null,"link_text_color":null,"link_text":null,"link_url":null,"show_feature_image":false,"background_image_opacity":0.8,"should_show_feature_image_section":true,"feature_image_attributes":{"image_urls":{"default":"https://ksr-ugc.imgix.net/assets/011/496/795/1396b180874abd417f3dc9c987b15d79_original.jpg?ixlib=rb-2.1.0&crop=faces&w=1552&h=873&fit=crop&v=1463683598&auto=format&frame=1&q=92&s=931303f2da3baa6031ddf097a444859b","baseball_card":"https://ksr-ugc.imgix.net/assets/011/496/795/1396b180874abd417f3dc9c987b15d79_original.jpg?ixlib=rb-2.1.0&crop=faces&w=560&h=315&fit=crop&v=1463683598&auto=format&frame=1&q=92&s=6c01cb3bb9118a90d70fa2e8e5d857ab"}}}'

In [30]:
# assess spotlight
kickstarter.spotlight.value_counts()

True     120950
False     91428
Name: spotlight, dtype: int64

In [31]:
# assess goals
kickstarter.goal.sort_values()

192795    1.000000e-02
172742    1.000000e+00
144883    1.000000e+00
202490    1.000000e+00
174668    1.000000e+00
126453    1.000000e+00
89168     1.000000e+00
193039    1.000000e+00
171527    1.000000e+00
162031    1.000000e+00
56783     1.000000e+00
126002    1.000000e+00
150882    1.000000e+00
60902     1.000000e+00
140890    1.000000e+00
212043    1.000000e+00
49478     1.000000e+00
164869    1.000000e+00
88900     1.000000e+00
188956    1.000000e+00
1583      1.000000e+00
191230    1.000000e+00
168363    1.000000e+00
108002    1.000000e+00
95524     1.000000e+00
108828    1.000000e+00
209164    1.000000e+00
18902     1.000000e+00
23311     1.000000e+00
112988    1.000000e+00
              ...     
155395    5.000000e+07
3011      5.000000e+07
192936    5.000000e+07
107414    5.500000e+07
2541      6.000000e+07
115889    6.800000e+07
144501    7.000000e+07
79796     7.300000e+07
80685     8.000000e+07
14549     9.000000e+07
16993     9.900000e+07
113469    1.000000e+08
140648    1

In [32]:
# assess project with highest goals and lowest goals
kickstarter.iloc[[150690, 8407, 141159, 164989]]

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
150690,52,I am lucky enough to have been invited to atte...,"{""id"":278,""name"":""People"",""slug"":""photography/...",2401,GB,1479321086,"{""id"":1542919473,""name"":""Nayru Suicide"",""is_re...",GBP,£,False,...,suicide-girls-argentina-shootfest-2017,https://www.kickstarter.com/discover/categorie...,True,False,successful,1482515412,1.249059,"{""web"":{""project"":""https://www.kickstarter.com...",2441.909407,domestic
8407,47,LP compilation per i 20 anni della Wallace Rec...,"{""id"":321,""name"":""Punk"",""slug"":""music/punk"",""p...",3406,IT,1542878451,"{""id"":1081860162,""name"":""Wallace Records"",""slu...",EUR,€,False,...,traccexx,https://www.kickstarter.com/discover/categorie...,True,True,successful,1548932401,1.134776,"{""web"":{""project"":""https://www.kickstarter.com...",3360.072447,domestic
141159,0,This is a feature film about one man's search ...,"{""id"":293,""name"":""Drama"",""slug"":""film & video/...",0,US,1468094944,"{""id"":900117272,""name"":""Jimmy Andrews; Directo...",USD,$,True,...,a-journey-through-pines,https://www.kickstarter.com/discover/categorie...,False,False,failed,1470690057,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",0.0,domestic
164989,143,From the author of the award-winning Jim Morga...,"{""id"":46,""name"":""Children's Books"",""slug"":""pub...",6540,US,1470521755,"{""id"":242477864,""name"":""James Raney"",""is_regis...",USD,$,True,...,the-lord-of-the-wolves-an-epic-childrens-fantasy,https://www.kickstarter.com/discover/categorie...,True,False,successful,1473815800,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",6540.0,domestic


The lowest project actually has a goal of USD 0. Due to database restriction it was saved as a very small number under USD 1.
https://www.kickstarter.com/projects/jerry/loveland-round-6-a-force-more-powerful

In [33]:
# assess distribution of goals
kickstarter.goal.value_counts()

5000.00       15447
10000.00      13527
1000.00       10128
3000.00        8668
2000.00        8645
500.00         8058
15000.00       7298
20000.00       6711
2500.00        6452
1500.00        6133
25000.00       5097
50000.00       4904
4000.00        4641
6000.00        4061
30000.00       4010
3500.00        3729
8000.00        3457
300.00         2701
7000.00        2633
12000.00       2591
7500.00        2470
100000.00      2451
600.00         2103
100.00         2052
250.00         1935
200.00         1927
1200.00        1871
35000.00       1705
800.00         1687
40000.00       1665
              ...  
13050.00          1
29011.00          1
5205.00           1
3978.00           1
5218.00           1
3637.00           1
58191.00          1
18450.00          1
808.00            1
10447.00          1
34240.00          1
7270.00           1
1523.00           1
14537.00          1
10290.00          1
1672.00           1
465000.00         1
7255.00           1
13824.00          1


In [34]:
# assess differences between converted pledged amount and usd pledged
kickstarter[kickstarter['converted_pledged_amount'] != kickstarter['usd_pledged']][['name','converted_pledged_amount', 'usd_pledged']]

Unnamed: 0,name,converted_pledged_amount,usd_pledged
6,Athleisure - Innovative Fashion / Sportswear c...,2,2.562614
7,"TRINAAD ""The sound of three Worlds"" new album",570,553.463513
10,ROOM ESCAPE VIP,696,679.171786
13,Gulcher Records Hat Trick,1522,1522.660000
15,Loftland's New Album!,8909,8909.540000
21,SEA Change Clothing Co. - Change Looks Good On...,8628,8618.630999
22,Moonspike | We're Going Back to the Moon,121890,119674.800519
24,"The first ""live"" performance in an Holographi...",12,12.155730
26,MyHereAfter interactive online memorials,88,89.188921
29,Alfie & Gaston. The dog café.,26,27.362990


In [35]:
# assess state
kickstarter.state.value_counts()

successful    120950
failed         75541
canceled        8656
live            6596
suspended        635
Name: state, dtype: int64

In [36]:
# compare project status to spotlight and staff_pick   
statuses = ['successful', 'failed']
bools = [(True, True), (True, False), (False, True), (False, False)]
for status in statuses: 
    print("\n", status, ":\n")
    for comb in bools:
        df = kickstarter[kickstarter.state == status]
        df = df[df.spotlight == comb[0]]
        df = df[df.staff_pick == comb[1]]
        print(comb[0], comb[1], ":", len(df))


 successful :

True True : 24203
True False : 96747
False True : 0
False False : 0

 failed :

True True : 0
True False : 0
False True : 2287
False False : 73254


Interestingly, all project that were awarded a Projects we love badge and and were put on a website in a spotlight succeeded. The same is true for all projects that were put only on the website. On the other hand all projects failed that didn't get any support or were only awarded a Projects we Love badge.  

In [37]:
# assess live projects
kickstarter[kickstarter.state == "live"]

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
26,3,An online memorial site acting as a all in one...,"{""id"":260,""name"":""Interactive Design"",""slug"":""...",88,GB,1560935695,"{""id"":1912022590,""name"":""John mitchell"",""slug""...",GBP,£,False,...,myhereafter-interactive-online-memorials,https://www.kickstarter.com/discover/categorie...,False,False,live,1561047260,1.256182,"{""web"":{""project"":""https://www.kickstarter.com...",89.188921,domestic
173,1,To bring battle royale into a real life atmosp...,"{""id"":271,""name"":""Live Games"",""slug"":""games/li...",1,US,1561646415,"{""id"":640205335,""name"":""Jai'lil Banks"",""slug"":...",USD,$,True,...,alternate-royale,https://www.kickstarter.com/discover/categorie...,False,False,live,1561836555,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",1.000000,
204,6,A strategy game with asymmetry of information ...,"{""id"":273,""name"":""Playing Cards"",""slug"":""games...",348,FR,1554459632,"{""id"":2044310129,""name"":""BrexitTheCardGame"",""i...",EUR,€,False,...,brexit-the-card-game,https://www.kickstarter.com/discover/categorie...,False,False,live,1563391539,1.125983,"{""api"":{""star"":""https://api.kickstarter.com/v1...",350.180663,domestic
224,1,Wireless electromechanical water-skier/wake-bo...,"{""id"":337,""name"":""Gadgets"",""slug"":""technology/...",1,US,1550170105,"{""id"":1075393741,""name"":""John"",""slug"":""fmpco"",...",USD,$,True,...,cross-wave-a-skier-down-visibility-system,https://www.kickstarter.com/discover/categorie...,False,False,live,1561593158,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",1.000000,domestic
231,1,Unoriginal premise of a dog who becomes a huma...,"{""id"":11,""name"":""Film & Video"",""slug"":""film & ...",0,CA,1559667747,"{""id"":1586832786,""name"":""GCIII"",""slug"":""gciiif...",CAD,$,True,...,good-boy,https://www.kickstarter.com/discover/categorie...,False,False,live,1559835912,0.747468,"{""web"":{""project"":""https://www.kickstarter.com...",0.964234,domestic
234,1,Revolutionizing the college experience using V...,"{""id"":332,""name"":""Apps"",""slug"":""technology/app...",1,US,1531455533,"{""id"":112449447,""name"":""Adulthood"",""slug"":""adu...",USD,$,True,...,virtual-reality-for-the-college-student-experi...,https://www.kickstarter.com/discover/categorie...,False,False,live,1562356196,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",1.000000,domestic
241,1,"A massive, online multiplayer videogame with a...","{""id"":35,""name"":""Video Games"",""slug"":""games/vi...",1,US,1561933248,"{""id"":1561113434,""name"":""Evan Carter"",""slug"":""...",USD,$,True,...,battle-for-spire-mmo,https://www.kickstarter.com/discover/categorie...,False,False,live,1561936271,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",1.000000,domestic
249,1,Production of the original anime. オリジナルのアニメの制作...,"{""id"":29,""name"":""Animation"",""slug"":""film & vid...",1,JP,1562217757,"{""id"":877029891,""name"":""中本　吉泰　Yoshihiro Nakamo...",JPY,¥,False,...,anime-crossroad-chronicle-produce-3,https://www.kickstarter.com/discover/categorie...,False,False,live,1562929919,0.009266,"{""web"":{""project"":""https://www.kickstarter.com...",1.009960,domestic
283,93,Inspire young scholars with fun STEM activitie...,"{""id"":46,""name"":""Children's Books"",""slug"":""pub...",3641,US,1561060191,"{""id"":1329216808,""name"":""PBS SoCal"",""slug"":""pb...",USD,$,True,...,pbs-kids-resource-boxes,https://www.kickstarter.com/discover/categorie...,False,True,live,1561651229,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",3641.000000,domestic
285,43,BECOME AN INVENTOR FOR A SUSTAINABLE FUTURE WI...,"{""id"":35,""name"":""Video Games"",""slug"":""games/vi...",3550,DE,1535628316,"{""id"":986242778,""name"":""Anke Petersen"",""is_reg...",EUR,€,False,...,hectarium-the-food-survival-eco-game,https://www.kickstarter.com/discover/categorie...,False,False,live,1561640261,1.135578,"{""web"":{""project"":""https://www.kickstarter.com...",3587.290144,domestic


In [38]:
# assess suspended projects
kickstarter[kickstarter.state == "suspended"]

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
14,0,Help fund my life long dream of becoming a Bus...,"{""id"":336,""name"":""Flight"",""slug"":""technology/f...",0,US,1421357956,"{""id"":291975465,""name"":""Rob Keith"",""is_registe...",USD,$,True,...,bush-pilot,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1422045322,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
126,1,I want to get my life back on track and and I ...,"{""id"":361,""name"":""Web"",""slug"":""journalism/web""...",5,US,1425286575,"{""id"":796344264,""name"":""Anthony Roberts"",""is_r...",USD,$,True,...,my-life-and-how-i-want-to-better-it,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1429287330,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",5.000000,domestic
167,134,Imagine if you could physically step into your...,"{""id"":271,""name"":""Live Games"",""slug"":""games/li...",54418,US,1425250261,"{""id"":1719438806,""name"":""Harbinger Entertainme...",USD,$,True,...,camp-sidereus,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1430424322,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",54418.000000,domestic
618,2,Hi! I am currently a Student Teacher in the st...,"{""id"":323,""name"":""Academic"",""slug"":""publishing...",42,US,1417904774,"{""id"":164067136,""name"":""Bryan Norkus"",""is_regi...",USD,$,True,...,student-teacher-needs-a-vehicle-to-create-bett...,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1418234428,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",42.000000,domestic
1521,0,-,"{""id"":279,""name"":""Places"",""slug"":""photography/...",0,NL,1447361968,"{""id"":942196753,""name"":""Robin"",""is_registered""...",EUR,€,False,...,not-so-boring-photo-album,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1448312046,1.079139,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
1705,0,I am try to see the world and take pictures of...,"{""id"":279,""name"":""Places"",""slug"":""photography/...",0,US,1441512785,"{""id"":479966649,""name"":""brayton stavis"",""is_re...",USD,$,True,...,travel-to-london-to-see-the-architecture,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1441982447,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",0.000000,domestic
1917,2,"A BBQ in Salford, on 6th May, from 2pm in my b...","{""id"":304,""name"":""Bacon"",""slug"":""food/bacon"",""...",7,GB,1493657480,"{""id"":2118328009,""name"":""Claire Wicher"",""is_re...",GBP,£,False,...,bbq-in-salford,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1493752838,1.294951,"{""web"":{""project"":""https://www.kickstarter.com...",7.769706,domestic
1974,1,I want to have a BBQ at my friends house. He t...,"{""id"":304,""name"":""Bacon"",""slug"":""food/bacon"",""...",1,US,1432689363,"{""id"":242023762,""name"":""Ian"",""is_registered"":n...",USD,$,True,...,i-want-to-have-a-bbq-at-my-friends-house,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1433174652,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",1.000000,domestic
2120,4,"I tried selling my boat after moving, no luck....","{""id"":271,""name"":""Live Games"",""slug"":""games/li...",4,US,1436046901,"{""id"":291011486,""name"":""Shiftyjohnson"",""is_reg...",USD,$,True,...,watch-me-axe-my-boat-or-watch-me-make-someones...,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1436814613,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",4.000000,domestic
2141,1,"The farm house we rent is being sold July 1st,...","{""id"":271,""name"":""Live Games"",""slug"":""games/li...",100,US,1431444093,"{""id"":1814127414,""name"":""Kayla Simmons"",""is_re...",USD,$,True,...,farm-carnival,https://www.kickstarter.com/discover/categorie...,False,False,suspended,1431564375,1.000000,"{""web"":{""project"":""https://www.kickstarter.com...",100.000000,domestic


In [39]:
# assess urls
kickstarter['urls'].values

array(['{"web":{"project":"https://www.kickstarter.com/projects/2044486203/citizen-carpentry-community-workshop-and-tool-shar?ref=discovery_category_newest","rewards":"https://www.kickstarter.com/projects/2044486203/citizen-carpentry-community-workshop-and-tool-shar/rewards"}}',
       '{"web":{"project":"https://www.kickstarter.com/projects/1498616534/simbin?ref=discovery_category_newest","rewards":"https://www.kickstarter.com/projects/1498616534/simbin/rewards"}}',
       '{"web":{"project":"https://www.kickstarter.com/projects/2019826059/photographing-places-i-love-mountains-of-eastern-k?ref=discovery_category_newest","rewards":"https://www.kickstarter.com/projects/2019826059/photographing-places-i-love-mountains-of-eastern-k/rewards"}}',
       ...,
       '{"web":{"project":"https://www.kickstarter.com/projects/546114795/q2q-comics-volume-1?ref=discovery_category_newest","rewards":"https://www.kickstarter.com/projects/546114795/q2q-comics-volume-1/rewards"}}',
       '{"web":{"pro

In [40]:
# assess composiition of urls
kickstarter.iloc[893]['urls']

'{"web":{"project":"https://www.kickstarter.com/projects/871534856/grafik-aircraft-the-sun-was-behind-us?ref=discovery_category_newest","rewards":"https://www.kickstarter.com/projects/871534856/grafik-aircraft-the-sun-was-behind-us/rewards"}}'

In [41]:
# assess usd type
kickstarter.usd_type.value_counts()

domestic         210842
international      1416
Name: usd_type, dtype: int64

The below Code was used to assess duplicated entries. I found out that most duplicated projects were differing because they were scraped from a different source url. Some urls used the parent category, some used the subcategory of the project. Additionally, I detected duplicate observations due to different values in current currency and therefore differing exchange rates. Fortunately, all relevant columns contain consistent values.

The below is code out-commented, since it requires a lot of resources and may cause a massive slow-down of Jupyter Notebook.

In [42]:
# Assess duplicates
# duplicated_ids = kickstarter_clean[kickstarter_clean.project_id.duplicated()].project_id
# duplicated_projects = kickstarter_clean[kickstarter_clean.project_id.isin(duplicated_ids)].sort_values(by='id').reset_index(drop=True)
# len(duplicated_projects.current_currency)

# col_names = ['current_currency', 'fx_rate', 'static_usd_rate', 'converted_pledged_amount','usd_pledged', 'goal', 'source_url']
# for i in range(0, len(duplicated_projects), 2):
#     next_i = i + 1
#     if not (duplicated_projects.iloc[i]['source_url'] == duplicated_projects.iloc[next_i]['source_url']):
#         print(f"{i}")
# #         print(f"state1: {duplicated_projects.iloc[i]['state_changed_at']}")
# #         print(f"state2: {duplicated_projects.iloc[next_i]['state_changed_at']}")
# #         print(f"update1: {duplicated_projects.iloc[i]['last_update_at']}")
# #         print(f"update2: {duplicated_projects.iloc[next_i]['last_update_at']}")     
#         for col in col_names:
#             if not (duplicated_projects.iloc[i][col] == duplicated_projects.iloc[next_i][col]):
#                 print(f"{col}, {duplicated_projects.iloc[i][col]}", "\n", f"{duplicated_projects.iloc[next_i][col]}")
#                 print(f"cat1: {duplicated_projects.iloc[i]['category']}")
#                 print(f"cat2: {duplicated_projects.iloc[next_i]['category']}")
#                 print(f"sub1: {duplicated_projects.iloc[next_i]['subcategory']}")
#                 print(f"sub2: {duplicated_projects.iloc[next_i]['subcategory']}")

## Data Summary<a name="datasummary"></a>

Above, I inspected each variable and their values. One of the main problems of this data is missing documentation on how the Kickstarter data was scraped or generated. Some column names are not descriptive enough or are ambiguous. I interpreted each value's meaning by carefully comparing the data to project features on Kickstarter.com.   

**backers_count:** Number of people supporting a project by financially pledging (integer).  
**blurb:** A short project summary (string).  
**category:** Contains a string representation of a dictionary containing multiple information about a project category information. Most relevant is the category's "name" and the higher level category "slug", which also represents the rightmost part of the category URI. The slug is composed of a higher level category, followed by a slash, followed by a subcategory. However, some observations lack a subcategory. In this case, slug and name contain the same value.   
**converted_pledged_amount:** Amount of fundraising, which was converted by using _current currency_ (see below).  
**country:** Project country (abbr.)  
**created_at:** The date a project was created in Unix epoch time integer.  
**creator:** A string representation of a dictionary describing features of project creator, containing:  
- user id, 
- name: chosen user name, which doesn't follow a specific format. It may be an actual first and last name and/or a username or company name,
- slug: creator profile slug,
- multiple links to the user's avatars of different sizes, 
- the link to a user's profile in the form of: https://www.kickstarter.com/profile/ + user_id,
- a user's API endpoint (which is not accessible to us),
- "is_registered" and "chosen_currency", of which all contain "null" as a value   


**currency:** The currency of a project's funding.  
**currency_symbol:** Currency symbol of the project's initial currency. 
**currency_trailing_code:** Contains boolean values. It's mostly "True", but "False" in the case of the following currencies: EUR, GBP, JPY and CHF. I was unable to identify its meaning. One assumption is that it is related to trailing zeros that occur during currency conversions. However, I don't see how this value affects our analysis.  
**current_currency:** contains a value out of 5 possible currencies (USD, CAD, AUD, GBP and EUR). Although projects are usually advertised in the country's currency, any other currencies are usually converted to one of the aforementioned currencies, usually USD.   
**deadline:** Project funding deadline as Unix epoch time integer.   
**disable_communication:** Contains a boolean. Likely, its value is False if a creator whether doesn't allow to be contacted. However, I cannot confirm its meaning beyond any doubt. 
**friends:** Is mainly an empty value, but contains empty square brackets in a few cases.  
**fx_rate:** Stands for foreign exchange rate. It's the exchange rate used to convert the project's currency into the current currency at the time of the data scraping (float).  
**goal:** Funding goal in the project's currency (float).  
**id:** Internal Kickstarter id.  
**is_backing:** It is unclear what this feature refers to. The common characteristics of those projects are that they are live and neither spotlighted, nor staff-picked. However, most values are empty.  
**is_starrable:** The meaning remains unclear to me. It contains boolean values, which are mostly "False".      
**is_starred:** Contains a few Boolean, the rest is nan. Interestingly, is_backing and is starrable contain the same values for each observation. The meaning remains unclear to me.  
**launched_at:** Time project opened for funding as Unix epoch time integer.   
**location:** String representation of a dictionary, containing location information:  
- _id:_location id,  
- _name:_ city name,  
- _slug:_ location slug,  
- _short name:_ city + country or US-state    
- _displayable name:_ city, + country or US-state,  
- _localized name_: name of the city
- _country_: country name (abbreviated)    
- _state_: local state/region,  
- _type_: location type - town, city etc.  
- _is root_: False, the meaning is unclear,  
- _urls_: location specific search urls    

**name:** Project title      
**permissions:** Contains mostly cells, in a few cases there are empty square brackets. Those are only found if there are non-null values in _friends_, _is_backing_,_is starrable_.  
**photo:** JSON string containing urls, linking to project cover photos in multiple sizes: original, ed, med, little, small, thumb and 1024x576, 1536x864.  
**pledged:** Amount pledged in project currency    
**profile:** JSON string containing fundamental project information, which were extracted from the project's homepage header:  
- _id_ and _project id:_ both keys contain the same project id,
- _state:_ Refers to projects that can be followed, after the funding ended. Values are 'inactive' or 'active',  
- _state changed at:_ Refers to the last time a profile information changed in Unix epoch time,  
- _name:_ project title (cf. _name_),  
- _blurb:_ Short campaign description, it may be different from the initial _blurb_ above,     
- _background color_, _text color_, _link background color:_, link_text_color: color information as hex code,
- _link text: text in a button of a project's homepage,  
- _link url_: contains information on style attributes of links   
- _image urls_: contains information on image attributes and urls linking to images   

**slug:** Project URI slug.  
**source_url:** Category URIs.    
**spotlight:** It is true for projects that were promoted by Kickstarter by spotlighting it on the landing page and optionally in Kickstarter's newsletter/social media channels.    
**staff_pick:** It is true for projects which were picked by staff and awarded the 'Projects We Love' badge.   
**state:** Refers to the current status of a project: successful, failed, canceled, live or suspended.      
**state_changed_at:** A Unix epoch time integer referring to when the status of the project has changed last, usually when the campaign ended.   
**static_usd_rate:** Conversion rate from project currency to USD. What exactly this exchange rate stands for and what date it relates to, remains unclear to me.   
**urls:** Json style string, containing:
- _project:_Link to project campaign page, which ends with a search query term: '?ref=discovery_category_newest'
- _rewards:_ Link to project rewards for pledges   

**usd pledged:** Refers to the total amount pledged in USD, which was converted using the above static USD conversion rate.   
**usd type:** International vs. domestic dollar type. However, since US projects are often considered international dollars, the meaning remains unclear to me.  


## Issues Summary <a name="issuessummary"></a>
### Tidiness Issues
- _category_: Contains multiple variables in the form of a string representation of a dictionary. Category name and category slug should be stored in separate columns.
- _creator_: Contains multiple variables in the form of a string representation of a dictionary. Creator id and creator name should be stored in separate columns.
- _location_: Contains multiple variables in the form of a string representation of a dictionary. City, state, displayable name and location type should be stored in separate columns.
- _photos_: contains multiple variables in the form of a string representation of a dictionary. Image link of size 'ed' should be stored in a separate column.
- _profile_: contains multiple variables in the form of a string representation of a dictionary. "Project id" and "profile change at" should be stored in a separate column.
- _Urls_: contains multiple variables in the form of a string representation of a dictionary. There should be one single URL linking to a project.  
- Duplicate projects with different values in some of the columns.
- _Staff pick_ and _spotlight_ refer to the same concern, namely whether a project was promoted by Kickstarter. Therefore, they should be summarized into one column.


### Quality Issues
- Erroneous values in _country_. They should match the value in _location_.  
- _created at_, _deadline_ , _launched at_, _state changed at_ and _last updated at_ time format is not human readable.
- Observations don't follow an ordered pattern. A historic order would help improving interpretation.
- Dubious currency conversion: we can't rely on values in _static usd rate_ and _fx rate_ due to missing documentation and inconsistent values.
- _goal_ and _usd pledged_ are incomparable due to non-matching currencies and dubious exchange rates. 
- Missing description (blurb) in some projects
- Erroneous data types: - _country_, _currency_, _status_, _category_ and _subcategory_ should be of type category.
- Ambiguous column names: name should refer to a project's title, _pledged_ and _goal_ should include currency measures.
- Dubious project statuses: The project's status (ID 2191564) is successful, the difference between goal and pledged is negative though. The projects status is failed (IDs: 3434836, 2736214, 445566), in spite of a positive surplus of pledged and goal.
- Irrelevant features

## Cleaning Data <a name="cleaningdata"></a>
We are now going to prepare our data for analysis. Any corrupt, inaccurate and unnecessary observations are being corrected or removed with the goal to create a master data frame I can use for analysis. 
I'm going to start by adressing structural issues first.

### Tidiness

In [43]:
# create copy of data frame
kickstarter_clean = kickstarter.copy()

**_category_ contains multiple variables in the form of a string representation of a dictionary. Category name and category slug should be stored in separate columns.**

**Define**

Rename _category_ to _inital category_. Convert string into a dictionary using json module. Extract values of the keys 'name' and 'slug' from _initial category_ and store each value in a new, separate column. Store name in _subcategory_ and slug in _category_. Only keep the parent category from slug, which is the term before the slash character.

**Code**

In [44]:
# Rename category
kickstarter_clean.rename(index=str, columns={"category": "initial_category"}, inplace=True)

# convert string to dictionary
categories = [json.loads(kickstarter_clean.iloc[int(i)]['initial_category']) for i in kickstarter_clean.index ]

# extract slug and name
kickstarter_clean['subcategory'] = [category['name'] for category in categories]
kickstarter_clean['category'] = [category['slug'] for category in categories]

# extract parent category from slug
kickstarter_clean['category'] = kickstarter_clean['category'].str.extract(r'(?P<category>^[a-zA-Z0-9 & _-]+)', expand=True)

# capitalize category
kickstarter_clean['category'] = kickstarter_clean['category'].str.title()

**Test**

In [45]:
# category was renamed to initital category
if 'initial_category' in kickstarter_clean.columns:
    print("Initial category found.")
else:
    print("Initial category NOT found.")

Initial category found.


In [46]:
# now we should have two additional columns: category and subcategory  
kickstarter_clean[['subcategory', 'category']].sample(10)

Unnamed: 0,subcategory,category
95665,Wearables,Technology
63334,Art,Art
36372,Illustration,Art
137029,Webseries,Film & Video
210632,Painting,Art
14671,Web,Journalism
111915,Software,Technology
376,Documentary,Film & Video
177446,Narrative Film,Film & Video
21286,Video,Journalism


In [47]:
# all categories and subcategories are capitalized and were correctly extracted
kickstarter_clean.category.value_counts()

Film & Video    27981
Music           27654
Technology      21398
Art             20981
Publishing      20595
Food            16573
Games           14119
Fashion         12350
Design           9002
Comics           8916
Photography      8277
Crafts           7388
Theater          7117
Journalism       5925
Dance            4102
Name: category, dtype: int64

In [48]:
kickstarter_clean.subcategory.value_counts()

Web                  4627
Product Design       4387
Tabletop Games       4066
Accessories          3633
Comic Books          3479
Comedy               3195
Illustration         3042
Graphic Novels       2998
Children's Books     2980
Apparel              2936
Gadgets              2936
Documentary          2907
Photobooks           2845
Drinks               2841
Shorts               2821
Restaurants          2733
Fiction              2721
Video Games          2716
Nonfiction           2714
Country & Folk       2700
Art Books            2649
Playing Cards        2645
Hardware             2643
Apps                 2638
Drama                2617
Rock                 2592
Indie Rock           2585
Mixed Media          2544
Pop                  2525
Classical Music      2522
                     ... 
Couture               338
Knitting              333
Fabrication Tools     328
Puzzles               324
Photo                 316
Publishing            311
Makerspaces           295
Film & Video

**_creator_ contains multiple variables in the form of a string representation of a dictionary. Creator id and creator name should be stored in separate columns.**

**Define**  
Some creator entries are not recognized as valid json, so we cannot use the json module to convert the string into a dictionary. For example, some creators use nicknames in quotation marks (e.g. "name":"Kat "NomadiKat" Vallera"). Instead, manually extract id and name using regular expressions. Than, remove remaining keys and quotation marks and convert _creator id_ into integers. Finally, store the new values in two new columns: _creator id_ and _creator name_.

**Code**

In [49]:
# extract creator id and creator name from json object
kickstarter_clean['creator_id'] = kickstarter_clean['creator'].str.extract(r'(?P<creator_id>\"id[\":]+\d+)', expand=True)
kickstarter_clean['creator_name'] = kickstarter_clean['creator'].replace(",","\,")
kickstarter_clean['creator_name'] = kickstarter_clean['creator_name'].str.extract(r'(?P<creator_name>\"name\":\"[\S\s]+(","))', expand=True)


# remove keys and uneccessary double quotes and convert to appropriate data format
kickstarter_clean['creator_id'] = [int(creator[5:]) for creator in kickstarter_clean['creator_id'].values]
kickstarter_clean['creator_name'] = [str(name)[8:-3] for name in kickstarter_clean['creator_name'].values]

# slice off string starting from 'slug' or 'is_registered'
def cut_ending(name):
    i = name.find("slug")
    if i != -1: return name[:i-3]
    i = name.find("is_registered")
    if i != -1: return name[:i-3]
    return name

kickstarter_clean['creator_name'] = kickstarter_clean['creator_name'].apply(cut_ending)

**Test**

In [50]:
# We have two new columns. 
kickstarter_clean[['creator_id', 'creator_name']].sample(20)

Unnamed: 0,creator_id,creator_name
102791,558226010,Ty DeBellotte
92595,358625990,William Quigley
104483,1708415095,Gareth Price
48321,939969452,Jeroen van der Pol
121954,661334869,Yatta Golf
199447,535732307,Paul Taylor
45064,650035249,Remo
193569,707440663,The Nightwood Society
154944,2017027030,BSKi
22104,1940176907,Daniel Gohstand


In [51]:
#  creator_id is integer
kickstarter_clean.creator_id.sample(3)

86716      667065938
122391    1516866189
109969    1502070645
Name: creator_id, dtype: int64

In [52]:
# test if any empty creator names
kickstarter_clean[kickstarter_clean.creator_name == '']['creator_name'].any()

False

**_location_ contains multiple variables in the form of a string representation of a dictionary. City, state, displayable name and location type should be stored in separate columns.**  

**_Erroneous values in country. They should match the value in location._**  

**Define**  
Start by removing unwanted nan values from the location data. It simplifies the conversion of our json string into a dictionary when using the json module. Rename the column referring to the project's _state_ in _status_ to avoid mixing it up with a location's _state_. Then, extract the values of _name_, _state_, _displayable name_, _type_, _country_ and store them in separate new columns _city_, _state_, _displ loc_, _loc type_ and _country_.

**Code**

In [53]:
# drop observations without location
kickstarter_clean.dropna(subset=['location'], inplace=True)
kickstarter_clean.reset_index(drop=True, inplace=True)

# convert string to dictionary
locations = [json.loads(kickstarter_clean.iloc[int(i)]['location']) for i in kickstarter_clean.index ]

# rename state into status
kickstarter_clean.rename(index = str, columns={"state": "status"},  inplace = True) 

# extract city name, state, displayable location and type
kickstarter_clean['city'] = [location['name'] for location in locations]
kickstarter_clean['state'] = [location['state'] for location in locations]
kickstarter_clean['displ_loc'] = [location['displayable_name'] for location in locations]
kickstarter_clean['loc_type'] = [location['type'] for location in locations]
kickstarter_clean['country'] = [location['country'] for location in locations]


**Test**

In [54]:
# renaming successful
kickstarter_clean.status.value_counts()

successful    120790
failed         75493
canceled        8645
live            6596
suspended        635
Name: status, dtype: int64

In [55]:
# there shouldn't be any null values left
kickstarter_clean.location.isna().any()

False

In [56]:
# new columns: 'city', 'state', 'displ_location', 'loc_type' 
kickstarter_clean[['city', 'state', 'displ_loc', 'loc_type', 'country', 'location']].sample(20)

Unnamed: 0,city,state,displ_loc,loc_type,country,location
78483,23705,VA,"Portsmouth, VA",Zip,US,"{""id"":12767474,""name"":""23705"",""slug"":null,""sho..."
110805,Los Angeles,CA,"Los Angeles, CA",Town,US,"{""id"":2442047,""name"":""Los Angeles"",""slug"":""los..."
71149,Nashville,TN,"Nashville, TN",Town,US,"{""id"":2457170,""name"":""Nashville"",""slug"":""nashv..."
211427,Burleson,TX,"Burleson, TX",Town,US,"{""id"":2372036,""name"":""Burleson"",""slug"":""burles..."
156624,San Jose,CA,"San Jose, CA",Town,US,"{""id"":2488042,""name"":""San Jose"",""slug"":""san-jo..."
205146,Portland,OR,"Portland, OR",Town,US,"{""id"":2475687,""name"":""Portland"",""slug"":""portla..."
164012,Raleigh,NC,"Raleigh, NC",Town,US,"{""id"":2478307,""name"":""Raleigh"",""slug"":""raleigh..."
23925,Frederick,MD,"Frederick, MD",Town,US,"{""id"":2372860,""name"":""Frederick"",""slug"":""frede..."
95929,Queens,NY,"Queens, NY",County,US,"{""id"":12589352,""name"":""Queens"",""slug"":""queens-..."
85320,Tucson,AZ,"Tucson, AZ",Town,US,"{""id"":2508428,""name"":""Tucson"",""slug"":""tucson-a..."


**_photos_ contains multiple variables in the form of a string representation of a dictionary. Image link of size 'ed' should be stored in a separate column.**

**Define**  
Convert _photo_ into a dictionary using the json module. Extract the image link to be found within the key 'ed' and store the values in a separate column with the name _image_.

**Code**

In [57]:
# convert string to dictionary
photos = [json.loads(kickstarter_clean.iloc[int(i)]['photo']) for i in kickstarter_clean.index ]

In [58]:
# extract image of size 'ed' and store in new column 'image'
kickstarter_clean['image'] = [photo['ed'] for photo in photos]

**Test**

In [59]:
# show examples how image url was extracted
kickstarter_clean.image.sample(20)

59225     https://ksr-ugc.imgix.net/assets/016/643/903/5...
47516     https://ksr-ugc.imgix.net/assets/022/587/100/6...
82953     https://ksr-ugc.imgix.net/assets/023/988/861/7...
82786     https://ksr-ugc.imgix.net/assets/012/036/640/c...
118023    https://ksr-ugc.imgix.net/assets/019/725/039/c...
191203    https://ksr-ugc.imgix.net/assets/022/884/647/a...
39479     https://ksr-ugc.imgix.net/assets/024/018/278/c...
26334     https://ksr-ugc.imgix.net/assets/017/199/341/1...
174766    https://ksr-ugc.imgix.net/assets/016/624/529/3...
59803     https://ksr-ugc.imgix.net/assets/024/126/619/f...
32200     https://ksr-ugc.imgix.net/assets/025/296/376/0...
143416    https://ksr-ugc.imgix.net/assets/012/169/686/e...
164491    https://ksr-ugc.imgix.net/assets/012/050/245/9...
168476    https://ksr-ugc.imgix.net/assets/013/650/146/a...
191104    https://ksr-ugc.imgix.net/assets/011/993/010/2...
139905    https://ksr-ugc.imgix.net/assets/012/007/951/b...
101586    https://ksr-ugc.imgix.net/asse

In [60]:
# example value
kickstarter_clean.iloc[8354]['image']

'https://ksr-ugc.imgix.net/assets/011/322/046/eb9b895fee765ffaa4dfa45be1a02e21_original.jpg?ixlib=rb-2.1.0&crop=faces&w=352&h=198&fit=crop&v=1463680993&auto=format&frame=1&q=92&s=36babcdeac8ee840da70aa1fb42305fe'

**_profile_ contains multiple variables in the form of a string representation of a dictionary. "Project id" and "profile change at" should be stored in a separate column.**

**Define**  
Due to unescaped quotation marks within values, the string is invalid JSON. Therefore, extract the id from _product id_ and the values of state changed using regular expressions. Then, remove remaining keys and quotation marks and convert creator id into an integer. Store the extracted values in new columns: _project id_ and _last update at_.

**Code**

In [61]:
# extract creator id and creator name from json object
kickstarter_clean['project_id'] = kickstarter_clean['profile'].str.extract(r'(?P<project_id>\"project_id\":\d+)', expand=True)

# remove keys and uneccessary quotes and convert to integer
kickstarter_clean['project_id'] = [int(project_id[13:]) for project_id in kickstarter_clean['project_id'].values]

In [62]:
# extract profile_changed_at
kickstarter_clean['last_update_at'] = kickstarter_clean['profile'].str.extract(r'(?P<profile_changed_at>\"state_changed_at":\d+)', expand=True)

# remove keys and uneccessary double quotes and convert to int
kickstarter_clean['last_update_at'] = [int(time[19:]) for time in kickstarter_clean['last_update_at'].values]


**Test**

In [63]:
#  show new project_id and last update at columns and compare values to previous column
kickstarter_clean[['project_id', 'profile', 'id', 'creator_id', 'location', 'last_update_at']].sample(1)

Unnamed: 0,project_id,profile,id,creator_id,location,last_update_at
60899,3440211,"{""id"":3440211,""project_id"":3440211,""state"":""ac...",957907239,928963061,"{""id"":733075,""name"":""Rotterdam"",""slug"":""rotter...",1538478462


In [64]:
# last update should be of type integer for now
kickstarter_clean['last_update_at'].sample()

185320    1425915837
Name: last_update_at, dtype: int64

**_Urls_ contains multiple variables in the form of a string representation of a dictionary. There should be one single URL linking to a project.**

**Define**  
Convert _urls_ into dictionaries using the json module and extract the value of key 'project'. Remove the query string from the end of the URI.

**Code**

In [65]:
# convert string to dictionary
urls = [json.loads(kickstarter_clean.iloc[int(i)]['urls']) for i in kickstarter_clean.index ]

# remove query string
kickstarter_clean['url'] = [url['web']['project'].replace("?ref=discovery_category_newest", "") for url in urls]

**Test**

In [66]:
# show examples of our newly cretaed url feature
kickstarter_clean.url.sample(10)

176469    https://www.kickstarter.com/projects/120749198...
152539    https://www.kickstarter.com/projects/105641330...
88704     https://www.kickstarter.com/projects/unicornpa...
87013     https://www.kickstarter.com/projects/168124492...
170156    https://www.kickstarter.com/projects/200759112...
52282     https://www.kickstarter.com/projects/103250360...
192837    https://www.kickstarter.com/projects/174645931...
209494    https://www.kickstarter.com/projects/152909942...
55341     https://www.kickstarter.com/projects/elizka-br...
13429     https://www.kickstarter.com/projects/56814098/...
Name: url, dtype: object

**Duplicate projects with different values in some of the columns.**

**Define**  
The assessment revealed that duplicate entries resulted from inconsistent values in features that are irrelevant for our research. Consequently, it is sufficient to remove duplicated observations by referring to projects with the same project_id. Keep the first observation in each case.

**Code**

In [67]:
kickstarter_clean.drop_duplicates(subset=['project_id'], keep='first', inplace=True)
kickstarter_clean.reset_index(drop=True, inplace=True)

**Test**

In [68]:
# There shouldn't be any duplcated project ids left.
len(kickstarter_clean[kickstarter_clean.project_id.duplicated()])

0

**_Staff pick_ and _spotlight_ refer to the same concern, namely whether a project was promoted by Kickstarter. Therefore, they should be summarized into one column.**

**Define:**  
Create two additional columns for projects that were a) not supported and 2) fully supported. Fully supported projects shall refer to project that were both, spotlighted on the landing page and were awarded a "Projects We Love" badge. Fill both columns by default with False. Then compare staff pick and spotlight. If both values are True for one observation, change the value in _full support_ to True. If both values are False, change the value in _no support_ to True.

Subsequently, replace all True values in _staff pick_ into "Projects We Love" and all True values in _spotlight_ by "spotlight"     
In _staff pick_ replace all True values by "Projects We Love" and in _spotlight_ replace all True values by "spotlight". Melt all values into one column and call it _featured_. Remove all rows containing "False" in _featured_ and the placeholder column, that was generated in the melt process. Finally, reset index. 

**Code:**

In [69]:
len(kickstarter_clean)

184921

In [70]:
# add additional columns with placeholder values
kickstarter_clean['no support'] = False
kickstarter_clean['full support'] = False

# assign "no support" to all observation without any support, 
# meaning the values are  False in both, staff_pick and spotlight
kickstarter_clean.loc[(kickstarter_clean['spotlight'] == False) & (kickstarter_clean['staff_pick'] == False), 
                      'no support'] = "no support" 

# assign "full support" to all observation with full support, 
# meaning the values are True in staff_pick and spolight
kickstarter_clean.loc[(kickstarter_clean['spotlight'] == True) & (kickstarter_clean['staff_pick'] == True), 
                      'full support'] = "full support" 
# replace True values of fully supported projects to False in spotlight and staff pick, so we don't any duplicates later
kickstarter_clean.loc[(kickstarter_clean['spotlight'] == True) & (kickstarter_clean['staff_pick'] == True), 
                      ['spotlight', 'staff_pick']] = False

# In staff pick, replace all remaining True values by "Projects We Love"
kickstarter_clean.loc[(kickstarter_clean['staff_pick'] == True), 'staff_pick'] = "Projects We Love" 

# In spotlight, replace all remaining True values by "spotlight"
kickstarter_clean.loc[(kickstarter_clean['spotlight'] == True), 'spotlight'] = "spotlight"

# get column names and remove the 4 support related columns
column_names = kickstarter_clean.columns
column_names = column_names.drop(['no support', 'full support', 'staff_pick', 'spotlight'])

# melt columns into one column "featured"
kickstarter_clean = pd.melt(kickstarter_clean, id_vars=column_names, var_name='placeholder', value_name='featured')

# Remove all observation containing False
kickstarter_clean = kickstarter_clean[kickstarter_clean['featured'] != False]

# remove placeholder and reset index
kickstarter_clean.drop(['placeholder'], axis=1, inplace=True)
kickstarter_clean.reset_index(drop=True, inplace=True)

**Test:**

In [71]:
# we shouldn't have lost any observations 
kickstarter_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 184921 entries, 0 to 184920
Data columns (total 48 columns):
backers_count               184921 non-null int64
blurb                       184913 non-null object
initial_category            184921 non-null object
converted_pledged_amount    184921 non-null int64
country                     184921 non-null object
created_at                  184921 non-null int64
creator                     184921 non-null object
currency                    184921 non-null object
currency_symbol             184921 non-null object
currency_trailing_code      184921 non-null bool
current_currency            184921 non-null object
deadline                    184921 non-null int64
disable_communication       184921 non-null bool
friends                     102 non-null object
fx_rate                     184921 non-null float64
goal                        184921 non-null float64
id                          184921 non-null int64
is_backing                  102 

In [72]:
# show example values of our new column
kickstarter_clean[['featured']].sample(10)

Unnamed: 0,featured
4305,spotlight
155066,no support
28055,spotlight
70232,spotlight
140870,no support
144328,no support
152932,no support
173463,full support
52391,spotlight
123207,no support


In [73]:
# show distribution of support
kickstarter_clean['featured'].value_counts()

no support          84621
spotlight           77741
full support        19747
Projects We Love     2812
Name: featured, dtype: int64

In [74]:
# the index was reset
kickstarter_clean.tail()

Unnamed: 0,backers_count,blurb,initial_category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,creator_name,city,state,displ_loc,loc_type,image,project_id,last_update_at,url,featured
184916,96,A visual essay exploring humanity's relationsh...,"{""id"":294,""name"":""Experimental"",""slug"":""film &...",10834,GB,1404650625,"{""id"":71118245,""name"":""Louis-Jack Horton-Steph...",GBP,£,False,...,Louis-Jack Horton-Stephens,London,England,"London, UK",Town,https://ksr-ugc.imgix.net/assets/011/740/553/7...,1100765,1433352322,https://www.kickstarter.com/projects/louis-jac...,full support
184917,79,"The untold story of an inland sea, a small tow...","{""id"":30,""name"":""Documentary"",""slug"":""film & v...",5109,US,1422379803,"{""id"":1224433372,""name"":""Aaron Peterson // Cle...",USD,$,True,...,Aaron Peterson // Clear & Cold Cinema,Munising,MI,"Munising, MI",Town,https://ksr-ugc.imgix.net/assets/012/015/917/4...,1662891,1425915891,https://www.kickstarter.com/projects/122443337...,full support
184918,60,Thomas est un mari et un père irréprochable. M...,"{""id"":32,""name"":""Shorts"",""slug"":""film & video/...",8389,FR,1436436273,"{""id"":2030287359,""name"":""David GUIRAUD"",""slug""...",EUR,€,False,...,David GUIRAUD,Valbonne,Provence-Alpes-Cote d'Azur,"Valbonne, France",LocalAdmin,https://ksr-ugc.imgix.net/assets/012/192/019/6...,2005448,1436436273,https://www.kickstarter.com/projects/enproiefi...,full support
184919,215,Spoiled Milk is a USC Graduate thesis film abo...,"{""id"":32,""name"":""Shorts"",""slug"":""film & video/...",18291,US,1516668607,"{""id"":1960005274,""name"":""Florence Heller"",""is_...",USD,$,True,...,Florence Heller,Los Angeles,CA,"Los Angeles, CA",Town,https://ksr-ugc.imgix.net/assets/020/006/374/4...,3285357,1516668607,https://www.kickstarter.com/projects/196000527...,full support
184920,143,big trees broken hearts beautiful images,"{""id"":31,""name"":""Narrative Film"",""slug"":""film ...",10626,US,1324229394,"{""id"":1756210031,""name"":""Adele Romanski"",""is_r...",USD,$,True,...,Adele Romanski,Los Angeles,CA,"Los Angeles, CA",Town,https://ksr-ugc.imgix.net/assets/011/308/760/9...,65220,1425915804,https://www.kickstarter.com/projects/175621003...,full support


## Quality issues
**_created at_, _deadline_ , _launched at_, _state changed at_ and _last updated at_ time format is not human readable.**

**Define**  
Convert created at, deadline, launched at, state changed at and last updated at into datetime format.  

**Code**

In [75]:
# to dateime
kickstarter_clean[['created_at', 'launched_at', 'state_changed_at', 'deadline', 'last_update_at']] = kickstarter_clean[['created_at', 'launched_at', 'state_changed_at', 'deadline', 'last_update_at']].apply(pd.to_datetime, unit='s')

**Test**

In [76]:
kickstarter_clean[['created_at', 'launched_at', 'state_changed_at', 'deadline', 'last_update_at']].sample(5)

Unnamed: 0,created_at,launched_at,state_changed_at,deadline,last_update_at
145218,2016-01-22 12:59:48,2016-03-09 08:51:59,2016-04-08 07:51:59,2016-04-08 07:51:59,2016-01-22 12:59:48
85785,2018-05-14 18:11:17,2018-05-22 22:30:28,2018-06-21 22:30:29,2018-06-21 22:30:28,2018-05-14 18:11:17
111907,2015-03-31 18:09:52,2015-04-16 17:22:47,2015-06-15 17:22:50,2015-06-15 17:22:47,2015-03-31 18:09:52
12673,2015-07-22 14:10:55,2015-08-07 08:38:02,2015-09-06 08:38:04,2015-09-06 08:38:02,2015-07-22 14:10:55
53810,2014-10-08 22:39:42,2014-11-02 22:27:04,2014-12-02 22:27:04,2014-12-02 22:27:04,2015-03-09 15:44:38


**Observations don't follow an ordered pattern. A historic order would help improving interpretation.**

**Define**  
Sort observations in descending order based on the date a project was _launched at_.

**Code**

In [77]:
# sort values and reset index
kickstarter_clean.sort_values(by='launched_at', ascending=False, inplace=True)
kickstarter_clean.reset_index(drop=True, inplace=True)

**Test**

In [78]:
kickstarter_clean.launched_at.head(10)

0   2019-07-18 05:04:48
1   2019-07-18 03:50:07
2   2019-07-18 03:23:01
3   2019-07-18 03:20:47
4   2019-07-18 02:55:13
5   2019-07-18 02:55:09
6   2019-07-18 02:24:42
7   2019-07-18 01:43:29
8   2019-07-18 01:33:07
9   2019-07-18 00:57:35
Name: launched_at, dtype: datetime64[ns]

In [79]:
kickstarter_clean.launched_at.tail()

184916   2009-05-01 15:44:25
184917   2009-05-01 12:22:21
184918   2009-04-29 21:11:15
184919   2009-04-29 20:08:13
184920   2009-04-28 11:55:41
Name: launched_at, dtype: datetime64[ns]

**Dubious currency conversion: we can't rely on values in _static usd rate_ and _fx rate_ due to missing documentation and inconsistent values.**

**Define**  
As we found inconsistencies in exchange rates, gather the exchange rates from the API of European Central Bank: 
https://exchangeratesapi.io/ .  

First request the current exchange rate as of today using USD as the currency base. Generate a new column and match the current exchange rate to the project's currency: USD, GBP, EUR, CAD, AUD, MXN, SEK, NZD, DKK, HKD, CHF,SGD, NOK and JPY.

Generate a second column of historic exchange rates based on the dates of the fundraising. By using the historic exchange rates, we'll gain a more realistic impression of the value of a campaign. Use a project's deadline as the date of the exchange rate. 
Write a function to request the historic exchange rate, based on a project's currency and deadline. The initial request may take multiple hours. So, cache the historic exchange rates into a csv file, to avoid running the request multiple times. Finally, merge the historic exchange rates into the clean data frame.

**Code**

In [80]:
# request current exchange rates based on USD
url = 'https://api.exchangeratesapi.io/latest?base=USD'
response = requests.get(url)

current_fx_rates = json.loads(response.content)
print(f"Current exchange rates base USD, date: {current_fx_rates['date']}.\n")
print(current_fx_rates)

Current exchange rates base USD, date: 2019-08-06.

{'rates': {'CAD': 1.3216233128, 'HKD': 7.8367748279, 'ISK': 122.0166264414, 'PHP': 52.0863502279, 'DKK': 6.6723875927, 'HUF': 290.8286403862, 'CZK': 22.9972289264, 'GBP': 0.8208635023, 'RON': 4.2281219272, 'SEK': 9.58854027, 'IDR': 14265.0040225261, 'INR': 70.8098685975, 'BRL': 3.941986234, 'RUB': 65.0734781443, 'HRK': 6.6005184589, 'JPY': 106.4628586752, 'THB': 30.7446142844, 'CHF': 0.9760436221, 'EUR': 0.8938946992, 'MYR': 4.1870027711, 'BGN': 1.7482792527, 'TRY': 5.5337445249, 'CNY': 7.0189505676, 'NOK': 8.8982747832, 'NZD': 1.52614642, 'ZAR': 14.8060248503, 'USD': 1.0, 'MXN': 19.5496558505, 'SGD': 1.380262805, 'AUD': 1.4719764012, 'ILS': 3.4940556003, 'KRW': 1213.3011531242, 'PLN': 3.8543845535}, 'base': 'USD', 'date': '2019-08-06'}


In [81]:
# create new column for current exchange rates
currencies = kickstarter_clean['currency'].values
kickstarter_clean['current_fx_rate(usd)'] = [float(current_fx_rates['rates'][currency]) for currency in currencies]

In [82]:
# utility to request exchange rate from API 
def get_fx_rate(i):
    rate = 1.0
    # request only if not USD
    if new_projects.iloc[i]['currency'] != 'USD':
        fx_url = url.format(deadlines[i])
        response = requests.get(fx_url)
        response = json.loads(response.content)
        currency = new_projects.iloc[i]['currency']
        rate =  float(response['rates'][currency])
    return rate

In [83]:
start = time.time()
# load existing exchange rates file or create a new empty data frame
try: 
    df_hist_rates = pd.read_csv('./data/exchange_rates.csv')
except:
    df_hist_rates = pd.DataFrame(columns=['project_id', 'hist_exchange_rate(usd)'])
    
# identify projects without historic exchange rates
# all project that had their exchange rate requested already
old_projects = df_hist_rates['project_id'].values
# new projects with missing exchange rates
new_projects = kickstarter_clean[~kickstarter_clean['project_id'].isin(old_projects)].reset_index(drop=True) 
print(len(new_projects), "projects require request of historic exchange rate.")

# variables needed to request exchange rate from url
url = 'https://api.exchangeratesapi.io/{}?base=USD'
deadlines = [str(deadline)[:10] for deadline in new_projects['deadline'].values]

# request exchange rate for every new project
if len(new_projects) != 0: 
    new_rates = []
    print("Start request of historic exchange rates.")
    for i in range(len(new_projects)):
        try:
            rate = get_fx_rate(i)
            new_rates.append({'project_id' : new_projects.iloc[i]['project_id'], 
                         'hist_exchange_rate(usd)': rate})
            print(new_projects.iloc[i]['currency'], ' - ', rate) 
        except:
            print("Unexpected error:", sys.exc_info()[0])
    
    # add new rates to data frame
    df_new_rates = pd.DataFrame(new_rates, columns=df_hist_rates.columns)
    df_hist_rates = df_hist_rates.append(df_new_rates, ignore_index=True)
    df_hist_rates.reset_index(drop=True, inplace=True)
    # write new dataframe to file 
    df_hist_rates.to_csv('./data/exchange_rates.csv', index=False, encoding='utf-8')    

end = time.time()
print("Process finished. Time elapsed: ", round((end-start) / 60, 2), "min." )

8666 projects require request of historic exchange rate.
Start request of historic exchange rates.
USD  -  1.0
CAD  -  1.3216233128
USD  -  1.0
CAD  -  1.3216233128
USD  -  1.0
HKD  -  7.8367748279
USD  -  1.0
USD  -  1.0
GBP  -  0.8208635023
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4719764012
EUR  -  0.8938946992
USD  -  1.0
NZD  -  1.52614642
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
NZD  -  1.52614642
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
USD  -  1.0
DKK  -  6.6723875927
USD  -  1.0
USD  -  1.0
CAD  -  1.3216233128
CAD  -  1.3148596538
USD  -  1.0
SEK  -  9.58854027
USD  -  1.0
CAD  -  1.3216233128
EUR  -  0.8938946992
USD  -  1.0
CHF  -  0.9760436221
CAD  -  1.3216233128
USD  -  1.0
GBP  -  0.8208635023
GBP  -  0.8208635023
GBP  -  0.820863502

GBP  -  0.8208635023
CAD  -  1.3216233128
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
HKD  -  7.8367748279
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
GBP  -  0.8208635023
GBP  -  0.8208635023
GBP  -  0.8208635023
DKK  -  6.6723875927
GBP  -  0.8208635023
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4719764012
HKD  -  7.8367748279
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
SGD  -  1.380262805
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
GBP  -  0.8208635023
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4719764012
USD  -  1.0
CAD  -  1.3216233128
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4719764012
USD  -  1.0
EUR  -  0.8938946992
AUD  -  1.4719764012
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
USD  -  1.0
JP

MXN  -  19.5496558505
USD  -  1.0
CAD  -  1.3230044396
USD  -  1.0
AUD  -  1.4719764012
EUR  -  0.8942944017
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
MXN  -  19.5496558505
USD  -  1.0
AUD  -  1.4719764012
USD  -  1.0
USD  -  1.0
MXN  -  19.5496558505
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.8208635023
USD  -  1.0
USD  -  1.0
GBP  -  0.8134094793
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3216233128
GBP  -  0.8208635023
MXN  -  19.5496558505
AUD  -  1.4719764012
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8993614534
GBP  -  0.8208635023
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8907892393
USD  -  1.0
GBP  -  0.8208635023
DKK  -  6.6723875927
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
CAD  -  1.3216233128
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.89

CAD  -  1.3216233128
USD  -  1.0
MXN  -  19.5496558505
USD  -  1.0
AUD  -  1.4735278228
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4619008789
USD  -  1.0
GBP  -  0.823924005
USD  -  1.0
JPY  -  108.0971659919
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
EUR  -  0.8938946992
USD  -  1.0
SEK  -  9.6827036332
USD  -  1.0
USD  -  1.0
GBP  -  0.825858476
CHF  -  0.9955603878
USD  -  1.0
USD  -  1.0
GBP  -  0.7979027815
USD  -  1.0
USD  -  1.0
EUR  -  0.8873901855
USD  -  1.0
AUD  -  1.4719764012
EUR  -  0.8938946992
EUR  -  0.8938946992
GBP  -  0.823924005
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
EUR  -  0.9060433089
USD  -  1.0
EUR  -  0.9060433089
USD  -  1.0
USD  -  1.0
EUR  -  0.9060433089
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
MXN  -  18.9964012596
USD  -  1.0
EUR  -  0.8915834522
USD  -  1.0
MXN  -  19.5496558505
USD  -  1.0
USD  -  1.0
GBP  -  0.8134094793
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
MXN  -  19.3176661264
MXN  -  19.5496558505
EUR  -  0

GBP  -  0.8047495062
USD  -  1.0
EUR  -  0.8993614534
USD  -  1.0
EUR  -  0.8967805578
GBP  -  0.8208635023
USD  -  1.0
GBP  -  0.796651865
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.8047495062
GBP  -  0.8047495062
EUR  -  0.888651915
USD  -  1.0
EUR  -  0.897827258
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3234287772
USD  -  1.0
CAD  -  1.3216233128
USD  -  1.0
GBP  -  0.7975241814
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
MXN  -  19.5496558505
EUR  -  0.897827258
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3180104148
USD  -  1.0
EUR  -  0.897827258
GBP  -  0.7979027815
USD  -  1.0
USD  -  1.0
GBP  -  0.8003868646
USD  -  1.0
GBP  -  0.8047495062
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.8003868646
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
MXN  -  

CAD  -  1.3065719361
GBP  -  0.7979027815
EUR  -  0.8942944017
USD  -  1.0
USD  -  1.0
AUD  -  1.4228944778
AUD  -  1.4228944778
GBP  -  0.8039917659
USD  -  1.0
CAD  -  1.3216233128
USD  -  1.0
USD  -  1.0
EUR  -  0.888651915
GBP  -  0.8039917659
USD  -  1.0
GBP  -  0.7984589346
USD  -  1.0
USD  -  1.0
GBP  -  0.8208635023
CAD  -  1.3148661953
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
CAD  -  1.3216233128
GBP  -  0.823924005
GBP  -  0.7979027815
USD  -  1.0
GBP  -  0.8208635023
EUR  -  0.8950147677
EUR  -  0.8950147677
EUR  -  0.8938946992
EUR  -  0.8938946992
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
JPY  -  106.4628586752
USD  -  1.0
USD  -  1.0
GBP  -  0.8042412902
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.8208635023
MXN  -  19.5496558505
USD  -  1.0
CAD  -  1.3076237182
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.897827258
USD  -  1.0
CAD  -  1.304731355

EUR  -  0.8938946992
USD  -  1.0
USD  -  1.0
GBP  -  0.8039917659
USD  -  1.0
GBP  -  0.8208635023
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8855042947
EUR  -  0.8907892393
USD  -  1.0
USD  -  1.0
SGD  -  1.3684683067
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
EUR  -  0.888651915
GBP  -  0.7979027815
EUR  -  0.8915834522
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3216233128
USD  -  1.0
MXN  -  19.0710641726
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3100150536
CAD  -  1.3054516301
USD  -  1.0
USD  -  1.0
EUR  -  0.8848774445
MXN  -  19.5496558505
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.8208635023
GBP  -  0.8022113241
EUR  -  0.9060433089
GBP  -  0.8208635023
GBP  -  0.7979027815
CAD  -  1.3180104148
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
MXN  -  19.0710641726
GBP  -  0.8047495062
USD  -  1.0
USD  -  1.0
NZD  -  1.52614642
USD  -  1.0
USD  -  1.0
GBP  - 

EUR  -  0.8916629514
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7878295255
USD  -  1.0
GBP  -  0.7979027815
MXN  -  19.5496558505
USD  -  1.0
MXN  -  19.2132825949
EUR  -  0.8873901855
CAD  -  1.3033857638
USD  -  1.0
USD  -  1.0
CAD  -  1.3076237182
USD  -  1.0
CAD  -  1.3191153239
HKD  -  7.8230694037
EUR  -  0.888651915
USD  -  1.0
NZD  -  1.4992446459
USD  -  1.0
USD  -  1.0
CAD  -  1.3422722803
EUR  -  0.8938946992
EUR  -  0.8811349018
AUD  -  1.4719764012
MXN  -  19.5496558505
EUR  -  0.8861320337
USD  -  1.0
GBP  -  0.795657953
SEK  -  9.291567539
GBP  -  0.8208635023
USD  -  1.0
EUR  -  0.9004141905
USD  -  1.0
GBP  -  0.8029897367
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
GBP  -  0.795657953
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.8014973262
USD  -  1.0
USD  -  1.0
CAD  -  1.3148661953
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
CAD  -  1.3057155516
HKD  -  7.8051843934
GBP  -  0.7

EUR  -  0.8880994671
AUD  -  1.4343679712
USD  -  1.0
DKK  -  6.628330373
GBP  -  0.796651865
USD  -  1.0
EUR  -  0.8924587238
GBP  -  0.796651865
USD  -  1.0
USD  -  1.0
GBP  -  0.7902527395
USD  -  1.0
USD  -  1.0
MXN  -  18.9939609236
EUR  -  0.8858965273
EUR  -  0.8880994671
GBP  -  0.796651865
SEK  -  9.5122016626
USD  -  1.0
USD  -  1.0
GBP  -  0.7865259455
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3213199785
GBP  -  0.8216776963
CAD  -  1.3213199785
USD  -  1.0
USD  -  1.0
EUR  -  0.8837044892
HKD  -  7.8374172778
GBP  -  0.8047495062
USD  -  1.0
GBP  -  0.796651865
HKD  -  7.7937833037
EUR  -  0.8880994671
GBP  -  0.8217052179
USD  -  1.0
USD  -  1.0
GBP  -  0.7984589346
USD  -  1.0
USD  -  1.0
EUR  -  0.8916629514
USD  -  1.0
USD  -  1.0
GBP  -  0.796651865
CAD  -  1.3213199785
USD  -  1.0
GBP  -  0.7878295255
USD  -  1.0
USD  -  1.0
USD  -  1.0
DKK  -  6.764519344
USD  -  1.0
USD  -  1.0
GBP  -  0.8216572505
EUR  -  0.8877052818
EUR  -  0.8942944017
USD  -  1.0
CHF  -  0.

MXN  -  18.9939609236
USD  -  1.0
USD  -  1.0
GBP  -  0.7878295255
USD  -  1.0
USD  -  1.0
GBP  -  0.7905366112
GBP  -  0.7905366112
USD  -  1.0
CHF  -  0.9828077677
EUR  -  0.8811349018
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8811349018
EUR  -  0.8837044892
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4297294916
GBP  -  0.7908832668
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3148596538
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8967805578
AUD  -  1.4490901021
SGD  -  1.3528119508
USD  -  1.0
MXN  -  19.0121962156
EUR  -  0.8787346221
MXN  -  19.184298539
EUR  -  0.8967805578
EUR  -  0.8787346221
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7902527395
USD  -  1.0
USD  -  1.0
GBP  -  0.7884938534
EUR  -  0.8861320337
SEK  -  9.291567539
CHF  -  0.981672394
CAD  -  1.3098951449
USD  -  1.0
USD  -  1.0
EUR  -  0.8811349018
EUR  -  0.8811349018
GBP  -  0.7905366112
EUR  -  0.891027

GBP  -  0.8047495062
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7865259455
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.897827258
USD  -  1.0
GBP  -  0.7865259455
GBP  -  0.7878295255
USD  -  1.0
NOK  -  8.517414248
USD  -  1.0
USD  -  1.0
GBP  -  0.7865259455
USD  -  1.0
SEK  -  9.4797044686
CAD  -  1.3422722803
USD  -  1.0
EUR  -  0.8795074758
GBP  -  0.7865259455
GBP  -  0.8047495062
USD  -  1.0
EUR  -  0.8848774445
GBP  -  0.7865259455
USD  -  1.0
GBP  -  0.7857832806
EUR  -  0.8870753127
DKK  -  6.5647317502
SEK  -  9.2777484609
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4458415664
MXN  -  19.1227420918
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
SGD  -  1.3538258575
AUD  -  1.4175129164
USD  -  1.0
USD  -  1.0
EUR  -  0.8801267383
USD  -  1.0
GBP  -  0.796651865
USD  -  1.0
GBP  -  0.7886199613
GBP  -  0.7878295255
AUD  -  1.4297294916
GBP  -  0.796651865
USD  -  1.0
USD  -  1.0
GBP  -  0.7886199613
USD  -  1.0
GBP  -  0.

SGD  -  1.3609625668
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8844078889
USD  -  1.0
GBP  -  0.7908832668
EUR  -  0.8787346221
USD  -  1.0
USD  -  1.0
CAD  -  1.3554838131
USD  -  1.0
USD  -  1.0
EUR  -  0.8833922261
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3331557923
GBP  -  0.7860642642
EUR  -  0.8844078889
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8844078889
USD  -  1.0
AUD  -  1.4436189971
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7884938534
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
HKD  -  7.8077801194
USD  -  1.0
CAD  -  1.3252650177
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3065719361
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3165295834
USD  -  1.0
GBP  -  0.7878295255
GBP  -  0.7884938534
USD  -  1.0
SGD  -  1.3563279384
USD  -  1.0
NOK  -  8.5638695885
USD  -  1.0
USD  -  1.0
CAD  -  1.3360241284
USD  -  1.0
USD  -  1.0
EUR  -  0.883704489

EUR  -  0.8877052818
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8876264868
USD  -  1.0
GBP  -  0.7908832668
GBP  -  0.7842886161
EUR  -  0.888651915
USD  -  1.0
EUR  -  0.8877052818
AUD  -  1.4490901021
USD  -  1.0
HKD  -  7.8089630931
NOK  -  8.7280271259
AUD  -  1.4373714797
AUD  -  1.4309073136
EUR  -  0.8922994557
USD  -  1.0
EUR  -  0.8877052818
GBP  -  0.789608649
USD  -  1.0
GBP  -  0.7908832668
AUD  -  1.4346669032
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
CHF  -  0.9952063915
USD  -  1.0
USD  -  1.0
CAD  -  1.3387225726
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
HKD  -  7.8391141942
AUD  -  1.4490901021
USD  -  1.0
USD  -  1.0
AUD  -  1.4470723713
EUR  -  0.8938946992
CAD  -  1.3033857638
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8938946992
CHF  -  0.992736292
CAD  -  1.3252650177
MXN  -  19.0766906603
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
SGD  -  1.3710556896
USD  -  1.0
EUR  -  0.88707531

GBP  -  0.7991529202
GBP  -  0.7866140335
GBP  -  0.7991529202
NZD  -  1.5094473521
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3409192725
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8870753127
NZD  -  1.5253623188
AUD  -  1.4346669032
GBP  -  0.7866140335
USD  -  1.0
CAD  -  1.3498293515
USD  -  1.0
GBP  -  0.7840583602
USD  -  1.0
GBP  -  0.7902527395
GBP  -  0.7919705407
USD  -  1.0
USD  -  1.0
CAD  -  1.3375568841
USD  -  1.0
EUR  -  0.8848774445
USD  -  1.0
SGD  -  1.359518502
GBP  -  0.7866140335
GBP  -  0.7902527395
EUR  -  0.8870753127
EUR  -  0.8870753127
EUR  -  0.8870753127
GBP  -  0.7866140335
EUR  -  0.8870753127
USD  -  1.0
CAD  -  1.3191940615
AUD  -  1.4318323674
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
AUD  -  1.4346669032
USD  -  1.0
EUR  -  0.8870753127
USD  -  1.0
USD  -  1.0
GBP  -  0.7865259455
USD  -  1.0
USD  -  1.0
GBP  -  0.796651865
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1

AUD  -  1.4295195115
AUD  -  1.4470451081
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8967805578
CAD  -  1.3554838131
MXN  -  19.1227420918
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3492660222
SEK  -  9.5408483544
USD  -  1.0
CAD  -  1.3554838131
USD  -  1.0
MXN  -  19.0450522928
EUR  -  0.8954956568
USD  -  1.0
EUR  -  0.8892841263
CAD  -  1.3191940615
USD  -  1.0
USD  -  1.0
USD  -  1.0
DKK  -  6.6754268347
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7953815801
GBP  -  0.7953815801
USD  -  1.0
EUR  -  0.8977466559
USD  -  1.0
EUR  -  0.8892841263
USD  -  1.0
USD  -  1.0
CAD  -  1.3554838131
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
SEK  -  9.5408483544
GBP  -  0.7842886161
GBP  -  0.7953815801
GBP  -  0.789608649
CAD  -  1.3422722803
GBP  -  0.7842886161
EUR  -  0.8967805578
USD  -  1.0
GBP  -  0.7840583602
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8848774445
USD  -  1.0
USD  -  1.0
EUR  -  0.8967805578
GBP  -  0.78960

GBP  -  0.7908300466
GBP  -  0.789608649
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
SEK  -  9.5122016626
GBP  -  0.772519152
CAD  -  1.3086994728
USD  -  1.0
USD  -  1.0
EUR  -  0.885818053
GBP  -  0.7840583602
USD  -  1.0
USD  -  1.0
EUR  -  0.8787346221
CAD  -  1.3503944066
GBP  -  0.7878295255
EUR  -  0.8787346221
MXN  -  19.0472405787
USD  -  1.0
USD  -  1.0
GBP  -  0.7908300466
USD  -  1.0
USD  -  1.0
USD  -  1.0
SEK  -  9.5968985299
USD  -  1.0
USD  -  1.0
EUR  -  0.8787346221
USD  -  1.0
USD  -  1.0
HKD  -  7.8489317958
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8963786303
SEK  -  9.5968985299
GBP  -  0.7908300466
AUD  -  1.4373714797
GBP  -  0.7908300466
JPY  -  108.7525782441
CAD  -  1.3503944066
EUR  -  0.8963786303
EUR  -  0.8870753127
AUD  -  1.4440671909
USD  -  1.0
CHF  -  1.0084426082
USD  -  1.0
USD  -  1.0
GBP  -  0.768032057
GBP  -  0.789608649
USD  -  1.0
EUR  -  0.8950948801
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8877052818
USD  -  1.0
GBP  -  0.79088

GBP  -  0.7846332945
USD  -  1.0
CHF  -  1.0131789849
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7902604959
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3191940615
GBP  -  0.7902527395
USD  -  1.0
USD  -  1.0
EUR  -  0.8950948801
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7902604959
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
EUR  -  0.8877052818
CAD  -  1.3554838131
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7908300466
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
HKD  -  7.8499686689
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3492660222
USD  -  1.0
GBP  -  0.7902604959
AUD  -  1.448038176
EUR  -  0.8892841263
USD  -  1.0
JPY  -  107.4938140686
CAD  -  1.33846567
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
USD  -  1.0
CAD  -  1.3409192725
USD  -  1.0
USD  -  1.0
USD  -  1.0
GBP  -  0.7849655049
US

In [84]:
# read in exchange rates
exchange_rates = pd.read_csv('./data/exchange_rates.csv')

# join exchange rates with kickstarter_projects
kickstarter_clean = kickstarter_clean.merge(exchange_rates, on='project_id', how='left')

**Test**  

In [102]:
# values were added to data frame
len(df_hist_rates) == len(old_projects) + len(new_projects)

True

In [85]:
# we should now have features containing the current and historic fx rate, for USD, the value should be 1.0 
kickstarter_clean[['current_fx_rate(usd)', 'hist_exchange_rate(usd)', 'currency']].sample(10)

Unnamed: 0,current_fx_rate(usd),hist_exchange_rate(usd),currency
1839,1.0,1.0,USD
105690,1.0,1.0,USD
142325,1.0,1.0,USD
49376,1.0,1.0,USD
17506,1.0,1.0,USD
80806,1.0,1.0,USD
162778,1.0,1.0,USD
18844,1.321623,1.315198,CAD
46583,1.471976,1.31717,AUD
159686,1.0,1.0,USD


In [86]:
# no missing values 
kickstarter_clean[['current_fx_rate(usd)', 'hist_exchange_rate(usd)']].isnull().any()

current_fx_rate(usd)       False
hist_exchange_rate(usd)    False
dtype: bool

**_goal_ and _usd pledged_ are incomparable due to non-matching currencies and dubious exchange rates.**

**Define**  
Convert the funding goal and the pledged funding into USD, based on the current exchange rates and their historic exchange rates. 

**Code**

In [87]:
# convert project financial goal into usd
kickstarter_clean['goal_current_usd'] = kickstarter_clean['goal'] / kickstarter_clean['current_fx_rate(usd)']
kickstarter_clean['goal_hist_usd'] = kickstarter_clean['goal'] / kickstarter_clean['hist_exchange_rate(usd)']

# convert pledged amounts into usd
kickstarter_clean['pledged_current_usd'] = kickstarter_clean['pledged'] / kickstarter_clean['current_fx_rate(usd)']
kickstarter_clean['pledged_hist_usd'] = kickstarter_clean['pledged'] / kickstarter_clean['hist_exchange_rate(usd)']

**Test**

In [88]:
kickstarter_clean[['currency', 'current_fx_rate(usd)', 'hist_exchange_rate(usd)','goal', 'goal_current_usd', 'usd_pledged', 'pledged_current_usd', 'pledged_hist_usd']].sample(10)

Unnamed: 0,currency,current_fx_rate(usd),hist_exchange_rate(usd),goal,goal_current_usd,usd_pledged,pledged_current_usd,pledged_hist_usd
113790,USD,1.0,1.0,4500.0,4500.0,4528.0,4528.0,4528.0
11390,USD,1.0,1.0,10000.0,10000.0,2.0,2.0,2.0
182329,USD,1.0,1.0,3500.0,3500.0,4551.0,4551.0,4551.0
103135,USD,1.0,1.0,1000.0,1000.0,50.0,50.0,50.0
34746,USD,1.0,1.0,2500.0,2500.0,92.0,92.0,92.0
52896,MXN,19.549656,17.766592,250000.0,12787.948899,3393.67127,3094.683633,3405.267683
144316,EUR,0.893895,0.788706,7500.0,8390.25,13434.736068,11604.2751,13151.9267
118299,GBP,0.820864,0.631383,720.0,877.125122,1132.315061,877.734237,1141.14515
4586,EUR,0.893895,0.884877,10000.0,11187.0,11641.742576,11635.03935,11753.60505
66595,USD,1.0,1.0,1350.0,1350.0,1605.0,1605.0,1605.0


In [89]:
kickstarter_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 184921 entries, 0 to 184920
Data columns (total 54 columns):
backers_count               184921 non-null int64
blurb                       184913 non-null object
initial_category            184921 non-null object
converted_pledged_amount    184921 non-null int64
country                     184921 non-null object
created_at                  184921 non-null datetime64[ns]
creator                     184921 non-null object
currency                    184921 non-null object
currency_symbol             184921 non-null object
currency_trailing_code      184921 non-null bool
current_currency            184921 non-null object
deadline                    184921 non-null datetime64[ns]
disable_communication       184921 non-null bool
friends                     102 non-null object
fx_rate                     184921 non-null float64
goal                        184921 non-null float64
id                          184921 non-null int64
is_backing    

**Missing description (blurb) in some projects**

**Define**  
Drop projects without value in 'blurb'. Reset the index.

**Code**

In [90]:
kickstarter_clean.dropna(subset=['blurb'], inplace=True)
kickstarter_clean.reset_index(drop=True, inplace=True)

**Test**

In [91]:
kickstarter_clean[kickstarter_clean.blurb.isna()]['blurb']

Series([], Name: blurb, dtype: object)

**Erroneous data types: - country, currency, status, category and subcategory should be of type category**

**Define**  
Convert country, currency, status, category and subcategory into category.

**Code**

In [92]:
# To category
kickstarter_clean['country'] = kickstarter_clean['country'].astype('category')
kickstarter_clean['currency'] = kickstarter_clean['currency'].astype('category')
kickstarter_clean['status'] = kickstarter_clean['status'].astype('category')
kickstarter_clean['category'] = kickstarter_clean['category'].astype('category')
kickstarter_clean['subcategory'] = kickstarter_clean['subcategory'].astype('category')

**Test**

In [93]:
kickstarter_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 184913 entries, 0 to 184912
Data columns (total 54 columns):
backers_count               184913 non-null int64
blurb                       184913 non-null object
initial_category            184913 non-null object
converted_pledged_amount    184913 non-null int64
country                     184913 non-null category
created_at                  184913 non-null datetime64[ns]
creator                     184913 non-null object
currency                    184913 non-null category
currency_symbol             184913 non-null object
currency_trailing_code      184913 non-null bool
current_currency            184913 non-null object
deadline                    184913 non-null datetime64[ns]
disable_communication       184913 non-null bool
friends                     102 non-null object
fx_rate                     184913 non-null float64
goal                        184913 non-null float64
id                          184913 non-null int64
is_backing

**Ambiguous column names: name should refer to a project's title, _pledged_ and _goal_ should include currency measures.**

**Define**  
Rename the columns name into 'project_name' and goal into 'goal_real', since it refers to the real project currency.

**Code**

In [94]:
kickstarter_clean.rename(index=str, columns={"name": "project_name", "goal": "goal_real", "pledged": "pledged_real"}, inplace=True)

**Test**

In [95]:
kickstarter_clean[['project_name', 'goal_real', "pledged_real"]].sample(5)

Unnamed: 0,project_name,goal_real,pledged_real
18966,www.DocupletionForms.com & www.Documatic.Website!,2000.0,1.0
47963,UG Musiq,100000.0,0.0
177690,"Razistan (""Land of Secrets""): a new outlet on ...",12500.0,13310.0
100589,"Project-Nemesis ""death in wonderland""",10000.0,0.0
48486,"Istadarium: un calendario, un racconto, un reg...",7500.0,10049.0


**Dubious project statuses: The project's status (ID 2191564) is successful, the difference between goal and pledged is negative though. The projects status is failed (IDs: 3434836, 2736214, 445566), in spite of a positive surplus of pledged and goal.**

**Define**  
Remove projects with _project id_ 2191564, 3434836, 2736214 and 445566 from data set. Indentify the index of each row containing aforementioned project ids, then drop each row. Finally, reset the row index.

**Code**

In [96]:
ids = [2191564, 3434836, 2736214, 445566]
indices = [kickstarter_clean[kickstarter_clean.project_id == id].index[0] for id in ids]
kickstarter_clean.drop(indices, inplace=True)
kickstarter_clean.reset_index(drop=True, inplace=True)

**Test**

In [97]:
for id in ids:
    if id in kickstarter_clean.project_id: print("Test failed, "+ id + " in df." )
    else: print("ok")

ok
ok
ok
ok


In [98]:
kickstarter_clean.tail(1)

Unnamed: 0,backers_count,blurb,initial_category,converted_pledged_amount,country,created_at,creator,currency,currency_symbol,currency_trailing_code,...,project_id,last_update_at,url,featured,current_fx_rate(usd),hist_exchange_rate(usd),goal_current_usd,goal_hist_usd,pledged_current_usd,pledged_hist_usd
184908,110,Let's make the world's first crowd-funded book...,"{""id"":13,""name"":""Journalism"",""slug"":""journalis...",3329,US,2009-04-27 04:44:17,"{""id"":1504593825,""name"":""We Make a Book"",""slug...",USD,$,True,...,73,2015-03-09 15:43:20,https://www.kickstarter.com/projects/nymab/new...,full support,1.0,1.0,3000.0,3000.0,3329.0,3329.0


_**Irrelevant columns**_

**Define**  
Select only columns needed for analysis, reset row index and store as master_df. Necessary columns are in the following order: 'project_id', 'project_name', 'url','blurb', 'category', 'subcategory', 'image', 'slug', 'created_at', 'launched_at', 'deadline', 'state_changed_at', 'last_update_at', 'status', 'creator_id', 'creator_name', 'country', 'city', 'state', 'displ_loc', 'loc_type', 'backers_count', 'featured','currency', 'goal_real','goal_current_usd', 'goal_hist_usd', 'pledged_real','pledged_current_usd', 'pledged_hist_usd', 'current_fx_rate(usd)','hist_exchange_rate(usd)'.

**Code**

In [99]:
master_df = kickstarter_clean[['project_id', 'project_name', 'url','blurb', 'category', 'subcategory', 'image', 'slug', 'created_at', 'launched_at', 'deadline', 'state_changed_at', 'last_update_at', 'status', 'creator_id', 'creator_name', 'country', 'city', 'state', 'displ_loc', 'loc_type', 'backers_count', 'featured', 'currency', 'goal_real','goal_current_usd', 'goal_hist_usd', 'pledged_real','pledged_current_usd', 'pledged_hist_usd', 'current_fx_rate(usd)','hist_exchange_rate(usd)']].reset_index(drop=True)

**Test**

In [100]:
master_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 184909 entries, 0 to 184908
Data columns (total 32 columns):
project_id                 184909 non-null int64
project_name               184909 non-null object
url                        184909 non-null object
blurb                      184909 non-null object
category                   184909 non-null category
subcategory                184909 non-null category
image                      184909 non-null object
slug                       184909 non-null object
created_at                 184909 non-null datetime64[ns]
launched_at                184909 non-null datetime64[ns]
deadline                   184909 non-null datetime64[ns]
state_changed_at           184909 non-null datetime64[ns]
last_update_at             184909 non-null datetime64[ns]
status                     184909 non-null category
creator_id                 184909 non-null int64
creator_name               184909 non-null object
country                    184909 non-null ca

## Store master<a name="storemaster"></a>

In [101]:
# Store clean data frame in two CSV files to limit file size and therefore allow pushing to Github
quarter = len(master_df) // 4
master_1_df = master_df.iloc[:quarter]
master_2_df = master_df.iloc[quarter:2*quarter]
master_3_df = master_df.iloc[2*quarter:3*quarter]
master_4_df = master_df.iloc[3*quarter:]
master_1_df.to_csv('./data/kickstarter_master1.csv', index=False, encoding='utf-8')
master_2_df.to_csv('./data/kickstarter_master2.csv', index=False, encoding='utf-8')
master_3_df.to_csv('./data/kickstarter_master3.csv', index=False, encoding='utf-8')
master_4_df.to_csv('./data/kickstarter_master4.csv', index=False, encoding='utf-8')

## Wrangling Summary<a name="wranglingsummary"></a>

The dataset initially consisted of 212,377 project observations. Each project contained 37 data columns. I visually and programmatically assessed the data set here in this Jupyter Notebook. Due to a lack of documentation, I took samples of the data and compared project values with the Kickstarter archive, which is accessible online without any restrictions. There was no information about project features provided by the web scraping service, nor how exactly the data was scraped or generated. Therefore, I spent quite some time to understand and interpret each feature in the data set.  

The data set was quite messy. I identified 7 tidiness issues and 10 quality issues. Therefore, I took a programmatic approach to clean the data. One of the main issues were insufficiently extracted JSON strings. I extracted the most important values from each string and stored each value in separate columns. 

The second major issue were about 27k duplicate projects. Web Robots warned on their website about possible duplicates, which were a consequence of the search URLs they used to find the projects on the website. Thus, I removed all duplicates from the data set. 

The third major change to this data was the conversion of funding goals and the pledged funding from their original currency to USD. I used two different exchange rates for the conversion. The first exchange rate refers to the current exchange rate (August 2019) to allow a general understanding of today's value of projects. However, to assess the real value of a campaign, I suppose it is better to compare projects by their value at the time of their funding. Hence, I additionally converted goals and pledges into USD using the date of the end of the funding.     

Apart from that, when collating project observation to Kickstarter's archive, I did not encounter any severe inconsistencies. However, I'd like to warn readers, that I cannot guarantee for the validity and completeness of the data and take no liability for misinterpretation due to a lack of documentation.

After the cleaning, the data set was left with 184,909 project observations with 32 features each describing a crowdfunding campaign. Due to Github's file size restrictions, the data was split and then stored into 4 csv files.