# Want to do a kickstarter?

## An analysis on kickstarters using python

This notebook is looking into a dataset showing thousands of kickstater projects to see what projects are successful and projects that are not successful.

In [None]:
import pandas as pd
import numpy as np

main_dataset = pd.read_csv('ks-projects-201801.csv')

In [None]:
main_dataset

As we can see, we have information for over 370,000 projects that were put on kickstarter. We will be first running a quick pd.info() function to obtain information on the d-types each of these columns are. The next step after that is to delete any column we will not be needing and dividing this df into 3 seperate lists(possible df's): successful projects, failed projects, and cancelled projects.

In [None]:
#gather basic info on the d-types for each column
main_dataset.info()

Looking at the information above, we are starting to get a general understanding of the dataframe/dataset. Understanding what is a float, int, or an object will help us going forward in order to use the correct code when it comes to reading strings or making int conversions.

In [None]:
#renaming certain column names within the dataset
main_dataset = main_dataset.rename(columns={"state" : "project_result", "backers" : "num_of_pledges", "usd pledged" : "usd_pledged", "pledged" : "total_amount_pledged"})

## Creating Two New Datasets

We will be making two new datasets:
* one for successful projects
* one for failed projects

We will analize more detailed information from the two datasets.

In [None]:
#making two new data sets, success and fail, to further investigate each of them
failed_dataset = main_dataset.loc[main_dataset["project_result"] == "failed"]
successful_dataset = main_dataset.loc[main_dataset["project_result"] == "successful"]


In [None]:
#convert number in dataset to make them easier to read
pd.options.display.float_format = '{:.2f}'.format

In [None]:
failed_dataset.describe(include='all')

In [None]:
successful_dataset.describe(include='all')

So, there is a lot of information to take in from the last few cells. We seperated the main dataset into two serperate datasets based on if they failed or succeeded. Then, we ran them through a describe method, which devideds up the average number that takes place within each column. To get an idea of what the average dollar amount was put into a successful/failed kickstater, you simply look at the usd_pledged_real column, maen row, and you will see the average amount put into either a successful or failed kickstater. Example, 22,670.80 was the average amount for a successful kickstater, while 1320.60 was the average amount for a failed kickstater. That is a staggering amount of difference between the two. We can start to look at these categories using visualizations to get a better understanding.

In [None]:
# get total amount pledged, number of pledges, and goals into a dataframe to put into visualization.
import matplotlib.pyplot as plt

## Comparison Graph of Successful and Failed Kickstarters

In [None]:
success_fail_graph = {
    'success_total_avg' : 24100,
    'success_goal_avg' : 9533,
    'fail_total_avg' : 1427,
    'fail_goal_avg' : 63175
}

#function that formats currencies on the x-axis in our graphs
def currency(x, pos):
    if x >= 1e6:
        s = '${:1.1f}M'.format(x*1e-6)
    else:
        s = '${:1.0f}K'.format(x*1e-3)
    return s


success_fail_values = list(success_fail_graph.values())
success_fail_keys = list(success_fail_graph.keys())
graph_mean = np.mean(success_fail_values)
fig, ax = plt.subplots()
ax.barh(success_fail_keys, success_fail_values)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45, horizontalalignment='right')
ax.set(xlabel='Average Amount Pledged', ylabel='Success/Fail Average', title='Average Goals/Pleged Amounts For Kickstarters')
ax.xaxis.set_major_formatter(currency)

After reoganizing the successful and failed kickstarters, I took the info from both datasets and looked at the total amount of money on average were put into succeful and failed kickstarters, as well as what the goal total was on average for each dataset. Looking at the datasets on a surface level, we can see the huge amount of difference between the average amount given and what the goal amount was for the failed dataset. There is still a big difference for the succefulamount given and goal amount as well. It is possible the amount of goal money for failed kickstarters was just too high and unrealistic to ever reach those goals. The next step in this analysis is to see the categories in each dataset and see if there was a category that could have been more succeful/not successful.

## Even More Datasets! Taking a Deeper Look at Kickstarter Categories

So no we have a brief idea of the differences between successful and failed Kickstarters, we are going to take a closer look at their specific Categories to see whcih ones do the best.

In [None]:
#making a new dataset to see the categories that were successful and how much their goals were on average, how much was pledged in total
successful_categories = successful_dataset[['name', 'category', 'main_category', 'goal', 'total_amount_pledged']]
successful_categories.head(10)

In [None]:
#using the value_counts function to see what category occurs the most in the main category
main_category_dataset = (successful_categories['main_category'].value_counts(normalize=True) * 100)
print(main_category_dataset)
top_10_categories = successful_categories['main_category'].value_counts().head(10).index
top_10_categories

In [None]:
side_category_dataset = (successful_categories['category'].value_counts(normalize=True, dropna=True) * 100)
print(side_category_dataset)
top_10_side_categories = successful_categories['category'].value_counts().head(10).index
top_10_side_categories

In [None]:
#using a for-loop to append categories and how much money is pleged on average to a dictionary

avg_main_category_pledged = {}

for categories in top_10_categories:
    main_cat_only = successful_dataset[successful_dataset["main_category"] == categories]

    mean_amount = main_cat_only['total_amount_pledged'].mean()

    avg_main_category_pledged[categories] = int(mean_amount)

avg_main_category_pledged

In [None]:
avg_side_category_pledged = {}

for side_categories in top_10_side_categories:
    side_cat_only = successful_dataset[successful_dataset['category'] == side_categories]

    side_mean_amount = side_cat_only['total_amount_pledged'].mean()

    avg_side_category_pledged[side_categories] = int(side_mean_amount)

avg_side_category_pledged

In [None]:
#creating a graph and customizing it from the main_categories dictionary and side categories dictionary.
avg_cat_pleged_values = list(avg_main_category_pledged.values())
avg_cat_pleged_keys = list(avg_main_category_pledged.keys())
graph_mean = np.mean(avg_cat_pleged_values)
fig, ax = plt.subplots()
ax.barh(avg_cat_pleged_keys, avg_cat_pleged_values)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45, horizontalalignment='right')
ax.set(xlabel='Average Amount Pledged', ylabel='Main Categories', title='Average for Main Categories')
ax.xaxis.set_major_formatter(currency)

In [None]:
avg_side_cat_pleged_values = list(avg_side_category_pledged.values())
avg_side_cat_pleged_keys = list(avg_side_category_pledged.keys())
graph_mean = np.mean(avg_side_cat_pleged_values)
fig, ax = plt.subplots()
ax.barh(avg_side_cat_pleged_keys, avg_side_cat_pleged_values)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45, horizontalalignment='right')
ax.set(xlabel='Average Amount Pledged', ylabel='Side Categories', title='Average for Side Categories')
ax.xaxis.set_major_formatter(currency)
