### Vote Counting

* In this challenge, you are tasked with helping a small, rural town modernize its vote-counting process. (Up until now, Uncle Cleetus had been trustfully tallying them one-by-one, but unfortunately, his concentration isn't what it used to be.)

* You will be give a set of poll data called [election_data.csv](PyPoll/Resources/election_data.csv). The dataset is composed of three columns: `Voter ID`, `County`, and `Candidate`. Your task is to create a Python script that analyzes the votes and calculates each of the following:

  * The total number of votes cast

  * A complete list of candidates who received votes

  * The percentage of votes each candidate won

  * The total number of votes each candidate won

  * The winner of the election based on popular vote.

* As an example, your analysis should look similar to the one below:

  ```text
  Election Results
  -------------------------
  Total Votes: 3521001
  -------------------------
  Khan: 63.000% (2218231)
  Correy: 20.000% (704200)
  Li: 14.000% (492940)
  O'Tooley: 3.000% (105630)
  -------------------------
  Winner: Khan
  -------------------------
  ```

* In addition, your final script should both print the analysis to the terminal and export a text file with the results.

## Hints and Considerations

* Consider what we've learned so far. To date, we've learned how to import modules like `csv`; to read and write files in various formats; to store contents in variables, lists, and dictionaries; to iterate through basic data structures; and to debug along the way. Using what we've learned, try to break down you tasks into discrete mini-objectives. This will be a _much_ better course of action than attempting to Google Search for a miracle.

* As you will discover, for some of these challenges, the datasets are quite large. This was done purposefully, as it showcases one of the limits of Excel-based analysis. While our first instinct, as data analysts, is often to head straight into Excel, creating scripts in Python can provide us with more robust options for handling "big data".

* Your scripts should work for each dataset provided. Run your script for each dataset separately to make sure that the code works for different data.

* Feel encouraged to work in groups, but don't shortchange yourself by copying someone else's work. You get what you put in, and the art of programming is extremely unforgiving to moochers. Dig your heels in, burn the night oil, and learn this while you can! These are skills that will pay dividends in your future career.

* Start early, and reach out for help often! Challenge yourself to identify _specific_ questions for your instructors and TAs. Don't resign yourself to simply saying, "I'm totally lost." Come prepared to show your effort and thought patterns, we'll be happy to help along the way.

* Always commit your work and back it up with GitHub pushes. You don't want to lose hours of your work because you didn't push it to GitHub every half hour or so.

In [1]:
import pandas as pd

In [4]:
# Read our Kickstarter data into pandas
df = pd.read_csv("Resources/KickstarterData.csv")

In [5]:
# Get a list of all of our columns for easy reference
list(df)

['id',
 'photo',
 'name',
 'blurb',
 'goal',
 'pledged',
 'state',
 'slug',
 'disable_communication',
 'country',
 'currency',
 'currency_symbol',
 'currency_trailing_code',
 'deadline',
 'state_changed_at',
 'created_at',
 'launched_at',
 'staff_pick',
 'is_starrable',
 'backers_count',
 'static_usd_rate',
 'usd_pledged',
 'creator',
 'location',
 'category',
 'profile',
 'spotlight',
 'urls',
 'source_url',
 'friends',
 'is_starred',
 'is_backing',
 'permissions']

In [25]:
# Extract "name", "goal", "pledged", "state", "country", "staff_pick",
# "backers_count", and "spotlight"
df2 = df[["name", "goal", "pledged", "state", "country", "staff_pick", "backers_count", "spotlight"]]
df2.head()

Unnamed: 0,name,goal,pledged,state,country,staff_pick,backers_count,spotlight
0,The Class Act Players Theatre Company Presents...,1500.0,2925.0,successful,US,False,17,True
1,MR INCREDIBLE by Camilla Whitehill - VAULT Fes...,2500.0,2936.0,successful,GB,True,15,True
2,RUN,1000.0,1200.0,successful,GB,False,30,True
3,9th International Meeting of Youth Theatre sap...,2000.0,2135.0,successful,IT,False,24,True
4,Get Conti to the Ed Fringe!,1000.0,1250.0,successful,GB,False,28,True


In [37]:
# Remove projects that made no money at all
df_subsetted = df2[df2["pledged"] > 0]
df_subsetted.head()

Unnamed: 0,name,goal,pledged,state,country,staff_pick,backers_count,spotlight
0,The Class Act Players Theatre Company Presents...,1500.0,2925.0,successful,US,False,17,True
1,MR INCREDIBLE by Camilla Whitehill - VAULT Fes...,2500.0,2936.0,successful,GB,True,15,True
2,RUN,1000.0,1200.0,successful,GB,False,30,True
3,9th International Meeting of Youth Theatre sap...,2000.0,2135.0,successful,IT,False,24,True
4,Get Conti to the Ed Fringe!,1000.0,1250.0,successful,GB,False,28,True


In [39]:
# Collect only those projects that were hosted in the US
df_subsetted = df_subsetted[df_subsetted["country"] == "US"]
df_subsetted.head()

Unnamed: 0,name,goal,pledged,state,country,staff_pick,backers_count,spotlight
0,The Class Act Players Theatre Company Presents...,1500.0,2925.0,successful,US,False,17,True
8,Forefront Festival 2015,7200.0,7230.0,successful,US,False,68,True
11,Hamlet the Hip-Hopera,9747.0,10103.0,successful,US,True,132,True
14,Pride Con,15000.0,15110.0,successful,US,False,60,True
15,En Garde Arts Emerging Artists Festival BOSSS,10000.0,10306.0,successful,US,True,80,True


In [49]:
# Create a new column that finds the average amount pledged to a project
df_subsetted["average_pledged"] = int(df_subsetted["pledged"].sum()/len(df_subsetted))
df_subsetted.head()

Unnamed: 0,name,goal,pledged,state,country,staff_pick,backers_count,spotlight,average_pledged
0,The Class Act Players Theatre Company Presents...,1500.0,2925.0,successful,US,False,17,True,4637
8,Forefront Festival 2015,7200.0,7230.0,successful,US,False,68,True,4637
11,Hamlet the Hip-Hopera,9747.0,10103.0,successful,US,True,132,True,4637
14,Pride Con,15000.0,15110.0,successful,US,False,60,True,4637
15,En Garde Arts Emerging Artists Festival BOSSS,10000.0,10306.0,successful,US,True,80,True,4637


In [50]:
# First convert "average_donation", "goal", and "pledged" columns to float
df_subsetted["average_pledged"] = df_subsetted["average_pledged"].map("${:.2f}".format)
df_subsetted["goal"] = df_subsetted["goal"].map("${:.2f}".format)
df_subsetted["pledged"] = df_subsetted["pledged"].map("${:.2f}".format)
# Then Format to go to two decimal places, include a dollar sign, and use comma notation

In [52]:
# Calculate the total number of backers for all US projects
bc = df_subsetted["backers_count"].sum()
print(f"The total number of backers for all US projects is {bc}")

The total number of backers for all US projects is 89273


In [59]:
# Calculate the average number of backers for all US projects
bc = int(df_subsetted["backers_count"].mean())
print(f"The mean number of backers for all US projects is {bc}")

The mean number of backers for all US projects is 41


In [62]:
# Collect only those US campaigns that have been picked as a "Staff Pick"
df_subsetted = df_subsetted[df_subsetted["staff_pick"] == True]
df_subsetted.head()

Unnamed: 0,name,goal,pledged,state,country,staff_pick,backers_count,spotlight,average_pledged
11,Hamlet the Hip-Hopera,$9747.00,$10103.00,successful,US,True,132,True,$4637.00
15,En Garde Arts Emerging Artists Festival BOSSS,$10000.00,$10306.00,successful,US,True,80,True,$4637.00
39,"""Poor People"" at FringeNYC 2015",$5500.00,$5682.00,successful,US,True,34,True,$4637.00
44,Queen Mab's Steampunk and Fairie Street Festival,$1300.00,$3363.00,successful,US,True,62,True,$4637.00
45,RAFT: a new play by Emily Kitchens,$7500.00,$7826.00,successful,US,True,120,True,$4637.00


In [68]:
# Group by the state of the campaigns and see if staff picks matter (Seems to matter quite a bit)
grouped_by_state = df_subsetted.groupby(["state"])
grouped_by_state[["staff_pick"]].sum()

Unnamed: 0_level_0,staff_pick
state,Unnamed: 1_level_1
canceled,6.0
failed,21.0
live,2.0
successful,145.0
