### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [2]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# Raw data file
file_to_load = "purchase_data.csv"

# Read purchasing file and store into pandas data frame
purchase_data = pd.read_csv(file_to_load)

purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [6]:
player_count = len(purchase_data["SN"].unique())
player_count_df = pd.DataFrame({'Total Number of Players':[player_count]})
player_count_df

Unnamed: 0,Total Number of Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [9]:
#define some variables to call some pandas methods

unique_items = len(purchase_data["Item ID"].unique())
#CHECKPOINT #unique_items 

purchase_count = purchase_data["Purchase ID"].count()
#CHECKPOINT: #purchase_count

avg_price = (purchase_data["Price"].mean())
#CHECKPOINT #avg_price 

#total_rev will sum,  call format() to get the thousands separator, and then convert to a string to get the "$"
total_rev = purchase_data["Price"].sum()
#CHECKPOINT #total_rev

#create the summary dataframe
purchasing_df = pd.DataFrame({"Number of Unique Items": [unique_items],"Average Price":[avg_price],"Number of Purchases":[purchase_count],"Total Revenue":[total_rev]})

#formatting da monies
purchasing_df["Average Price"] = purchasing_df["Average Price"].astype(float).map("${:,.2f}".format)
purchasing_df["Total Revenue"] = purchasing_df["Total Revenue"].astype(float).map("${:,.2f}".format)

#Display results
purchasing_df.head()


Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$3.05,780,"$2,379.77"


## Gender Demographics

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [12]:
#creates a unique list of user info by removing duplicates
purchase_demographics = purchase_data.loc[:,["Gender", "SN", "Age"]].drop_duplicates(subset="SN", keep="first")

#create the two data frames... also do the index nonsense to allow for merging later
gender_counts_df = pd.DataFrame(purchase_demographics["Gender"].value_counts())
gender_percents_df = pd.DataFrame((gender_counts_df/player_count)*100)

gender_counts_df = gender_counts_df.reset_index().rename(columns={"index":"Gender", "Gender":"Count"})
gender_percents_df = gender_percents_df.reset_index().rename(columns={"index":"Gender","Gender":"Percent"})

#format percents purty
gender_percents_df["Percent"] = gender_percents_df["Percent"].astype(float).map("{:,.2f}%".format)
#CHECKPOINT gender_counts_df; gender_percents_df

#create and display the merged df
gender_merge_df = gender_counts_df.merge(gender_percents_df, on="Gender")
gender_merge_df

Unnamed: 0,Gender,Count,Percent
0,Male,484,84.03%
1,Female,81,14.06%
2,Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, etc. by gender


* For normalized purchasing, divide total purchase value by purchase count, by gender


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [13]:
#create the grouping based on gender for performing calculations
gender_group = purchase_data.groupby("Gender")

#create the average price dataframe
avg_price = pd.DataFrame(gender_group["Price"].mean().astype(float).map("${:,.2f}".format))
avg_price_df = avg_price.rename(columns={"Price":"Average Purchase Price"})
#CHECKPOINT: avg_price_df

#create the total price dataframe
total_price = pd.DataFrame(gender_group["Price"].sum().astype(float).map("${:,.2f}".format))
total_price_df = total_price.rename(columns={"Price":"Total Purchase Value"})
#CHECKPOINT: total_price_df

#More merging... this time with nesting tho
gender_analysis_df = total_price_df.merge(gender_merge_df.merge(avg_price_df, on="Gender"),on="Gender")
gender_analysis_df = gender_analysis_df[["Gender","Count","Percent","Average Purchase Price","Total Purchase Value"]]

gender_analysis_df

Unnamed: 0,Gender,Count,Percent,Average Purchase Price,Total Purchase Value
0,Female,81,14.06%,$3.20,$361.94
1,Male,484,84.03%,$3.02,"$1,967.64"
2,Other / Non-Disclosed,11,1.91%,$3.35,$50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [15]:
# Establish bins for ages
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

#categorize the players using pd.cut()
agebin_df = pd.cut(purchase_demographics["Age"], age_bins, labels=group_names)
#CHECKPOINT: agebin_df

#calculations for the bins
age_bin_counts = agebin_df.value_counts()
#CHECKPOINT: age_bin_counts

age_bin_percents = (agebin_df.value_counts()/player_count)*100
#CHECKPOINT: age_bin_percents

age_df = pd.DataFrame({"Count":age_bin_counts,"Percent":age_bin_percents}).reset_index().rename(columns={"index":"Age Group"})
age_df["Percent"] = age_df["Percent"].astype(float).map("{:,.2f}%".format)
#age_df.sort_by("Age Group")
age_df

Unnamed: 0,Age Group,Count,Percent
0,20-24,258,44.79%
1,15-19,107,18.58%
2,25-29,77,13.37%
3,30-34,52,9.03%
4,35-39,31,5.38%
5,10-14,22,3.82%
6,<10,17,2.95%
7,40+,12,2.08%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, etc. in the table below


* Calculate Normalized Purchasing


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [16]:
#add the age bin to the original data set
purchase_data["Age Group"] = agebin_df
#CHECKPOINT: purchase_data.head()

#create the grouping based on age for performing calculations
age_group = purchase_data.groupby("Age Group")

#create the average price dataframe
avg_age_price = pd.DataFrame(age_group["Price"].mean().astype(float).map("${:,.2f}".format))
avg_age_price_df = avg_age_price.rename(columns={"Price":"Average Purchase Price"})
#CHECKPOINT: avg_age_price_df

#create the total price dataframe
total_age_price = pd.DataFrame(age_group["Price"].sum().astype(float).map("${:,.2f}".format))
total_age_price_df = total_age_price.rename(columns={"Price":"Total Purchase Value"})
#CHECKPOINT: total_age_price_df

#Moooooorrrre merging... 
age_analysis_df = avg_age_price_df.merge(total_age_price_df.merge(age_df, on="Age Group"), on="Age Group")
#reorder the columns
age_analysis_df = age_analysis_df[["Age Group","Count","Percent","Average Purchase Price","Total Purchase Value"]]
age_analysis_df

Unnamed: 0,Age Group,Count,Percent,Average Purchase Price,Total Purchase Value
0,<10,17,2.95%,$3.39,$57.63
1,10-14,22,3.82%,$3.07,$67.64
2,15-19,107,18.58%,$3.10,$331.88
3,20-24,258,44.79%,$3.06,$790.39
4,25-29,77,13.37%,$2.91,$223.93
5,30-34,52,9.03%,$2.92,$151.92
6,35-39,31,5.38%,$3.51,$108.81
7,40+,12,2.08%,$3.04,$36.45


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [17]:
#create the grouping based on scree names for performing calculations
sn_group = purchase_data.groupby("SN")

#create a unique list of screen names? tbd
#sn = pd.DataFrame(sn_group["SN"].unique())

#create the value count price dataframe
sn_count = pd.DataFrame(purchase_data["SN"].value_counts()).reset_index().rename(columns={"index":"SN", "SN":"Purchase Count"}).set_index("SN")
#CHECKPOINT: sn_count

#create the average price dataframe
avg_sn_price = pd.DataFrame(sn_group["Price"].mean().astype(float).map("${:,.2f}".format))
avg_sn_price_df = avg_sn_price.rename(columns={"Price":"Average Purchase Price"})
#CHECKPOINT: avg_sn_price_df

#mergey mergey merge merge
total_sn_price = pd.DataFrame(sn_group["Price"].sum().astype(float).map("${:,.2f}".format))
total_sn_price_df = total_sn_price.rename(columns={"Price":"Total Purchase Value"})
#CHECKPOINT: total_sn_price_df

#Merging this way autosorts the data
sn_df = sn_count.merge(avg_sn_price, on="SN")
sn_df = sn_df.merge(total_sn_price_df, on="SN")

#display
sn_df.head()

Unnamed: 0_level_0,Purchase Count,Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Iral74,4,$3.40,$13.62
Strithenu87,3,$3.39,$10.18
Yathecal82,3,$2.07,$6.22


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [27]:
#pull three specific columns; this creates a unique list
purchase_data_sub = pd.DataFrame(purchase_data.loc[:,["Item ID","Item Name","Price"]]).set_index('Item ID')

#groupby two ids
item_group = purchase_data_sub.groupby(["Item ID","Item Name"])

#calculate some stuff: purchase count and total purchase value
item_count = pd.DataFrame(item_group.count()).rename(columns={"Price":"Purchase Count"})
item_total = pd.DataFrame(item_group.sum()).rename(columns={"Price":"Total Purchase Value"})

#merge to create the final data frame
item_summary = (purchase_data_sub.merge(item_total.merge(item_count, on="Item ID"), on="Item ID")).drop_duplicates()

#formatting
item_summary.sort_values(by=['Purchase Count'])
item_summary['Price'] = item_summary['Price'].astype(float).map("${:,.2f}".format)
item_summary['Total Purchase Value'] = item_summary['Total Purchase Value'].astype(float).map("${:,.2f}".format)
item_summary = item_summary.reset_index()
item_summary = item_summary[['Item ID','Item Name','Purchase Count','Price','Total Purchase Value']]

#display a preview of the final data frame
item_summary.head()

#TESTING
item_count.sort_values(by=['Purchase Count'],ascending=False)
#item_total.sort_values(by=['Total Purchase Value'],ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count
Item ID,Item Name,Unnamed: 2_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12
145,Fiery Glass Crusader,9
108,"Extraction, Quickblade Of Trembling Hands",9
82,Nirvana,9
19,"Pursuit, Cudgel of Necromancy",8
103,Singed Scalpel,8
75,Brutality Ivory Warmace,8
72,Winter's Bite,8
60,Wolf,8
59,"Lightning, Etcher of the King",8


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [43]:
item_summary.sort_values(by=['Total Purchase Value'],ascending=False)
item_summary.head()

Unnamed: 0,Item ID,Item Name,Purchase Count,Price,Total Purchase Value
0,108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
1,143,Frenzied Scimitar,6,$1.56,$9.36
2,92,Final Critic,8,$4.88,$39.04
3,100,Blindscythe,5,$3.27,$16.35
4,131,Fury,5,$1.44,$7.20
