### Heroes Of Pymoli Data Analysis
* The input used for this analysis includes 780 entries.  Within the dataset there are 576 unique player ID's included on teh file.  

* The players are broken into three gender groups: Male, Female and Unknown.  The vast majority are male (84.03%). Female players make up 14.06% of the set.  

* The peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
* The best selling game is Final Critic, earning $59.99
* As expected, male players complete the most purchases. 

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np
from pandas import DataFrame
from pandas import Series

# Raw data file
file_to_load = "Resources/purchase_data.csv"

# Read purchasing file and store into pandas data frame
df = pd.read_csv(file_to_load)
df.head(10)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
5,5,Yalae81,22,Male,81,Dreamkiss,3.61
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67
8,8,Undjask33,22,Male,21,Souleater,1.1
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58


## Player Count

* Display the total number of players

In [2]:
# Copy the player data into a new dataframe, removing duplicates to get accurate player counts.
players_df = df[["SN", "Gender", "Age"]]
players_df.head()

Unnamed: 0,SN,Gender,Age
0,Lisim78,Male,20
1,Lisovynya38,Male,40
2,Ithergue48,Male,24
3,Chamassasya86,Male,24
4,Iskosia90,Male,23


In [3]:
# Identify the unique player ID's.
unique_player_id = players_df.drop_duplicates('SN')
unique_player_id

unique_player_id.head()

Unnamed: 0,SN,Gender,Age
0,Lisim78,Male,20
1,Lisovynya38,Male,40
2,Ithergue48,Male,24
3,Chamassasya86,Male,24
4,Iskosia90,Male,23


In [4]:
# Count the number of unique players ID's
tot_players = unique_player_id.count()
tot_players

print(tot_players['SN'])

576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [5]:
# list out all of the basic information for the dataframe. 
df.describe()

Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,92.114103,3.050987
std,225.310896,6.659444,52.775943,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,48.0,1.98
50%,389.5,22.0,93.0,3.15
75%,584.25,25.0,139.0,4.08
max,779.0,45.0,183.0,4.99


In [6]:
# Check to see if there are any rows with missing data
df.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

In [7]:
# Count the number of items sold by item type 
Sum_Item_count = df.groupby('Item Name')['Item ID'].count()
Sum_Item_count

# Sum the sales by item type
Sum_Item_price = df.groupby('Item Name')['Price'].sum()
Sum_Item_price

# Get the average amount by item name. 
Avg_Item_Price = df.groupby('Item Name')['Price'].mean()
Avg_Item_Price

# Save the Item Name and count into a summary data frame
Summary_df = pd.DataFrame({"Item Count": Sum_Item_count,
                                           "Total Sales": Sum_Item_price,
                                           "Average Item Price": Avg_Item_Price})

# Format the total sales value 
Summary_df['Total Sales'] = Summary_df['Total Sales'].astype(float).map(
    "${:,.2f}".format)

# Format the total sales value 
Summary_df['Average Item Price'] = Summary_df['Average Item Price'].astype(float).map(
    "${:,.2f}".format)

# show the first five rows of the summary table. 
Summary_df.head()

Unnamed: 0_level_0,Item Count,Total Sales,Average Item Price
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abyssal Shard,5,$13.35,$2.67
"Aetherius, Boon of the Blessed",5,$16.95,$3.39
Agatha,6,$18.48,$3.08
Alpha,3,$6.21,$2.07
"Alpha, Oath of Zeal",3,$12.15,$4.05


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [8]:
# Count Gender Count
Gender_cnt = unique_player_id.groupby('Gender')['Gender'].count()
Gender_cnt.head()

# Save the counts into a summary data frame
Gender_summary_df = pd.DataFrame({"Gender Count": Gender_cnt})
Gender_summary_df

Unnamed: 0_level_0,Gender Count
Gender,Unnamed: 1_level_1
Female,81
Male,484
Other / Non-Disclosed,11


In [9]:
# Get percentage for each of the genders
Gender_pct = (Gender_summary_df['Gender Count'] / tot_players['SN']) * 100

# Append the percentages to the summary table. 
Gender_summary_df["Gender Percentage"] = Gender_pct

# Format the percentage field to limit the number of positions after the decimal and add % sign
Gender_summary_df["Gender Percentage"] = Gender_summary_df["Gender Percentage"].map("{:.2f}%".format)

Gender_summary_df.head()

Unnamed: 0_level_0,Gender Count,Gender Percentage
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.06%
Male,484,84.03%
Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [10]:
# Copy the purchase data into a new dataframe, summing the purchases by gender. 
purchase_df = df[["SN", "Price", "Item Name", "Gender", "Age"]]
purchase_df.head()

Unnamed: 0,SN,Price,Item Name,Gender,Age
0,Lisim78,3.53,"Extraction, Quickblade Of Trembling Hands",Male,20
1,Lisovynya38,1.56,Frenzied Scimitar,Male,40
2,Ithergue48,4.88,Final Critic,Male,24
3,Chamassasya86,3.27,Blindscythe,Male,24
4,Iskosia90,1.44,Fury,Male,23


In [11]:
# Count the number of items sold by item type 
Gender_purchase_count = df.groupby('Gender')['Item Name'].count()
Gender_purchase_count

# Sum the sales by item type
Gender_Sum_purchases = df.groupby('Gender')['Price'].sum()
Gender_Sum_purchases

# Get the average amount by item name. 
Gender_Avg_Purchase = df.groupby('Gender')['Price'].mean()
Gender_Avg_Purchase

# Save the Item Name and count by gender into a summary data frame
Gender_Purchase_Summary_df = pd.DataFrame({"Items Purchased by Gender": Gender_purchase_count,
                                           "Total Sales": Gender_Sum_purchases,
                                           "Average Purchase": Gender_Avg_Purchase})
# Format the total sales value 
Gender_Purchase_Summary_df['Total Sales'] = Gender_Purchase_Summary_df['Total Sales'].astype(float).map(
    "${:,.2f}".format)

# Format the total sales value 
Gender_Purchase_Summary_df['Average Purchase'] = Gender_Purchase_Summary_df['Average Purchase'].astype(float).map(
    "${:,.2f}".format)

# show the first five rows of the summary table. 
Gender_Purchase_Summary_df.head()

Unnamed: 0_level_0,Items Purchased by Gender,Total Sales,Average Purchase
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,$361.94,$3.20
Male,652,"$1,967.64",$3.02
Other / Non-Disclosed,15,$50.19,$3.35


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [12]:
# Establish bins for ages
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

df["Age Group"] = pd.cut(df["Age"], age_bins, labels=group_names)
df

# Count the number of items sold by age group 
Age_purchase_count = df.groupby('Age Group')['Item Name'].count()
Age_purchase_count

# Sum the sales by item type
Age_Sum_purchases = df.groupby('Age Group')['Price'].sum()
Age_Sum_purchases

# Get the average amount by item name. 
Age_Avg_Purchase = df.groupby('Age Group')['Price'].mean()
Age_Avg_Purchase

# Save the Item Name and count into a summary data frame
Age_Purchase_Summary_df = pd.DataFrame({"Item Count by Age Group": Age_purchase_count,
                                           "Total Sales": Age_Sum_purchases,
                                           "Average Purchase": Age_Avg_Purchase})
# Format the total sales value 
Age_Purchase_Summary_df['Total Sales'] = Age_Purchase_Summary_df['Total Sales'].astype(float).map(
    "${:,.2f}".format)

# Format the total sales value 
Age_Purchase_Summary_df['Average Purchase'] = Age_Purchase_Summary_df['Average Purchase'].astype(float).map(
    "${:,.2f}".format)

# show the first five rows of the summary table. 
Age_Purchase_Summary_df.head(100)

Unnamed: 0_level_0,Item Count by Age Group,Total Sales,Average Purchase
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
<10,23,$77.13,$3.35
10-14,28,$82.78,$2.96
15-19,136,$412.89,$3.04
20-24,365,"$1,114.06",$3.05
25-29,101,$293.00,$2.90
30-34,73,$214.00,$2.93
35-39,41,$147.67,$3.60
40+,13,$38.24,$2.94


## Purchasing Analysis (User)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below

* Create a summary data frame to hold the results

* Optional: give the displayed data cleaner formatting

* Display the summary data frame

In [13]:
# Count the number of items sold by user 
User_purchase_count = df.groupby('SN')['Item Name'].count()
User_purchase_count

# Sum the sales by item type
User_Sum_purchases = df.groupby('SN')['Price'].sum()
User_Sum_purchases

# Get the average amount by item name. 
User_Avg_Purchase = df.groupby('SN')['Price'].mean()
User_Avg_Purchase

# Save the Item Name and count into a summary data frame
User_Purchase_Summary_df = pd.DataFrame({"Item Count by User": User_purchase_count,
                                           "Total Sales": User_Sum_purchases,
                                           "Average Purchase": User_Avg_Purchase})
# Format the total sales value 
User_Purchase_Summary_df['Total Sales'] = User_Purchase_Summary_df['Total Sales'].astype(float).map(
    "${:,.2f}".format)

# Format the total sales value 
User_Purchase_Summary_df['Average Purchase'] = User_Purchase_Summary_df['Average Purchase'].astype(float).map(
    "${:,.2f}".format)

# show the first five rows of the summary table. 
User_Purchase_Summary_df.head(100)

Unnamed: 0_level_0,Item Count by User,Total Sales,Average Purchase
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adairialis76,1,$2.28,$2.28
Adastirin33,1,$4.48,$4.48
Aeda94,1,$4.91,$4.91
Aela59,1,$4.32,$4.32
Aelaria33,1,$1.79,$1.79
Aelastirin39,2,$7.29,$3.64
Aelidru27,1,$1.09,$1.09
Aelin32,3,$8.98,$2.99
Aelly27,2,$6.79,$3.39
Aellynun67,1,$3.74,$3.74


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [14]:
# Sort the file by purchases 
User_Purchase_Summary_df = User_Purchase_Summary_df.sort_values("Total Sales", ascending=False)
User_Purchase_Summary_df.head(100)

Unnamed: 0_level_0,Item Count by User,Total Sales,Average Purchase
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Haillyrgue51,3,$9.50,$3.17
Phistym51,2,$9.50,$4.75
Lamil79,2,$9.29,$4.64
Aina42,3,$9.22,$3.07
Saesrideu94,2,$9.18,$4.59
Arin32,2,$9.09,$4.54
Rarallo90,3,$9.05,$3.02
Baelollodeu94,2,$9.03,$4.51
Aelin32,3,$8.98,$2.99
Lisopela58,3,$8.86,$2.95


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [15]:
# Count the number of items sold by user 
Item_purchase_count = df.groupby('Item Name')['Item ID'].count()
Item_purchase_count

# Sum the sales by item type
Item_Sum_purchases = df.groupby('Item Name')['Price'].sum()
Item_Sum_purchases

# Get the average amount by item name. 
Item_Avg_Purchase = df.groupby('Item Name')['Price'].mean()
Item_Avg_Purchase

# Save the Item Name and count into a summary data frame
Item_Purchase_Summary_df = pd.DataFrame({"Item Count by Type": Item_purchase_count,
                                           "Total Sales": Item_Sum_purchases,
                                           "Average Purchase": Item_Avg_Purchase})
# Format the total sales value 
Item_Purchase_Summary_df['Total Sales'] = Item_Purchase_Summary_df['Total Sales'].astype(float).map(
    "${:,.2f}".format)

# Format the total sales value 
Item_Purchase_Summary_df['Average Purchase'] = Item_Purchase_Summary_df['Average Purchase'].astype(float).map(
    "${:,.2f}".format)

# show the first five rows of the summary table. 
Item_Purchase_Summary_df.head(100)

Unnamed: 0_level_0,Item Count by Type,Total Sales,Average Purchase
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abyssal Shard,5,$13.35,$2.67
"Aetherius, Boon of the Blessed",5,$16.95,$3.39
Agatha,6,$18.48,$3.08
Alpha,3,$6.21,$2.07
"Alpha, Oath of Zeal",3,$12.15,$4.05
"Alpha, Reach of Ending Hope",1,$3.58,$3.58
Amnesia,6,$13.08,$2.18
Apocalyptic Battlescythe,6,$11.82,$1.97
Arcane Gem,3,$11.37,$3.79
Avenger,6,$20.64,$3.44


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [16]:
# Sort the file by Top Selling items
Item_Purchase_Summary_df = Item_Purchase_Summary_df.sort_values("Item Count by Type", ascending=False)
Item_Purchase_Summary_df.head()

Unnamed: 0_level_0,Item Count by Type,Total Sales,Average Purchase
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Final Critic,13,$59.99,$4.61
"Oathbreaker, Last Hope of the Breaking Storm",12,$50.76,$4.23
Persuasion,9,$28.99,$3.22
Nirvana,9,$44.10,$4.90
"Extraction, Quickblade Of Trembling Hands",9,$31.77,$3.53
