# House of Pymoli
* **Author:** Felipe Murillo
* **Date:** April 17, 2020

* **Description:** Analyze the data for the most recent fantasy game: Heroes of Pymoli

* **Required Input Files:** /Resources/purchase_data.csv


## **>> Observable Trends**

1. The clear majority of video game purchases are made by male video game players; however, female players are spending more money per capita

2. Most video game players are in their early 20's (20 - 24 years old)

3. Kids under 10 years old are spending the most money per capita

4. An individual player is NOT willing to spend more than $20 for video game items

5. The item "Final Critic" is the most popular and most profitable item to buy

***

## **>> Configure Dependencies**

In [2]:
import pandas as pd
import os

## **>> Import Data**

In [3]:
csvfile = os.path.join("..","Resources/purchase_data.csv")

purchase_data = pd.read_csv(csvfile)

purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## **>> Player Count**

In [107]:
# Obtain list of unique players
unique_players = purchase_data["SN"].value_counts()
# Count # of unique players
no_unique_players = unique_players.count()
print(f"\033[1m Total number of players: \033[94m{no_unique_players}")

[1m Total number of players: [94m576


## **>> Purchasing Analysis Total**

In [108]:
# Calculate number of unique items
no_unique_items = purchase_data["Item Name"].value_counts().count()

In [109]:
# Calculate the average purchase price
avg_purchase_price = purchase_data["Price"].mean()

In [110]:
# Calculate number of unique purchases
no_purchases = purchase_data["Purchase ID"].value_counts().count()

In [111]:
# Calculate the total revenue of the game
total_revenue = purchase_data["Price"].sum()

### **<u>Purchasing Summary</u>**

In [112]:
# Compile summary table into a dataframe
purchase_summary_df = pd.DataFrame({
                                "No. of Unique Items":[no_unique_items],
                                "Avg. Price":[avg_purchase_price],
                                "No. of Purchases":[no_purchases],
                                "Total Revenue":[total_revenue]
                                })

# Format the summary table
purchase_summary_df["Avg. Price"] = purchase_summary_df["Avg. Price"].map("${:.2f}".format)
purchase_summary_df["Total Revenue"] = purchase_summary_df["Total Revenue"].map("${:,.2f}".format)

# Display the formatted summary table
purchase_summary_df.style.hide_index().set_properties(**{'text-align': 'center'})

No. of Unique Items,Avg. Price,No. of Purchases,Total Revenue
179,$3.05,780,"$2,379.77"


## **>> Gender Demographics**

### **Percentage and Count of Male Players**

In [148]:
# Filter purchase data for male players
malePlayers = purchase_data.loc[purchase_data["Gender"] == 'Male']

# Find no. of unique male players
no_male_players = malePlayers["SN"].value_counts().count()

# Calculate percentage of male players
malePercent = no_male_players/no_unique_players*100

### **Percentage and Count of Female Players**

In [149]:
# Filter purchase data for female players
femalePlayers = purchase_data.loc[purchase_data["Gender"] == 'Female']

# Find no. of unique male players
no_female_players = femalePlayers["SN"].value_counts().count()

# Calculate percentage of male players
femalePercent = no_female_players/no_unique_players*100

### **Percentage and Count of Other / Non-Disclosed**

In [156]:
# Filter purchase data for female players
otherPlayers = purchase_data.loc[purchase_data["Gender"] == 'Other / Non-Disclosed']

# Find no. of unique male players
no_other_Players = otherPlayers["SN"].value_counts().count()

# Calculate percentage of male players
otherPercent = no_other_Players/no_unique_players*100

### **<u>Gender Demographics Summary</u>**

In [152]:
# Compile gender demogrpahics summary table into a dataframe
gender_summary_df = pd.DataFrame({
                                "Gender":['Male','Female','Other / Non-Disclosed'],
                                "Total Count":[no_male_players,no_female_players,no_other_players],
                                "Percentage of Players":[malePercent, femalePercent,otherPercent],
                                })

# Format the summary table
gender_summary_df["Total Count"] = gender_summary_df["Total Count"].map("{:,}".format)
gender_summary_df["Percentage of Players"] = gender_summary_df["Percentage of Players"].map("{:.2f}%".format)


# Display the formatted summary table
gender_summary_df.set_index("Gender").style.set_properties(**{'text-align': 'center'})

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%


## **>> Purchasing Analysis (Gender)**

### **Purchase Count**

In [162]:
# Count purchases made by different gender groups
mPurchCnt = malePlayers["Purchase ID"].value_counts().count()
fPurchCnt = femalePlayers["Purchase ID"].value_counts().count()
oPurchCnt = otherPlayers["Purchase ID"].value_counts().count()

### **Average Purchase Price**

In [163]:
# Average purchase price by gender
mAvgPrice = malePlayers["Price"].mean()
fAvgPrice = femalePlayers["Price"].mean()
oAvgPrice = otherPlayers["Price"].mean()

### **Total Purchase Value**

In [164]:
# Total purchase price by gender
mTotPrice = malePlayers["Price"].sum()
fTotPrice = femalePlayers["Price"].sum()
oTotPrice = otherPlayers["Price"].sum()

### **Average Purchase Total per Person by Gender**

In [165]:
# Avg purchase total per person by gender
mAvgPP = mTotPrice/no_male_players
fAvgPP = fTotPrice/no_female_players
oAvgPP = oTotPrice/no_other_players

### **<u>Purchasing Analysis (Gender) Summary</u>**

In [174]:
# Compile gender demogrpahics summary table into a dataframe
p_gender_summary_df = pd.DataFrame({
                                "Gender":['Male','Female','Other / Non-Disclosed'],
                                "Purchase Count":[mPurchCnt,fPurchCnt,oPurchCnt],
                                "Avg. Purchase Count":[mAvgPrice, fAvgPrice,oAvgPrice],
                                "Total Purchase Count":[mTotPrice, fTotPrice,oTotPrice],
                                "Avg. Purchase Total per Person":[mAvgPP, fAvgPP,oAvgPP]
                                })

# Format the summary table
p_gender_summary_df["Avg. Purchase Count"] = p_gender_summary_df["Avg. Purchase Count"].map("${:,.2f}".format)
p_gender_summary_df["Total Purchase Count"] = p_gender_summary_df["Total Purchase Count"].map("${:,.2f}".format)
p_gender_summary_df["Avg. Purchase Total per Person"] = p_gender_summary_df["Avg. Purchase Total per Person"].map("${:,.2f}".format)

# Display the formatted summary table
p_gender_summary_df.sort_values("Gender").set_index("Gender").style.set_properties(**{'text-align': 'center'})



Unnamed: 0_level_0,Purchase Count,Avg. Purchase Count,Total Purchase Count,Avg. Purchase Total per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## **>> Age Demographics**

In [231]:
# Create an age bin
bins = [0,9,14,19,24,29,34,39,purchase_data["Age"].max()]

# Create bin labels for age groups
grp_labels = ["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]

# Add age group column to purchase data
purchase_data["Age Group"] = pd.cut(purchase_data["Age"],bins,labels=grp_labels)

# Count number of customers in each age group
grpCnt = []
# Cycle through each age group
for grp in grp_labels:
    # Make temporary groups
    grpFilter = pd.DataFrame(purchase_data.loc[purchase_data["Age Group"]==grp])
    # Count unique individuals within group
    grpCnt.append(grpFilter["SN"].value_counts().count())

# Display age demographics
age_summary_df = pd.DataFrame({
                                "Age Group":grp_labels,
                                "Total Count":grpCnt,
                                "Percentage of Players":grpCnt/no_unique_players*100,
                                })

# Format Summary Table
age_summary_df["Percentage of Players"] = age_summary_df["Percentage of Players"].map("{:.2f}%".format)

# Display age demographics summary table
age_summary_df.set_index("Age Group").style.set_properties(**{'text-align': 'center'})

Unnamed: 0_level_0,Total Count,Percentage of Players
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


## **>> Purchasing Analysis (Age)**

In [246]:
# Initialize lists containingg purchasing data points
age_purchCnt = []
age_avgPurchPrice = []
age_totPurchVal = []
age_avgPurchTot = []

# Cycle through each age group
for grp in grp_labels:
    
    # Create an age group filter
    grpFilter = pd.DataFrame(purchase_data.loc[purchase_data["Age Group"]==grp])
    
    # Calculate purchase count for each age group
    age_purchCnt.append(grpFilter["Purchase ID"].value_counts().count())
                        
    # Calculate average purchase count for each age group
    age_avgPurchPrice.append(grpFilter["Price"].mean())
                             
    # Calculate total purchase value for each age group
    age_totPurchVal.append(grpFilter["Price"].sum())
    
                             
# Calculate average purchase total per person for each age group
age_avgPurchTot = [x/y for x,y in zip(age_totPurchVal, age_summary_df["Total Count"])]
                             

### **<u>Purchasing Analysis (Age) Summary</u>**

In [419]:
# Compile gender demogrpahics summary table into a dataframe
p_age_summary_df = pd.DataFrame({
                                "Age Ranges":grp_labels,
                                "Purchase Count":age_purchCnt,
                                "Avg. Purchase Count":age_avgPurchPrice,
                                "Total Purchase Count":age_totPurchVal,
                                "Avg. Purchase Total per Person":age_avgPurchTot
                                })

# Format the summary table
p_age_summary_df["Avg. Purchase Count"] = p_age_summary_df["Avg. Purchase Count"].map("${:,.2f}".format)
p_age_summary_df["Total Purchase Count"] = p_age_summary_df["Total Purchase Count"].map("${:,.2f}".format)
p_age_summary_df["Avg. Purchase Total per Person"] = p_age_summary_df["Avg. Purchase Total per Person"].map("${:,.2f}".format)

# Display the formatted summary table
p_age_summary_df.set_index("Age Ranges").style.set_properties(**{'text-align': 'center'})

Unnamed: 0_level_0,Purchase Count,Avg. Purchase Count,Total Purchase Count,Avg. Purchase Total per Person
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


## **>> Top Spenders**

In [38]:
# Obtain list of unique spenders
spenders = purchase_data.groupby(["SN"])

# Calculate total purchases made by indiv. spenders
spenders_totPurch = spenders["Price"].sum()

# Count number of purchases made by indiv. spenders
spenders_purchCnt = spenders["Purchase ID"].count()

# Calculate average purchase price
spenders_avgPurch = [i/j for i,j in zip(spenders_totPurch, spenders_purchCnt)]

### **<u>Top Spenders Summary</u>**

In [39]:
# Compile top spender summary table into a dataframe
spender_summary_df = pd.DataFrame({
                                "Purchase Count":spenders_purchCnt,
                                "Avg. Purchase Count":spenders_avgPurch,
                                "Total Purchase Value":spenders_totPurch
                                })

# Sort from top spender to lowest spender
spender_summary_sorted = spender_summary_df.sort_values(by="Total Purchase Value",ascending = False)

# Format the sorted summary table
spender_summary_sorted["Avg. Purchase Count"] = spender_summary_df["Avg. Purchase Count"].map("${:,.2f}".format)
spender_summary_sorted["Total Purchase Value"] = spender_summary_df["Total Purchase Value"].map("${:,.2f}".format)


# Display the formatted summary table; provide top 5 spenders
spender_summary_sorted.head().style.set_properties(**{'text-align': 'center'})

Unnamed: 0_level_0,Purchase Count,Avg. Purchase Count,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


## **>> Most Popular Items**

In [5]:
# Create item dataframe
items_db = purchase_data[["Item ID","Item Name","Price"]]

# Group by Item ID and Item Name
grouped_items_db = items_db.groupby(by=["Item ID","Item Name"])

# Calculate number of items in group
noItems = grouped_items_db["Item Name"].count()

# Add total purchase price for each item
totalItemPrice = grouped_items_db["Price"].sum()

# Price of each item
priceItem = [x/y for x,y in zip(totalItemPrice,noItems)]

### **<u>Most Popular Items Summary</u>**

In [6]:
# Compile most popular item summary table into a dataframe
item_summary_df = pd.DataFrame({
                                "Purchase Count":noItems,
                                "Item Price":priceItem,
                                "Total Purchase Value":totalItemPrice
                                })

# Sort from top spender to lowest spender
item_summary_sorted = item_summary_df.sort_values(by="Purchase Count",ascending = False)

# Format the sorted summary table
item_summary_sorted["Item Price"] = item_summary_df["Item Price"].map("${:,.2f}".format)
item_summary_sorted["Total Purchase Value"] = item_summary_df["Total Purchase Value"].map("${:,.2f}".format)


# Display the formatted summary table; provide top 5 spenders
item_summary_sorted.head().style.set_properties(**{'text-align': 'center'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
132,Persuasion,9,$3.22,$28.99
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77


## **>> Most Profitable Items**

### **<u>Most Profitable Items Summary</u>**

In [7]:
# Sort from top spender to lowest spender
item_summary_profit = item_summary_df.sort_values(by="Total Purchase Value",ascending = False)

# Format the sorted summary table
item_summary_profit["Item Price"] = item_summary_df["Item Price"].map("${:,.2f}".format)
item_summary_profit["Total Purchase Value"] = item_summary_df["Total Purchase Value"].map("${:,.2f}".format)


# Display the formatted summary table; provide top 5 spenders
item_summary_profit.head().style.set_properties(**{'text-align': 'center'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
103,Singed Scalpel,8,$4.35,$34.80


***