#### Created by: Sabrina Karam

## Heroes of Pymoli Data Analysis

- There are 576 players who are active in the game; out of the total players the majority are males taking up 84% of the population while female players take up approximately 14%.

- The peak age demographic would be between the ages of 20-24 years old (47%) and the other group would be teenages in between 15-19 years old (~18%).

In [1]:
#set up the dependencies
import pandas as pd
import numpy as np

In [2]:
#create a file path
heroes_path = "Resources/purchase_data.csv"

In [3]:
#read the file and create a table
purchase_game = pd.read_csv(heroes_path)
purchase_game_df = pd.DataFrame(purchase_game)
purchase_game_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [4]:
#determine the number of players
sum_of_players = len(purchase_game_df["SN"].unique())
sum_of_players

player_count = pd.DataFrame({"Total Players": sum_of_players}, index = [0])
player_count

Unnamed: 0,Total Players
0,576


### Purchase Analysis

- Ran basic calculations to determine the number of unique items, average price and other attributes to help create a DataFrame to hold the results


- Cleaned up DataFrame to a more organized formatting


- Display the DataFrame

In [5]:
#calculate total number of unique items
unique_items = len(purchase_game_df["Item ID"].unique())
unique_items

179

In [6]:
#calculate the total number of purchases
purchase_total = purchase_game_df["Purchase ID"].count()
purchase_total

780

In [7]:
#calculate the total revenue for game
total_revenue = purchase_game_df["Price"].sum()
total_revenue

2379.77

In [8]:
#calculate the average price
average_prices1 = purchase_game_df["Price"].mean()
average_prices1

3.050987179487176

In [9]:
#create a summary for the data as a frame format
purchase_analysis_df = pd.DataFrame({"Number of Unique Items": unique_items, "Average Prices" : average_prices1, "Number of Purchases" : purchase_total, "Total Revenue" : total_revenue}, index=[0])
purchase_analysis_df["Average Prices"] = purchase_analysis_df["Average Prices"].map("${:,.2f}".format)
purchase_analysis_df["Total Revenue"] = purchase_analysis_df["Total Revenue"].map("${:,.2f}".format)
purchase_analysis_df

#reset the column order for the summary data frame
organize_purchase_analysis_df = purchase_analysis_df[["Number of Unique Items" , "Average Prices" , "Number of Purchases" , "Total Revenue"]]
organize_purchase_analysis_df

Unnamed: 0,Number of Unique Items,Average Prices,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


### GENDER DEMOGRAPHICS

- Ran calculations to help calculate the number of unique, average price, and other attributes help create a DataFrame to hold all results


- Percentage and Count of Male & Female Players


- Percentage and Count of Other/Non-Disclosed Genders


- Display the DataFrame

In [10]:
#calculate gender differences between players
gender_df = pd.DataFrame(purchase_game_df["Gender"].value_counts())
gender_df

gender_percentage = (purchase_game_df["Gender"].value_counts()/sum_of_players)*100
gender_percentage

#take the calculations and add them to the data frame as new column
gender_df["Percentage of Players"] = gender_percentage
gender_df["Percentage of Players"] = gender_df["Percentage of Players"].map("{:,.2f}%".format)
gender_df

#re-organize the order of the columns
organize_gender_df = gender_df[["Percentage of Players" , "Gender"]]
organize_gender_df

#rename the columns using the .rename command
final_gender_df = organize_gender_df.rename(columns={"Gender" : "Total Counts"})
final_gender_df

Unnamed: 0,Percentage of Players,Total Counts
Male,113.19%,652
Female,19.62%,113
Other / Non-Disclosed,2.60%,15


In [None]:
#use the group by function to separate the data into fields with Gender values
gender_purchased_data_df = purchase_game_df.groupby(["Gender"])

#print(gender_purchcased_data_df) by using a function
gender_purchased_data_df["Purchase ID"].count().head(10)

In [None]:
#calculate the total purchase value by gender
total_purchase_value = gender_purchased_data_df["Price"].sum()
total_purchase_value.head()
dlr_total_purchase_value = total_purchase_value.map("${:,.2f}".format)
dlr_total_purchase_value.head()

In [None]:
#calculate the average purchase price by gender
average_purchase_price = gender_purchased_data_df["Price"].mean()
average_purchase_price.head()
dlr_average_purchase_price = average_purchase_price.map("${:,.2f}".format)
dlr_average_purchase_price.head()

In [None]:
#total purchase value divided by purchase count by gender
normalized_totals = total_purchase_value/gender_purchased_data_df["Purchase ID"].count()
dlr_normalized_totals = normalized_totals.map("${:,.2f}".format)
dlr_normalized_totals.head()

In [None]:
#organize the summary data for genders, and organize all columns in a Data Frame
organize_gender_purchase_data_df = pd.DataFrame(gender_purchased_data_df["Purchase ID"].count())
organize_gender_purchase_data_df["Average Purchase Price"] = dlr_average_purchase_price
organize_gender_purchase_data_df["Total Purchase Values"] = dlr_total_purchase_value
organize_gender_purchase_data_df["Normalized Totals"] = dlr_normalized_totals
organize_gender_purchase_data_df

In [None]:
#calculate the summary of purchases DataFrame and group it by gender and rename columns
summary_gender_data_df = organize_gender_purchase_data_df.rename(columns={"Purchase ID": "Purchase Count"})
summary_gender_data_df

### AGE DEMOGRAPHIC ANALYSIS

- Created bins for the varying age brackets


- Calculate numbers and percentages by varying age groups


- Create the summary DataFrame to hold all results of calculations


- Display the Age Demographics as a table

In [None]:
#create bins where the purchase data will be located
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]

In [None]:
#create names for the four bins
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
group_by_age_purchase_data_df = purchase_game_df
group_by_age_purchase_data_df["Age Summary"] = pd.cut(group_by_age_purchase_data_df["Age"], age_bins, labels=group_names)
group_by_age_purchase_data_df

In [None]:
#calculate a group using the bins
group_by_age_purchase_data_df = group_by_age_purchase_data_df.groupby("Age Summary")
group_by_age_purchase_data_df.count()

In [None]:
#utilize the new DataFrame
age_summary_df = pd.DataFrame(group_by_age_purchase_data_df.count())
age_summary_df

In [None]:
#calculations on columns of the summary DataFrame
age_summary_df["Purchase ID"] = (age_summary_df["Purchase ID"]/sum_of_players)*100
age_summary_df

In [None]:
#reformat numbers to percentages
age_summary_df["Purchase ID"] = age_summary_df["Purchase ID"].map("{:,.2f}%".format)
age_summary_df

In [None]:
#reformat the table to only include columns of Age Summary, Purchase ID and write into DataFrame
org_age_summary_df = age_summary_df[["Purchase ID", "SN"]]
org_age_summary_df

In [None]:
#rename the columns for age demographics using the .rename command
final_group_summary_df = org_age_summary_df.rename(columns={"Purchase ID":"Percentage of Players", "SN":"Total Count"})
final_group_summary_df

### TOP SPENDORS ANALYSIS

- Run all the basic calculations to create results in a table


- Create a DataFrame for the summary of all results


- Create a cleaner format DataFrame for the summary

In [None]:
#group the purchase data by player spendors("SN")
original_purchased_data_df = pd.DataFrame(purchase_game)
original_purchased_data_df.head()

In [None]:
#group the "SN"
group_SN_top_df = original_purchased_data_df.groupby("SN")
group_SN_top_df.count()

In [None]:
#start to work with the data created by new DataFrame
analysis_by_spendor_df = pd.DataFrame(group_SN_top_df["Purchase ID"].count())
analysis_by_spendor_df

In [None]:
#get the total purchase value by SN
total_value_purchase_SN = group_SN_top_df["Price"].sum()
total_value_purchase_SN
dlr_total_value_purchase_SN = total_value_purchase_SN.map("${:,.2f}".format)
dlr_total_value_purchase_SN

In [None]:
#organize the top sender data summary and get all the columns into an organized DataFrame
average_purchase_SN = group_SN_top_df["Price"].mean()
average_purchase_SN
dlr_average_purchase_SN = average_purchase_SN.map("${:,.2f}".format)
dlr_average_purchase_SN

In [None]:
#organize the summary Top Spender data and get all columns to organize a new DataFrame with additional columns
analysis_by_spendor_df["Average Purchase Price"] = dlr_average_purchase_SN
analysis_by_spendor_df["Total Purchase Value"] = dlr_total_value_purchase_SN
analysis_by_spendor_df

In [None]:
#summary of the top spendor analysis and grouped by SN and rename the columns
summary_SN_purchased_data_df = analysis_by_spendor_df.rename(columns={"Purchase ID":"Purchase Count"})
top5_spendors_df = summary_SN_purchased_data_df.sort_values("Total Purchase Value", ascending=False)
top5_spendors_df.head()

In [None]:
#group by item id and item name
group_top_item_df = original_purchased_data_df.groupby(["Item ID", "Item Name"])
group_top_item_df.count()

### MOST PROFITABLE ITEMS

- Sort the table from previous line that was created of the total purchase value in descending order

- Give the data a cleaner formatting

- Display a preview of the newly created DataFrame

In [None]:
#work with new data in DataFrame
analysis_by_item_df = pd.DataFrame(group_top_item_df["Purchase ID"].count())
analysis_by_item_df

In [None]:
#calculate the new total purchase value by item
total_item_purchase_value = group_top_item_df["Price"].sum()
total_item_purchase_value
dlr_total_item_purchase_value = total_item_purchase_value.map("${:,.2f}".format)
dlr_total_item_purchase_value

In [None]:
#calculate the purchase price by item
item_purchase_price = group_top_item_df["Price"].mean()
item_purchase_price
dlr_item_purchase_price = item_purchase_price.map("${:,.2f}".format)
dlr_item_purchase_price

In [None]:
#re organize the item data into a summary and create columns for a new DataFrame
analysis_by_item_df["Item Price"] = dlr_item_purchase_price
analysis_by_item_df["Total Purchase Value"] = dlr_total_item_purchase_value
analysis_by_item_df

In [None]:
#create a summary of most popular item analysis grouped by item
item_sum_purchased_data_df = analysis_by_item_df.rename(columns={"Purchase ID":"Purchase Count"})
top5_items_df = item_sum_purchased_data_df.sort_values("Purchase Count", ascending=False)
top5_items_df.head()

In [None]:
#create a total purchase price column in number form
analysis_by_item_df["Total Purchase Value"] = total_item_purchase_value
analysis_by_item_df

In [None]:
#have an analysis grouped by item and rename the Purchase ID column
top5_items_df=item_sum_purchased_data_df.sort_values("Total Purchase Value", ascending=False)


#format the total purchase price to be in dollar form
dlr_total_item_purchase_value = total_item_purchase_value.map("${:,.2f}".format)
top5_items_df["Total Purchase Value"] = dlr_total_item_purchase_value
top5_items_df.head()

### CONCLUSIONS:

- The largest part of the active player populations are males that take around 84% of the population while the female players take up around 14%.


- The peak age demographic that was determined through the DataFrame fell between the ages of 20-24 years old which took up ~47% of all demographic groups. Right behind that group was the second age demographic of 15-19 year olds which took up around 17% of the age demographic.


- There were 183 unique items offered in the game; the most popular ones were "Oathbreaker, Last Hope of the Breaking Storm" which sent in 51 dollars to the game, "Nirvana" and "Fiery Glass Crusader" bringing in both 44 and 41 dollars each respectively. All of the 576 players preferred different items and there was no large significance between the players and the items.


- The average purchase a player made in the game was around 3 dollars and the top spendors spent around 3x as much - around 9 dollars for in-game purchases. The total profit from all of the sold items was around 2400 dollars for the total players.