### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.json"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_json(file_to_load)



In [2]:
purchase_data.head()

Unnamed: 0,Age,Gender,Item ID,Item Name,Price,SN
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46
2,34,Male,174,Primitive Blade,2.46,Assastnya25
3,21,Male,92,Final Critic,1.36,Pheusrical25
4,23,Male,63,Stormfury Mace,1.27,Aela59


## Player Count

* Display the total number of players


In [3]:
Unique_Players =  purchase_data.loc [:,["Gender","SN","Age"]]
Unique_Players = Unique_Players.drop_duplicates()
Num_Players = Unique_Players.count()[0]

Total_Players = {"Total Players": [Num_Players]}
Total = pd.DataFrame(Total_Players )
Total


Unnamed: 0,Total Players
0,573


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:

number_of_unique_items = len(purchase_data["Item Name"].unique())
average_price = purchase_data["Price"].mean()
total_purchase = purchase_data["Item ID"].count()
total_revenue = purchase_data["Price"].sum()

summary_data = {"Number of Unique Items": [number_of_unique_items], "Average Price": [average_price], "Total Purchase": [total_purchase], "Total revenue":[total_revenue]}
df_total_calc = pd.DataFrame(summary_data)
df_total_calc

Unnamed: 0,Number of Unique Items,Average Price,Total Purchase,Total revenue
0,179,2.931192,780,2286.33


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
# count of gender
gender_group = purchase_data.groupby(["Gender"])

# finding percent
each_gender_percent = round((gender_group["SN"].count() / Num_Players) *100,2)

# creating dataframe
gender_df = pd.DataFrame({"Total Count":gender_group["SN"].count(), "Percentage of Players":each_gender_percent})
gender_df




Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,136,23.73
Male,633,110.47
Other / Non-Disclosed,11,1.92



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [6]:
gen_groupby = purchase_data.groupby(["Gender"])

pur_gen_count = gen_groupby["Price"].count()
 
avr_price = round(gen_groupby["Price"].mean(), 2)

ttl_val = round(gen_groupby["Price"].sum() ,2)

each_person = ttl_val / gender_df["Total Count"]

pur_df = pd.DataFrame({"Purchase Count": pur_gen_count, "Average Purchase Price": avr_price, "Total Purchase Value": ttl_val,
                       "Avg Total Purchase per Person": each_person})
pur_df 


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,136,2.82,382.91,2.815515
Male,633,2.95,1867.68,2.950521
Other / Non-Disclosed,11,3.25,35.74,3.249091


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [7]:
# creating bin and labelling to hold dem data
ply_d = purchase_data.loc[:,["Gender", "SN", "Age"]]

bins = [0, 9.9, 14.9, 19.9, 24.9, 29.9, 34.9, 39.9, 1000]
labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
ply_d["Age Ranges"] = pd.cut(ply_d["Age"], bins, labels =labels)
         
# creating a data
ttl_age = ply_d["Age Ranges"].value_counts()
per_age = round((ttl_age /Num_Players  * 100), 2)
         
dem_age = pd.DataFrame({"Total Count": ttl_age, "Percentage of Players": per_age})
dem_age["Percentage of Players"] = dem_age["Percentage of Players"].map("{:.2f}%".format)
dem_age = dem_age.sort_index()
dem_age

Unnamed: 0,Total Count,Percentage of Players
<10,28,4.89%
10-14,35,6.11%
15-19,133,23.21%
20-24,336,58.64%
25-29,125,21.82%
30-34,64,11.17%
35-39,42,7.33%
40+,17,2.97%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [8]:
# creating bins and label to hold demographic info
bins = [0, 9.9, 14.9, 19.9, 24.9, 29.9, 34.9, 39.9, 1000]
labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
purchase_data["Age Ranges"] = pd.cut(purchase_data["Age"], bins, labels=labels)

pur_data_groupby = purchase_data.groupby(by="Age Ranges")

pur_count = pur_data_groupby["Price"].count()
avg_pur_price = round(pur_data_groupby["Price"].mean(), 2)
total_pur = round(pur_data_groupby["Price"].sum(), 2)
avg_pur_per = round(total_pur/ pur_count, 2)

pur_analysis = pd.DataFrame({"Purchase Count": pur_count, "Average Purchase Price": avg_pur_price,"Total Purchase Value": total_pur,
                             "Avg Total Purchase per Person": avg_pur_per})
pur_analysis["Average Purchase Price"] = pur_analysis["Average Purchase Price"].map("${:.2f}".format)

pur_analysis["Total Purchase Value"] = pur_analysis["Total Purchase Value"].map("${:.2f}".format)

pur_analysis["Avg Total Purchase per Person"] = pur_analysis["Avg Total Purchase per Person"].map("${:.2f}".format)

pur_analysis = pur_analysis.sort_index()
pur_analysis


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,28,$2.98,$83.46,$2.98
10-14,35,$2.77,$96.95,$2.77
15-19,133,$2.91,$386.42,$2.91
20-24,336,$2.91,$978.77,$2.91
25-29,125,$2.96,$370.33,$2.96
30-34,64,$3.08,$197.25,$3.08
35-39,42,$2.84,$119.40,$2.84
40+,17,$3.16,$53.75,$3.16


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [9]:
ttl_user = purchase_data.groupby(["SN"]).sum()["Price"]

avg_user = round(purchase_data.groupby(["SN"]).mean()["Price"], 2)

count_user = purchase_data.groupby(["SN"]).count()["Price"]

top_spend = pd.DataFrame({"Total Purchase Value": ttl_user,  "Item Price": avg_user, "Purchase Count": count_user})

top_spend["Item Price"] = top_spend["Item Price"].map("${:.2f}".format)

top_spend.sort_values("Total Purchase Value", ascending = False).head(10)


Unnamed: 0_level_0,Total Purchase Value,Item Price,Purchase Count
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Undirrala66,17.06,$3.41,5
Saedue76,13.56,$3.39,4
Mindimnya67,12.74,$3.18,4
Haellysu29,12.73,$4.24,3
Eoda93,11.58,$3.86,3
Isursti83,11.05,$3.68,3
Isurria36,11.01,$3.67,3
Eusri70,10.55,$3.52,3
Aerithllora36,10.45,$3.48,3
Yasriphos60,10.4,$3.47,3


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [10]:
pop_item_group = purchase_data.groupby(["Item ID", "Item Name"])
pop_pur_count = pop_item_group["Price"].count()
pop_price = pop_item_group["Price"].mean()
tot_price = pop_item_group["Price"].sum()

df_pop_item = pd.DataFrame({"Purchase Count": pop_pur_count, "Total Purchase Value": tot_price, "Item Price": pop_price})
df_pop_item["Item Price"] = df_pop_item["Item Price"].map("${:.2f}".format)
df_pop_item.sort_values("Purchase Count", ascending = False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Total Purchase Value,Item Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
39,"Betrayal, Whisper of Grieving Widows",11,25.85,$2.35
84,Arcane Gem,11,24.53,$2.23
31,Trickster,9,18.63,$2.07
175,Woeful Adamantite Claymore,9,11.16,$1.24
13,Serenity,9,13.41,$1.49


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [11]:
df_pop_item.sort_values("Total Purchase Value", ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Total Purchase Value,Item Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34,Retribution Axe,9,37.26,$4.14
115,Spectral Diamond Doomblade,7,29.75,$4.25
32,Orenmir,6,29.7,$4.95
103,Singed Scalpel,6,29.22,$4.87
107,"Splitter, Foe Of Subtlety",8,28.88,$3.61
