### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
#Number of Unique Entries
total_players = purchase_data.SN.nunique()

#Summary DataFrame
summary_df_players = pd.DataFrame({"Total Players":[total_players]})
summary_df_players.set_index("Total Players", inplace = True)
summary_df_players

576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
purchase_summary = purchase_data.describe()
purchase_summary

Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,91.755128,3.050987
std,225.310896,6.659444,52.697702,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,47.75,1.98
50%,389.5,22.0,92.0,3.15
75%,584.25,25.0,138.0,4.08
max,779.0,45.0,183.0,4.99


In [4]:
#rowData = dfObj.loc[ 'b' , : ]
price_max = purchase_summary.loc['max','Price']
price_min = purchase_summary.loc['min','Price']
price_avg = purchase_summary.loc['mean','Price']
price_sum = purchase_data["Price"].sum()

#Summary DataFrame
summary_df_price = pd.DataFrame({"Price":["Maximum","Minimum", "Average", "Sum"],
                     "Amount ($)": [price_max, price_min, price_avg, price_sum]})
summary_df_price.set_index("Price", inplace = True)
summary_df_price

Unnamed: 0_level_0,Amount ($)
Price,Unnamed: 1_level_1
Maximum,4.99
Minimum,1.0
Average,3.050987
Sum,2379.77


In [5]:
#Total Items
items_total = purchase_data["Item ID"].count()
items_total

#Number of Unique Items
items_unique = purchase_data["Item ID"].nunique()
items_unique

#Summary DataFrame
summary_df_items = pd.DataFrame({"Items":["Unique","Total"],
                     "Count": [items_unique, items_total]})
summary_df_items.set_index("Items", inplace = True)
summary_df_items

Unnamed: 0_level_0,Count
Items,Unnamed: 1_level_1
Unique,179
Total,780


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [6]:
#Group by Gender
gender = purchase_data.groupby("Gender")

#Total by Gender
gender_count = gender.nunique().SN

female_count = gender_count[0]
male_count = gender_count [1]
other_count = gender_count [2]

#Gender by Percentage
percent_gender = (gender_count / total_players)*100

female_percent = percent_gender[0]
male_percent = percent_gender[1]
other_percent = percent_gender[2]

#Summary DataFrame
summary_df_gender = pd.DataFrame({"Gender":["Female","Male", "Other/Non-Disclosed"],
                     "Count": [female_count, male_count, other_count],
                    "Percentages %": [female_percent, male_percent, other_percent]})
summary_df_gender.set_index("Gender", inplace = True)
summary_df_gender

Unnamed: 0_level_0,Count,Percentages %
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.0625
Male,484,84.027778
Other/Non-Disclosed,11,1.909722



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
#Avg of Purchases by Gender
purchase_avg = purchase_data.groupby(["Gender"]).mean()["Price"]
purchase_avg

#Min of Purchases by Gender
purchase_min = purchase_data.groupby(["Gender"]).min()["Price"]
purchase_min

#Max of Purchases by Gender
purchase_max = purchase_data.groupby(["Gender"]).max()["Price"]
purchase_max

#Sum of Purchases by Gender
purchase_sum = purchase_data.groupby(["Gender"]).sum()["Price"]
purchase_sum

summary_df_purchase = pd.DataFrame({ "Average": purchase_avg, "Minimum": purchase_min, "Maximum": purchase_max, "Sum": purchase_sum })
summary_df_purchase

Unnamed: 0_level_0,Average,Minimum,Maximum,Sum
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,3.203009,1.0,4.9,361.94
Male,3.017853,1.0,4.99,1967.64
Other / Non-Disclosed,3.346,1.33,4.75,50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [8]:
#Create Bins
bins = [0, 18, 19, 22, 25, 45]
age_labels = ["0-18", "19-22", "23-25", "25-45", "45+"]

#Create a new column in DataFrame for Age Group
age_df = purchase_data
age_df["Age Group"] =pd.cut(age_df["Age"], bins, labels=age_labels, include_lowest=True)

#Groupby Age Group & Aggregate by Count
summary_df_age = age_df.groupby("Age Group").count()
summary_df_age = summary_df_age.drop(['Purchase ID', 'SN', 'Item ID', 'Item Name', 'Price', 'Gender'], axis=1)
summary_df_age["Percentage %"] = round(((summary_df_age.Age / (summary_df_age.Age.sum()))*100), 2)
summary_df_age = summary_df_age.rename(columns = {"Age": "Total"})
summary_df_age

Unnamed: 0_level_0,Total,Percentage %
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1
0-18,164,21.03
19-22,23,2.95
23-25,231,29.62
25-45,193,24.74
45+,169,21.67


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [9]:
pa_sum = age_df.groupby("Age Group").sum().Price
pa_max = age_df.groupby("Age Group").max().Price
pa_min = age_df.groupby("Age Group").min().Price
pa_mean = age_df.groupby("Age Group").mean().Price

summary_df_purchase_2 = pd.DataFrame({ "Average Purchase ($)": pa_mean, "Minimum Purchase ($)": pa_min, "Maximum Purchase ($)": pa_max, "Total Purchase ($)": pa_sum })
summary_df_purchase_2

Unnamed: 0_level_0,Average Purchase ($),Minimum Purchase ($),Maximum Purchase ($),Total Purchase ($)
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0-18,3.065976,1.01,4.94,502.82
19-22,3.042609,1.02,4.91,69.98
23-25,3.038571,1.0,4.99,701.91
25-45,3.077979,1.0,4.99,594.05
45+,3.023728,1.02,4.94,511.01


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [10]:
#purchase_data.groupby(["Gender"]).mean()["Price"]
spenders = purchase_data.groupby(["SN"]).sum().Price

#Sort by Purchase
top_spenders = spenders.sort_values(ascending = False)
top_spenders.head(10)

summary_top_spenders = pd.DataFrame({ "Top Spenders": (top_spenders.head(10))})
summary_top_spenders.index.names = ['User']
summary_top_spenders

Unnamed: 0_level_0,Top Spenders
User,Unnamed: 1_level_1
Lisosia93,18.96
Idastidru52,15.45
Chamjask73,13.83
Iral74,13.62
Iskadarya95,13.1
Ilarin91,12.7
Ialallo29,11.84
Tyidaim51,11.83
Lassilsala30,11.51
Chadolyla44,11.46


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [11]:
pop_item = purchase_data.groupby(["Item ID", "Item Name"]).count().Price

#Sort by Count
pop_item_2 = pop_item.sort_values(ascending = False)
pop_item_2.head(10)

#Summary Popular Item
summary_pop_item = pd.DataFrame({ "Most Popular Item": (pop_item_2.head(10))})
summary_pop_item

Unnamed: 0_level_0,Unnamed: 1_level_0,Most Popular Item
Item ID,Item Name,Unnamed: 2_level_1
92,Final Critic,13
178,"Oathbreaker, Last Hope of the Breaking Storm",12
108,"Extraction, Quickblade Of Trembling Hands",9
132,Persuasion,9
82,Nirvana,9
145,Fiery Glass Crusader,9
60,Wolf,8
34,Retribution Axe,8
37,"Shadow Strike, Glory of Ending Hope",8
59,"Lightning, Etcher of the King",8


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [12]:
profit_item = purchase_data.groupby(["Item ID", "Item Name"]).sum().Price

#Sort by Sum
profit_item_2 = profit_item.sort_values(ascending = False)
profit_item_2.head(10)

#Summary Profitable Item
summary_profit_item = pd.DataFrame({ "Most Popular Item": (profit_item_2.head(10))})
summary_profit_item

Unnamed: 0_level_0,Unnamed: 1_level_0,Most Popular Item
Item ID,Item Name,Unnamed: 2_level_1
92,Final Critic,59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",50.76
82,Nirvana,44.1
145,Fiery Glass Crusader,41.22
103,Singed Scalpel,34.8
59,"Lightning, Etcher of the King",33.84
108,"Extraction, Quickblade Of Trembling Hands",31.77
78,"Glimmer, Ender of the Moon",30.8
72,Winter's Bite,30.16
132,Persuasion,28.99
