### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [293]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# Load the purchasing file (.csv)
file = "purchase_data.csv"

# Read purchasing file and store into Pandas data frame
purchase_df = pd.read_csv(file)
purchase_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [129]:
# determine number of players based on unique SN
player_count = len(purchase_df["SN"].unique())

player_count

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [127]:
no_items = len(purchase_df["Item ID"].unique()) # number of unique items purchased
no_purchases = purchase_df["Purchase ID"].count() # number of purchases
total_sales = "${:,.2f}".format(purchase_df["Price"].sum()) # total of price
ave_price = "${:.2f}".format(purchase_df["Price"].sum() / no_purchases)
summary_purchases = pd.DataFrame({"Number of unique items":[no_items],
                        "Number of purchases": [no_purchases],
                        "Total Sales (USD)": [total_sales],
                        "Average Price (USD)": [ave_price]})
summary_purchases

Unnamed: 0,Number of unique items,Number of purchases,Total Sales (USD),Average Price (USD)
0,183,780,"$2,379.77",$3.05


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [394]:
# drop duplicate player entries and group players by gender
gender_groups = purchase_df.drop_duplicates("SN").groupby("Gender")

# extract series for counts and for percentages by gender
gender_count = gender_groups.size()
gender_pct = round(((gender_groups.size() / player_count) * 100),2)

# create a dataframe containing gender counts and proportions
gender_demographics = pd.DataFrame(dict(gender_count = gender_count, gender_pct = gender_pct))

# rename the columns for clarity
gender_demographics = gender_demographics.rename(columns = {"gender_count":"Number of Players",
                                                            "gender_pct":"Proportion (%)"})

gender_demographics

Unnamed: 0_level_0,Number of Players,Proportion (%)
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.06
Male,484,84.03
Other / Non-Disclosed,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [401]:
# group players and purchases by gender (no dropped duplicates)
gender_groups2 = purchase_df.groupby("Gender")

# number of purchases per gender
purchase_count = gender_groups2["Purchase ID"].count()

# average number of purchases by gender
ave_no_purch = round((purchase_count / gender_count),2)

# total purchase price per gender
tot_purch_gender = gender_groups2["Price"].sum()

# average total purchase price per person by gender
ave_purch_person = round((tot_purch_gender / gender_count),2)

# average purchase price by gender
ave_purch_gender = round((tot_purch_gender / purchase_count),2)

# summary table
gender_purchases = pd.DataFrame(dict(purchase_count = purchase_count, 
                                     ave_no_purch = ave_no_purch,
                                     tot_purch_gender = tot_purch_gender,
                                     ave_purch_gender = ave_purch_gender,
                                     ave_purch_person = ave_purch_person
                                     ))

# rename columns for clarity
gender_purchases = gender_purchases.rename(columns = {"purchase_count": "Total Number of Purchases",
                                                      "ave_no_purch": "Number of Purchases Per Person",
                                                      "tot_purch_gender": "Total Cost of Purchases (USD)",
                                                      "ave_purch_gender": "Average Cost of Purchases (USD)",
                                                      "ave_purch_person": "Ave Total Purchase Per Person (USD)"
                                                      })

gender_purchases

Unnamed: 0_level_0,Total Number of Purchases,Number of Purchases Per Person,Total Cost of Purchases (USD),Average Cost of Purchases (USD),Ave Total Purchase Per Person (USD)
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Female,113,1.4,361.94,3.2,4.47
Male,652,1.35,1967.64,3.02,4.07
Other / Non-Disclosed,15,1.36,50.19,3.35,4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [373]:
# Establish the bins
bins = [0, 10, 20, 30, 40, 50]

# Establish the groups
group_names = ["<10", "10–19", "20–29", "30–39","40+"]

# Insert a column with Age Group in the dataframe
purchase_df["Age Grp"] = pd.cut(purchase_df["Age"], bins, right = False, labels = group_names)

# Group the players by Age Group
age_unique = purchase_df.drop_duplicates("SN").groupby("Age Grp")

# Count the number of players falling into each bin
size_age_grps = age_unique["Age"].count()

# Proportion of players per bin
pct_age_grps = round(((size_age_grps / player_count) * 100),2)
pct_age_grps

# Summary table
age_groups = pd.DataFrame(dict(size_age_grps = size_age_grps, 
                                     pct_age_grps = pct_age_grps
                                     ))
# Rename the columns
age_groups = age_groups.rename(columns = {"size_age_grps": "Number of players",
                                          "pct_age_grps": "Proportion (%)"})
age_groups

Unnamed: 0_level_0,Number of players,Proportion (%)
Age Grp,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95
10–19,129,22.4
20–29,335,58.16
30–39,83,14.41
40+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [402]:
# Group the purchases by age
age_purch = purchase_df.groupby("Age Grp")

# Total number of purchases by age
age_purch_count = age_purch["Purchase ID"].count()

# Average number of purchases by age
ave_purch_count = round((age_purch_count / size_age_grps),2)

# Total purchase price by age group
age_total_purch = age_purch["Price"].sum()

# Average purchase price by age group
age_ave_purch = round((age_total_purch / age_purch_count),2)

# Average purchase price per person by age group
age_ave_person_purch = round((age_total_purch / size_age_grps),2)

# Summary table
age_purchases = pd.DataFrame(dict(age_purch_count = age_purch_count,
                                  ave_purch_count = ave_purch_count,
                                  age_total_purch = age_total_purch,
                                  age_ave_purch = age_ave_purch,
                                  age_ave_person_purch = age_ave_person_purch,
                                  ))

# Rename the columns
age_purchases = age_purchases.rename(columns = {"age_purch_count": "Total Number of Purchases",
                                                "ave_purch_count": "Number of Purchases Per Person",
                                                "age_total_purch": "Total Cost of Purchases (USD)",
                                                "age_ave_purch": "Average Cost of Purchases (USD)",
                                                "age_ave_person_purch": "Average Purchase Per Person (USD)"})
age_purchases

Unnamed: 0_level_0,Total Number of Purchases,Number of Purchases Per Person,Total Cost of Purchases (USD),Average Cost of Purchases (USD),Average Purchase Per Person (USD)
Age Grp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<10,23,1.35,77.13,3.35,4.54
10–19,164,1.27,495.67,3.02,3.84
20–29,466,1.39,1407.06,3.02,4.2
30–39,114,1.37,361.67,3.17,4.36
40+,13,1.08,38.24,2.94,3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [439]:
# Group the players by SN
big_spender = purchase_df.groupby("SN")

# Calculate cost of purchase for each player
tot_purchase_SN = big_spender["Price"].sum()

# Calculate number of purchases for each player
count_purchase_SN = big_spender["Price"].count()

# Calculate the average cost of purchase for each player
ave_purchase_SN = round((tot_purchase_SN / count_purchase_SN),2)

# Combine in a dataframe
big_spender2 = pd.DataFrame(dict(count_purchase_SN = count_purchase_SN,
                                 tot_purchase_SN = tot_purchase_SN,
                                 ave_purchase_SN = ave_purchase_SN
                                ))

# Rename the column headers
big_spender2 = big_spender2.rename(columns = {"count_purchase_SN": "Number of Purchases",
                                              "tot_purchase_SN":"Total Purchase Cost (USD)",
                                              "ave_purchase_SN": "Ave Purchase Cost (USD)"})

# Sort the players by purchase cost in descending order
big_spender2 = big_spender2.sort_values("Total Purchase Cost (USD)",ascending = False)

# Get the top 10 biggest spenders
top10_spender = big_spender2[0:10]
top10_spender

Unnamed: 0_level_0,Number of Purchases,Total Purchase Cost (USD),Ave Purchase Cost (USD)
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,18.96,3.79
Idastidru52,4,15.45,3.86
Chamjask73,3,13.83,4.61
Iral74,4,13.62,3.4
Iskadarya95,3,13.1,4.37
Ilarin91,3,12.7,4.23
Ialallo29,3,11.84,3.95
Tyidaim51,3,11.83,3.94
Lassilsala30,3,11.51,3.84
Chadolyla44,3,11.46,3.82


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

