# Heroes Of Pymoli Data Analysis
Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights.

-----
## Observable Trends
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).


* Our peak age demographic falls between 20-24 (44.79%) with secondary groups falling between 15-19 (18.58%) and 25-29 (13.37%).


* The age group that spends the most money is the 20-24 with 1,114.06 dollars as total purchase value and an average purchase of $4.32. In contrast, the demographic group that has the highest average purchase is the 35-39 with $4.76 and a total purchase value of 147.67 dollars.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "./datasets/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count
> Display the total number of players

In [2]:
print("Total number of players is " + str(len(purchase_data["SN"].unique())) +".")

Total number of players is 576.


## Purchasing Analysis (Total)
* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [3]:
def overall_summary_printer(dataframe) -> pd.core.frame.DataFrame:
    """print out overall summary of dataframe
    """
    aggregation = {"Item Name":"nunique","Price":"sum","Purchase ID":"count"}
    summary = pd.DataFrame(dataframe.agg(aggregation)).T
    summary["average_price"] = dataframe.Price.mean()

    # formatting
    summary = summary.rename(columns = {"Item Name":"Number of Unique Items",
                                        "Price":"Total Revenue",
                                        "Purchase ID":"Number of Purchases",
                                        "average_price":"Average Price"})

    summary["Number of Unique Items"] = int(summary["Number of Unique Items"])
    summary["Number of Purchases"] = int(summary["Number of Purchases"])
    summary["Total Revenue"] = summary["Total Revenue"].map("${:.2f}".format)
    summary["Average Price"] = summary["Average Price"].map("${:.2f}".format)
    return summary
    
overall_summary_printer(purchase_data)

Unnamed: 0,Number of Unique Items,Total Revenue,Number of Purchases,Average Price
0,179,$2379.77,780,$3.05


## Gender Demographics
* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed

In [4]:
def gender_summary(dataframe) -> pd.core.frame.DataFrame:
    """print out count and percentage of players by gender
    """
    gender_summary  = (dataframe.groupby("Gender").agg({"SN":"nunique"})
                       .rename(columns= {"SN":"Total Count"})
                       .sort_values(by = "Total Count", ascending = False)
                       .assign(percent_of_player = lambda df: df["Total Count"]/df["Total Count"].sum()*100)
                       .rename(columns = {"percent_of_player":"Percentage of Players"}))

    # formatting
    gender_summary["Percentage of Players"] = gender_summary["Percentage of Players"].map("{:.2f}%".format)
    return gender_summary

gender_summary(purchase_data)

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%


## Purchasing Analysis (Gender)
* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
def purchase_by_gender_summary(dataframe) -> pd.core.frame.DataFrame:
    """pirnt out purchase information by gender in dataframe format
    """

    aggregation = {"Item Name":"count","Price":"sum","SN":"nunique"}
    gender_purchase = (dataframe.groupby("Gender").agg(aggregation)
                       .rename(columns= {"Item Name":"Purchase Count",
                                         "Price":"Total Purchase Value",
                                         "SN":"total_count"})
                       .assign(average_purchase_price = lambda df: df["Total Purchase Value"]/df["Purchase Count"],
                               avg_total_purchase_per_person = lambda df: df["Total Purchase Value"]/df["total_count"])
                       .rename(columns = {"average_purchase_price":"Average Purchase Price",
                                          "avg_total_purchase_per_person":"Avg Total Purchase Per Person"}))
    gender_purchase = gender_purchase.drop("total_count", axis = 1)

    # formatting
    gender_purchase.iloc[:,1:4] = gender_purchase.iloc[:,1:4].applymap("${:.2f}".format)
    return gender_purchase

purchase_by_gender_summary(purchase_data)

Unnamed: 0_level_0,Purchase Count,Total Purchase Value,Average Purchase Price,Avg Total Purchase Per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$361.94,$3.20,$4.47
Male,652,$1967.64,$3.02,$4.07
Other / Non-Disclosed,15,$50.19,$3.35,$4.56


## Age Demographics
* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table

In [6]:
# create label and bins
label = ["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]
bin_to_fill = [0,9,14,19,24,29,34,39,100]

def binning_data(labels: list, bins: list, dataframe): 
    """void function, only binning the data with pre_determined label and bins
    """
    labels = labels
    bins = bins
    
    dataframe["age_category"] = pd.cut(x =dataframe["Age"] ,bins = bin_to_fill,labels = label, include_lowest=True)
    return None

def age_category_count_summary(dataframe) -> pd.core.frame.DataFrame:
    """print out count and percentage of players in different age category
    """

    summary_age_category = (dataframe.groupby("age_category").agg({"SN":"nunique"})
                            .assign(percent_of_player = lambda df: df["SN"]/df["SN"].sum()*100)
                            .rename(columns = {"SN":"Total Count","percent_of_player":"Percentage of Players"}))

    summary_age_category["Percentage of Players"] = summary_age_category["Percentage of Players"].map("{:.2f}%".format)
    return summary_age_category

binning_data(label, bin_to_fill, purchase_data)
age_category_count_summary(purchase_data)

Unnamed: 0_level_0,Total Count,Percentage of Players
age_category,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


## Purchasing Analysis (Age)
* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
def age_category_purchase_summary(dataframe) -> pd.core.frame.DataFrame:
    """print out purchase information of players in different age category
    """
    
    aggregation = {"Item Name":"count","Price":"sum","SN":"nunique"}
    age_purchase = (dataframe.groupby("age_category").agg(aggregation)
                    .assign(avg_purchase_price = lambda df: df["Price"]/df["Item Name"],
                            total_purchase_per_person  = lambda df: df["Price"]/df["SN"])
                    .rename(columns= {"Item Name":"Purchase Count",
                                      "Price":"Total Purchase Value",
                                      "SN":"total_count","avg_purchase_price":"Average Purchase Price",
                                      "total_purchase_per_person":"Average Total Purchase Per Person"}))

    age_purchase = age_purchase.drop("total_count", axis = 1)

    # formatting
    age_purchase.iloc[:,1:4] = age_purchase.iloc[:,1:4].applymap("${:.2f}".format)
    return age_purchase

age_category_purchase_summary(purchase_data)

Unnamed: 0_level_0,Purchase Count,Total Purchase Value,Average Purchase Price,Average Total Purchase Per Person
age_category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$77.13,$3.35,$4.54
10-14,28,$82.78,$2.96,$3.76
15-19,136,$412.89,$3.04,$3.86
20-24,365,$1114.06,$3.05,$4.32
25-29,101,$293.00,$2.90,$3.81
30-34,73,$214.00,$2.93,$4.12
35-39,41,$147.67,$3.60,$4.76
40+,13,$38.24,$2.94,$3.19


## Top Spenders
* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame

In [8]:
def top_spender_summary(dataframe, num_of_row: int) -> pd.core.frame.DataFrame: 
    """print out list of top spenders with specified rows
    """
    aggregation = {"Item Name":"count","Price":"sum","SN":"nunique"}
    top_spender_summary = (dataframe.groupby("SN").agg(aggregation)
                           .assign(avg_purchase_price = lambda df: df["Price"]/df["Item Name"])
                           .rename(columns= {"Item Name":"Purchase Count",
                                             "Price":"Total Purchase Value",
                                             "SN":"total_count","avg_purchase_price":"Average Purchase Price"})
                           .sort_values(by = "Purchase Count", ascending =False))
    top_spender_summary = top_spender_summary.drop("total_count", axis = 1)

    # formatting
    top_spender_summary.iloc[:,1:3] = top_spender_summary.iloc[:,1:3].applymap("${:.2f}".format)
    return top_spender_summary.head(num_of_row)

top_spender_summary(purchase_data, 5)

Unnamed: 0_level_0,Purchase Count,Total Purchase Value,Average Purchase Price
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$18.96,$3.79
Iral74,4,$13.62,$3.40
Idastidru52,4,$15.45,$3.86
Asur53,3,$7.44,$2.48
Inguron55,3,$11.11,$3.70


## Most Popular Items
* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame

In [9]:
def the_most_summary(dataframe, num_of_row: int, column_name: str) -> pd.core.frame.DataFrame: 
    """print out specified number of rows with list of items and sort by the columns name
    """
    column = column_name
    retrieved_data = dataframe[["Item ID","Item Name","Price"]]
    retrieved_summary = (retrieved_data.groupby(["Item ID","Item Name"]).agg({"Item Name":"count","Price":"mean"})
                         .assign(total_purchase_value = lambda df: df["Item Name"] * df["Price"])
                         .rename(columns= {"Item Name":"Purchase Count",
                                           "Price":"Item Price","total_purchase_value":"Total Purchase Price"})
                         .sort_values(by = column, ascending = False))

    retrieved_summary.iloc[:,1:3] = retrieved_summary.iloc[:,1:3].applymap("${:.2f}".format)
    return retrieved_summary.head(num_of_row)

the_most_summary(purchase_data,5, "Purchase Count")

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
132,Persuasion,9,$3.22,$28.99
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77


## Most Profitable Items
* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

In [10]:
the_most_summary(purchase_data,5, "Total Purchase Price")

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
103,Singed Scalpel,8,$4.35,$34.80


## Final conclusion
> Although male is the majority of the plaer of this game, the average per person spending on this game for female is .40 cents more than the spending of male. 

* People who play this game are mainly at 20 - 24 years old, and they take 45% of this game players.
* People who are on top 5 spending list are people who are 20 - 25 years old