### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [None]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data_df = pd.read_csv(file_to_load, encoding="utf-8")

## Player Count

* Display the total number of players


In [None]:
#Display the total number of players
player_count = len(purchase_data_df["SN"].unique())
#Place all the data into player count dataframe
player_count_table = pd.DataFrame({"Total Players":[player_count]})
player_count_table

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [None]:
#Obtain number of unique items
unique_items = len(purchase_data_df["Item ID"].unique())
number_of_purchases = purchase_data_df["Purchase ID"].count()
#Obtain average price
average_price = purchase_data_df["Price"].sum()/number_of_purchases
#Obtain total revenue
total_revenue = purchase_data_df["Price"].sum()

#Create DataFrame using Dictionary of Arrays  
purchasing_analysis = pd.DataFrame({
    "Number of Unique Items":unique_items,
    "Average Price": average_price,
    "Number of Purchases":number_of_purchases,
    "Total Revenue":[total_revenue]
})
#Formating the Price 
purchasing_analysis["Average Price"] = purchasing_analysis["Average Price"].map("${:.2f}".format)
purchasing_analysis["Total Revenue"] = purchasing_analysis["Total Revenue"].map("${:.2f}".format)

purchasing_analysis

## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [None]:
#Creating dataframe for SN , Gender column and remove duplicate
gender_name_df = pd.DataFrame(purchase_data_df, columns=["SN","Gender"]).drop_duplicates()
#Count the player with gender after removing duplicates
gender_count = gender_name_df["Gender"].value_counts()
gender_percentage = round((gender_name_df["Gender"].value_counts()/len(gender_name_df["Gender"]))*100,2)
data_gender = {"Total Count":gender_count,"Percentage of Players":gender_percentage}
#Create new dataframe for Gender Demographics results
gender_demographics = pd.DataFrame(data_gender)
gender_demographics


## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
#Grouping of purchase count by gender
purchase_count = purchase_data_df.groupby(["Gender"])["Purchase ID"].count()
#Calculate average purchase count
avg_purchase_price = round(purchase_data_df.groupby(["Gender"])["Price"].mean(),2)
total_purchase_value = purchase_data_df.groupby(["Gender"])["Price"].sum()
player_count_by_gender = purchase_data_df.groupby(["Gender"])["SN"].count()
avg_purchase_total_per_person = round(total_purchase_value/gender_count,2)

#Get new dataframe for Purchasing Analysis
purchasing_analysis = pd.DataFrame({
    "Purchase Count": purchase_count,
    "Average Purchase Price": avg_purchase_price,
    "Total Purchase Value": total_purchase_value.map("${:.2f}".format),
    "Avg Total Purchase per Person": avg_purchase_total_per_person.map("${:.2f}".format)
})
#Formating of price results
purchasing_analysis["Average Purchase Price"] = purchasing_analysis["Average Purchase Price"].map("${:.2f}".format)
purchasing_analysis

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [None]:
#This is to define the bins
bins = [1, 9, 14, 19, 24, 29, 34, 39, 90]
#Define the groups of the bins
group_names = ["<10","10-14","15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

#Categorize the existing players using the age bins.
purchase_data_df["age_group"] = pd.cut(purchase_data_df["Age"], bins, labels=group_names)
#Create new dataframe and remove the duplicate rows
age_group_df = pd.DataFrame(purchase_data_df, columns = ["SN","age_group"]).drop_duplicates()

#Calculate the numbers by age group
numbers_by_age_group = age_group_df["age_group"].value_counts()

#Calculate the percentages by age group
percentages_by_age_group = round(((numbers_by_age_group/player_count)*100),2)

#Create new datframe for the Age Demographics results
age_demographics = pd.DataFrame({
    "Total Count": numbers_by_age_group,
    "Percentage of Players":percentages_by_age_group
})
age_demographics = age_demographics.sort_index(0, ascending= True)
age_demographics

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
#This is to define the bins
bins = [1, 9, 14, 19, 24, 29, 34, 39, 90]
#Define the groups of the bins
group_names = ["<10","10-14","15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

#Categorize the existing players using the age bins.
purchase_data_df["age_group"] = pd.cut(purchase_data_df["Age"], bins, labels=group_names)

#Create new dataframe
purchase_data_age_group_df = pd.DataFrame(purchase_data_df, columns = ["SN","age_group", "Price"])#.drop_duplicates()

#Get Series for Purchase count by age
purchase_count = purchase_data_age_group_df.groupby(["age_group"])["Price"].count()
#purchase_count = purchase_data_age_group_df["age_group"].value_counts()

# Get Series for Average Price in age group
purchase_price_by_age_group = purchase_data_age_group_df.groupby(["age_group"])["Price"].sum()
avg_purchase_price =  round((purchase_price_by_age_group/purchase_count),2) 

player_count_by_age_group = purchase_data_age_group_df.groupby(["age_group"])["SN"].count()
#Format the price data and add $ sign
avg_total_purchase_per_person = purchase_price_by_age_group/numbers_by_age_group

#Create Datafranme for Ourchasung Analysis
purchasing_analysis = pd.DataFrame({
    "Purchase Count": purchase_count,
    "Average Purchase Price": avg_purchase_price.map("${:.2f}".format),
    "Total Purchase Value": purchase_price_by_age_group.map("${:.2f}".format),
    "Avg Total Purchase per Person": avg_total_purchase_per_person.map("${:.2f}".format)
})

purchasing_analysis

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame


In [None]:
purchase_count = purchase_data_df.groupby(["SN"])["Item ID"].count()
total_purchase_value = purchase_data_df.groupby(["SN"])["Price"].sum()
avg_purchase_price =  round((total_purchase_value/purchase_count),2) 

#Create a summary data frame to hold the results
top_spenders = pd.DataFrame({
    "Purchase Count": purchase_count,
     "Average Purchase Price": avg_purchase_price,
     "Total Purchase Value": total_purchase_value
})

# Sort the total purchase value column in descending order
top_spenders = top_spenders.sort_values("Total Purchase Value", ascending= False)
top_spenders["Average Purchase Price"] = top_spenders["Average Purchase Price"].map("${:.2f}".format)
top_spenders["Total Purchase Value"] = top_spenders["Total Purchase Value"].map("${:.2f}".format)
# Display a preview of the summary data frame
top_spenders.head()

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
# Retrieve the Item ID, Item Name, and Item Price columns
popular_items_df = pd.DataFrame(purchase_data_df, columns =["Item ID","Item Name","Price"])
#Get Series for Purchase count by Item ID and Item Name
purchase_count = popular_items_df.groupby(["Item ID", "Item Name"])["Price"].count() ##get series for items price
total_purchase_value = round(purchase_data_df.groupby(["Item ID", "Item Name"])["Price"].sum(),2)
item_price = round(purchase_data_df.groupby(["Item ID","Item Name"])["Price"].mean(),2)
most_popular_items = pd.DataFrame({
    "Purchase Count": purchase_count,
    "Item Price": item_price.map("${:.2f}".format),
    "Total Purchase Value":total_purchase_value
})

# Sort the purchase count column in descending order
most_popular_items_sortby_pc = most_popular_items.sort_values("Purchase Count", ascending= False)
most_popular_items_sortby_pc ["Total Purchase Value"] = most_popular_items_sortby_pc ["Total Purchase Value"].map("${:.2f}".format)
# Display a preview of the summary data frame
most_popular_items_sortby_pc .head()

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [None]:
#Sort the table by total purchase value in descending order
most_profitable_items = most_popular_items.sort_values("Total Purchase Value", ascending= False)
most_profitable_items["Total Purchase Value"] = most_profitable_items["Total Purchase Value"].map("${:.2f}".format)

# Display a preview of the data frame
most_profitable_items.head()