### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
#Total number of players.
totalNoOfPlayers = len(purchase_data["SN"].unique())
totalNoOfPlayers_df = pd.DataFrame([
    {"Total Players":totalNoOfPlayers}
])

totalNoOfPlayers_df

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
noOfItems = len(purchase_data["Item ID"].unique())
totalPrice = purchase_data["Price"].sum()
numberOfPurchases = len(purchase_data)
averagePrice = totalPrice / numberOfPurchases
totalRevenue = averagePrice * numberOfPurchases

total_purchasing_analysis = pd.DataFrame({
    "Number of Unique Items":[noOfItems],
    "Average Price":round(averagePrice,2),
    "Number of Purchases":numberOfPurchases,
    "Total Revenue":totalRevenue
})

total_purchasing_analysis

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,3.05,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
#Establish the % and the total count of all the players

def getGenderDemographs(gender):
    '''
        Get the Gender Demographics for the following:
        1. Percentage and count of Male Players.
        2. Percentage and count of Female Players.
        3. Percentage and count of Other / Non-disclosed.
    '''
    gender_data = purchase_data.loc[purchase_data["Gender"] == gender, :]
    totalPlayers = len(gender_data["SN"].unique())
    totalPlayers_perc = round(((totalPlayers)/(totalNoOfPlayers))*100, 2)
    return totalPlayers, totalPlayers_perc
    
totalNoOfMalePlayers, totalNoOfMalePlayers_percentage = getGenderDemographs("Male")
totalNoOfFemalePlayers, totalNoOfFemalePlayers_percentage = getGenderDemographs("Female")
totalNoOfOtherPlayers, totalNoOfOtherPlayers_percentage = getGenderDemographs("Other / Non-Disclosed")

#Construct the Demographics Dictionary
demo_dicts = {'Total Count':[totalNoOfMalePlayers, totalNoOfFemalePlayers, totalNoOfOtherPlayers], 
        'Percentage of Players':[totalNoOfMalePlayers_percentage, totalNoOfFemalePlayers_percentage, 
                                 totalNoOfOtherPlayers_percentage]}

#Construct the Demograph Dataframe
gender_demographs_df = pd.DataFrame(demo_dicts, index=['Male', 'Female', 'Other / Non-Disclosed'])
gender_demographs_df

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03
Female,81,14.06
Other / Non-Disclosed,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
#Analyze the Purchase for all the genders.
def genderPurchasingAnalysis(gender):
    '''
        Return the following:
        1. Purchase Count.
        2. Average Purchase Price.
        3. Total Purchase Value.
        4. Average Purchase Total per Person. 
    '''
    #Filter the genders to create gender specific dataframes.
    gender_df = purchase_data[purchase_data["Gender"] == gender]
    
    purchase_count = len(gender_df)
    total_purchase_value = round(gender_df['Price'].sum(), 2)
    average_purchase_price = round(((total_purchase_value) / purchase_count),2)
    average_purchase_total_person = round(total_purchase_value / (len(gender_df['SN'].unique())), 2)
    
    #Return the values
    return (purchase_count, total_purchase_value, average_purchase_price, average_purchase_total_person)

purchaseDataFemale = genderPurchasingAnalysis("Female")
purchaseDataMale = genderPurchasingAnalysis("Male")
purchaseDataOthers = genderPurchasingAnalysis("Other / Non-Disclosed")
    
#Construct the Purchasing Analysis Dictionary
purchase_dicts = {'Purchase Count':[purchaseDataFemale[0], purchaseDataMale[0], purchaseDataOthers[0]], 
        'Average Purchase Price':[purchaseDataFemale[2], purchaseDataMale[2],purchaseDataOthers[2]],
        'Total Purchase Value':[purchaseDataFemale[1], purchaseDataMale[1], purchaseDataOthers[1]],
        'Average Total Purchase Per Person':[purchaseDataFemale[3], purchaseDataMale[3], purchaseDataOthers[3]]
}

#Construct the Purchasing Analysis Dataframe
gender_purchasing_df = pd.DataFrame(purchase_dicts, index=['Female', 'Male', 'Other / Non-Disclosed'])

gender_purchasing_df["Average Purchase Price"] = gender_purchasing_df["Average Purchase Price"].astype(float).map(
    "${:,.2f}".format)
gender_purchasing_df["Total Purchase Value"] = gender_purchasing_df["Total Purchase Value"].astype(float).map(
    "${:,.2f}".format)
gender_purchasing_df["Average Total Purchase Per Person"] = gender_purchasing_df["Average Total Purchase Per Person"].astype(float).map(
    "${:,.2f}".format)
gender_purchasing_df

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase Per Person
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
#Create a bin list to factor the ages.
age_bins = [0, 10, 15, 20, 25, 30, 35, 40, 45]

#Create Labels for the age_bins
age_labels = [">10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

#Categorize the existing players using the age bins.
purchase_data["Age Summary"] = pd.cut(purchase_data["Age"], age_bins, labels=age_labels)
purchase_data.head()


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Summary
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,15-19
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,35-39
2,2,Ithergue48,24,Male,92,Final Critic,4.88,20-24
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,20-24
4,4,Iskosia90,23,Male,131,Fury,1.44,20-24


In [8]:
totalPlayers = gender_demographs_df["Total Count"].sum()

def getAgeAnalysis(frm, to):
    '''
        Get the Age analysis from:
        1. >10
        2. 10-14
        3. 15-19
        4. 20-24
        5. 25-29
        6. 30-34
        7. 35-39
        8. 40+
    '''
    
    if(to!=0):
        age_group_total = (purchase_data[(purchase_data["Age"]>=frm) & (purchase_data["Age"]<to)])["SN"].unique()
    else:
        age_group_total = (purchase_data[(purchase_data["Age"]>=frm)])["SN"].unique()
        
    age_group_total_perc = round(((len(age_group_total) / totalPlayers) * 100),2)
    return (len(age_group_total), age_group_total_perc)

ageLessThan10_df = getAgeAnalysis(0, 10)
ageFrom10To14_df = getAgeAnalysis(10, 15)
ageFrom15To19_df = getAgeAnalysis(15, 20)
ageFrom20To24_df = getAgeAnalysis(20, 25)
ageFrom25To29_df = getAgeAnalysis(25, 30)
ageFrom30To34_df = getAgeAnalysis(30, 35)
ageFrom35To39_df = getAgeAnalysis(35, 40)
ageFrom40To44_df = getAgeAnalysis(40, 0)


#Construct the Purchasing Analysis Dictionary
age_analysis_dicts = {'Total Count':[
                                 ageLessThan10_df[0], ageFrom10To14_df[0], ageFrom15To19_df[0], 
                                 ageFrom20To24_df[0], ageFrom25To29_df[0], ageFrom30To34_df[0],
                                 ageFrom35To39_df[0], ageFrom40To44_df[0]
                                ], 
        'Percentage of Players':[
                                 ageLessThan10_df[1], ageFrom10To14_df[1], ageFrom15To19_df[1], 
                                 ageFrom20To24_df[1], ageFrom25To29_df[1], ageFrom30To34_df[1],
                                 ageFrom35To39_df[1], ageFrom40To44_df[1]
        ]
}

#Construct the Purchasing Analysis Dataframe
age_analysis_df = pd.DataFrame(age_analysis_dicts, index=['<10', '10-14', '15-19','20-24','25-29','30-34','35-39','40+'])
age_analysis_df



Unnamed: 0,Total Count,Percentage of Players
<10,17,2.95
10-14,22,3.82
15-19,107,18.58
20-24,258,44.79
25-29,77,13.37
30-34,52,9.03
35-39,31,5.38
40+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [34]:
def getPurchaseByAge(frm, to):
    '''
        Get the Age analysis from:
        1. >10
        2. 10-14
        3. 15-19
        4. 20-24
        5. 25-29
        6. 30-34
        7. 35-39
        8. 40+
    '''
    
    if(to!=0):
        purch_age_df = (purchase_data[(purchase_data["Age"]>=frm) & (purchase_data["Age"]<to)])
    else:
        purch_age_df = (purchase_data[(purchase_data["Age"]>=frm)])
      
    return (len(purch_age_df), round(purch_age_df["Price"].sum()/len(purch_age_df),2), 
            round(purch_age_df["Price"].sum(),2), 
            round(purch_age_df["Price"].sum() / len(purch_age_df["SN"].unique()),2))

ageLessThan10_df = getPurchaseByAge(0, 10)
ageFrom10To14_df = getPurchaseByAge(10, 15)
ageFrom15To19_df = getPurchaseByAge(15, 20)
ageFrom20To24_df = getPurchaseByAge(20, 25)
ageFrom25To29_df = getPurchaseByAge(25, 30)
ageFrom30To34_df = getPurchaseByAge(30, 35)
ageFrom35To39_df = getPurchaseByAge(35, 40)
ageFrom40To44_df = getPurchaseByAge(40, 0)

#Construct the Purchasing Analysis Dictionary
purchase_analysis_dicts = {'Purchase Count':[
                                 ageLessThan10_df[0], ageFrom10To14_df[0], ageFrom15To19_df[0], 
                                 ageFrom20To24_df[0], ageFrom25To29_df[0], ageFrom30To34_df[0],
                                 ageFrom35To39_df[0], ageFrom40To44_df[0]
                                ], 
        'Average Purchase Price':[
                                 ageLessThan10_df[1], ageFrom10To14_df[1], ageFrom15To19_df[1], 
                                 ageFrom20To24_df[1], ageFrom25To29_df[1], ageFrom30To34_df[1],
                                 ageFrom35To39_df[1], ageFrom40To44_df[1]
                                ],
        'Total Purchase Value':[
                                ageLessThan10_df[2], ageFrom10To14_df[2], ageFrom15To19_df[2], 
                                ageFrom20To24_df[2], ageFrom25To29_df[2], ageFrom30To34_df[2],
                                ageFrom35To39_df[2], ageFrom40To44_df[2]            
                               ],
        'Average Total Purchase Per Person':[
                                ageLessThan10_df[3], ageFrom10To14_df[3], ageFrom15To19_df[3], 
                                ageFrom20To24_df[3], ageFrom25To29_df[3], ageFrom30To34_df[3],
                                ageFrom35To39_df[3], ageFrom40To44_df[3]  
                            ]
}

#Construct the Purchasing Analysis Dataframe
purchase_analysis_df = pd.DataFrame(purchase_analysis_dicts, index=['<10', '10-14', '15-19','20-24','25-29','30-34','35-39','40+'])
purchase_analysis_df


28 2.96 82.78 3.76


Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase Per Person
<10,23,3.35,77.13,4.54
10-14,28,2.96,82.78,3.76
15-19,136,3.04,412.89,3.86
20-24,365,3.05,1114.06,4.32
25-29,101,2.9,293.0,3.81
30-34,73,2.93,214.0,4.12
35-39,41,3.6,147.67,4.76
40+,13,2.94,38.24,3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

