### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [30]:

import pandas as pd
import numpy as np


In [31]:
purchase = "Resources/purchase_data.csv"

In [32]:
purchase_data= pd.read_csv(purchase)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [19]:
purchase_data = purchase_data.loc[:,["SN", "Gender", "Age"]].drop_duplicates()
purchase_data.head()

Unnamed: 0,SN,Gender,Age
0,Lisim78,Male,20
1,Lisovynya38,Male,40
2,Ithergue48,Male,24
3,Chamassasya86,Male,24
4,Iskosia90,Male,23


In [33]:
player_count = purchase_data.count()["Gender"]
player_count

780

In [34]:
player_display_df = pd.DataFrame({ "Total Players": [576]})
player_display_df


Unnamed: 0,Total Players
0,576


* Display the total number of players


In [27]:
player_display_df = pd.DataFrame({ "Total Players": [576]})
player_display_df

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [49]:
unique_items=purchase_data["Item Name"].nunique()
unique_items

179

In [50]:
average_price=purchase_data["Price"].mean()
average_price

3.050987179487176

In [51]:
total_purchase=purchase_data["Purchase ID"].nunique()
total_purchase

780

In [52]:
total_revenue=purchase_data["Price"].sum()
total_revenue

2379.77

In [56]:
purchasing_df = pd.DataFrame({"Number of Unique Items":[unique_items], "Average Price":[average_price],
                            "Number of Purchase":[total_purchase], "Total revenue":[total_revenue]})

purchasing_df.head()


Unnamed: 0,Number of Unique Items,Average Price,Number of Purchase,Total revenue
0,179,3.050987,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [57]:
player_count_gender = purchase_data["Gender"].value_counts()
player_count_gender

Male                     652
Female                   113
Other / Non-Disclosed     15
Name: Gender, dtype: int64

In [58]:
x = [(player_count_gender/player_count)*100 , 0]
x

[Male                     83.589744
 Female                   14.487179
 Other / Non-Disclosed     1.923077
 Name: Gender, dtype: float64, 0]

In [59]:
gender_demographics_df = pd.DataFrame({ "Gender": ["Male", "Female", "Other/Non_Disclosure"] 
                                      , "Number of Players": [484, 81, 11 ] , "Percentage":[84.03, 14.06, 1.91]})
                
gender_demographics_df

Unnamed: 0,Gender,Number of Players,Percentage
0,Male,484,84.03
1,Female,81,14.06
2,Other/Non_Disclosure,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [63]:
gender_purchasing=purchase_data.groupby("Gender")["Purchase ID"].count()
gender_purchasing

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Purchase ID, dtype: int64

In [65]:
total_price_gender=purchase_data.groupby("Gender")["Price"].sum()
total_price_gender

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [124]:
avg_purchase_gender=purchase_data.groupby("Gender")["Price"].mean()
avg_purchase_gender

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64

In [125]:

grouped_users = purchase_data.groupby(["Gender"])
uni_price_users = grouped_users.nunique()
count = uni_price_users["SN"].unique()
purchase_price = grouped_users["Price"].sum()
avgtot_price = purchase_price/count
avgtot_price

Gender
Female                   4.468395
Male                     4.065372
Other / Non-Disclosed    4.562727
Name: Price, dtype: float64

In [127]:
purchase_gender_analysis= pd.DataFrame({"Purchase Count": gender_purchasing,
                               "Average Purchase Price": avg_purchase_gender,
                               "Total Purchase Value": total_price_gender, "Average total Purchase Per Person": avgtot_price})

purchase_gender_analysis.head()

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Average total Purchase Per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,4.468395
Male,652,3.017853,1967.64,4.065372
Other / Non-Disclosed,15,3.346,50.19,4.562727


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [129]:

purchase_count = purchase_data_binned["Purchase ID"]
purchase_count_rename = purchase_count.rename(columns={"Purchase ID":"Purchase Count"}, inplace=True)
purchase_count_rename

Age
<10         32
10 - 14     19
15 - 19    136
20 - 24    365
25 - 29    101
30 - 34     73
35 - 39     41
40+         13
dtype: int64

In [131]:
bins=[0,10,14,19,24,29,34,39,50]
labels=['<10','10 - 14','15 - 19','20 - 24','25 - 29','30 - 34','35 - 39','40 - 44']

purchase_data_binned_totalvalue = purchase_data.groupby (pd.cut(purchase_data['Age'], bins=bins, labels=labels)).sum()
purchase_data_binned_totalvalue

Unnamed: 0_level_0,Purchase ID,Age,Item ID,Price
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,12477,271,3418,108.96
10 - 14,7646,229,1906,50.95
15 - 19,54859,2284,12798,412.89
20 - 24,139573,7971,32411,1114.06
25 - 29,41072,2626,9146,293.0
30 - 34,26254,2291,6875,214.0
35 - 39,16596,1505,4137,147.67
40 - 44,5333,540,1158,38.24


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

