### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [2]:
# Dependencies and Setup
import pandas as pd
import os
import numpy as np

# File to Load (Remember to Change These)
HeroesOfPymoli = os.path.join("Resources","purchase_data.csv")

# Read Purchasing File and store into Pandas data frame
purchase_data_df = pd.read_csv(HeroesOfPymoli)

purchase_data_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [3]:
purchase_data_df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 780 entries, 0 to 779
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Purchase ID  780 non-null    int64  
 1   SN           780 non-null    object 
 2   Age          780 non-null    int64  
 3   Gender       780 non-null    object 
 4   Item ID      780 non-null    int64  
 5   Item Name    780 non-null    object 
 6   Price        780 non-null    float64
dtypes: float64(1), int64(3), object(3)
memory usage: 42.8+ KB


In [4]:
purchase_data_df.describe()


Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,91.755128,3.050987
std,225.310896,6.659444,52.697702,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,47.75,1.98
50%,389.5,22.0,92.0,3.15
75%,584.25,25.0,138.0,4.08
max,779.0,45.0,183.0,4.99


## Player Count

* Display the total number of players


In [5]:
purchase_data_df.describe(include='all')


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
count,780.0,780,780.0,780,780.0,780,780.0
unique,,576,,3,,179,
top,,Lisosia93,,Male,,Final Critic,
freq,,5,,652,,13,
mean,389.5,,22.714103,,91.755128,,3.050987
std,225.310896,,6.659444,,52.697702,,1.169549
min,0.0,,7.0,,0.0,,1.0
25%,194.75,,20.0,,47.75,,1.98
50%,389.5,,22.0,,92.0,,3.15
75%,584.25,,25.0,,138.0,,4.08


In [6]:
# Total number of Players:
Total_Players = purchase_data_df["SN"].nunique()
Total_Players

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [7]:
# Display all columns of the df
purchase_data_df.columns


Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')

In [8]:
#Number of Unique Items

unique_items_list = purchase_data_df['Item Name'].nunique()
unique_items_list

179

In [9]:
# Average Price
Average_Price = round(purchase_data_df["Price"].mean(),2)
Average_Price


3.05

In [10]:
# Number of purchases
Number_of_Purchases = purchase_data_df['Purchase ID'].count()
Number_of_Purchases

780

In [11]:
# Total_revenue
total_revenue = purchase_data_df["Price"].sum()
total_revenue

2379.77

In [12]:
unique_gender_list = purchase_data_df['Gender'].value_counts()

unique_gender_list

Male                     652
Female                   113
Other / Non-Disclosed     15
Name: Gender, dtype: int64

In [13]:
Average_purchase_price = purchase_data_df["Price"].mean()
round(Average_purchase_price,2)

3.05

In [14]:
total_purchases = purchase_data_df["Purchase ID"].count()
total_purchases

780

In [15]:
total_revenue = purchase_data_df["Price"].sum()
total_revenue

2379.77

## Gender Demographics

In [16]:
gender_demo_df = purchase_data_df[['Purchase ID','SN','Gender']]
gender_demo_df.head()

Unnamed: 0,Purchase ID,SN,Gender
0,0,Lisim78,Male
1,1,Lisovynya38,Male
2,2,Ithergue48,Male
3,3,Chamassasya86,Male
4,4,Iskosia90,Male


In [17]:
gender_demo_results = gender_demo_df.groupby('Gender')['SN'].agg(['nunique'])
gender_demo_results['Percentage of Players'] = gender_demo_results['nunique']/gender_demo_results['nunique'].sum()
# gender_demo_results['Percentage of Players'].style.format("{:.2%}")
gender_demo_results

Unnamed: 0_level_0,nunique,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,0.140625
Male,484,0.840278
Other / Non-Disclosed,11,0.019097



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [18]:
purchasing_analysis_gender_df = purchase_data_df.groupby('Gender')['SN'].agg(['count'])

purchasing_analysis_gender_df

Unnamed: 0_level_0,count
Gender,Unnamed: 1_level_1
Female,113
Male,652
Other / Non-Disclosed,15


In [19]:
purchasing_analysis_gender_df = purchase_data_df.groupby('Gender')['Price'].agg(['mean'])

purchasing_analysis_gender_df

Unnamed: 0_level_0,mean
Gender,Unnamed: 1_level_1
Female,3.203009
Male,3.017853
Other / Non-Disclosed,3.346


In [20]:
purchasing_analysis_gender_df = purchase_data_df.groupby('Gender')['Price'].agg(['sum'])

purchasing_analysis_gender_df

Unnamed: 0_level_0,sum
Gender,Unnamed: 1_level_1
Female,361.94
Male,1967.64
Other / Non-Disclosed,50.19


In [33]:
# # purchasing_analysis_male_df = purchase_data_df.loc[""]
purchasing_analysis_gender_df = purchase_data_df.set_index('Gender')

purchasing_analysis_gender_df = purchasing_analysis_gender_df.groupby(['Gender'])
percent = purchasing_analysis_gender_df["Price"].sum()/purchasing_analysis_gender_df["SN"].nunique()
# purchasing_analysis_df["Purchase ID"].count()
purchasing_analysis_gender_df["SN"].nunique()
print(percent)
# purchase_data_df.agg(Purchase Count=pd.NamedAgg(column=''.aggfunc='count'))
purchasing_analysis_gender_df

Gender
Female                   4.468395
Male                     4.065372
Other / Non-Disclosed    4.562727
dtype: float64


<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001F48484CAF0>

In [None]:
male_count_df = purchase_data_df.loc[purchase_data_df['Gender'] == "Male"]
male_count = male_count_df["Gender"].count()
male_count

In [None]:
male_player_percentage = male_count / total_players_count*100
print(f'{round(male_player_percentage,2)} %')

In [None]:
purchasing_analysis_gender_df = purchase_data_df.set_index('Gender')

purchasing_analysis_gender_df = purchasing_analysis_gender_df.groupby(['Gender'])

purchasing_analysis_gender_df.agg(Purchase_Count=pd.NamedAgg(column='Purchase ID', aggfunc='count'),
                                  Average_Purchase_Price=pd.NamedAgg(column='Price', aggfunc='mean'),
                                  Total_Purchase_Value=pd.NamedAgg(column='Price', aggfunc='sum'))           


# purchasing_analysis_gender_df["Average_Purchase_Price"] = purchasing_analysis_gender_df["Average_Purchase_Price"] .map("${:.2f}".format)
# purchasing_analysis_gender_df["Total_Purchase_Value"] = purchasing_analysis_gender_df["Total_Purchase_Value"] .map("${:.2f}".format)

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [22]:
purchase_data_df.describe()


Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,91.755128,3.050987
std,225.310896,6.659444,52.697702,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,47.75,1.98
50%,389.5,22.0,92.0,3.15
75%,584.25,25.0,138.0,4.08
max,779.0,45.0,183.0,4.99


## Purchasing Analysis (Age)

In [23]:
bins = [0,9, 14, 19, 24, 29, 34, 39, 45]
bin_labels = ["<10", "10-14", "15-19","20-24","25-29","30-34","35-39","40+"]

In [24]:
gender_demo_df['percentage'] = gender_demo_df['Mathematics_score']/df1['Mathematics_score'].sum()

KeyError: 'Mathematics_score'

In [25]:
purchase_data_df["Age_Bins"] = pd.cut(purchase_data_df["Age"],bins=bins,labels=bin_labels)
purchase_data_df

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age_Bins
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,20-24
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,40+
2,2,Ithergue48,24,Male,92,Final Critic,4.88,20-24
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,20-24
4,4,Iskosia90,23,Male,131,Fury,1.44,20-24
...,...,...,...,...,...,...,...,...
775,775,Aethedru70,21,Female,60,Wolf,3.54,20-24
776,776,Iral74,21,Male,164,Exiled Doomblade,1.63,20-24
777,777,Yathecal72,20,Male,67,"Celeste, Incarnation of the Corrupted",3.46,20-24
778,778,Sisur91,7,Male,92,Final Critic,4.19,<10


In [26]:
age_groups = purchase_data_df.groupby('Age_Bins')
age_groups['Purchase ID'].count()

Age_Bins
<10       23
10-14     28
15-19    136
20-24    365
25-29    101
30-34     73
35-39     41
40+       13
Name: Purchase ID, dtype: int64

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

