### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [2]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# Raw data file
file_to_load = "Resources/purchase_data.csv"

# Read purchasing file and store into pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [3]:
player_count = purchase_data['SN'].value_counts()

players = len(player_count)

players

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:
unique_items = purchase_data['Item ID'].value_counts()
item = (len(unique_items))
#print (len(unique_items))
avg_price =  purchase_data['Price'].mean()

#print (avg_price)

purchase = purchase_data['Purchase ID'].count()
#print (purchase)

revenue = purchase_data['Price'].sum()

#print (revenue)

summary = pd.DataFrame({'Number of Unique Items': [item],
                       'Average Price': [avg_price],
                       'Number of Purchases': [purchase],
                       'Total Revenue': [revenue]})

summary['Average Price'] = summary['Average Price'].map("${:,.2f}".format)
summary['Total Revenue'] = summary['Total Revenue'].map("${:,.2f}".format)
summary

Unnamed: 0,Average Price,Number of Purchases,Number of Unique Items,Total Revenue
0,$3.05,780,183,"$2,379.77"


## Gender Demographics

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [5]:
gender_count = purchase_data['Gender'].value_counts()

gender_count

#summary = pd.dataframe({'Gender': gender,
                       #'Total Count': gender_count})


Male                     652
Female                   113
Other / Non-Disclosed     15
Name: Gender, dtype: int64

In [40]:
percentage = (gender_count/ players) * 100
print(percentage)

percentage.to_frame(name=None)

Male                     113.194444
Female                    19.618056
Other / Non-Disclosed      2.604167
Name: Gender, dtype: float64


Unnamed: 0,Gender
Male,113.194444
Female,19.618056
Other / Non-Disclosed,2.604167


In [41]:
gender_summary = pd.DataFrame({"Percentage of Players": percentage,
                       "Total Counts": gender_count})

gender_summary['Percentage of Players'] = gender_summary['Percentage of Players'].map("{0:,.2f}".format)
gender_summary

Unnamed: 0,Percentage of Players,Total Counts
Male,113.19,652
Female,19.62,113
Other / Non-Disclosed,2.6,15



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, etc. by gender


* For normalized purchasing, divide total purchase value by purchase count, by gender


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [8]:
avg_purchase = purchase_data.groupby(['Gender'])

avg_purchase1 = avg_purchase['Price'].mean()
#print(avg_purchase1)


total_purchase = avg_purchase['Price'].sum()
#print(total_purchase)

normal = total_purchase / gender_count

#print(normal)

genderpur_summary = pd.DataFrame({"Purchase Counts": gender_count,
                       "Average Purchase Price": avg_purchase1,
                        "Total Purchase Value": total_purchase,
                        "Normalized Totals": normal})
genderpur_summary['Average Purchase Price'] = genderpur_summary['Average Purchase Price'].map("${:,.2f}".format)
genderpur_summary["Total Purchase Value"] = genderpur_summary["Total Purchase Value"].map("${:,.2f}".format)
genderpur_summary["Normalized Totals"] = genderpur_summary["Normalized Totals"].map("${:,.2f}".format)
genderpur_summary
#gender_summary = pd.DataFrame({"Avg Purchase": avg_purchase,
                       #"Total Counts": gender_count})

Unnamed: 0,Average Purchase Price,Normalized Totals,Purchase Counts,Total Purchase Value
Female,$3.20,$3.20,113,$361.94
Male,$3.02,$3.02,652,"$1,967.64"
Other / Non-Disclosed,$3.35,$3.35,15,$50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [47]:
# Establish bins for ages
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

pd.cut(purchase_data['Age'],age_bins, labels=group_names)

purchase_data['age range'] = pd.cut(purchase_data['Age'],age_bins, labels=group_names)

purchase_data.head()


age_count = purchase_data['age range'].value_counts()

print(age_count)

age_percentage = (age_count / players) * 100

print(age_percentage)


genderage_summary = pd.DataFrame({"Percentage of Players": age_percentage,
                       "Total Counts": age_count})
genderage_summary["Percentage of Players"] = genderage_summary["Percentage of Players"].map("{:,.2f}".format)
genderage_summary

20-24    365
15-19    136
25-29    101
30-34     73
35-39     41
10-14     28
<10       23
40+       13
Name: age range, dtype: int64


Unnamed: 0,Percentage of Players,Total Counts
20-24,63.37,365
15-19,23.61,136
25-29,17.53,101
30-34,12.67,73
35-39,7.12,41
10-14,4.86,28
<10,3.99,23
40+,2.26,13


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, etc. in the table below


* Calculate Normalized Purchasing


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [58]:
# Establish bins for ages
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
pd.cut(purchase_data['Age'],age_bins, labels=group_names)

purchase_data['age range'] = pd.cut(purchase_data['Age'],age_bins, labels=group_names)

purchase_data.head()

app_age = purchase_data.groupby(['age range'])
avg_purchaseage = app_age['Price'].mean()
print(avg_purchaseage)

tot_purchaseage = app_age['Price'].sum()
print(tot_purchaseage)

normal_purage = tot_purchaseage / age_count

genderagepur_summary = pd.DataFrame({"Purchase Count": age_count,
                            "Average Purchase Price": avg_purchaseage, 
                           "Total Purchase Value": tot_purchaseage,
                                    "Normalized Totals": normal_purage})
genderagepur_summary["Average Purchase Price"] = genderagepur_summary["Average Purchase Price"].map("${:,.2f}".format)
genderagepur_summary["Total Purchase Value"] = genderagepur_summary["Total Purchase Value"].map("${:,.2f}".format)
genderagepur_summary["Normalized Totals"] = genderagepur_summary["Normalized Totals"].map("${:,.2f}".format)
genderagepur_summary

age range
<10      3.353478
10-14    2.956429
15-19    3.035956
20-24    3.052219
25-29    2.900990
30-34    2.931507
35-39    3.601707
40+      2.941538
Name: Price, dtype: float64
age range
<10        77.13
10-14      82.78
15-19     412.89
20-24    1114.06
25-29     293.00
30-34     214.00
35-39     147.67
40+        38.24
Name: Price, dtype: float64


Unnamed: 0,Average Purchase Price,Normalized Totals,Purchase Count,Total Purchase Value
10-14,$2.96,$2.96,28,$82.78
15-19,$3.04,$3.04,136,$412.89
20-24,$3.05,$3.05,365,"$1,114.06"
25-29,$2.90,$2.90,101,$293.00
30-34,$2.93,$2.93,73,$214.00
35-39,$3.60,$3.60,41,$147.67
40+,$2.94,$2.94,13,$38.24
<10,$3.35,$3.35,23,$77.13


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [98]:
SN_count = purchase_data['SN'].value_counts()

SN_count

purchaseSN = purchase_data.groupby(['SN'])

avg_purchaseSN = purchaseSN['Price'].mean()
#print(avg_purchaseSN)


total_purchaseSN = purchaseSN['Price'].sum()
#print(total_purchaseSN)


topspender_summary = pd.DataFrame({"Purchase Counts": SN_count,
                       "Average Purchase Price": avg_purchaseSN,
                        "Total Purchase Value": total_purchaseSN})


topspender_sort = topspender_summary.sort_values('Total Purchase Value',ascending=False)

topspender_sort['Average Purchase Price'] = topspender_sort['Average Purchase Price'].map("${:,.2f}".format)
topspender_sort["Total Purchase Value"] = topspender_sort["Total Purchase Value"].map("${:,.2f}".format)
topspender_sort.head()

Unnamed: 0,Average Purchase Price,Purchase Counts,Total Purchase Value
Lisosia93,$3.79,5,$18.96
Idastidru52,$3.86,4,$15.45
Chamjask73,$4.61,3,$13.83
Iral74,$3.40,4,$13.62
Iskadarya95,$4.37,3,$13.10


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [143]:
#Item_count = purchase_data['Item ID'].value_counts()

#Item_count

item = purchase_data.groupby(['Item ID', 'Item Name'])



avg_purchaseitem = item['Price'].mean()
#print(avg_purchaseitem)


total_purchaseitem = item['Price'].sum()
#print(total_purchaseitem)


topitem_summary = pd.DataFrame({"Item Price": avg_purchaseitem,
                                "Total Purchase Value": total_purchaseitem})
                              #"Purchase Count": Item_count})


topitem_summary['Item Price'] = topitem_summary['Item Price'].map("${:,.2f}".format)
topitem_summary["Total Purchase Value"] = topitem_summary["Total Purchase Value"].map("${:,.2f}".format)

topitem_summary.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Splinter,$1.28,$5.12
1,Crucifer,$3.26,$9.78
2,Verdict,$2.48,$14.88
3,Phantomlight,$2.49,$14.94
4,Bloodlord's Fetish,$1.70,$8.50


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [141]:
topitem_sort = topitem_summary.sort_values('Total Purchase Value',ascending=False)

topitem_sort['Item Price'] = topitem_sort['Item Price'].map("${:,.2f}".format)
topitem_sort["Total Purchase Value"] = topitem_sort["Total Purchase Value"].map("${:,.2f}".format)

topitem_sort.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",$4.23,$50.76
82,Nirvana,$4.90,$44.10
145,Fiery Glass Crusader,$4.58,$41.22
92,Final Critic,$4.88,$39.04
103,Singed Scalpel,$4.35,$34.80
