### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
%matplotlib notebook
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
pdata = "purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
pdatadf = pd.read_csv(pdata)
pdatadf.head(30)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
5,5,Yalae81,22,Male,81,Dreamkiss,3.61
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67
8,8,Undjask33,22,Male,21,Souleater,1.1
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58


## Player Count

* Display the total number of players


In [2]:
count = len(pdatadf["SN"].value_counts())
countdf = pd.DataFrame({"# Total Players": count}, index=[0])
countdf

Unnamed: 0,# Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#Count unique items
uitems = len(pdatadf['SN'].value_counts())

#average purchase price
avgprice = pdatadf['Price'].mean()

#number of pruchases
prch = pdatadf['Item Name'].count()

#revenue total
rev = pdatadf['Price'].sum()

padf = pd.DataFrame({'# of Unique Items': [uitems], 'AVG Price': [avgprice],
                    'Total Purchases': [prch], 'Total Revenue': [rev]})
padf

Unnamed: 0,# of Unique Items,AVG Price,Total Purchases,Total Revenue
0,576,3.050987,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
gdf = pdatadf.groupby(['Gender'])
ngdf = gdf.nunique()

#tot gender
tgen = ngdf['SN'].sum()
gcount = ngdf['SN'].unique()

#percent
pgen = ngdf['SN']/tgen

#Framed Data
demodf = pd.DataFrame({'% of Players': pgen, '#': gcount})
demodf['% of Players'] = demodf['% of Players'].map("{:,.2%}".format)
demodf

Unnamed: 0_level_0,% of Players,#
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,14.06%,81
Male,84.03%,484
Other / Non-Disclosed,1.91%,11



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
pcount = ngdf['Gender'].value_counts()
# avg and tot purchase price
avgprice = gdf['Price'].mean()
totprice = gdf['Price'].sum()
# fixed totals
fprice = totprice / gcount

#New Frame
Genanalysis = pd.DataFrame({'Average Price': avgprice, 'Purchase Price' : totprice,
                           'Fixed Prices Total': fprice})
#Genanalysis = ['Fixed Prices Total'] = Genanalysis['Fixed Prices Total'].map("${:,.2f}".format)
Genanalysis

Unnamed: 0_level_0,Average Price,Purchase Price,Fixed Prices Total
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,3.203009,361.94,4.468395
Male,3.017853,1967.64,4.065372
Other / Non-Disclosed,3.346,50.19,4.562727


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
newdf = pdatadf.drop_duplicates('SN')

bins= [0,9,14,19,24,29,34,39,100]
group= ["Under 10", "10-14", "14-15", "15-19", "20-24", "30-34", "35-39", "Over 40"]

newdf["Age Brackets"]=pd.cut(newdf["Age"], bins, labels=group)
agedf = newdf.groupby(["Age Brackets"])

totalage = ngdf['Age'].sum()

#count of purchases
page= newdf["Age Brackets"].value_counts()

#% of users
perage = page/count

agedemo = pd.DataFrame({'Total Count of Purchases': page, "% of Players": perage})
agedemo['% of Players']=agedemo['% of Players'].map("{:,.2%}".format)

agedemo

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Total Count of Purchases,% of Players
15-19,258,44.79%
14-15,107,18.58%
20-24,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
10-14,22,3.82%
Under 10,17,2.95%
Over 40,12,2.08%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
avgpage = agedf['Price'].mean()

aprice = agedf['Price'].sum()

#fixed totals
fixedage= aprice/page

#age dataframe for analysis
aaly = pd.DataFrame({"# Purchase": page, "AVG Price": avgpage,
                     "Total Value": aprice, "Fixed Totals": fixedage})
aaly = pd.concat([aaly.loc[["Under 10"],:], aaly.drop("Under 10", axis=0)], axis=0)
aaly



Unnamed: 0,# Purchase,AVG Price,Total Value,Fixed Totals
Under 10,17,3.39,57.63,3.39
10-14,22,3.074545,67.64,3.074545
14-15,107,3.101682,331.88,3.101682
15-19,258,3.063527,790.39,3.063527
20-24,77,2.908182,223.93,2.908182
30-34,52,2.921538,151.92,2.921538
35-39,31,3.51,108.81,3.51
Over 40,12,3.0375,36.45,3.0375


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [8]:
spengrp = pdatadf.groupby(['SN'])

#total per user spent
totsn = spengrp.sum()["Price"]

#per user avg
avgsn = spengrp.mean()["Price"]

#count of purchases
cntsn = spengrp.count()["Price"]

#new df
spendf = pd.DataFrame({"Total Amount Spent": totsn, 
                       "Avg Purchase in $": avgsn,
                      "# of Purchases": cntsn})
sortdf = spendf.sort_values("Total Amount Spent", ascending=False)
sortdf["Avg Purchase in $"]=sortdf["Avg Purchase in $"].map("${:,.2f}".format)
sortdf["Total Amount Spent"]=sortdf["Total Amount Spent"].map("${:,.2f}".format)
sortdf

Unnamed: 0_level_0,Total Amount Spent,Avg Purchase in $,# of Purchases
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,$18.96,$3.79,5
Idastidru52,$15.45,$3.86,4
Chamjask73,$13.83,$4.61,3
Iral74,$13.62,$3.40,4
Iskadarya95,$13.10,$4.37,3
Ilarin91,$12.70,$4.23,3
Ialallo29,$11.84,$3.95,3
Tyidaim51,$11.83,$3.94,3
Lassilsala30,$11.51,$3.84,3
Chadolyla44,$11.46,$3.82,3


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [19]:
items= pdatadf.set_index(["Item ID", "Item Name"])
items= items.groupby(level=["Item ID", "Item Name"])

#find, count, avg items
countid= items.count()["Price"]
totid= items.sum()["Price"]
avgid= items.mean()["Price"]

itemdf= pd.DataFrame({"Item Count": countid, "Total Price": totid,
                     "Avg Purchase Price": avgid})
#sort by Item Count
srtitemdf= itemdf.sort_values("Item Count", ascending=False)

#add $ to the prices columns
srtitemdf["Total Price"]= srtitemdf["Total Price"].map("${:,.2f}".format)
srtitemdf["Avg Purchase Price"]= srtitemdf["Avg Purchase Price"].map("${:,.2f}".format)

srtitemdf

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Count,Total Price,Avg Purchase Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$50.76,$4.23
145,Fiery Glass Crusader,9,$41.22,$4.58
108,"Extraction, Quickblade Of Trembling Hands",9,$31.77,$3.53
82,Nirvana,9,$44.10,$4.90
19,"Pursuit, Cudgel of Necromancy",8,$8.16,$1.02
103,Singed Scalpel,8,$34.80,$4.35
75,Brutality Ivory Warmace,8,$19.36,$2.42
72,Winter's Bite,8,$30.16,$3.77
60,Wolf,8,$28.32,$3.54
59,"Lightning, Etcher of the King",8,$33.84,$4.23


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [20]:
srtitemdf= itemdf.sort_values("Total Price", ascending=False)

#add $ to the prices columns
srtitemdf["Total Price"]= srtitemdf["Total Price"].map("${:,.2f}".format)
srtitemdf["Avg Purchase Price"]= srtitemdf["Avg Purchase Price"].map("${:,.2f}".format)

srtitemdf

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Count,Total Price,Avg Purchase Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$50.76,$4.23
82,Nirvana,9,$44.10,$4.90
145,Fiery Glass Crusader,9,$41.22,$4.58
92,Final Critic,8,$39.04,$4.88
103,Singed Scalpel,8,$34.80,$4.35
59,"Lightning, Etcher of the King",8,$33.84,$4.23
108,"Extraction, Quickblade Of Trembling Hands",9,$31.77,$3.53
78,"Glimmer, Ender of the Moon",7,$30.80,$4.40
72,Winter's Bite,8,$30.16,$3.77
60,Wolf,8,$28.32,$3.54
