### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
df = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
#ttl num players (unique screen names)=576
sn=df['SN'].copy()
sn.value_counts()

Lisosia93          5
Iral74             4
Idastidru52        4
Inguron55          3
Ilarin91           3
Pheodaisun84       3
Chadolyla44        3
Chamjask73         3
Hiaral50           3
Sondastsda82       3
Umolrian85         3
Iskadarya95        3
Tyidaim51          3
Rarallo90          3
Aelin32            3
Lisopela58         3
Asur53             3
Chamimla85         3
Iri67              3
Lassilsala30       3
Phyali88           3
Chanastnya43       3
Haillyrgue51       3
Silaera56          3
Lisim78            3
Hada39             3
Strithenu87        3
Idai61             3
Yathecal82         3
Tyisur83           3
                  ..
Yaliru88           1
Tyialisti80        1
Tyarithn67         1
Chanirrasta87      1
Eoralrap26         1
Seolollo93         1
Sundista37         1
Lirtirra37         1
Iljask75           1
Adairialis76       1
Fironon91          1
Eyista89           1
Pheutherin27       1
Dyally87           1
Iduelis31          1
Yadaisuir65        1
Lisista27    

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
items=len(df['Item ID'].unique())
avgprice= df['Price'].mean()
purcha = pd.DataFrame({'Items':[items],'AvePrice':[avgprice]})
purcha['AvePrice'] = purcha['AvePrice'].map('${:,.2f}'.format)
purcha.head()

Unnamed: 0,AvePrice,Items
0,$3.05,183


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
gen=df['Gender'].copy()
genc=pd.DataFrame(gen.value_counts())
genc=genc.rename(columns={'Gender':'Count'})

genp=pd.DataFrame(gen.value_counts(normalize=True))
genp=genp.rename(columns={'Gender':'Percentage'})
genp['Percentage'] = (genp['Percentage']*100).map('{0:.2f}%'.format)

gend = pd.concat([genc, genp], axis=1)
gend.head()

Unnamed: 0,Count,Percentage
Male,652,83.59%
Female,113,14.49%
Other / Non-Disclosed,15,1.92%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
gpa = df.groupby('Gender').agg({'Purchase ID':'count',
                                'SN': 'nunique',
                                'Price':['mean', 'sum']})
gpa.columns=gpa.columns.get_level_values(1)

gpa = gpa.rename(columns={'count':'Purchases',
                          'mean':'Aveprice',
                          'sum':'Ttl'})

gpa['Avg Per Person'] = \
              gpa['Ttl']/gpa['nunique']
gpa['Aveprice'] = gpa['Aveprice'].map('${:.2f}'.format)
gpa['Ttl'] = gpa['Ttl'].map('${:,.2f}'.format)
gpa['Avg Per Person'] = gpa['Avg Per Person'].map('${:.2f}'.format)
gpa = gpa.drop('nunique', axis=1)
gpa


Unnamed: 0_level_0,Aveprice,Ttl,Purchases,Avg Per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,$3.20,$361.94,113,$4.47
Male,$3.02,"$1,967.64",652,$4.07
Other / Non-Disclosed,$3.35,$50.19,15,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
ab = np.linspace(0,50,6)
labels = ['0-9','10-19','20-29','30-39','40-49']

aggrp=df[['SN','Age']]
aggrp['Age_Group']=pd.cut(df['Age'],ab,labels=labels)
ttl=aggrp.groupby('Age_Group').count()['SN'].sum()
aggrp=aggrp.groupby('Age_Group').agg({'SN':'count'})
aggrp=aggrp.rename(columns={'SN':'Ttl_Count'})
aggrp['%Players']=(aggrp['Ttl_Count']/10).map('{0:.2f}%'.format)
aggrp

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


Unnamed: 0_level_0,Ttl_Count,%Players
Age_Group,Unnamed: 1_level_1,Unnamed: 2_level_1
0-9,32,3.20%
10-19,254,25.40%
20-29,402,40.20%
30-39,85,8.50%
40-49,7,0.70%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
pba=df.loc[:,['Age','Purchase ID','SN','Price']]

pba['Age Group'] = pd.cut(pba['Age'],ab,labels=labels)
pba=pba.groupby('Age Group').agg({'Purchase ID':'count',
                                  'SN':'nunique',
                                  'Price':['mean','sum']})
pba.columns=pba.columns.get_level_values(1)
pba=pba.rename(columns={'mean':'avgprice',
                        'sum':'total$'})

pba['avg$perpers']=pba['total$']/pba['nunique']
pba['avgprice']=pba['avgprice'].map('${:.2f}'.format)
pba['total$']=pba['total$'].map('${:,.2f}'.format)
pba['avg$perpers']=pba['avg$perpers'].map('${:.2f}'.format)
pba.index.name = None
pba=pba.drop(columns='nunique',axis=1)
pba.dropna()

pba


Unnamed: 0,avgprice,total$,count,avg$perpers
0-9,$3.40,$108.96,32,$4.54
10-19,$3.06,$778.16,254,$4.07
20-29,$2.99,"$1,203.06",402,$4.13
30-39,$3.15,$268.06,85,$4.25
40-49,$3.08,$21.53,7,$3.08


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [8]:
tops=df.loc[:,['SN','Purchase ID','Price']]

tops=tops.groupby('SN').agg({'Purchase ID':'count','Price':['mean','sum']})

tops.columns=tops.columns.get_level_values(1)
tops=tops.rename(columns={'mean':'AveragePrice',
                          'sum':'TotalSpent'}).sort_values('TotalSpent',ascending=False)

tops['AveragePrice']=tops['AveragePrice'].map('${:.2f}'.format)
tops['TotalSpent']=tops['TotalSpent'].map('${:,.2f}'.format)
tops.head()


Unnamed: 0_level_0,AveragePrice,TotalSpent,count
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,$3.79,$18.96,5
Idastidru52,$3.86,$15.45,4
Chamjask73,$4.61,$13.83,3
Iral74,$3.40,$13.62,4
Iskadarya95,$4.37,$13.10,3


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [9]:
pop=df.loc[:,['Item ID','Item Name','Purchase ID','Price']]

pop=pop.groupby(['Item ID','Item Name']).agg({'Purchase ID':'count','Price':['mean','sum']})
pop=pop.rename(columns={'mean':'price',
                        'sum':'cost'})

pop.columns=pop.columns.get_level_values(1)

pop=pop.sort_values('count',ascending=False)
pop['price']=pop['price'].map('${:.2f}'.format)
pop['cost']=pop['cost'].map('${:,.2f}'.format)
pop.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,price,cost,count
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",$4.23,$50.76,12
145,Fiery Glass Crusader,$4.58,$41.22,9
108,"Extraction, Quickblade Of Trembling Hands",$3.53,$31.77,9
82,Nirvana,$4.90,$44.10,9
19,"Pursuit, Cudgel of Necromancy",$1.02,$8.16,8


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [10]:
prof=pop.sort_values('cost',ascending=False)
prof.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,price,cost,count
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
63,Stormfury Mace,$4.99,$9.98,2
29,"Chaos, Ender of the End",$1.98,$9.90,5
173,Stormfury Longsword,$4.93,$9.86,2
1,Crucifer,$3.26,$9.78,3
38,"The Void, Vengeance of Dark Magic",$2.37,$9.48,4
