### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [3]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [4]:
SN_data = purchase_data['SN']
unique_SN_item = np.unique(SN_data)
unique_SN_frame = pd.DataFrame([len(unique_SN_item)],columns=['Total Players'])

unique_SN_frame.head()

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [13]:
uni_ItemID = np.unique(purchase_data['Item ID'])

average_price = np.mean(np.array(purchase_data['Price']))
total_purchase = len(purchase_data['Purchase ID'])
sum_price = np.sum(np.array(purchase_data['Price']))

Purchasing_Analysis_frame = pd.DataFrame([])
Purchasing_Analysis_frame['Number of Unique Items'] = [len(uni_ItemID)]
Purchasing_Analysis_frame['Average Price'] = ["$"+str(np.round(average_price*100)/100)]
Purchasing_Analysis_frame['Number of Purchases'] = [total_purchase]
Purchasing_Analysis_frame['Total Revenue'] = ["$"+str(np.round(sum_price*100)/100)]

Purchasing_Analysis_frame.head()

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$3.05,780,$2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [8]:
uni_Gender = np.unique(purchase_data['Gender'])

total_Players = len(purchase_data['Gender'])
Gender_data = []
for gender in uni_Gender:
    gender_tmp = purchase_data.loc[(purchase_data['Gender'] == gender),'Gender']
    Gender_data.append([gender,len(gender_tmp),np.round(np.double(len(gender_tmp))*10000/total_Players)/100])

Gender_Demographics_frame = pd.DataFrame(Gender_data,columns=['','Total Count','Percentage of Players'])

Gender_Demographics_frame.head()

Unnamed: 0,Unnamed: 1,Total Count,Percentage of Players
0,Female,113,14.49
1,Male,652,83.59
2,Other / Non-Disclosed,15,1.92



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [11]:
Gender_data = []
for gender in uni_Gender:
    gender_tmp = purchase_data.loc[(purchase_data['Gender'] == gender),'Price']
    sn_tmp = np.unique(purchase_data.loc[(purchase_data['Gender'] == gender), 'SN'])
    tmp_data = []
    for sn in sn_tmp:
        gender_sn_data = purchase_data.loc[(purchase_data['Gender'] == gender) & (purchase_data['SN']==sn), 'Price']
        tmp_data.append(np.sum(gender_sn_data))

    Gender_data.append([gender,len(gender_tmp),"$"+str(np.round(np.mean(gender_tmp)*100)/100), "$"+str(np.sum(gender_tmp)),"$"+str(np.round(np.mean(tmp_data)*100)/100)])

Purchasing_Analysis_frame = pd.DataFrame(Gender_data,columns=['Gender','Purchase Count','Average Purchase Price','Total Purchase Value	Avg Total Purchase per','Avg Total Purchase per Person'])

Purchasing_Analysis_frame.head()


Unnamed: 0,Gender,Purchase Count,Average Purchase Price,Total Purchase Value	Avg Total Purchase per,Avg Total Purchase per Person
0,Female,113,$3.2,$361.94,$4.47
1,Male,652,$3.02,$1967.64,$4.07
2,Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [15]:
bins = [[0,9],[10,14],[15,19],[20,24],[25,29],[30,34],[35,39],[40,100]]
Age_data = []
for age in bins:
    Age_tmp = purchase_data.loc[(purchase_data['Age'] >= age[0]) & (purchase_data['Age'] <= age[1]), 'Price']
    if age[0]==0:
        Age_data.append([str(age[1]+1)+"<", len(Age_tmp),
                         np.round(np.double(len(Age_tmp)) * 10000 / total_Players) / 100])
    elif age[1]==100:
        Age_data.append([ str(age[0])+"+",len(Age_tmp),np.round(np.double(len(Age_tmp))*10000/total_Players)/100])
    else:
        Age_data.append([str(age[0]) + "-" + str(age[1]), len(Age_tmp),
                         np.round(np.double(len(Age_tmp)) * 10000 / total_Players) / 100])

Age_Demographics = pd.DataFrame(Age_data,columns=['','Total Count','Percentage of Players'])

Age_Demographics

Unnamed: 0,Unnamed: 1,Total Count,Percentage of Players
0,10<,23,2.95
1,10-14,28,3.59
2,15-19,136,17.44
3,20-24,365,46.79
4,25-29,101,12.95
5,30-34,73,9.36
6,35-39,41,5.26
7,40+,13,1.67


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [16]:
bins = [[0,9],[10,14],[15,19],[20,24],[25,29],[30,34],[35,39],[40,100]]
Age_data = []
for age in bins:
    Age_tmp = purchase_data.loc[(purchase_data['Age'] >= age[0]) & (purchase_data['Age'] <= age[1]), 'Price']

    sn_tmp = np.unique(purchase_data.loc[(purchase_data['Age'] >= age[0]) & (purchase_data['Age'] <= age[1]), 'SN'])
    tmp_data = []
    for sn in sn_tmp:
        age_sn_data = purchase_data.loc[(purchase_data['Age'] >= age[0]) & (purchase_data['Age'] <= age[1]) & (purchase_data['SN'] == sn), 'Price']
        tmp_data.append(np.sum(age_sn_data))

    if age[0]==0:
        m_str = str(age[1]+1)+"<"
    elif age[1]==100:
        m_str = str(age[0]) + "+"
    else:
        m_str = str(age[0]) + "-" + str(age[1])

    Age_data.append([m_str, len(Age_tmp),"$"+str(np.round(np.mean(Age_tmp) * 100) / 100),"$"+str(np.round(np.sum(Age_tmp)*100)/100),"$"+str(np.round(np.mean(tmp_data)*100)/100)])

Purchasing_Analysis_frame  = pd.DataFrame(Age_data,columns=['','Purchase Count','Average Purchase Price','Total Purchase Value','Avg Total Purchase per Person'])

Purchasing_Analysis_frame

Unnamed: 0,Unnamed: 1,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
0,10<,23,$3.35,$77.13,$4.54
1,10-14,28,$2.96,$82.78,$3.76
2,15-19,136,$3.04,$412.89,$3.86
3,20-24,365,$3.05,$1114.06,$4.32
4,25-29,101,$2.9,$293.0,$3.81
5,30-34,73,$2.93,$214.0,$4.12
6,35-39,41,$3.6,$147.67,$4.76
7,40+,13,$2.94,$38.24,$3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [18]:
uni_sn = np.unique(purchase_data['SN'])
SN_data = []
for sn in uni_sn:
    tmp_sn = purchase_data.loc[(purchase_data['SN'] == sn), 'Price']
    SN_data.append([sn,len(tmp_sn),np.mean(tmp_sn),np.sum(tmp_sn)])

Top_Spenders_frame  = pd.DataFrame(SN_data,columns=['SN','Purchase Count','Average Purchase Price','Total Purchase Value'])
Top_Spenders_frame = Top_Spenders_frame.sort_values(by='Total Purchase Value',ascending=False)

Top_Spenders_frame.head()

Unnamed: 0,SN,Purchase Count,Average Purchase Price,Total Purchase Value
360,Lisosia93,5,3.792,18.96
246,Idastidru52,4,3.8625,15.45
106,Chamjask73,3,4.61,13.83
275,Iral74,4,3.405,13.62
281,Iskadarya95,3,4.366667,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [21]:
uni_ID = np.unique(purchase_data['Item ID'])
ID_data = []
for id in uni_ID:
    tmp_id = purchase_data.loc[(purchase_data['Item ID'] == id), 'Price']
    tmp_idName = purchase_data.loc[(purchase_data['Item ID'] == id), 'Item Name']
    ID_data.append([id,np.array(tmp_idName)[0],len(tmp_id),np.mean(tmp_id),np.sum(tmp_id)])

Most_Popular_Items_frame  = pd.DataFrame(ID_data,columns=['Item ID','Item Name','Purchase Count','Item Price','Total Purchase Value'])
Most_Popular_Items_frame = Most_Popular_Items_frame.sort_values(by='Purchase Count',ascending=False)

Most_Popular_Items_frame.head()

Unnamed: 0,Item ID,Item Name,Purchase Count,Item Price,Total Purchase Value
177,178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
144,145,Fiery Glass Crusader,9,4.58,41.22
107,108,"Extraction, Quickblade Of Trembling Hands",9,3.53,31.77
81,82,Nirvana,9,4.9,44.1
19,19,"Pursuit, Cudgel of Necromancy",8,1.02,8.16


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [22]:
Most_Popular_Items_frame = Most_Popular_Items_frame.sort_values(by='Total Purchase Value',ascending=False)

Most_Popular_Items_frame.head()

Unnamed: 0,Item ID,Item Name,Purchase Count,Item Price,Total Purchase Value
177,178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
81,82,Nirvana,9,4.9,44.1
144,145,Fiery Glass Crusader,9,4.58,41.22
91,92,Final Critic,8,4.88,39.04
102,103,Singed Scalpel,8,4.35,34.8
