### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
purchase_data = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchases_df = pd.read_csv(purchase_data)
purchases_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
unique_player = purchases_df["SN"].nunique()

total_players = purchases_df["SN"].unique()
len(total_players)

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
unique_items = purchases_df["Item Name"].unique()
len(unique_items)

179

In [4]:
# Determine the average purchase price
average_price = purchases_df["Price"].mean()
average_price

3.050987179487176

In [5]:
# Determine total number of purchases 
total_purchases = purchases_df["Purchase ID"].unique()
len(total_purchases)

780

In [6]:
# Determine Total Revenue
total_revenue = purchases_df["Price"].sum()
total_revenue

2379.77

## Gender Demographics

In [7]:
# Determine the possible genders used by the players
gender_demographics = purchases_df["Gender"].unique()
gender_demographics

array(['Male', 'Other / Non-Disclosed', 'Female'], dtype=object)

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [8]:
# Count the genders
gender_counts = purchases_df["Gender"].unique()
gender_counts

array(['Male', 'Other / Non-Disclosed', 'Female'], dtype=object)

In [9]:
# Group the players by their screen names and genders
gender_group_df = purchases_df.groupby('Gender')["SN"].unique()
print(gender_group_df)

Gender
Female                   [Lisassa64, Reunasu60, Reulae52, Assosia88, Ph...
Male                     [Lisim78, Lisovynya38, Ithergue48, Chamassasya...
Other / Non-Disclosed    [Chanosian48, Siarithria38, Haerithp41, Sundim...
Name: SN, dtype: object


In [10]:
# Determine the count and percentage of female players
female_count=len(gender_group_df[0])
female_count

81

In [11]:
# Determine the percentage of female players
female_perc=female_count/len(total_players)*100
female_perc

14.0625

In [12]:
# Determine the count of male players
male_count=len(gender_group_df[1])
male_count

484

In [13]:
# Determine the percentage of male players
male_perc=male_count/len(total_players)*100
male_perc

84.02777777777779

In [14]:
# Determine the count of Other / Non-Disclosed
other_count=len(gender_group_df[2])
other_count

11

In [15]:
# Determine the percentage of Other / Non-Disclosed, using [2]
other_count=other_count/len(total_players)*100
other_count

1.9097222222222223

## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [16]:
total_players_df = pd.DataFrame([unique_player],columns =["Total Players"])
total_players_df

Unnamed: 0,Total Players
0,576


In [17]:
purchasing_analysis_df = pd.DataFrame()
gender_purch_group_df = purchases_df.groupby(['Gender'])['Price']
purchasing_analysis_df['Purchase Count'] = gender_purch_group_df.size()
purchasing_analysis_df['Average Purchase Price'] = gender_purch_group_df.mean().round(2).map("${:,.2f}".format)
purchasing_analysis_df['Total Purchase Value'] = gender_purch_group_df.sum().map("${:,.2f}".format)
unique_df = purchases_df.groupby(['SN', 'Gender'])['Price'].sum()
unique_df = unique_df.reset_index()
purchasing_analysis_df['Avg Total Purchase Per Person'] = unique_df.groupby(['Gender'])['Price'].mean().map("${:,.2f}".format)

purchasing_analysis_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase Per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [18]:
# Create the bins in which Data will be held

bins = [0, 11, 15, 19, 23, 27, 31, 34]
group_labels = ["11", "15", "19","23","27", "31", "34"]

group_bin_df = purchases_df.copy()
group_bin_df.drop_duplicates(['SN', 'Gender'], inplace=True, keep='first')
group_bin_df.head()


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [19]:
#purchases_df["Gender"] = pd.cut(purchases_df["Age"], bins, labels=group_names)
#purchases_df
pd.cut(purchases_df["Age"], bins, labels=group_labels).head()


0     23
1    NaN
2     27
3     27
4     23
Name: Age, dtype: category
Categories (7, object): [11 < 15 < 19 < 23 < 27 < 31 < 34]

In [20]:
# Creating a group based off of the bins
purchases_df = purchases_df.groupby("Gender")
purchases_df.max()

Unnamed: 0_level_0,Purchase ID,SN,Age,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,775,Yoishirrala98,40,183,Yearning Mageblade,4.9
Male,779,Zontibe81,45,183,Yearning Mageblade,4.99
Other / Non-Disclosed,747,Sundim98,38,163,Warped Iron Scimitar,4.75


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [21]:
top_spenders_df = pd.DataFrame()
snSpenders_df = purchases_df.groupby(['SN'])['Price']
top_spenders_df['Purchase Count'] = snSpenders_df.size()
top_spenders_df['Average Purchase Price'] = snSpenders_df.mean().round(2)
top_spenders_df['Total Purchase Value'] = snSpenders_df.sum()
top_spenders_df = top_spenders_df.sort_values(by='Total Purchase Value', ascending=False)

top_spenders_df.head(10)


AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects, try using the 'apply' method

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [22]:
popular_items_df = pd.DataFrame()
ItemId_df = purchases_df.groupby(['Item ID','Item Name'])['Price']
popular_items_df['Purchase Count'] = ItemId_df.mean().round(2)
popular_items_df['Total Purchase Value'] = ItemId_df.sum()
popular_items_df = popular_items_df.sort_values(by='Purchase Count', ascending=False)

popular_items_df.head(10)

AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects, try using the 'apply' method

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [23]:
popular_items_df = pd.DataFrame()
ItemId_df = purchases_df.groupby(['Item ID','Item Name'])['Price']
popular_items_df['Purchase Count'] = ItemId_df.mean().round(2)
popular_items_df['Total Purchase Value'] = ItemId_df.sum()
popular_items_df = popular_items_df.sort_values(by='Total Purchase Value', ascending=False)

popular_items_df.head(10)

AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects, try using the 'apply' method