### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Omar Eltorai's Observations from the Analysis
* Despite players who identify as non-Male (Other or Non-Disclose or Female) representing a smaller portion of the overall population (14% female, 2% Other/Non-Disclose), this segment of players spends more on average per person (USD4.47 for Females and USD4.56 for Other or Non-Disclose) than the Male players (USD4.07)

* The most popular game by total sales is "Oathbreaker, Last Hope of the Breaking Storm", which sold 12 copies (USD4.23 each) and brought in a total sale of USD50.76. The second most popular game by total sales was Nirvana, which sold 9 copies (USD4.90 each) and brought in a total sale of USD44.10. 

* The single player who purchased the most games was Lisosia93, who purchased a total of 5 games.  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
# Show all the column headers to better understand the data
purchase_data.columns



Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')

In [3]:
# Count the number of rows in data set to see if any gaps or null values
purchase_data.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

In [4]:
# Count the rows per SN to see if a single player made multiple purchases
purchase_data["SN"].value_counts()

Lisosia93         5
Iral74            4
Idastidru52       4
Iskadarya95       3
Raesty92          3
Saedaiphos46      3
Haillyrgue51      3
Lassilsala30      3
Zontibe81         3
Hada39            3
Chamimla85        3
Chamjask73        3
Phyali88          3
Inguron55         3
Asur53            3
Sondastsda82      3
Tyisur83          3
Rarallo90         3
Aina42            3
Ialallo29         3
Umolrian85        3
Idai61            3
Lisopela58        3
Saistyphos30      3
Phaena87          3
Ilarin91          3
Siallylis44       3
Aelin32           3
Hiaral50          3
Strithenu87       3
                 ..
Yasur35           1
Haerith37         1
Alaesu91          1
Chanadar44        1
Marughi89         1
Lisirra25         1
Alaesu77          1
Adastirin33       1
Firon67           1
Lisassala98       1
Mindjasksya61     1
Ceoral34          1
Jiskimsda56       1
Tyaelly53         1
Pheusrical25      1
Lisista63         1
Alaephos75        1
Farusrian86       1
Tyirinu79         1


In [5]:
# Create a df which holds only the unique players' SN
unique_players = pd.DataFrame(purchase_data["SN"].value_counts())

# Rename the columns for the new dataframe
unique_players.reset_index(inplace=True)
unique_players.columns = ["SN","Total Players"]

# Check to see the new df was created
unique_players.head()

Unnamed: 0,SN,Total Players
0,Lisosia93,5
1,Iral74,4
2,Idastidru52,4
3,Iskadarya95,3
4,Raesty92,3


In [6]:
# Create new table with total players
total_players_table = pd.DataFrame(unique_players["Total Players"].value_counts())

# Show table
total_players_table.sum()

Total Players    576
dtype: int64

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [7]:
# Number of unique items, average price, number of purchases, total revenue
###total items
total_items = pd.DataFrame(purchase_data["Item Name"].value_counts())
total_items.reset_index(inplace=True)
total_items.columns = ["Item Name","Count"]
total_items_table = len(total_items)

###average price
average_price_table = purchase_data["Price"].mean()

###number of purchases
number_purchases_table = len(purchase_data["Purchase ID"])

### total revenue
total_revenue_table = purchase_data["Price"].sum()

In [8]:
# Create a dataframe with all of the values requested and populate the correct headers
purchase_analysis = pd.DataFrame({
    "Number of Unique Items": [total_items_table], 
    "Average Price": [average_price_table], 
    "Number of Purchases": [number_purchases_table], 
    "Total Revenue": [total_revenue_table]})
purchase_analysis

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,3.050987,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [9]:
# First get an understanding of the gender data
purchase_data["Gender"].value_counts()



Male                     652
Female                   113
Other / Non-Disclosed     15
Name: Gender, dtype: int64

In [10]:
###Create a dataframe with unique players and gender, drop duplicates
SN_gender_dups = (pd.DataFrame({"SN": purchase_data["SN"],
                         "Gender": purchase_data["Gender"]}).sort_values(by=["SN"]))

SN_gender = pd.DataFrame(SN_gender_dups.drop_duplicates())
SN_gender.head(10)

Unnamed: 0,SN,Gender
467,Adairialis76,Male
142,Adastirin33,Female
388,Aeda94,Male
28,Aela59,Male
630,Aelaria33,Male
766,Aelastirin39,Male
705,Aelidru27,Male
52,Aelin32,Male
43,Aelly27,Male
286,Aellynun67,Male


In [11]:
###Sum of all players
players_sum = (len(SN_gender["SN"]))

###Sum of all male, female, other players
male_count = SN_gender["Gender"].eq('Male').sum()
female_count = SN_gender["Gender"].eq('Female').sum()
other_ND_count = SN_gender["Gender"].eq('Other / Non-Disclosed').sum()

###Create percentages of all male, female, other players
male_perc = ((male_count / players_sum)*100).round(2).astype(str)+'%'
female_perc = ((female_count / players_sum)*100).round(2).astype(str)+'%'
other_ND_perc = ((other_ND_count / players_sum)*100).round(2).astype(str)+'%'

In [12]:
###Create output table
# Create a dataframe with all of the values requested and populate the correct headers
gender_analysis = pd.DataFrame({
    "Total Count": [male_count, female_count, other_ND_count], 
    "Percentage of Players": [male_perc, female_perc, other_ND_perc]})
gender_analysis.rename(index={0: 'Male',
                             1: 'Female',
                             2: 'Other / Non-Disclosed'})

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [55]:
# Create dataframes with the purchase info for each gender
male_purchase = purchase_data.loc[lambda purchase_data: purchase_data["Gender"] == 'Male', ["Price"]]
male_purchase_count = male_purchase["Price"].count()
male_purchase_avgpx = male_purchase["Price"].mean()
male_purchase_total = male_purchase["Price"].sum()
male_purchase_pp = ((male_purchase["Price"].sum()) / (male_count))

female_purchase = purchase_data.loc[lambda purchase_data: purchase_data["Gender"] == 'Female', ["Price"]]
female_purchase_count = female_purchase["Price"].count()
female_purchase_avgpx = female_purchase["Price"].mean()
female_purchase_total = female_purchase["Price"].sum()
female_purchase_pp = ((female_purchase["Price"].sum()) / (female_count))

other_ND_purchase = purchase_data.loc[lambda purchase_data: purchase_data["Gender"] == 'Other / Non-Disclosed', ["Price"]]
other_ND_purchase_count = other_ND_purchase["Price"].count()
other_ND_purchase_avgpx = other_ND_purchase["Price"].mean()
other_ND_purchase_total = other_ND_purchase["Price"].sum()
other_ND_purchase_pp = ((other_ND_purchase["Price"].sum()) / (other_ND_count))
# .style.format('${:.2f}')

In [56]:
male_purchase_count

652

In [57]:
# Create dataframe to cleanly display values
purchasing_analysis_gender = pd.DataFrame({"Purchase Count": [male_purchase_count, female_purchase_count, other_ND_purchase_count],
                                   "Average Purchase Price": [male_purchase_avgpx, female_purchase_avgpx, other_ND_purchase_avgpx],
                                   "Total Purchase Price": [male_purchase_total, female_purchase_total, other_ND_purchase_total],
                                   "Average Total per Person": [male_purchase_pp, female_purchase_pp, other_ND_purchase_pp]})
purchasing_analysis_gender = purchasing_analysis_gender.rename(index={0: 'Male',
                                 1: 'Female',
                                 2: 'Other / Non-Disclosed'})

###Comment to Grader --> I do not know why the data information (e.g., dtype: int64) is shown in the table, and struggled to find a way to remove

In [58]:
purchasing_analysis_gender.style.format({"Average Purchase Price": "${:.2f}",
                                        "Total Purchase Price": "${:.2f}",
                                        "Average Total per Person": "${:.2f}"})

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Price,Average Total per Person
Male,652,$3.02,$1967.64,$4.07
Female,113,$3.20,$361.94,$4.47
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [19]:
# Create the bins
bins = [0, 9, 14, 19, 24, 29, 34, 39, 200]

# Create bin labels
bin_labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

In [20]:
# Create df with unique players and their ages
age_demographics_dups = (pd.DataFrame({"SN": purchase_data["SN"],
                         "Age": purchase_data["Age"]}))
                         
age_demographics = pd.DataFrame(age_demographics_dups.drop_duplicates())
age_demographics.sort_values(by=["SN"])

# Cut data and put into bins
pd.cut(age_demographics["Age"], bins, labels=bin_labels)

# Age Demographics view
age_demographics.head(10)

Unnamed: 0,SN,Age
0,Lisim78,20
1,Lisovynya38,40
2,Ithergue48,24
3,Chamassasya86,24
4,Iskosia90,23
5,Yalae81,22
6,Itheria73,36
7,Iskjaskst81,20
8,Undjask33,22
9,Chanosian48,35


In [21]:
# Place the data series into a new column inside of the age_demographics df
age_demographics["Age Group"] = pd.cut(age_demographics["Age"], bins, labels=bin_labels)
age_demographics.head()

Unnamed: 0,SN,Age,Age Group
0,Lisim78,20,20-24
1,Lisovynya38,40,40+
2,Ithergue48,24,20-24
3,Chamassasya86,24,20-24
4,Iskosia90,23,20-24


In [22]:
# Create a GroupBy object based upon "Age Group"
age_group = age_demographics.groupby("Age Group")

# Find how many rows fall into each bin
print(age_group["SN"].count())


Age Group
<10       17
10-14     22
15-19    107
20-24    258
25-29     77
30-34     52
35-39     31
40+       12
Name: SN, dtype: int64


In [32]:
# Create table to present data requested
age_demo_total_count = age_demographics["SN"].value_counts().sum()
age_demo_bin_count = age_demographics["Age Group"].value_counts()
age_demo_bin_pct = (((age_demo_bin_count) / (age_demo_total_count))*100).round(2).astype(str)+'%'

age_demo_bin_table = pd.DataFrame({"Total Count": age_demo_bin_count,
                                "Percent of Players": age_demo_bin_pct})
age_demo_bin_table.sort_index()

Unnamed: 0,Total Count,Percent of Players
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


Unnamed: 0,Total Count,Percentage of Players
<10,17,2.95
10-14,22,3.82
15-19,107,18.58
20-24,258,44.79
25-29,77,13.37
30-34,52,9.03
35-39,31,5.38
40+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [24]:
# Using the previously defined bins and labels, group the purchase_data
purchase_age_bin = pd.cut(purchase_data["Age"], bins, labels=bin_labels)

# Place the data series into a new column inside of the purchase_data df
purchase_data["Age Group"] = pd.cut(purchase_data["Age"], bins, labels=bin_labels)
purchase_data.head()


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Group
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,20-24
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,40+
2,2,Ithergue48,24,Male,92,Final Critic,4.88,20-24
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,20-24
4,4,Iskosia90,23,Male,131,Fury,1.44,20-24


In [35]:
# Calculate the purchase count, avg purchase px, total purchase value, avg total purchase per person
# First pull only the data needed from purchase_data df: SN, Age Group, Price
purchase_data_age_binned = pd.DataFrame(purchase_data.loc[:, ["Purchase ID", "Age Group", "Price"]])

# Find the values requested
purchase_data_age_grouped = purchase_data_age_binned.groupby("Age Group")
purchase_data_age_grouped_count = purchase_data_age_grouped["Price"].count() 
purchase_data_age_grouped_avpx = '$' + purchase_data_age_grouped["Price"].mean().round(2).astype(str)
purchase_data_age_grouped_total = '$' + purchase_data_age_grouped["Price"].sum().round(2).astype(str)
purchase_data_age_grouped_avgpp = '$' + (purchase_data_age_grouped["Price"].sum() / age_demographics["Age Group"].value_counts()).round(2).astype(str)

# Create table to present the values requested
purchase_data_age_group_table = pd.DataFrame({"Purchase Count": purchase_data_age_grouped_count,
                                             "Average Purchase Price": purchase_data_age_grouped_avpx,
                                             "Total Purchase Price": purchase_data_age_grouped_total,
                                             "Avg Total Purchase/Person": purchase_data_age_grouped_avgpp})
purchase_data_age_group_table


Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Price,Avg Total Purchase/Person
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,$1114.06,$4.32
25-29,101,$2.9,$293.0,$3.81
30-34,73,$2.93,$214.0,$4.12
35-39,41,$3.6,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19
<10,23,$3.35,$77.13,$4.54


Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19
<10,23,$3.35,$77.13,$4.54


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [49]:
# First pull only the data needed from purchase_data df: SN, Purchase ID, Price
top_spenders = pd.DataFrame(purchase_data.loc[:, ["SN", "Purchase ID", "Price"]])

# Find the values requested
top_spenders_grouped = top_spenders.groupby("SN")
top_spenders_grouped_count = top_spenders_grouped["Price"].count() 
top_spenders_grouped_avpx = '$' + top_spenders_grouped["Price"].mean().round(2).astype(str)
top_spenders_grouped_total = '$' + top_spenders_grouped["Price"].sum().round(2).astype(str)

# Create table to present the values requested
top_spenders_grouped_table = pd.DataFrame({"Purchase Count": top_spenders_grouped_count,
                                             "Average Purchase Price": top_spenders_grouped_avpx,
                                             "Total Purchase Price": top_spenders_grouped_total})
top_spenders_table = top_spenders_grouped_table.sort_values(by='Purchase Count', ascending=False)
top_spenders_table.head()

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Price
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Iral74,4,$3.4,$13.62
Idastidru52,4,$3.86,$15.45
Asur53,3,$2.48,$7.44
Inguron55,3,$3.7,$11.11


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [93]:
# First pull only the data needed from purchase_data df: SN, Purchase ID, Price
popular_items = pd.DataFrame(purchase_data.loc[:, ["Item ID", "Item Name", "Price"]])

# Find the values requested
popular_items_grouped = popular_items.groupby(["Item ID", "Item Name"])
popular_items_grouped_name = popular_items_grouped["Item Name"]
popular_items_grouped_count = popular_items_grouped["Price"].count().round(0) 
popular_items_grouped_avpx = '$' + popular_items_grouped["Price"].mean().round(2).astype(str)
popular_items_grouped_total = '$' + popular_items_grouped["Price"].sum().round(2).astype(str)

# Create table to present the values requested
popular_items_grouped_table = pd.DataFrame({"Purchase Count": popular_items_grouped_count,
                                            "Item Price": popular_items_grouped_avpx,
                                            "Total Purchase Price": popular_items_grouped_total})
popular_items_table = popular_items_grouped_table.sort_values(by='Purchase Count', ascending=False)
popular_items_table.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
82,Nirvana,9,$4.9,$44.1
19,"Pursuit, Cudgel of Necromancy",8,$1.02,$8.16


Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
82,Nirvana,9,$4.90,$44.10
19,"Pursuit, Cudgel of Necromancy",8,$1.02,$8.16


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [102]:
# Using the table above, perform the requested sort
profit_items = pd.DataFrame(purchase_data.loc[:, ["Item ID", "Item Name", "Price"]])

# Find the values requested
profit_items_grouped = profit_items.groupby(["Item ID", "Item Name"])
profit_items_grouped_name = profit_items_grouped["Item Name"]
profit_items_grouped_count = profit_items_grouped["Price"].count().round(0) 
profit_items_grouped_avpx = profit_items_grouped["Price"].mean().round(2)
profit_items_grouped_total = profit_items_grouped["Price"].sum().round(2)

# Create table to present the values requested
profit_items_grouped_table = pd.DataFrame({"Purchase Count": profit_items_grouped_count,
                                            "Item Price": '$' + profit_items_grouped_avpx.astype(str),
                                            "Total Purchase Price": profit_items_grouped_total})
profit_items_table = profit_items_grouped_table.sort_values(by='Total Purchase Price', ascending=False)

# Clean final table format (add in the $ signs)
profit_items_table['Total Purchase Price'] = '$' + profit_items_table['Total Purchase Price'].astype(str)
profit_items_table.head()


Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Price
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.9,$44.1
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.8


Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
