### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

In [2]:
# File to Load (Remember to Change These)
file = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
df = pd.read_csv(file)

df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [3]:
#Per Alexander's suggestion, we need to do an info of the Data Frame created
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 780 entries, 0 to 779
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Purchase ID  780 non-null    int64  
 1   SN           780 non-null    object 
 2   Age          780 non-null    int64  
 3   Gender       780 non-null    object 
 4   Item ID      780 non-null    int64  
 5   Item Name    780 non-null    object 
 6   Price        780 non-null    float64
dtypes: float64(1), int64(3), object(3)
memory usage: 42.8+ KB


In [4]:
#Per Alexander's suggestion, we should always do a shape of the data frame
df.shape

(780, 7)

* Display the total number of players


In [5]:
#This is the total number of Players by Screen_Names (SN)
df.SN.nunique()

576

In [6]:
#Creating Summary One data frame
ttl_plyrs = df.SN.nunique()

Smmry1 = pd.DataFrame()
Smmry1["Total Players"] = [ttl_plyrs]

Smmry1

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [7]:
#Obtain number of unique items
df["Item ID"].nunique()

179

In [8]:
#Obtain average price
df.Price.mean()

3.050987179487176

In [9]:
#Obtain sum of all Purchases
df.Price.sum()

2379.77

In [10]:
#Obtain the total number of Purchases
len(df)

780

In [11]:
#Created Summary Two data frame to hold the results of the four values
ttl_itms = df["Item ID"].nunique()
ttl_rws = len(df)
ttl_prc = df.Price.sum()
avg_prc = df.Price.mean()

#Give the displayed data cleaner formatting
Smmry2 = pd.DataFrame()
Smmry2["Number of of Unique Items"] = [ttl_itms]
Smmry2["Number of Purchases"] = [ttl_rws]
Smmry2["Average Price"] = [avg_prc]
Smmry2["Total Revenue"] = [ttl_prc]

Smmry2

Unnamed: 0,Number of of Unique Items,Number of Purchases,Average Price,Total Revenue
0,179,780,3.050987,2379.77


## Gender Demographics

In [12]:
#Displaying data frame
df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [13]:
#Count Players
cols = ["SN", "Age", "Gender"]

plyrs = df.loc[:, cols]
plyrs.head()

Unnamed: 0,SN,Age,Gender
0,Lisim78,20,Male
1,Lisovynya38,40,Male
2,Ithergue48,24,Male
3,Chamassasya86,24,Male
4,Iskosia90,23,Male


In [32]:
#Dropping duplicates b/c one Screen_Name can make several purchases
plyrs = plyrs.drop_duplicates().reset_index(drop=True)
plyrs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 576 entries, 0 to 575
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   SN      576 non-null    object
 1   Age     576 non-null    int64 
 2   Gender  576 non-null    object
dtypes: int64(1), object(2)
memory usage: 13.6+ KB


In [19]:
#Group By Gender
plyrs.groupby("Gender").size()

Gender
Female                    81
Male                     484
Other / Non-Disclosed     11
dtype: int64

In [16]:
#Percentage of Female players
plyrs.groupby("Gender").size()/len(plyrs)

Gender
Female                   0.140625
Male                     0.840278
Other / Non-Disclosed    0.019097
dtype: float64

In [20]:
#This sorts it by the VALUE, which is why Male is now on top of list
plyrs.Gender.value_counts()

Male                     484
Female                    81
Other / Non-Disclosed     11
Name: Gender, dtype: int64

In [21]:
#This sorts it by the VALUE of percentages, which is why Male is now on top of list
plyrs.Gender.value_counts() / len(plyrs)

Male                     0.840278
Female                   0.140625
Other / Non-Disclosed    0.019097
Name: Gender, dtype: float64

In [25]:
#Creating two columns (Total Count and % of Players) by GroupBy of Gender
#Note - order is different b/c it is alphabetized and not on Value Count
plyr_cnt = plyrs.groupby("Gender").size()
plyr_prcnt = plyrs.groupby("Gender").size()/len(plyrs)

# This is concatenating two series - Source: https://stackoverflow.com/a/18062521
summ3 = pd.concat([plyr_cnt, plyr_prcnt], axis=1)
summ3.columns = ["Total Count", 'Percentage of Players']

summ3

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,0.140625
Male,484,0.840278
Other / Non-Disclosed,11,0.019097


In [26]:
df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [31]:
#Source:  https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.core.groupby.DataFrameGroupBy.agg.html
df.groupby("Gender").agg({"Purchase ID": "count", "Price": "mean"})

Unnamed: 0_level_0,Purchase ID,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,113,3.203009
Male,652,3.017853
Other / Non-Disclosed,15,3.346



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, average item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

