# **Heroes Of Pymoli**

Analyze data for the game Heroes of Pymoli by first creating a purchase data frame 

In [None]:
# Dependencies and Setup
import pandas as pd
import os

In [None]:
# Identify a variable for the path needed to retrieve the data
pymoli_path = os.path.join('Resources', 'purchase_data.csv')

# Read purchase_data.csv file and store into Pandas DataFrame
gaming_df = pd.read_csv(pymoli_path)

# View the values in first 5 rows of data frame
gaming_df.head()

In [None]:
# Identify all column headers in the data
gaming_df.columns

## Player Count

Using the data frame created from purchase_data.csv, identify the number of players

In [None]:
# Calculate number of players using unique values for screen name column
players_count = len(gaming_df['SN'].unique())

# Load the players_count in a data frame to display
players_count_df = pd.DataFrame({
    'Total Players':[players_count]
})

# View the data frame
players_count_df

## Purchasing Analysis (Total)

Perform basic calculations on purchasing data to be displayed in a summary data frame

In [None]:
# Apply formatting for the $ values
pd.options.display.float_format = '${:,.2f}'.format

In [None]:
# Create variables and perform calculations:
# Number of Unique Items, Average Price, Number of Purchases, Total Revenue
unique_items = len(gaming_df['Item ID'].unique())
average_price = gaming_df['Price'].mean()
num_purchases = len(gaming_df['Purchase ID'].unique())
total_revenue = gaming_df['Price'].sum()

In [None]:
# Create a summary_df to hold the results of above calculations
summary_df = pd.DataFrame({
    'Number of Unique Items':[unique_items],
    'Average Price':[average_price],
    'Number of Purchases':[num_purchases],       
    'Total Revenue':[total_revenue]
})

summary_df

## Gender Demographics

Identify the count for the three different gender classifications in the data and display the count and percentage of total in a data frame

In [None]:
# Identify the players who made purchases and include demographics
players_df = gaming_df[['SN', 'Gender', 'Age']]

# Using gender_df remove duplicates (using the values of 'SN' column)
players_df = players_df.drop_duplicates(subset=['SN'])

players_df

In [None]:
# Count the results for each gender
count_gender = players_df['Gender'].value_counts()

# Create a data frame to display count of males, females and others/non-disclosed
count_gender_df = players_df['Gender'].value_counts().to_frame()
count_gender_df = count_gender_df.rename(columns={'Gender':'Total Count'})

In [None]:
# Apply a format for any float for the section to come
pd.options.display.float_format = '{:,.2f}%'.format

In [None]:
# Calculate and format percentages to add to the data frame
count = len(players_df['SN'])

for_percentage_gender = count_gender / count * 100
count_gender_df['Percentage of Players'] = for_percentage_gender

count_gender_df


## Purchasing Analysis (Gender)

Create a data frame to display the purchase counts, average purchase price, purchase total per person and average total purchase per person, by gender 

In [None]:
# Create a data frame for all purchases split out by gender
gaming_by_gender_df = gaming_df.groupby(['Gender'])

# To visualize use a data function count
gaming_by_gender_df.count()

In [None]:
# Create data frame to display the gender analysis beginning with the gender distribution for purchase count
count_gender_all_df = gaming_df['Gender'].value_counts().to_frame()
count_gender_all_df = count_gender_all_df.rename(columns={'Gender':'Purchase Count'})

count_gender_all_df

In [None]:
# Create a variable to hold the purchase count value to use in calculations
count_by_gender = count_gender_all_df['Purchase Count'] 

# Calculate average purchase price and total purchase using gaming_by_gender_df 
average_purchase_price_by_gender = gaming_by_gender_df['Price'].mean()
total_purchase_by_gender = gaming_by_gender_df['Price'].sum()

# Calculate average total purchase per person by again considering only the number of unique individuals 
average_purchase_per_person = total_purchase_by_gender / count_gender

In [None]:
# Apply formatting for the values to be displayed in the data frame
pd.options.display.float_format = '${:,.2f}'.format

In [None]:
# Complete build of this data frame to input the desired calcluations as additional columns
count_gender_all_df['Average Purchase Price'] = average_purchase_price_by_gender
count_gender_all_df['Total Purchase Value'] = total_purchase_by_gender
count_gender_all_df['Average Total Purchase per Person'] = average_purchase_per_person

# Note: Alternate way to format with .map, like
# count_gender_all_df['Average Purchase Price'].map('${:,.2}'.format)

# Display the data frame
count_gender_all_df

## Age Demographics

Display an Age Demographics Table by using bins to calculate count and percentages for each age group

In [None]:
# Update the formatting for percentages in the following blocks of code
pd.options.display.float_format = '{:.2f}%'.format

In [None]:
# Create age bins for the players to span from 0 to > max age 
# max_age = players_df['Age'].max(), result is 45
players = gaming_df[['SN', 'Age', 'Gender']].drop_duplicates()

bins = [0, 9, 14, 19, 24, 29, 34, 39, 99]
group_names = ['<10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+']

players['Age Range'] = pd.cut(players_df['Age'], bins, labels=group_names)

players

In [None]:
# With the bins created, group the players by age to evaluate count and percentage for each bin
age_groups = players.groupby('Age Range').count().Age

# Create a variable to hold the percentage calculated for each age group
# count is a variable already holding the total number of unique players
percentage_of_players = age_groups / count * 100

# Create a data frame to display the counts and percentages for the age ranges
age_groups_df = pd.DataFrame({
    'Total Count':age_groups, 
    'Percentage of Players':percentage_of_players
})

# View the age demographics table that includes counts and percentages for Age bins
age_groups_df

## Purchasing Analysis (Age)

Using bins for each age group, create and display a summary data frame that includes calculations for the purchase count, the average purchase price, the total purchase value and the average purchase total per person 

In [None]:
# Update the formatting for currency in this section
pd.options.display.float_format = '${:,.2f}'.format

# Data frame starting point, gaming_df
gaming_df

In [None]:
# For all purchases create age bins for the players to span from 0 to > max age 
# max_age = players_df['Age'].max(), result is 45
bins_purchases = [0, 9, 14, 19, 24, 29, 34, 39, 99]
group_names_purchases = ['<10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+']

# Add a column to gaming_df that includes the Age Range
gaming_df['Age Range'] = pd.cut(gaming_df['Age'], bins, labels=group_names)

# Display the data frame to see the additional column Age Range
gaming_df.head()

In [None]:
# Group the players by age to evaluate count and percentage for each bin
age_groups_purchases = gaming_df.groupby('Age Range').Price

# Create a data frame including the info from age_groups_purchases
age_groups_purchases_df = pd.DataFrame({
    'Purchase Count': age_groups_purchases.count(),
    'Average Purchase Price': age_groups_purchases.mean(),
    'Total Purchase Value': age_groups_purchases.sum(),  
})

# Display the data frame
age_groups_purchases_df 

In [None]:
# Calculate the average purchase per person and add it as final column of the data frame
average_purchase_per_person = age_groups_purchases.sum() / age_groups
age_groups_purchases_df['Average Total Purchase per Person'] = average_purchase_per_person

# Display the data frame including the final column
age_groups_purchases_df

## Top Spenders

Using the data frame for all purchases, gaming_df, as the starting point, group by screen name to display a summary data frame that includes each player's purchase count, average purchase price, and total dollar amount of purchases (sorting in descending order by total purchase value)

In [None]:
# Group spending by player
summarized_by_SN = gaming_df.groupby('SN').Price

# Create a data frame to hold spending by player and purchase data
summarized_by_SN_df = pd.DataFrame({
    'Purchase Count': summarized_by_SN.count(),
    'Average Purchase Price': summarized_by_SN.mean(),
    'Total Purchase Value': summarized_by_SN.sum(),
})

# Sort by Total Purchase Value, descending (ascending is the default order)
sorted_by_purchases_df = summarized_by_SN_df.sort_values('Total Purchase Value', ascending=False)

# Format from previous df is carrying forward as needed
# Display the first five values for the data frame
sorted_by_purchases_df.head()

## Most Popular Items

Display the item IDs, item names, and item prices in a data frame; calculate the purchase count, the average item price and total purchase value for each item (sort by purchase count in descending order to identify the most popular items) to also display in a data frame

In [None]:
# Create a data frame from gaming_df that includes Item ID, Item Name and Item Price 
gaming_simple_df = gaming_df[['Item ID', 'Item Name', 'Price']]

# Format from previous df is carrying forward as needed
# View the first and last rows of the data frame
gaming_simple_df

In [None]:
# Create a data frame after grouping by Item Id and Item Name
grouped_gaming_simple = gaming_simple_df.groupby(['Item ID', 'Item Name']).Price

grouped_gaming_simple_df = pd.DataFrame({
    'Purchase Count':grouped_gaming_simple.count(),
    'Item Price':grouped_gaming_simple.mean(),
    'Total Purchase Value':grouped_gaming_simple.sum()
})

# Display first and last rows of the data frame
grouped_gaming_simple_df

In [None]:
# Sort the data frame by Purchase Count, highest to lowest
sorted_by_count_df = grouped_gaming_simple_df.sort_values('Purchase Count', ascending=False)

# Display the first five rows (those items purchased most often)
sorted_by_count_df.head()

## Most Profitable Items

To identify the biggest money-makers, sort the table above by total purchase value in descending order and display the preview of the data frame

In [None]:
# Sort the data frame about by total purchase value, to see what items are most profitable
sorted_by_count_df = grouped_gaming_simple_df.sort_values('Total Purchase Value', ascending=False)

# Display the five items that made the most money
sorted_by_count_df.head()

# Observations about gaming data for Heroes of Pymoli

##### Following analysis of purchase data for Heroes of Pymoli these observations can be made:
    
    - Nearly 85% of the players are male.
    - About 85% of all players are 15-34 years of age.
    - Players age 20-24 make up about 45% of all players.
    - Total purchases by players age 20-24 total nearly as much as purchases for all other age groups combined.
    - The items Final Critic, Oathbreaker, and Fiery Glass Crusader are popular and profitable items.    