### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [None]:
# import modules pandas and os
import pandas as pd
import os

# file path to open purchase data
file_purchase_data = os.path.join("Resources", "purchase_data.csv")

# Read purchase data file and store in dataframe
purchase_data_df = pd.read_csv(file_purchase_data)
#preview purchsae data DataFrame
purchase_data_df

## Player Count

* Display the total number of players


In [None]:
#count the number of unique players
player_count = len(purchase_data_df["SN"].unique())
#add the total player count to a dataframe
total_players_df = pd.DataFrame({"Total Players": [player_count]})
#display the total number of players and align-right the data
total_players_df.style.set_properties(**{'text-align': 'right'})


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.
* Create a summary data frame to hold the results
* Optional: give the displayed data cleaner formatting
* Display the summary data frame


In [None]:
#collect purchasing data: unique item count, average purchase price, total number of purchases, total revenue from purchases
item_count = len(purchase_data_df["Item Name"].unique())
average_purchase_price = round(purchase_data_df["Price"].mean(),2)
number_of_purchases = len(purchase_data_df["Purchase ID"])
total_revenue = purchase_data_df["Price"].sum()
#add puchasing data to a dataframe
purchasing_anaysis_df = pd.DataFrame({"Number of Unique Items": [item_count],
                                      "Average Price": [average_purchase_price],
                                      "Number of Purchases": [number_of_purchases],
                                      "Total Revenue": [total_revenue]
                                     })
#format currency data in puchasing anaysis dataframe
purchasing_anaysis_df["Average Price"] = purchasing_anaysis_df["Average Price"].map("${:,.2f}".format)
purchasing_anaysis_df["Total Revenue"] = purchasing_anaysis_df["Total Revenue"].map("${:,.2f}".format)

#display the summary of the purchasing analysis and align-right
purchasing_anaysis_df.style.set_properties(**{'text-align': 'right'})


## Gender Demographics

* Percentage and Count of Male Players
* Percentage and Count of Female Players
* Percentage and Count of Other / Non-Disclosed


In [None]:
#add age bins to purchase data for future use in Age Demogarphics analysis
age_bin = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+", "45+"]
age_list = purchase_data_df["Age"]
age_group = pd.cut(age_list,range(5,55,5), right=False, labels=age_bin)
#create a new purchase dataframe with age group added
purchase_data_with_age_group_df = purchase_data_df
purchase_data_with_age_group_df["Age Group"] = age_group
#replace the age group for any player with an age of 45 and up with the age group "40+" to create one bin for all players age 40 and up
purchase_data_with_age_group_df["Age Group"] = purchase_data_with_age_group_df["Age Group"].replace({"45+": "40+"})

#pull player data and create a dataframe with a unique list of players
player_data_with_age_group_df = purchase_data_with_age_group_df.loc[:,["SN", "Age", "Gender", "Age Group"]]
player_data_with_age_group_df.sort_values("SN", inplace = True)
player_data_with_age_group_df.drop_duplicates(subset = None, keep = 'first', inplace = True)
total_players = len(player_data_with_age_group_df)

#create a gender dataframe using "Gender" as the index to collect gender demographic data
gender_df = player_data_with_age_group_df.set_index("Gender")
#create lists of players for each gender
female_players = gender_df.loc["Female", "SN"]
male_players = gender_df.loc["Male", "SN"]
other_players = gender_df.loc["Other / Non-Disclosed", "SN"]
#create a demographics dataframe for gender data
demograph_gend_df = pd.DataFrame({"Gender": ["Male", "Female", "Other / Non-Disclosed"],   
                             "Total Count": [len(male_players), len(female_players), len(other_players)],
                             "Percentage of Players": [len(male_players) / total_players, len(female_players) / total_players, len(other_players) / total_players]
                            })

#set the index to "Gender"
demograph_gend_df = demograph_gend_df.set_index("Gender")

#format percentage data in demographics by gender dataframe
demograph_gend_df["Percentage of Players"] = demograph_gend_df["Percentage of Players"].map("{:,.2%}".format)

#display the Demographics based on gender and right-align data
demograph_gend_df.style.set_properties(**{'text-align': 'right'})



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender
* Create a summary data frame to hold the results
* Optional: give the displayed data cleaner formatting
* Display the summary data frame

In [None]:
#create new purchasing dataframes for each gender
purch_by_male_df = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Gender"] == "Male",:]
purch_by_female_df = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Gender"] == "Female",:]
purch_by_other_df = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Gender"] == "Other / Non-Disclosed",:]
#group purchases for each player within each gender dataframe
group_by_male_df = purch_by_male_df.groupby(["SN"])
group_by_female_df = purch_by_female_df.groupby(["SN"])
group_by_other_df = purch_by_other_df.groupby(["SN"])
#record the sum of each player's total purchases
total_by_male = group_by_male_df["Price"].sum()
total_by_female = group_by_female_df["Price"].sum()
total_by_other = group_by_other_df["Price"].sum()
#create a dataframe to store the purchasing analysis by gender data
purch_analysis_gend_df = pd.DataFrame({"Gender": ["Male", "Female", "Other / Non-Disclosed"],
                                       "Purchase Count": [len(purch_by_male_df), len(purch_by_female_df), len(purch_by_other_df)],
                                       "Average Purchase Price": [round(purch_by_male_df["Price"].sum() / len(purch_by_male_df),2), round(purch_by_female_df["Price"].sum() / len(purch_by_female_df),2),
                                                                  round(purch_by_other_df["Price"].sum() / len(purch_by_other_df),2)], 
                                       "Total Purchase Value": [purch_by_male_df["Price"].sum(), purch_by_female_df["Price"].sum(), purch_by_other_df["Price"].sum()],
                                       "Avg Total Purchase per Person": [round(total_by_male.sum() / len(total_by_male),2), round(total_by_female.sum() / len(total_by_female),2),
                                                                         round(total_by_other.sum() / len(total_by_other),2)]
                                      })

#set the index to "Gender"
purch_analysis_gend_df = purch_analysis_gend_df.set_index("Gender")

#format currency data in puchasing anaysis by gender dataframe
purch_analysis_gend_df["Average Purchase Price"] = purch_analysis_gend_df["Average Purchase Price"].map("${:,.2f}".format)
purch_analysis_gend_df["Total Purchase Value"] = purch_analysis_gend_df["Total Purchase Value"].map("${:,.2f}".format)
purch_analysis_gend_df["Avg Total Purchase per Person"] = purch_analysis_gend_df["Avg Total Purchase per Person"].map("${:,.2f}".format)


#display the purchasing analysis by gender and align-right
purch_analysis_gend_df.style.set_properties(**{'text-align': 'right'})


## Age Demographics

* Establish bins for ages
* Categorize the existing players using the age bins. Hint: use pd.cut()
* Calculate the numbers and percentages by age group
* Create a summary data frame to hold the results
* Optional: round the percentage column to two decimal points
* Display Age Demographics Table


In [None]:
#create an age dataframe using "Age Group" as the index to collect age demographic data
age_df = player_data_with_age_group_df.set_index("Age Group")
#collect a list of players for each age group
under10 = age_df.loc["<10","SN"]
tento14 = age_df.loc["10-14","SN"]
fifteento19 = age_df.loc["15-19","SN"]
twentyto24 = age_df.loc["20-24","SN"]
twenty5to29 = age_df.loc["25-29","SN"]
thirtyto34 = age_df.loc["30-34","SN"]
thirty5to39 = age_df.loc["35-39","SN"]
fourtyplus = age_df.loc["40+","SN"]
#create a demographics dataframe for age data
demograph_age_df = pd.DataFrame({"Age Group": ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"],
                                "Total Count": [len(under10), len(tento14), len(fifteento19), len(twentyto24), len(twenty5to29), len(thirtyto34), len(thirty5to39), len(fourtyplus)],
                                 "Percentage of Players": [len(under10)/total_players, len(tento14)/total_players, len(fifteento19)/total_players, len(twentyto24)/total_players, 
                                                           len(twenty5to29)/total_players, len(thirtyto34)/total_players, len(thirty5to39)/total_players, len(fourtyplus)/total_players]
                                })

#set the index to "Age Group"
demograph_age_df = demograph_age_df.set_index("Age Group")

#format percentage data in demographics by age dataframe
demograph_age_df["Percentage of Players"] = demograph_age_df["Percentage of Players"].map("{:,.2%}".format)

#display the Demographics based on age and slign-right
demograph_age_df.style.set_properties(**{'text-align': 'right'})

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age
* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below
* Create a summary data frame to hold the results
* Optional: give the displayed data cleaner formatting
* Display the summary data frame

In [None]:
#create new purchasing dataframes for each age group
purch_by_under10 = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "<10",:]
purch_by_tento14 = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "10-14",:]
purch_by_fifteento19 = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "15-19",:]
purch_by_twentyto24 = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "20-24",:]
purch_by_twenty5to29 = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "25-29",:]
purch_by_thirtyto34 = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "30-34",:]
purch_by_thirty5to39 = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "35-39",:]
purch_by_fourtyplus = purchase_data_with_age_group_df.loc[purchase_data_with_age_group_df["Age Group"] == "40+",:]

#group purchases for each player within each age group dataframe
group_by_under10 = purch_by_under10.groupby(["SN"])
group_by_tento14 = purch_by_tento14.groupby(["SN"])
group_by_fifteento19 = purch_by_fifteento19.groupby(["SN"])
group_by_twentyto24 = purch_by_twentyto24.groupby(["SN"])
group_by_twenty5to29 = purch_by_twenty5to29.groupby(["SN"])
group_by_thirtyto34 = purch_by_thirtyto34.groupby(["SN"])
group_by_thirty5to39 = purch_by_thirty5to39.groupby(["SN"])
group_by_fourtyplus = purch_by_fourtyplus.groupby(["SN"])

#record the sum of each player's total purchases
total_by_under10 = group_by_under10["Price"].sum()
total_by_tento14 = group_by_tento14["Price"].sum()
total_by_fifteento19 = group_by_fifteento19["Price"].sum()
total_by_twentyto24 = group_by_twentyto24["Price"].sum()
total_by_twenty5to29 = group_by_twenty5to29["Price"].sum()
total_by_thirtyto34 = group_by_thirtyto34["Price"].sum()
total_by_thirty5to39 = group_by_thirty5to39["Price"].sum()
total_by_fourtyplus = group_by_fourtyplus["Price"].sum()

#create a dataframe to store the purchasing analysis by age group data
purch_analysis_age_df = pd.DataFrame({"Age Group": ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"],
                                       "Purchase Count": [len(purch_by_under10), len(purch_by_tento14), len(purch_by_fifteento19), len(purch_by_twentyto24),
                                                        len(purch_by_twenty5to29),len(purch_by_thirtyto34),len(purch_by_thirty5to39),len(purch_by_fourtyplus)],
                                       "Average Purchase Price": [round(purch_by_under10["Price"].sum()/len(purch_by_under10),2),
                                                                  round(purch_by_tento14["Price"].sum()/len(purch_by_tento14),2),
                                                                  round(purch_by_fifteento19["Price"].sum()/len(purch_by_fifteento19),2), 
                                                                  round(purch_by_twentyto24["Price"].sum()/len(purch_by_twentyto24),2), 
                                                                  round(purch_by_twenty5to29["Price"].sum()/len(purch_by_twenty5to29),2), 
                                                                  round(purch_by_thirtyto34["Price"].sum()/len(purch_by_thirtyto34),2), 
                                                                  round(purch_by_thirty5to39["Price"].sum()/len(purch_by_thirty5to39),2), 
                                                                  round(purch_by_fourtyplus["Price"].sum()/len(purch_by_fourtyplus),2)],
                                       "Total Purchase Value": [purch_by_under10["Price"].sum(), purch_by_tento14["Price"].sum(), purch_by_fifteento19["Price"].sum(), 
                                                                purch_by_twentyto24["Price"].sum(), purch_by_twenty5to29["Price"].sum(), purch_by_thirtyto34["Price"].sum(),
                                                                purch_by_thirty5to39["Price"].sum(), purch_by_fourtyplus["Price"].sum()],
                                       "Avg Total Purchase per Person": [round(total_by_under10.sum()/len(total_by_under10),2),
                                                                         round(total_by_tento14.sum()/len(total_by_tento14),2),
                                                                         round(total_by_fifteento19.sum()/len(total_by_fifteento19),2), 
                                                                         round(total_by_twentyto24.sum()/len(total_by_twentyto24),2), 
                                                                         round(total_by_twenty5to29.sum()/len(total_by_twenty5to29),2), 
                                                                         round(total_by_thirtyto34.sum()/len(total_by_thirtyto34),2), 
                                                                         round(total_by_thirty5to39.sum()/len(total_by_thirty5to39),2), 
                                                                         round(total_by_fourtyplus.sum()/len(total_by_fourtyplus),2)]
                                     })

#set the index to "Age Group"
purch_analysis_age_df = purch_analysis_age_df.set_index("Age Group")

#format currency data in purchase analysis by age dataframe
purch_analysis_age_df["Average Purchase Price"] = purch_analysis_age_df["Average Purchase Price"].map("${:,.2f}".format)
purch_analysis_age_df["Total Purchase Value"] = purch_analysis_age_df["Total Purchase Value"].map("${:,.2f}".format)
purch_analysis_age_df["Avg Total Purchase per Person"] = purch_analysis_age_df["Avg Total Purchase per Person"].map("${:,.2f}".format)

#display the purchasing analysis by age group and align-right
purch_analysis_age_df.style.set_properties(**{'text-align': 'right'})


## Top Spenders

* Run basic calculations to obtain the results in the table below
* Create a summary data frame to hold the results
* Sort the total purchase value column in descending order
* Optional: give the displayed data cleaner formatting
* Display a preview of the summary data frame


In [None]:
#create a dataframe with all purchases and group all purchases by player
grouped_purchases_with_every_player_df = purchase_data_with_age_group_df[["SN", "Price"]]
grouped_purchases_with_every_player_df = grouped_purchases_with_every_player_df.groupby(["SN"])

#create a list of players, calculate total value of purchases, total count of purchases, and average purchase price for each player
player_list = player_data_with_age_group_df["SN"].tolist()
value_purchases_for_every_player = grouped_purchases_with_every_player_df.sum()
value_list = value_purchases_for_every_player["Price"].tolist()
count_purchases_for_every_player = grouped_purchases_with_every_player_df.count()
count_list = count_purchases_for_every_player["Price"].tolist()
avg_purchases_for_every_player = round(value_purchases_for_every_player["Price"] / count_purchases_for_every_player["Price"],2)
avg_list = avg_purchases_for_every_player[:].tolist()

#create dataframe that lists each player, total value of purchases for each player, and total count of purchases for each player
top_spender_df = pd.DataFrame({"SN": player_list,
                                "Purchase Count": count_list,
                                "Average Purchase Price": avg_list,
                                "Total Purchase Value": value_list
                              })
#set index to "SN"
top_spender_df = top_spender_df.set_index("SN")

#sort top spenders from highest Total purchase value to lowest
top_spender_df = top_spender_df.sort_values("Total Purchase Value", ascending = False)

#format currency data in top spenders dataframe
top_spender_df["Average Purchase Price"] = top_spender_df["Average Purchase Price"].map("${:,.2f}".format)
top_spender_df["Total Purchase Value"] = top_spender_df["Total Purchase Value"].map("${:,.2f}".format)

#display the top 5 spenders and align-right
top_spender_df.head().style.set_properties(**{'text-align': 'right'})


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns
* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value
* Create a summary data frame to hold the results
* Sort the purchase count column in descending order
* Optional: give the displayed data cleaner formatting
* Display a preview of the summary data frame


In [None]:
#create a dataframe with a list of unique itemsand a dataframe with all item purchases and group all purchases by item
purchases_by_item_df = purchase_data_with_age_group_df[["Purchase ID", "Item ID", "Item Name", "Price"]]

#pull item data and create a dataframe with a unique list of items.  Create lists of unique item data for use in Popular Items dataframe
item_data_df = purchases_by_item_df.loc[:,["Item ID", "Item Name"]]
item_data_df.sort_values("Item Name", inplace = True)
item_data_df.drop_duplicates(subset = None, keep = 'first', inplace = True)

item_id_list = item_data_df["Item ID"].tolist()
item_name_list = item_data_df["Item Name"].tolist()

#group item purchases by "Item Name" , calculate item purchase count and total item purchase value, and create lists for use in Popular Items dataframe
purchases_by_item_df = purchases_by_item_df.groupby(["Item Name"])
value_purchases_by_item = purchases_by_item_df.sum()
item_value_list = value_purchases_by_item["Price"].tolist()
count_purchases_by_item = purchases_by_item_df.count()
item_count_list = count_purchases_by_item["Price"].tolist()
avg_item_price = round(value_purchases_by_item["Price"] / count_purchases_by_item["Price"],2)
avg_item_price_list = avg_item_price[:].tolist()

#create dataframe that lists each item, item price, total value of purchases for each item, and total count of purchases for each item
popular_items_df = pd.DataFrame({"Item ID": item_id_list,
                                 "Item Name": item_name_list,
                                 "Purchase Count": item_count_list,
                                 "Average Item Price": avg_item_price_list,
                                 "Total Purchase Value": item_value_list
                              })
#copy Popular items dataframe for use in Profitable items anaysis
profitable_items_df = popular_items_df

#set index to "Item ID"
popular_items_df = popular_items_df.set_index("Item ID")

#sort popular items from highest Purchase count to lowest
popular_items_df = popular_items_df.sort_values("Purchase Count", ascending = False)

#format currency data in popular items dataframe
popular_items_df["Average Item Price"] = popular_items_df["Average Item Price"].map("${:,.2f}".format)
popular_items_df["Total Purchase Value"] = popular_items_df["Total Purchase Value"].map("${:,.2f}".format)

#display the top 10 most popular items
popular_items_df.head(10).style.set_properties(**{'text-align': 'right'})

## Most Profitable Items

* Sort the above table by total purchase value in descending order
* Optional: give the displayed data cleaner formatting
* Display a preview of the data frame


In [None]:
# #profitable_items_df = popular_items_df

#set index to "Item ID"
profitable_items_df = profitable_items_df.set_index("Item ID")

#sort profitable items from highest Total Purchase value to lowest
profitable_items_df = profitable_items_df.sort_values("Total Purchase Value", ascending = False)

#format currency data in popular items dataframe
profitable_items_df["Average Item Price"] = profitable_items_df["Average Item Price"].map("${:,.2f}".format)
profitable_items_df["Total Purchase Value"] = profitable_items_df["Total Purchase Value"].map("${:,.2f}".format)

#display the top 10 most profitable items
profitable_items_df.head(10).style.set_properties(**{'text-align': 'right'})

In [None]:
#print trends based on final analysis
print("""
Based on the final analysis, I can note the following three purchasing trends for the game Heroes of Pymoli:


 - Female and other/non-disclosed players are more likely than male players to purchase an item in the game.  
And, for the purchases that female and other/non-disclosed players make, the purchase price for each item is higher on average than the purchase price for item purchased by males.
      -Females players made an average of 1.40 purchases per female player with an average purchase price of $3.20 per purchase.
      -Other/non-disclosed players made an average of 1.36 purchases per other/non-disclosed player with an average purchase price of $3.35 per purchase.
      -Male players made an average of 1.35 purchases per male player with an average purchase price of $3.02 per purchase.
      
 - The majority of players fall within the age group 20-24 (45% of players) and are the most likely group to purchase a item in the game.  
 On average, the purchase price for players age 20-24 matches average purchase price for all players across the entire game.
       -Players within the age group 20-24 made an average of 1.41 purchases per player with an average purchase price of $3.05 per purchase.
       -The age group with the highest purchase average is players aged 35-39 - this group has an average purchase price of $3.60 per purchase.
       
 - The most popular and the most profitable item is the Final Critic.  The Final Critic was purchased more often than any other item and produced more revenue than any other item.
       -The purchase price for the Final Critic was among the highest priced items.  The purchase price for items range between $1.00 and $4.99 per purchase.  
       -The purchase price for the Final Critic averaged at $4.61 per purchase.  So, the higher price was not a deterrent.   
 
""")
