#### HeroesOfPymoli
* The project HerosOfPymoli has a goal of analyzing the data for the most recent fantasy game Heroes of Pymoli, thus generating a report that breaks down the game's purchasing data into meaningful insights. Also We will find atleast three trends based on the final report.
* Dataset : Purchase_data.csv
* Columns : 
    | S.No. | Name | Discription |
    | --- | --- | --- |
    |1. |Purchase ID| This represents the unique purchase ID of all the purchases done by players.|
    |2. |SN| Serial number or game name of each player.|
    |3. |Age| corresponding age of the player.|
    |4. |Gender| Gender of each player.|
    |5. |Item ID| ID of the item purchased by the player.|
    |6. |Item Name| Name of the Item.|
    |7. |Price| Price of the purchase|

* We have loaded the purchase_data.csv in the dataframe name purchase_data



In [1]:
# Dependencies and Setup
import pandas as pd

In [2]:
# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


### Exploring the Dataset
* we have all the information regarding each data type of the column and number of rows.
* We have total of 780 rows and no null values in the dataset.

In [3]:
purchase_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 780 entries, 0 to 779
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Purchase ID  780 non-null    int64  
 1   SN           780 non-null    object 
 2   Age          780 non-null    int64  
 3   Gender       780 non-null    object 
 4   Item ID      780 non-null    int64  
 5   Item Name    780 non-null    object 
 6   Price        780 non-null    float64
dtypes: float64(1), int64(3), object(3)
memory usage: 33.6+ KB


## Player Count

* Since we have duplicate names in SN column, therefore we need the data of total number of player who have made the purchase.
* There are total 576 players who purchased items in the game


In [4]:
total_players = len(purchase_data["SN"].unique())
Total_players =  pd.DataFrame({ "Total Player" : [len(purchase_data["SN"].unique())] })
Total_players


Unnamed: 0,Total Player
0,576


## Purchasing Analysis (Total)

* The total revenue of the company is made by selling 179 unique items is 2379.77 dollars.
* The total number of purchases made is 780.
* The average price of the all the items on sale is $3.05.

In [5]:
Number_of_unique_items = len(purchase_data["Item Name"].unique())
Average_purchase_price = round(purchase_data["Price"].mean(),2)
Total_number_of_purchases = purchase_data["Purchase ID"].count()
Total_revenue = purchase_data["Price"].sum()

summary_data_frame = pd.DataFrame({
    "Number of Unique Items":[Number_of_unique_items] , 
    "Average Price": f'${Average_purchase_price}', 
    "Number of Purchases" : Total_number_of_purchases, 
    "Total Revenue":f'${Total_revenue}' 
})
summary_data_frame

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,$2379.77


## Gender Demographics

* In total of 576 players, percentage of male players is 84.03%, seconded by female players with percentage 14.06%.
* 1.91 % of player have not disclosed their gender perference.




In [6]:
unique_purchase_data = purchase_data.drop_duplicates("SN")
gender_demographic = pd.DataFrame(unique_purchase_data["Gender"].value_counts())
gender_demographic["Percentage"] = (gender_demographic["Gender"]/total_players) *100


gender_demographic = gender_demographic.rename(columns={"Gender":"Total Count"})

gender_demographic["Percentage"] = gender_demographic["Percentage"].map("{:.2f}%".format) 
gender_demographic


Unnamed: 0,Total Count,Percentage
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* The numbers of purchases made, if grouped by Gender shows that male players have made higher number of purchases i.e. 652 with total purchase value equal to 1,967.64 dollars. 
* On the other hand Female player have made total of 113 purchases, making their total purchase value equal to 361.94 dollars.
* Average total purchase per person is nearly same for all gender types. 

In [7]:
gender_group = purchase_data.groupby("Gender")
purchase_count = pd.Series(gender_group["Purchase ID"].count())
average_purchase_price = pd.Series(gender_group["Price"].mean())
Purchase_total_per_gender = pd.Series(gender_group["Price"].sum())
Average_purchase_total_per_person = pd.Series(gender_group["Price"].sum()/gender_demographic["Total Count"])


Purchasing_analysis_Gender = pd.DataFrame({
    "Purchase Count" : purchase_count ,
    "Average Purchase Price":average_purchase_price ,
    "Total Purchase Value" : Purchase_total_per_gender,
    "Avg Total Purchase per Person" : Average_purchase_total_per_person
})

Purchasing_analysis_Gender["Average Purchase Price"] = Purchasing_analysis_Gender["Average Purchase Price"].map("${:.2f}".format) 
Purchasing_analysis_Gender["Total Purchase Value"] = Purchasing_analysis_Gender["Total Purchase Value"].map("${:,.2f}".format) 
Purchasing_analysis_Gender["Avg Total Purchase per Person"] = Purchasing_analysis_Gender["Avg Total Purchase per Person"].map("${:.2f}".format) 

Purchasing_analysis_Gender

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* After diving the "Age" Column into bins of 8, we have much clear understanding of which age group contributes highly in the revenue. 
* The percentage of players in age group 20-24 has higest, followed by the age group in 15-19 and then age group 25-29 closely limiting to 13.37%.
* This indicates players in their teens and in twenties are highly motivated to purchase the items in the game to enhance their gaming experience. 
* Data also indicates that we do have player who are above 40, which shows popularity of the game in older ages also.


In [8]:
unique_purchase_data_new = pd.DataFrame(purchase_data.drop_duplicates("SN"))
bin=[0,9,14,19,24,29,34,39,50]

label_names = ["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]
unique_purchase_data_new["Age-Groups"] = pd.cut(unique_purchase_data_new["Age"],bin,labels=label_names,include_lowest = True)
age_group = unique_purchase_data_new.groupby("Age-Groups")
age_count = pd.Series(age_group["Age-Groups"].count())
age_percentage = pd.Series((age_group["Age-Groups"].count()/total_players) * 100)
avg_purchase_price = pd.Series(age_group["Price"].mean())

age_demographics = pd.DataFrame({
    "Total-Count" : age_count,
    "Percentage of Players": age_percentage
})

age_demographics["Percentage of Players"] = age_demographics["Percentage of Players"].map("{:.2f}%".format) 

age_demographics

Unnamed: 0_level_0,Total-Count,Percentage of Players
Age-Groups,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


## Purchasing Analysis (Age)

* Further we are using the age distributions to analyse the purchasing data. In this distribution, we have data related to each purchase.
* Same trend as above can be seen here also, We have maximum total purchase value in age 20-24 i.e. 1,114.06 dollars, followed by age group 15-19 with total purchase value equals to 412 dollars. 
* If we look at the average purchase price and average total purchase per person in all age groups, age group 35-39 is leading with 3.60 and 4.76 dollars respectively. 

In [9]:
bin=[0,9,14,19,24,29,34,39,50]

label_names = ["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]
purchase_data["Age-Groups"] = pd.cut(purchase_data["Age"],bin,labels=label_names,include_lowest = True)
complete_age_group = purchase_data.groupby("Age-Groups")
complete_age_count = pd.Series(complete_age_group["Age-Groups"].count())
average_purchase_price = pd.Series(complete_age_group["Price"].mean())
Total_purchase_price = pd.Series(complete_age_group["Price"].sum())
avg_purchase_total_per_person = pd.Series(Total_purchase_price/age_count)


purchasing_analysis = pd.DataFrame({
    "Purchase Count" :complete_age_count ,
    "Average Purchase Price" :average_purchase_price,
    "Total Purchase Value" :Total_purchase_price,
    "Avg Total Purchase per Person" :avg_purchase_total_per_person
})
purchasing_analysis["Average Purchase Price"] = purchasing_analysis["Average Purchase Price"].map("${:,.2f}".format) 
purchasing_analysis["Total Purchase Value"] = purchasing_analysis["Total Purchase Value"].map("${:,.2f}".format) 
purchasing_analysis["Avg Total Purchase per Person"] = purchasing_analysis["Avg Total Purchase per Person"].map("${:,.2f}".format) 
purchasing_analysis

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age-Groups,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


## Top Spenders

* We have grouped the dataset according to player names to analyse highest purchase value done by any player and also purchase amount.
* Our table shows that Total Purchase value of 18.96 dollars is the highest any players has spent with an average of 3.79 on each item, closely followed by 15.45 with average price of 3.86 per item.
* Also maximum number of purchases done by any player is 5 and followed by 4 and 3.



In [10]:
group_SN = purchase_data.groupby("SN")
purchase_count_by_SN = pd.Series(group_SN["SN"].count())
avg_price = pd.Series(group_SN["Price"].mean())
total_price_SN = pd.Series(group_SN["Price"].sum())

top_spenders = pd.DataFrame({
    "Purchase Count" : purchase_count_by_SN,
    "Average Purchase Price" :avg_price,
    "Total Purchase Value" :total_price_SN
})

top_spenders["Average Purchase Price"] = top_spenders["Average Purchase Price"].map("${:,.2f}".format) 

top_spenders.sort_values(by="Total Purchase Value",ascending=False).head()

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,18.96
Idastidru52,4,$3.86,15.45
Chamjask73,3,$4.61,13.83
Iral74,4,$3.40,13.62
Iskadarya95,3,$4.37,13.1


## Most Popular Items

* We have grouped the dataset based on item ID and item name to analyse which items are most popular in the game.
* Items "Final Critic" takes the first place with total of purchase count of 13 and has price 4.61 dollars. Next on the list is "Oathbreaker, Last hope of the Breaking Strom" with purchase count of 12 and price 4.23. Then comes the "Fiery Glass Crusader", "Persuasion" and "Extraction, Quickblade of trembling hands" each with purchase count of 9.


In [11]:
popular_items = purchase_data[["Item ID","Item Name","Price"]]
group_popular_items = popular_items.groupby(["Item ID","Item Name"])
purchase_count_popular_items = pd.Series(group_popular_items["Price"].count())
Item_price_popular_items = pd.DataFrame(group_popular_items["Price"].mean())

total_purchase_value_popular_items = list(group_popular_items["Price"].sum())
total_purchase_value_popular_items
most_popular_items= pd.DataFrame({
    "Purchase Count" :purchase_count_popular_items,
    "Item Price" :Item_price_popular_items["Price"],
    "Total Purchase Value": total_purchase_value_popular_items
})

most_popular_items["Item Price"] = most_popular_items["Item Price"].map("${:,.2f}".format)

most_popular_items.sort_values(by="Purchase Count",ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,50.76
145,Fiery Glass Crusader,9,$4.58,41.22
132,Persuasion,9,$3.22,28.99
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,31.77


## Most Profitable Items

* Now if we look at the same dataset grouped by items by another factor "Total Purchasing value".
* We can see the top two items in this list is also same because they have higher purchase count, but the items after them have changed.
* "Nirvana" item closely follows the purchase value of item on second place with total purchase value 44.10 dollars. and just 9 purchases count.
* Other items have maintained a good total purchase value even with low number of purchase count. such as "Singed scalpel".



In [12]:

most_popular_items.sort_values(by="Total Purchase Value",ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,50.76
82,Nirvana,9,$4.90,44.1
145,Fiery Glass Crusader,9,$4.58,41.22
103,Singed Scalpel,8,$4.35,34.8


### Conclusions

* This is game is mostly popular among men within the age range of 15-29. These players are also responsible for the maximum revenue of the company.
* The maximum any player has spent on the items is 18.96 dollars.
* Also the items which are most purchased by the players are below the average item price which is 3.0, this means company can increase prices of some items to increase the average price of the items.
* The Item Nirvana with high price of 4.90 have quite a high number of purchases and has the capability of matching up with top items purchased in the game. So company can increase the presence of "Nirvana" item in the game to push more purchases count.
