# Heores of Pymoli: Game Analytics

In this project we will analyze data for the fictional F2P game "Heroes of Pymoli". As many other games developed on the schema free-to-play, playes are encouraged to purchase optional items that enchance their player experience. 

## Goal: 

Generate a report breaking down the game's purchasing data into meaningful insights.

##  Import libraries and loading data

In [294]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from plotly import graph_objects as go
import plotly.express as px
from scicolorscales import *
from plotly.subplots import make_subplots

In [3]:
# Loading the data set
df = pd.read_csv("data/purchase_data.csv")

In [4]:
df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## 1. What is the total number of players?

In [5]:
df.SN.nunique()

576

We have 576 paying players.

## 2. Purchasing Analysis

**2.1.** What are the items purchased by users?

In [9]:
df["Item Name"].value_counts()[0:10]

Final Critic                                    13
Oathbreaker, Last Hope of the Breaking Storm    12
Fiery Glass Crusader                             9
Persuasion                                       9
Extraction, Quickblade Of Trembling Hands        9
Nirvana                                          9
Pursuit, Cudgel of Necromancy                    8
Retribution Axe                                  8
Lightning, Etcher of the King                    8
Singed Scalpel                                   8
Name: Item Name, dtype: int64

We have 179 different items sold in this cohort. We can see the breakdown of this quantity below:

In [21]:
purchases_item = (df.groupby("Item Name")
                    .agg({"Item ID": "count", "Price": "sum"})
                    .reset_index()
                    .sort_values("Item ID", ascending= False)
                    .rename(columns = {"Item ID": "no_purchases", "Price": "value_purchases"}))

In [22]:
purchases_item.head(10)

Unnamed: 0,Item Name,no_purchases,value_purchases
56,Final Critic,13,59.99
93,"Oathbreaker, Last Hope of the Breaking Storm",12,50.76
98,Persuasion,9,28.99
92,Nirvana,9,44.1
51,"Extraction, Quickblade Of Trembling Hands",9,31.77
55,Fiery Glass Crusader,9,41.22
105,"Pursuit, Cudgel of Necromancy",8,8.16
23,Brutality Ivory Warmace,8,19.36
125,Singed Scalpel,8,34.8
122,"Shadow Strike, Glory of Ending Hope",8,25.28


In [291]:
fig = px.bar(purchases_item.head(9), 
             x= "Item Name", 
             y = "no_purchases",
             color = "no_purchases",
             color_continuous_scale = oslorev)

fig.update_layout(
    title_text = 'Heroes of Pymoli: Most popular items')

fig.show() 



2.2. What is the average purchase price of the items?

In [41]:
(purchases_item["value_purchases"]/purchases_item["no_purchases"]).mean()

3.0385562001623447

The items have an average price of $3.04. 

2.3. What is the total number of purchases?

In [42]:
len(df)

780

In this cohort we had 780 transactions.

2.4. What was the total revenue?

In [45]:
df.Price.sum()

2379.77

We had a revenue of $2379.77 from this Cohort.

2.5. **Plus**: What was the most profitable item?

In [292]:
fig = px.bar(purchases_item.sort_values("value_purchases", ascending= False).head(6), 
             x= "Item Name", 
             y = "value_purchases",
             color = "value_purchases",
             color_continuous_scale = oslorev)

fig.update_layout(
    title_text = 'Heroes of Pymoli: Most profitable items')

fig.show() 


## 3. Gender demographics

3.1. What is the breakdown between male and female players?

In [55]:
gender_share = df.groupby("Gender").agg({"SN": "nunique"}).reset_index()

gender_share["%"] = 100 * gender_share["SN"]/576

In [56]:
gender_share

Unnamed: 0,Gender,SN,%
0,Female,81,14.0625
1,Male,484,84.027778
2,Other / Non-Disclosed,11,1.909722


In [79]:
palette = ["rgb(255,228,225)", 'rgb(105,105,105)', 'black']

fig = go.Figure(data=[go.Pie(labels=gender_share["Gender"], 
                             values=gender_share["SN"], 
                             hole=.3,
                             marker_colors= palette)])

fig.update(layout_title_text='Share of players by Gender')


fig.show()


Nearly 85% of the players are Male, 14% are women and near 2% did not disclose their gender. So Heroes of Pymoli is more targeted to men.

## 4. How is the breakdown of IAPs by gender?

In [98]:
purchases_gender = (df.groupby("Gender")
                    .agg({"Item ID": "count", "Price": ["sum", "mean"]})
                    .rename(columns = {"Item ID": "no_purchases", "Price": "value_purchases"})).reset_index()
# drop multilevel
purchases_gender.columns=purchases_gender.columns.get_level_values(1)
#rename columns
purchases_gender.rename(columns= {"": "Gender", 
                                  "count": "purchases", 
                                  "sum":"value_purchases", 
                                  "mean":"average_purchase"},
                       inplace= True)
#sorting values
purchases_gender.sort_values("purchases", ascending = False, inplace= True)

In [143]:
purchases_gender

Unnamed: 0,Gender,purchases,value_purchases,average_purchase
1,Male,652,1967.64,3.017853
0,Female,113,361.94,3.203009
2,Other / Non-Disclosed,15,50.19,3.346


In [305]:
fig = make_subplots(rows=2, 
                    cols=1,
                    subplot_titles=("IAPs by gender", "Average IAP amount by gender"))

fig.add_trace(go.Bar(x=purchases_gender["Gender"], 
                             y=purchases_gender["value_purchases"], 
                             marker_color= ['rgb(105,105,105)', "rgb(255,228,225)", 'black']), 1,1)

fig.add_trace(go.Bar(x=purchases_gender["Gender"], 
                             y=purchases_gender["average_purchase"], 
                             marker_color= ['rgb(105,105,105)', "rgb(255,228,225)", 'black']), 2,1)

fig.update_yaxes(title_text="USD")

fig.update_layout(height=700, width=700,
                  title_text="IAP breakdown by gender",
                  showlegend=False)

fig.show()


From the plots above we observed that while most of the IAPs revenue come from male players, the average value per purchase is higher in female players and non-disclosed gender groups.

## 5. Age Demographics

First we will bucket our data by age groups:

In [203]:
buckets = [(i, i+4) for i in range(10, 40, 5)]

def bins(age):
    if age < 10:
        return "< 10"
    elif age >= 40:
        return "> 40"
    else:
        for i in range(6):
            if age in list(range(buckets[i][0], buckets[i][1]+1)):
                return buckets[i]   

In [204]:
df["Age_group"]= df.Age.apply(bins)

In [205]:
df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age_group
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,"(20, 24)"
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,> 40
2,2,Ithergue48,24,Male,92,Final Critic,4.88,"(20, 24)"
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,"(20, 24)"
4,4,Iskosia90,23,Male,131,Fury,1.44,"(20, 24)"


In [240]:
purchases_age = df.groupby("Age_group").agg({"Purchase ID":"count", 
                                             "Price": ["sum", "mean"]}).reset_index()

In [241]:
purchases_age.columns=purchases_age.columns.get_level_values(1)
#rename columns
purchases_age.rename(columns= {"": "Age", 
                               "count": "purchases", 
                               "sum":"value_purchases", 
                               "mean":"average_purchase"},
                     inplace= True)
#sorting values
#purchases_age.sort_values("purchases", ascending = False, inplace= True)

In [242]:
purchases_age

Unnamed: 0,Age,purchases,value_purchases,average_purchase
0,"(10, 14)",28,82.78,2.956429
1,"(15, 19)",136,412.89,3.035956
2,"(20, 24)",365,1114.06,3.052219
3,"(25, 29)",101,293.0,2.90099
4,"(30, 34)",73,214.0,2.931507
5,"(35, 39)",41,147.67,3.601707
6,< 10,23,77.13,3.353478
7,> 40,13,38.24,2.941538


In [245]:
idx = [i+1 for i in range(0,8)]
purchases_age.index = idx
idx[6] = 0
purchases_age.index = idx
purchases_age.sort_index(0, inplace= True)

In [261]:
fig = px.bar(purchases_age, 
             x= "Age", 
             y = "value_purchases",
             color = "purchases",
             color_continuous_scale = oslorev)

fig.update_layout(
    title_text = 'Heroes of Pymoli: IAP breakdown by age')

fig.show() 

The players who make the most of IAPs are between 20-24 years old. However, IAPs made by people between 35 and 39 years old are in average more profitable: 

In [290]:
fig = px.scatter(purchases_age, 
                 y="average_purchase", 
                 x="purchases", 
                 size="value_purchases",
                 #text= "Age",
                 color="Age",
                 hover_name="Age", 
                 size_max=40)

fig.add_annotation(
            x=200,
            y=3.3,
            text="hi")

fig.update_layout(
    title_text = 'Average IAP value vs. number of IAPs')


fig.show()


## 6. Top spenders

From this cohort who are the players which spend more?



In [315]:
spenders = df.groupby("SN").agg({"Purchase ID": "count", 
                      "Price": "sum"}).sort_values("Price", ascending= False).reset_index()

In [316]:
spenders["Purchase ID"].value_counts()

1    414
2    124
3     35
4      2
5      1
Name: Purchase ID, dtype: int64

In [317]:
spenders.head(5)

Unnamed: 0,SN,Purchase ID,Price
0,Lisosia93,5,18.96
1,Idastidru52,4,15.45
2,Chamjask73,3,13.83
3,Iral74,4,13.62
4,Iskadarya95,3,13.1


In [320]:
#spenders who did more than 1 purchase
(124+35+3)/576*100

28.125

Our top spender is Lisosia93 who did 5 transactions in this cohort with an amount spent of $18.96. 

# Key Insights

* Heroes of Pymoli have 576 paying players in this cohort from which 4 from 5 players are male. Eventhough the male players are responsible for most of the IAPs, the amount spent by female players or diverse gender is higher in average. 

* Most of the paying players are between 20-24 years old. While most of the IAPs revenue come from this age group, people whom are between 35 -39 years old spend more money in each transaction. Possibly the reason behind this is the more stable situation reached normally at this age. Beware!  There are some paying users with age under 10 years old making more expensive transactions than the most popular age group. Maybe we should revise the security in app-store or google-store. Also, is our game suitable for users under 10 years old?

* In this cohort we had 780 transactions and $2379.77 in revenue. Our top spender made 5 IAPs in this timeframe and 1 from 4 paying players makes more than 1 purchase. This number is great, reflecting the engagement our paying users have.