# Data Analysis on Heroes of Pymoli
----
## Conclusion


> Data analysis was performed on "Heroes of Pymoli" data set (purchase_data.json, purchase_data2.json) and following observations were made:

----
### Trend 1:
Percentage of female buyers are way less than males. It may be a good idea to focus on activities in game that are targeted towards female audience 

| Gender Demographics   | Percentage of Players | Total Count | 
|-----------------------|-----------------------|-------------| 
| Female                | 17.45%                | 100         | 
| Male                  | 81.15%                | 465         | 
| Other / Non-Disclosed | 1.40%                 | 8           | 


----
### Trend 2:
Older people of 40+ age are not buying as much and its probably due to marketing towards one set of group. That strategy needs to be evaluated.


| Age Demographics | Percentage of Players | Total Count | 
|------------------|-----------------------|-------------| 
| <10              | 3.32%                 | 19          | 
| 14-Oct           | 4.01%                 | 23          | 
| 15-19            | 17.45%                | 100         | 
| 20-24            | 45.20%                | 259         | 
| 25-29            | 15.18%                | 87          | 
| 30-34            | 8.20%                 | 47          | 
| 35-39            | 4.71%                 | 27          | 
| 40+              | 1.92%                 | 11          | 





----
### Trend 3:
'Betrayal, Whisper of Grieving Widows' is popular item but its not getting enough revenue compared to others. It may be a time to increase its price.

| Item ID | Item Name                            | Item Price | Total Purchase Value | Purchase Count | 
|---------|--------------------------------------|------------|----------------------|----------------| 
| 39      | Betrayal, Whisper of Grieving Widows | \$2.35      | \$25.85               | 11             | 
| 84      | Arcane Gem                           | \$2.23      | \$24.53               | 11             | 
| 31      | Trickster                            | \$2.07      | \$18.63               | 9              | 
| 175     | Woeful Adamantite Claymore           | \$1.24      | \$11.16               | 9              | 
| 13      | Serenity                             | \$1.49      | \$13.41               | 9              | 

Here is the list of most revenue making items.

| Item ID | Item Name                  | Item Price | Total Purchase Value | Purchase Count | 
|---------|----------------------------|------------|----------------------|----------------| 
| 34      | Retribution Axe            | \$4.14      | \$37.26               | 9              | 
| 115     | Spectral Diamond Doomblade | \$4.25      | \$29.75               | 7              | 
| 32      | Orenmir                    | \$4.95      | \$29.70               | 6              | 
| 103     | Singed Scalpel             | \$4.87      | \$29.22               | 6              | 
| 107     | Splitter, Foe Of Subtlety  | \$3.61      | \$28.88               | 8              | 


----
## Tasks completed:

**Player Count**

- Total Number of Players

**Purchasing Analysis (Total)**

- Number of Unique Items
- Average Purchase Price
- Total Number of Purchases
- Total Revenue

**Gender Demographics**

- Percentage and Count of Male Players
- Percentage and Count of Female Players
- Percentage and Count of Other / Non-Disclosed

**Purchasing Analysis (Gender)**

- The below each broken by gender
    - Purchase Count
    - Average Purchase Price
    - Total Purchase Value
    - Normalized Totals

**Age Demographics**

- The below each broken into bins of 4 years (i.e. &lt;10, 10-14, 15-19, etc.)
    - Purchase Count
    - Average Purchase Price
    - Total Purchase Value
    - Normalized Totals

**Top Spenders**

- Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
    - SN
    - Purchase Count
    - Average Purchase Price
    - Total Purchase Value

**Most Popular Items**

- Identify the 5 most popular items by purchase count, then list (in a table):
    - Item ID
    - Item Name
    - Purchase Count
    - Item Price
    - Total Purchase Value

**Most Profitable Items**

- Identify the 5 most profitable items by total purchase value, then list (in a table):
    - Item ID
    - Item Name
    - Purchase Count
    - Item Price
    - Total Purchase Value

---
## changelog
* 07-Feb-2018 


In [78]:
import numpy as np
import pandas as pd

In [79]:
df = pd.read_json('purchase_data.json')
#df = pd.read_json('purchase_data2.json')

In [80]:
#getting sense of data 
df.head()

Unnamed: 0,Age,Gender,Item ID,Item Name,Price,SN
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46
2,34,Male,174,Primitive Blade,2.46,Assastnya25
3,21,Male,92,Final Critic,1.36,Pheusrical25
4,23,Male,63,Stormfury Mace,1.27,Aela59


In [81]:
# Preparing data : Changing name of Gender column to Gender Demographics

df = df.rename(columns={'Gender' : 'Gender Demographics' })

df.head()

Unnamed: 0,Age,Gender Demographics,Item ID,Item Name,Price,SN
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46
2,34,Male,174,Primitive Blade,2.46,Assastnya25
3,21,Male,92,Final Critic,1.36,Pheusrical25
4,23,Male,63,Stormfury Mace,1.27,Aela59


In [82]:
# Preparing data continue --Adding Age Demographics group 
# Dynamic bin calculation

my_source_series = df['Age']
subtract_lable_display=True
bin_precison = 5
upper_bin = int(bin_precison*np.ceil(my_source_series.max()/bin_precison)) 
lower_bin = int(bin_precison*np.floor(my_source_series.min()/bin_precison)) 


myList = []
i = lower_bin
while i <= upper_bin:
    myList.append(i)
    i += bin_precison 


myLabelList = []
myLabelList.append(myList[0])

if subtract_lable_display==True:
    upper_subtract =1
else :
    upper_subtract = 0


j = 1
while j < len(myList):
    mylabel = str(myList[j-1]) + '-' +str( myList[j]-upper_subtract)
    myLabelList.append(mylabel)
    j += 1
myLabelList.pop(0)



#------custom upper bound and lower bound
lowest_list = myLabelList[0].split('-')
myLabelList[0] = '<' + str(int(lowest_list[1])+upper_subtract)

highest_list = myLabelList[len(myLabelList)-1].split('-')
myLabelList[len(myLabelList)-1] = highest_list[0] + '+'
#----------------------------------


df['Age Demographics'] = pd.cut(my_source_series,myList, labels = myLabelList,precision=0,include_lowest=True, right= False)

# upper limit correction
df.loc[(df['Age'] == upper_bin), ['Age Demographics'] ] = myLabelList[len(myLabelList)-1]


df.head()

Unnamed: 0,Age,Gender Demographics,Item ID,Item Name,Price,SN,Age Demographics
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34,35-39
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46,20-24
2,34,Male,174,Primitive Blade,2.46,Assastnya25,30-34
3,21,Male,92,Final Critic,1.36,Pheusrical25,20-24
4,23,Male,63,Stormfury Mace,1.27,Aela59,20-24


In [83]:
#create seperate personal data frame ---normalizing data 
result_list = ['SN','Age','Gender Demographics','Age Demographics']
df_personal = df.loc[:,result_list].drop_duplicates()
df_personal.head()

Unnamed: 0,SN,Age,Gender Demographics,Age Demographics
0,Aelalis34,38,Male,35-39
1,Eolo46,21,Male,20-24
2,Assastnya25,34,Male,30-34
3,Pheusrical25,21,Male,20-24
4,Aela59,23,Male,20-24


In [84]:
#Finding total counts of players #Result

total_players = df_personal['SN'].count()
Result_PlayerCount = pd.DataFrame([{'Total Players' : total_players}])         
Result_PlayerCount

Unnamed: 0,Total Players
0,573


In [85]:
# Run Aggregation on purchasing data 

agg_dict = { 'Item ID': ['nunique', 'count'] ,'Price': ['mean', 'sum' ] }
df_PurchasingAnalysis = df.agg(agg_dict)

df_PurchasingAnalysis

Unnamed: 0,Item ID,Price
count,780.0,
mean,,2.931192
nunique,183.0,
sum,,2286.33


In [86]:
# Purchasing Analysis (Total) #result

d1 =  {'Number of Unique Items' : df_PurchasingAnalysis.iloc[2,0 ],  
       'Average Price' : df_PurchasingAnalysis.iloc[1,1 ], 
       'Number of Purchases' : df_PurchasingAnalysis.iloc[0,0 ] , 
       'Total Revenue' : df_PurchasingAnalysis.iloc[3,1 ]
      }

Result_PurchasingAnalysis= pd.DataFrame([d1])

Result_PurchasingAnalysis["Total Revenue"] = Result_PurchasingAnalysis["Total Revenue"].map("$ {:,.2f}".format)
Result_PurchasingAnalysis["Average Price"] = Result_PurchasingAnalysis["Average Price"].map("$ {:,.2f}".format)
Result_PurchasingAnalysis





Unnamed: 0,Average Price,Number of Purchases,Number of Unique Items,Total Revenue
0,$ 2.93,780.0,183.0,"$ 2,286.33"


In [87]:
# Gender Demographics #result

agg_dict = { 'SN': ['count'] }
groupbytype_GenderDemographics = df_personal.groupby(['Gender Demographics'])
Result_GenderDemographics = groupbytype_GenderDemographics.agg(agg_dict)
Result_GenderDemographics.columns = ["_".join(x) for x in Result_GenderDemographics.columns.ravel()]

Result_GenderDemographics = Result_GenderDemographics.rename(columns={'SN_count' : 'Percentage of Players' })
Result_GenderDemographics['Total Count'] = Result_GenderDemographics['Percentage of Players']
Result_GenderDemographics['Percentage of Players'] = round(100*Result_GenderDemographics['Percentage of Players']/total_players,2)

Result_GenderDemographics["Percentage of Players"] = Result_GenderDemographics["Percentage of Players"].map("{0:,.2f}%".format)

Result_GenderDemographics

Unnamed: 0_level_0,Percentage of Players,Total Count
Gender Demographics,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,17.45%,100
Male,81.15%,465
Other / Non-Disclosed,1.40%,8


In [88]:
# Purchasing Analysis (Gender) #result
agg_dict = { 'SN': ['nunique'],'Price': ['mean', 'sum' ,'count']  }
groupbytype_GenderDemographics1 = df.groupby(['Gender Demographics'])
Result_GenderDemographics1 = groupbytype_GenderDemographics1.agg(agg_dict)
Result_GenderDemographics1.columns = ["_".join(x) for x in Result_GenderDemographics1.columns.ravel()]
Result_GenderDemographics1 = Result_GenderDemographics1.rename(columns={'SN_nunique' : 'Number of Players' , 'Price_count' : 'Purchase Count','Price_sum' : 'Total Purchase Value','Price_mean' : 'Average Purchase Price' })
Result_GenderDemographics1['Normalized Total'] = round(Result_GenderDemographics1['Total Purchase Value']/Result_GenderDemographics1['Number of Players'],2)

Result_GenderDemographics1.drop('Number of Players', axis=1 ,  inplace=True)

Result_GenderDemographics1["Total Purchase Value"] = Result_GenderDemographics1["Total Purchase Value"].map("$ {:,.2f}".format)
Result_GenderDemographics1["Average Purchase Price"] = Result_GenderDemographics1["Average Purchase Price"].map("$ {:,.2f}".format)
Result_GenderDemographics1["Normalized Total"] = Result_GenderDemographics1["Normalized Total"].map("$ {:,.2f}".format)


Result_GenderDemographics1




Unnamed: 0_level_0,Average Purchase Price,Total Purchase Value,Purchase Count,Normalized Total
Gender Demographics,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,$ 2.82,$ 382.91,136,$ 3.83
Male,$ 2.95,"$ 1,867.68",633,$ 4.02
Other / Non-Disclosed,$ 3.25,$ 35.74,11,$ 4.47


In [89]:
# Age Demographics #result

agg_dict = { 'SN': ['count'] }
groupbytype_AgeDemographics = df_personal.groupby(['Age Demographics'])
Result_AgeDemographics = groupbytype_AgeDemographics.agg(agg_dict)
Result_AgeDemographics.columns = ["_".join(x) for x in Result_AgeDemographics.columns.ravel()]

Result_AgeDemographics = Result_AgeDemographics.rename(columns={'SN_count' : 'Percentage of Players' })
Result_AgeDemographics['Total Count'] = Result_AgeDemographics['Percentage of Players']
Result_AgeDemographics['Percentage of Players'] = round(100*Result_AgeDemographics['Percentage of Players']/total_players,2)

Result_AgeDemographics["Percentage of Players"] = Result_AgeDemographics["Percentage of Players"].map("{0:,.2f}%".format)


Result_AgeDemographics

Unnamed: 0_level_0,Percentage of Players,Total Count
Age Demographics,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,3.32%,19
10-14,4.01%,23
15-19,17.45%,100
20-24,45.20%,259
25-29,15.18%,87
30-34,8.20%,47
35-39,4.71%,27
40+,1.92%,11


In [90]:
# Purchasing Analysis (Age) #result
agg_dict = { 'SN': ['nunique'],'Price': ['mean', 'sum' ,'count']  }
groupbytype_AgeDemographics1 = df.groupby(['Age Demographics'])
Result_AgeDemographics1 = groupbytype_AgeDemographics1.agg(agg_dict)
Result_AgeDemographics1.columns = ["_".join(x) for x in Result_AgeDemographics1.columns.ravel()]
Result_AgeDemographics1 = Result_AgeDemographics1.rename(columns={'SN_nunique' : 'Number of Players' , 'Price_count' : 'Purchase Count','Price_sum' : 'Total Purchase Value','Price_mean' : 'Average Purchase Price' })
Result_AgeDemographics1['Normalized Total'] = round(Result_AgeDemographics1['Total Purchase Value']/Result_AgeDemographics1['Number of Players'],2)

Result_AgeDemographics1.drop('Number of Players', axis=1 ,  inplace=True)

Result_AgeDemographics1["Total Purchase Value"] = Result_AgeDemographics1["Total Purchase Value"].map("$ {:,.2f}".format)
Result_AgeDemographics1["Average Purchase Price"] = Result_AgeDemographics1["Average Purchase Price"].map("$ {:,.2f}".format)
Result_AgeDemographics1["Normalized Total"] = Result_AgeDemographics1["Normalized Total"].map("$ {:,.2f}".format)



Result_AgeDemographics1






Unnamed: 0_level_0,Average Purchase Price,Total Purchase Value,Purchase Count,Normalized Total
Age Demographics,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,$ 2.98,$ 83.46,28,$ 4.39
10-14,$ 2.77,$ 96.95,35,$ 4.22
15-19,$ 2.91,$ 386.42,133,$ 3.86
20-24,$ 2.91,$ 978.77,336,$ 3.78
25-29,$ 2.96,$ 370.33,125,$ 4.26
30-34,$ 3.08,$ 197.25,64,$ 4.20
35-39,$ 2.84,$ 119.40,42,$ 4.42
40+,$ 3.16,$ 53.75,17,$ 4.89


In [91]:
# Top Spenders (SN) #result
agg_dict = { 'SN': ['nunique'],'Price': ['mean', 'sum' ,'count']  }
groupbytype_SNDemographics1 = df.groupby(['SN'])
Result_SNDemographics1 = groupbytype_SNDemographics1.agg(agg_dict)
Result_SNDemographics1.columns = ["_".join(x) for x in Result_SNDemographics1.columns.ravel()]
Result_SNDemographics1 = Result_SNDemographics1.rename(columns={'SN_nunique' : 'Number of Players' , 'Price_count' : 'Purchase Count','Price_sum' : 'Total Purchase Value','Price_mean' : 'Average Purchase Price' })

Result_SNDemographics1.drop('Number of Players', axis=1 ,  inplace=True)

Result_SNDemographicssorted=Result_SNDemographics1.sort_values(['Total Purchase Value'], ascending= False).head(5)

Result_SNDemographicssorted["Total Purchase Value"] = Result_SNDemographicssorted["Total Purchase Value"].map("$ {:,.2f}".format)
Result_SNDemographicssorted["Average Purchase Price"] = Result_SNDemographicssorted["Average Purchase Price"].map("$ {:,.2f}".format)

Result_SNDemographicssorted





Unnamed: 0_level_0,Average Purchase Price,Total Purchase Value,Purchase Count
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Undirrala66,$ 3.41,$ 17.06,5
Saedue76,$ 3.39,$ 13.56,4
Mindimnya67,$ 3.18,$ 12.74,4
Haellysu29,$ 4.24,$ 12.73,3
Eoda93,$ 3.86,$ 11.58,3


In [92]:
# Most Popular Items #result

agg_dict = {'Price': ['max', 'sum' ,'count']    }
groupbytype_ItemDemographics1= df.groupby(['Item ID','Item Name'])
Result_ItemDemographics1= groupbytype_ItemDemographics1.agg(agg_dict) 
Result_ItemDemographics1.columns = ["_".join(x) for x in Result_ItemDemographics1.columns.ravel()]

Result_ItemDemographics1 = Result_ItemDemographics1.rename(columns={'Price_count' : 'Purchase Count','Price_sum' : 'Total Purchase Value','Price_max' : 'Item Price' })
Result_ItemDemographics1['Total Purchase Value'] = Result_ItemDemographics1['Purchase Count'] * Result_ItemDemographics1['Item Price']

Result_ItemDemographicssorted=Result_ItemDemographics1.sort_values(['Purchase Count'], ascending= False).head(5)

Result_ItemDemographicssorted["Total Purchase Value"] = Result_ItemDemographicssorted["Total Purchase Value"].map("$ {:,.2f}".format)
Result_ItemDemographicssorted["Item Price"] = Result_ItemDemographicssorted["Item Price"].map("$ {:,.2f}".format)

Result_ItemDemographicssorted

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Price,Total Purchase Value,Purchase Count
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
39,"Betrayal, Whisper of Grieving Widows",$ 2.35,$ 25.85,11
84,Arcane Gem,$ 2.23,$ 24.53,11
31,Trickster,$ 2.07,$ 18.63,9
175,Woeful Adamantite Claymore,$ 1.24,$ 11.16,9
13,Serenity,$ 1.49,$ 13.41,9


In [93]:
# Most Profitable Items #result

Result_ItemDemographicssorted2=Result_ItemDemographics1.sort_values(['Total Purchase Value'], ascending= False).head(5)

Result_ItemDemographicssorted2["Total Purchase Value"] = Result_ItemDemographicssorted2["Total Purchase Value"].map("$ {:,.2f}".format)
Result_ItemDemographicssorted2["Item Price"] = Result_ItemDemographicssorted2["Item Price"].map("$ {:,.2f}".format)

Result_ItemDemographicssorted2

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Price,Total Purchase Value,Purchase Count
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34,Retribution Axe,$ 4.14,$ 37.26,9
115,Spectral Diamond Doomblade,$ 4.25,$ 29.75,7
32,Orenmir,$ 4.95,$ 29.70,6
103,Singed Scalpel,$ 4.87,$ 29.22,6
107,"Splitter, Foe Of Subtlety",$ 3.61,$ 28.88,8
