## Heroes of Pymoli Data Analysis

###### On Gender Ratio
- Of the 576 players, about 84% are Male and 14% are Female (with 2% undisclosed). 

- Female players spend about 18 cents more on average. Further research should go into considering if appealing to growing a female playerbase may return a higher profit on spent items.

###### Comparing Age and Gender Demographics
- We could test the above points on the gender ratio by looking at our largest 20-24 demographic. On a deeper dive, we could try to find out what the gender ratio is for that demographic to see if females spent significantly more on average, or if they're even the majority of that demographic. 

- It's definitely not safe to immediately assume that the largest gender demographic would be the majority in the largest age demographic. So it's definitely worth looking into the gender ratio of spenders per age demographic to get another perspective on the dataset.

##### The 15-19 and 25-29 Age Demographics
- It is interesting that the (most likely) non-working 15-19 demographic spends more than the working 25-29 demographic. We could assume that the 20-24 demographic in between them spends the most simply because there are so many of them. However, the amount of players one standard deviation away from the peak demographic are relatively the same. 
- For their relative similarity, the 25-29 demographic that should be working and therefore have money to spend on themselves spend a lot less than 15-19 year olds who either work on teenage pay rates or their parents' credit cards.
- Therefore, while getting a deeper perspective on these two demographics may not necessarily help our Pymoli analysis in and of itself, it could help with other projects. If there are other similiar spending datasets that use similar demographics we could make analyses on the spending habits of these demographics to see if we can come to any good conclusions, especially on seeing if we can get the 20-24 demographic to spend more.


###### On Items Analysis
- The game's most valuable item is "Oathbreaker, Last Hope of the Breaking Storm", which has the most sales and the most purchases throughout the whole playerbase.

- It's notable that the average sale price is around \$3, so it is interesting that our two top 5 item lists happen to show us mostly items that are over \$.

- However, it's not safe to assume that items over \$4 are the most popular, as "Pursuit, Cudgel of Necromancy" sells for \$1.02 and is one of the top 5 most popular items.
   
- It's nice to see which of our items had the most sales or purchases, but we could use more data related to these items if we really want to pinpoint on sales increases for items. Item use rates by both player count and player time could be valuable metrics, for example. 

###### Conclusions
- Look into ways Heroes of Pymoli or similar future projects can further appeal to the female playerbase while maintaining its male playerbase.
- We could try to find ways to compare the age and gender demographics to come to conclusions on things like which gender spent the most per age group to get a perspective on how we can appeal to the female playerbase.
- Take note on the 25-29 young adult working demographic seems to spend less on average than the two school-aged demographics that come before it and see if it can be used to come to conclusions on other data analyses for similar games.
- Find ways to get a better perspective on why the top-selling and most popular items are so popular so that future items can be just as appealing or better. Or see if any of that is adversely affecting the game experience and therefore scaring money away from the game.

In [1]:
#import Dependencies
import pandas as pd

In [2]:
#file
file = "Resources/purchase_data.csv"

In [3]:
#read file with pandas.
df = pd.read_csv(file)
# df.head()

In [4]:
#all columns
df.columns

Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')

In [5]:
#identify any incomplete rows
# df.count()

In [6]:
#check for multiple SNs for Player Count
# df['SN'].value_counts().head()

## Player Count

In [7]:
#sum unique values in SN column
players = len(df["SN"].unique())
player_count = pd.DataFrame({"Player Count": players}, index=[0])
player_count

Unnamed: 0,Player Count
0,576


## Purchasing Analysis (total)

In [8]:
#create values for 'Number of Unique Items', 'Average Price', 'Number of Purchases', 'Total Revenue'
uniqueitems = len(df['Item ID'].unique())
averageprice = df['Price'].mean()
purchases = len(df['Item ID'])
revenue = df['Price'].sum()

#create new dataframe with those four columns
purchase_analysis = pd.DataFrame({"Number of Unique Items": uniqueitems,
                                 "Average Price": averageprice,
                                 "Number of Purchases": purchases,
                                 "Total Revenue": revenue}, index=[0])
purchase_analysis["Average Price"] = purchase_analysis["Average Price"].map("${:,.2f}".format)
purchase_analysis

Unnamed: 0,Average Price,Number of Purchases,Number of Unique Items,Total Revenue
0,$3.05,780,183,2379.77


## Gender Demographics

In [9]:
#create dataframe with no duplicate SNs
df_unique = df.drop_duplicates(['SN'])

In [10]:
#retrieve unique values in the gender column
df_unique['Gender'].value_counts()

Male                     484
Female                    81
Other / Non-Disclosed     11
Name: Gender, dtype: int64

In [11]:
#use groupby to manipulate dataframe
gender_group = df_unique.groupby(['Gender'])

gendercount = gender_group["Gender"].count()
genderpercent = (gendercount * 100) / players

gender_table = pd.DataFrame({'Percentage of Players': genderpercent.map("{:,.2f}".format),
                            'Total Count': gendercount})
gender_table

Unnamed: 0_level_0,Percentage of Players,Total Count
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,14.06,81
Male,84.03,484
Other / Non-Disclosed,1.91,11


## Purchasing Analysis (Gender)

In [30]:
#group original df by gender
gender_purchase = df.groupby(['Gender'])
# gender_purchase.head(3)

In [13]:
#sort the following: "Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"
genpurchase = gender_purchase["Gender"].count()
genavg = gender_purchase["Price"].mean()
gentot = gender_purchase["Price"].sum()
#have no clue what they mean by normalized totals
gennorm = gentot / genpurchase

gender_analysis = pd.DataFrame({"Purchase Count": genpurchase,
                               "Average Purchase Price": genavg.map("${:,.2f}".format),
                               "Total Purchase Value": gentot.map("${:,.2f}".format),
                               "Normalized Totals": gennorm.map("${:,.2f}".format)})
gender_analysis

Unnamed: 0_level_0,Average Purchase Price,Normalized Totals,Purchase Count,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,$3.20,$3.20,113,$361.94
Male,$3.02,$3.02,652,"$1,967.64"
Other / Non-Disclosed,$3.35,$3.35,15,$50.19


## Age Demographics

In [26]:
#create bin of demographics
#this is my old bin before i found out you guys changed the Resource files on me and gave a precise example.
#i spent time on this before we were given an example!!!
# bins = [0, 9, 14, 19, 24, 29, 34, 39, 9999]
bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999] 

#so much time calculating just to be given an example a day later...
# group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

In [27]:
 #use unique df to groupby ages
df_unique["Age Group"] = pd.cut(df_unique["Age"], bins, labels=group_names)
# df_unique.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [28]:
age_group = df_unique.groupby(['Age Group'])

agecount = age_group['Age Group'].count()
agepercent = (agecount * 100) / players

age_table = pd.DataFrame({'Percentage of Players': agepercent.map("{:,.2f}".format),
    'Total Count': agecount})
age_table

Unnamed: 0_level_0,Percentage of Players,Total Count
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,2.95,17
10-14,3.82,22
15-19,18.58,107
20-24,44.79,258
25-29,13.37,77
30-34,9.03,52
35-39,5.38,31
40+,2.08,12


## Purchasing Analysis (Age)

In [17]:
#throw the bin on the unfiltered dataframe
df["Age Group"] = pd.cut(df["Age"], bins, labels=group_names)
# df.head()
#groupby age group
age_purchase = df.groupby(['Age Group'])
# age_purchase.head()

In [29]:
#sort the following: "Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"
agepurch = age_purchase["Age Group"].count()
ageavg = age_purchase["Price"].mean()
agetot = age_purchase["Price"].sum()
#still have no clue what they mean by normalized totals
agenorm = agetot / agepurch

age_analysis = pd.DataFrame({"Purchase Count": agepurch,
                               "Average Purchase Price": ageavg.map("${:,.2f}".format),
                               "Total Purchase Value": agetot.map("${:,.2f}".format),
                               "Normalized Totals": agenorm.map("${:,.2f}".format)})
age_analysis

Unnamed: 0_level_0,Average Purchase Price,Normalized Totals,Purchase Count,Total Purchase Value
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,$3.35,$3.35,23,$77.13
10-14,$2.96,$2.96,28,$82.78
15-19,$3.04,$3.04,136,$412.89
20-24,$3.05,$3.05,365,"$1,114.06"
25-29,$2.90,$2.90,101,$293.00
30-34,$2.93,$2.93,73,$214.00
35-39,$3.60,$3.60,41,$147.67
40+,$2.94,$2.94,13,$38.24


## Top Spenders

In [19]:
#group by SN
spenders = df.groupby(['SN'])

userspurch = spenders["SN"].count()
usersavg = spenders["Price"].mean()
userstot = spenders["Price"].sum()

top_spenders = pd.DataFrame({"Purchase Count": userspurch,
                               "Average Purchase Price": usersavg.map("${:,.2f}".format),
                               "Total Purchase Value": userstot.map("${:,.2f}".format)}).sort_values(["Purchase Count"], ascending = False)
top_spenders.head()

Unnamed: 0_level_0,Average Purchase Price,Purchase Count,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,$3.79,5,$18.96
Iral74,$3.40,4,$13.62
Idastidru52,$3.86,4,$15.45
Asur53,$2.48,3,$7.44
Inguron55,$3.70,3,$11.11


## Most Popular Items

In [20]:
#create the item list
favitems = df.groupby(['Item ID','Item Name'])
# favitems.head()

In [21]:
favpurch = favitems['Price'].count()
#surely there's a better way to display price than making it retrieve the mean for me? i guess if it works
favprice = favitems['Price'].mean()
favtot = favitems['Price'].sum()

itemsales = pd.DataFrame({"Purchase Count": favpurch,
                            "Item Price": favprice.map("${:,.2f}".format),
                             "Total Purchase Value": favtot})

popular_items = itemsales.sort_values("Purchase Count", ascending=False)
#format Total Purchase Value here to avoid later sorting errors
popular_items["Total Purchase Value"] = popular_items["Total Purchase Value"].map("${:,.2f}".format)

popular_items.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Price,Purchase Count,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",$4.23,12,$50.76
145,Fiery Glass Crusader,$4.58,9,$41.22
108,"Extraction, Quickblade Of Trembling Hands",$3.53,9,$31.77
82,Nirvana,$4.90,9,$44.10
19,"Pursuit, Cudgel of Necromancy",$1.02,8,$8.16


## Most Profitable Items

In [22]:
#double checking that total purchase value is still a numeric value
itemsales.dtypes

Item Price               object
Purchase Count            int64
Total Purchase Value    float64
dtype: object

In [23]:
profit_items = itemsales.sort_values("Total Purchase Value", ascending=False)
#format Total Purchase Value here to avoid later sorting errors
profit_items["Total Purchase Value"] = profit_items["Total Purchase Value"].map("${:,.2f}".format)
profit_items.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Price,Purchase Count,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",$4.23,12,$50.76
82,Nirvana,$4.90,9,$44.10
145,Fiery Glass Crusader,$4.58,9,$41.22
92,Final Critic,$4.88,8,$39.04
103,Singed Scalpel,$4.35,8,$34.80
