### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [77]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# Raw data file
file_to_load = "Resources/purchase_data.csv"

# Read purchasing file and store into pandas data frame
pdata = pd.read_csv(file_to_load)
pdata.dtypes

Purchase ID      int64
SN              object
Age              int64
Gender          object
Item ID          int64
Item Name       object
Price          float64
dtype: object

## Player Count

* Display the total number of players


In [78]:
uni=pdata.groupby(["SN"])
uni =len(uni.count())   #length of unique names, get total number of players

total_player_num=uni
total_player=pd.DataFrame({"Total Player":[uni]})
print(total_player)

uni_name=pdata["SN"].unique()   #get unique names
uni_names=pd.DataFrame({"Player names : ": uni_name})
uni_names.head()
print(uni_names)   #print the names of players

   Total Player
0           576
    Player names : 
0           Lisim78
1       Lisovynya38
2        Ithergue48
3     Chamassasya86
4         Iskosia90
5           Yalae81
6         Itheria73
7       Iskjaskst81
8         Undjask33
9       Chanosian48
10        Inguron55
11     Haisrisuir60
12     Saelaephos52
13      Assjaskan73
14      Saesrideu94
15        Lisassa64
16        Lisirra25
17        Zontibe81
18        Reunasu60
19        Chamalo71
20     Iathenudil29
21    Phiarithdeu40
22     Siarithria38
23         Eyrian71
24          Siala43
25        Lisirra87
26       Lirtossa84
27          Eusri44
28           Aela59
29          Tyida79
..              ...
546    Chanosiaya39
547       Assylla81
548       Aidaira26
549        Eudanu84
550      Chamiman85
551     Tyialisti80
552       Marundi65
553         Eusur90
554   Mindirranya33
555    Phiallylis33
556          Isty55
557     Frichilsa31
558      Chanista95
559      Aellyria80
560   Rastynusuir31
561        Iljask75
562     

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [79]:
purchase= pdata.groupby(["Item ID"])
purchase=len(purchase.count())

price= pdata["Price"].mean()
price=np.round(price,decimals=2)

purchase_number=pdata.groupby(["Purchase ID"])
purchase_number=len(purchase_number.count())

revenue=pdata["Price"].sum()


total_analysis=pd.DataFrame({"Number of Unique Items":[purchase],
                             "Average Price":[price],
                             "Number of Purchase":[purchase_number],
                             "Total Revenue":revenue})
total_analysis
print(total_analysis)

   Number of Unique Items  Average Price  Number of Purchase  Total Revenue
0                     183           3.05                 780        2379.77


* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [80]:
gender=pdata["Gender"].value_counts()
#gender=gender/total_player

total_count=pd.DataFrame({"Total Counts":[gender[0],gender[1],gender[2]],"Percentage of Players":[(gender[0]/total_player_num)*100,(gender[1]/total_player_num)*100,(gender[2]/total_player_num)*100]})
total_count=total_count.set_index([["Male","Famale","Other / Non-Disclosed"]])
total_count
print(total_count)




##percent_player=pd.DataFrame({"Percentage of Players":[(gender[0]/total_player_num)*100,(gender[1]/total_player_num)*100,(gender[2]/total_player_num)*100]})
##percent_player=percent_player.set_index([["Male","Famale","Other / Non-Disclosed"]])
##percent_player

##merged=total_count.merge(percent_player,on=("Total Counts", "Percentage Of Players"))
##merged

                       Total Counts  Percentage of Players
Male                            652             113.194444
Famale                          113              19.618056
Other / Non-Disclosed            15               2.604167



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [81]:
#test=pdata[["Gender","Purchase ID"]]
#test=test.set_index(["Gender"])
#test=test.groupby(["Purchase ID"])
#test=test["Gender"].value_counts()


test=pdata.groupby(["Gender"])
    ##test=test.count()
price_mean=test["Price"].mean()
price_mean=np.round(price_mean,decimals=2)
    ##gender, price_mean
price_sum=test["Price"].sum()
price_sum=np.round(price_sum,decimals=2)

p_analysis=pd.DataFrame({"Purchase Count":[gender[0],gender[1],gender[2]],
                         "Average Purchase Price":["$ "+str(price_mean[0]),"$ "+str(price_mean[1]),"$ "+str(price_mean[2])],
                         "Total Purchase Value":["$ "+str(price_sum[0]),"$ "+str(price_sum[1]),"$ "+str(price_sum[2])],
                         "Avg Purchase Total per Person":["$ "+str(price_mean[0]),"$ "+str(price_mean[1]),"$ "+str(price_mean[2])]
                        })
p_analysis=p_analysis.set_index([["Male","Famale","Other / Non-Disclosed"]])
p_analysis
print(p_analysis)
##test = pdata.set_index("Gender")
##test.head()


                       Purchase Count Average Purchase Price  \
Male                              652                  $ 3.2   
Famale                            113                 $ 3.02   
Other / Non-Disclosed              15                 $ 3.35   

                      Total Purchase Value Avg Purchase Total per Person  
Male                              $ 361.94                         $ 3.2  
Famale                           $ 1967.64                        $ 3.02  
Other / Non-Disclosed              $ 50.19                        $ 3.35  


* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [82]:
# Establish bins for ages
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]


showage=pd.DataFrame(pdata)#"range":group_names})  #get dataframe from original data
showage["Range"]=pd.cut(pdata["Age"],age_bins,labels=group_names) #make new column to for the bin
ageall=showage
showage=showage[["Range","Age"]]  #just need age and total count
showage=showage.rename(columns={"Age":"Total Count"}) #name correct

#showage
showage=showage.groupby("Range").count() #get the result
showage["percent"]=showage["Total Count"]/576*100
showage["percent"]=np.round(showage["percent"],decimals=2)
showage
print(showage)



       Total Count  percent
Range                      
<10             23     3.99
10-14           28     4.86
15-19          136    23.61
20-24          365    63.37
25-29          101    17.53
30-34           73    12.67
35-39           41     7.12
40+             13     2.26


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [83]:
#ageana=showage["Total Count"]
ageana=ageall[["Range","Price"]]
ageana=ageana.groupby(["Range"]).sum()
#ageana=ageana["Price"]
ageana["Purchase Count"]=showage["Total Count"]
ageana["Average Purchase per Person"]= ageana["Price"]/ageana["Purchase Count"]
ageana["Average Purchase per Person"]= np.round(ageana["Average Purchase per Person"], decimals=2)
ageana["Average Purchase per Person"]= "$"+ ageana["Average Purchase per Person"].astype(str)
ageana["Price"]= np.round(ageana["Price"],decimals=2)
ageana["Price"]= '$'+ageana["Price"].astype(str)
ageana=ageana.rename(columns={"Price":"Total Purchase Price"})
ageana["Average Purchase Price"]=ageana["Average Purchase per Person"]
ageana
print(ageana)

      Total Purchase Price  Purchase Count Average Purchase per Person  \
Range                                                                    
<10                 $77.13              23                       $3.35   
10-14               $82.78              28                       $2.96   
15-19              $412.89             136                       $3.04   
20-24             $1114.06             365                       $3.05   
25-29               $293.0             101                        $2.9   
30-34               $214.0              73                       $2.93   
35-39              $147.67              41                        $3.6   
40+                 $38.24              13                       $2.94   

      Average Purchase Price  
Range                         
<10                    $3.35  
10-14                  $2.96  
15-19                  $3.04  
20-24                  $3.05  
25-29                   $2.9  
30-34                  $2.93  
35-39      

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [84]:
spenddata=pd.DataFrame(pdata)

#spenddata=spenddata[["SN","Price"]]
spenddata=spenddata.groupby("SN")

spenddata_count=spenddata.count()
spenddata_sum=spenddata.sum()


spend_sum=spenddata_sum.nlargest(5,"Price")
spend_result=pd.DataFrame(spend_sum)
#get total and round it
spend_result["Total Purchase Value"]=spend_sum["Price"]
spend_result["Total Purchase Value"]=np.round(spend_result["Total Purchase Value"], decimals=2)
spend_result=spend_result.drop(columns=["Purchase ID","Age","Item ID"])#get column we needs

#get count
spcount=spenddata_count.loc[["Lisosia93","Idastidru52","Chamjask73","Iral74","Iskadarya95"],["Price","Age"]] # get counts

spend_result["Purchase Count"]=spcount["Price"]


#get average and round it
spend_result["Average Purchase Price"]=spend_result["Total Purchase Value"]/spend_result["Purchase Count"]
spend_result["Average Purchase Price"]=np.round(spend_result["Average Purchase Price"], decimals=2)

#remove not needed column
spend_result=spend_result.drop(columns="Price")

# add $
spend_result["Average Purchase Price"]="$"+spend_result["Average Purchase Price"].astype(str)
spend_result["Total Purchase Value"]= "$"+spend_result["Total Purchase Value"].astype(str)
spend_result
print(spend_result)



            Total Purchase Value  Purchase Count Average Purchase Price
SN                                                                     
Lisosia93                 $18.96               5                  $3.79
Idastidru52               $15.45               4                  $3.86
Chamjask73                $13.83               3                  $4.61
Iral74                    $13.62               4                   $3.4
Iskadarya95                $13.1               3                  $4.37


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

