# Potential Player
## Objective
Identify users not playing any Nexon game, but have high likelihood of playing one of Nexon's game

## Methodology
Nexon has 3 games, each with its own player profile. We will try to identify players sharing similar profile, such as purchasing and gaming behavior.  
The assumption is that if, for example, most MapleStory players only spend about 30 hours playing a week, it could mean that the nature of the game or the gaming community is suitable for people who can devote 30 hours a week playing game. Thus, people who do not spend around the same time are likely not suitable for MapleStory and therefore have less likelihood of playing MapleStory.  
Here we will create user profile (features) and build binary classification model to see the likelihood of players for playing each Nexon game (MapltStory, Mabinogi, and Vindictus). Our target here are players who have high probability but had not play at Nexon game before.

## Steps
1. Label games with relevant genre to slim down number of available features
2. feature genaration such as purchase genre Y, play time for genre X, % time playing genre X, # of games played, ...etc.
3. model building

## Brainstorm/Notes
1. look into people who play low-res games (games with earlier published days) as they may have hardware limtations
2. look into people who play exclusively MMORPG

## Obsticles
1. There's no timestamp for time information. 
    * Unable to distinguish between players who play lots of hours in short period of time vs one who plays moderate amount but over a long horizon.
    * Unable to tell if certain game was played recently and if a player has moved onto to a different genre of games

In [1]:
import pandas as pd
import utility

In [2]:
# read data
df = pd.read_csv('data/steam-200k.csv', header=None)\
       .rename(columns={0:'id', 1:'game', 2:'action', 3:'value'})\
       .iloc[:, :4] # drop last column
df = df.drop_duplicates()
df.head(3)

Unnamed: 0,id,game,action,value
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0
1,151603712,The Elder Scrolls V Skyrim,play,273.0
2,151603712,Fallout 4,purchase,1.0


## EDA Phase 1
Getting familiar with dataset without any specific direction

In [3]:
# actual EDA code are in utility.py
# calling eda_phase1 function here to make notebook more readable
utility.eda_phase1(df)

Number of records:          199293
Number of unique User ID:   12393
Number of games:            5155

	Action Count: 
purchase    128804
play         70489
Name: action, dtype: int64

Number of Purchasing Player: 11350
% of Purchasing Player: 92.0%

	Summary Statistics on Play Time:
count    70489.000000
mean        48.878063
std        229.335236
min          0.100000
25%          1.000000
50%          4.500000
75%         19.100000
max      11754.000000
Name: value, dtype: float64

	Summary Statistics on Number of Games Played:
count    70477.000000
mean        56.972388
std         73.658005
min          1.000000
25%          8.000000
50%         30.000000
75%         78.000000
max        498.000000
Name: game_freq, dtype: float64

	Summary Statistics on Number of Games Purchase:
count    128804.000000
mean        132.569672
std         182.063415
min           1.000000
25%          15.000000
50%          61.000000
75%         164.000000
max        1068.000000
Name: purch_freq, dty