<a href="https://colab.research.google.com/github/mhdSharuk/Data-Science-Bowl-2K19/blob/master/DSB_2019.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notes

**1) Description of the all the columns**





*   **event_id** - Randomly generated unique identifier for the event type. Maps to event_id column in specs table.
*   **game_session** - Randomly generated unique identifier grouping events within a single game or video play session.
*   **timestamp** - Client-generated datetime
*   **event_data** - Semi-structured JSON formatted string containing the events parameters. Default fields are: event_count, event_code, and game_time; otherwise fields are determined by the event type.
*   **installation_id** - Randomly generated unique identifier grouping game sessions within a single installed application instance.
*   **installation_id** - Randomly generated unique identifier grouping game sessions within a single installed application instance.
*   **event_count** - Incremental counter of events within a game session (offset at 1). Extracted from event_data.
*   **event_code** - Identifier of the event 'class'. Unique per game, but may be duplicated across games. E.g. event code '2000' always identifies the 'Start Game' event for all games. Extracted from event_data.
*   **game_time** - Time in milliseconds since the start of the game session. Extracted from event_data.
*   **title** - Title of the game or video.
*   **type** - Media type of the game or video. Possible values are: 'Game', 'Assessment', 'Activity', 'Clip'.
*   **world** - The section of the application the game or video belongs to. Helpful to identify the educational curriculum goals of the media. Possible values are: 'NONE' (at the app's start screen), TREETOPCITY' (Length/Height), 'MAGMAPEAK' (Capacity/Displacement), 'CRYSTALCAVES' (Weight).






**2) Groupby data to get the number of attempts each installation_id played**


*   train_data.groupby(['game_session','installation_id'],as_index =False)['title'].agg({'value_counts'}).rename(columns={'value_counts':'Total_no'}).head()

*   test_data.groupby(['game_session','installation_id'])['title'].agg({'value_counts'}).rename(columns={'value_counts':'Total_no'}).index.get_level_values(3)


**3) Event Codes Meaning**

*   2000 : Start of the game
*   3010 : Voice description of what to do in the game
*   3110 : Starting of game with the voice description in the background
*   4070 : Player starting to play the game




# Mounting Google Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Importing Required Modules

In [2]:
!pip install catboost



In [0]:
import pandas as pd
pd.set_option('display.max_columns', 10000)
pd.set_option('display.max_rows', 28445)
import numpy as np
import os
import gc
import matplotlib.pyplot as plt
from tqdm import tqdm_notebook
import json
import pprint
import seaborn as sns
from catboost import CatBoostClassifier,CatBoostRegressor,Pool,cv
import xgboost as xgb
import lightgbm as lgb

# Helper Functions

In [0]:
def read_file():
  specs = pd.read_csv('/content/drive/My Drive/specs.csv')
  train_labels = pd.read_csv('./drive/My Drive/train_labels.csv.zip')
  train = pd.read_csv('./drive/My Drive/train.csv.zip')
  test = pd.read_csv('./drive/My Drive/test.csv.zip')
  sub = pd.read_csv('./drive/My Drive/sample_submission.csv')
  
  return specs,train_labels,train,test,sub

In [0]:
specs,train_labels,train,test,sub = read_file()

In [0]:
def get_datetime(df):
  df['timestamp'] = pd.to_datetime(df['timestamp'])
  df['date'] = df['timestamp'].dt.date
  df['month'] = df['timestamp'].dt.month
  df['hour'] = df['timestamp'].dt.hour
  df['minute'] = df['timestamp'].dt.minute
  df['day_of_week'] = df['timestamp'].dt.dayofweek
  
for c in [train,test]:
  get_datetime(c)
  c.pop('timestamp')

In [0]:
def gametime_to(df):
  df['game_time_seconds'] = df['game_time']/1000
  df['game_time_minutes'] = df['game_time']/60000

gametime_to(train)
gametime_to(test)

# Exploratatry Data Analysis

In [7]:
specs.head()

Unnamed: 0,event_id,info,args
0,2b9272f4,The end of system-initiated feedback (Correct)...,"[{""name"":""game_time"",""type"":""int"",""info"":""mill..."
1,df4fe8b6,The end of system-initiated feedback (Incorrec...,"[{""name"":""game_time"",""type"":""int"",""info"":""mill..."
2,3babcb9b,The end of system-initiated instruction event ...,"[{""name"":""game_time"",""type"":""int"",""info"":""mill..."
3,7f0836bf,The end of system-initiated instruction event ...,"[{""name"":""game_time"",""type"":""int"",""info"":""mill..."
4,ab3136ba,The end of system-initiated instruction event ...,"[{""name"":""game_time"",""type"":""int"",""info"":""mill..."


In [17]:
pprint.pprint(json.loads(specs['args'][99]))

[{'info': 'millisecond count since start of game',
  'name': 'game_time',
  'type': 'int'},
 {'info': 'the number of the round that is has just finished',
  'name': 'round',
  'type': 'int'},
 {'info': 'list of layers of sand added to the jar from bottom to top of the '
          'jar completed this round:\n'
          '[{“color”: string – color of sand in first layer, “amount”: int – '
          'amount of sand in the first layer}, {“color”: string – color of '
          'sand in second layer, “amount”: int – amount of sand in the second '
          'layer}, etc.]',
  'name': 'jar',
  'type': 'array->object'},
 {'info': 'duration of the round in milliseconds',
  'name': 'duration',
  'type': 'int'},
 {'info': 'session event counter', 'name': 'event_count', 'type': 'int'},
 {'info': 'event class identifier', 'name': 'event_code', 'type': 'int'}]


In [38]:
print(f'Number of rows : {train.shape[0]}, Number of columns : {train.shape[1]}')
print()
train.head(50)

Number of rows : 11341042, Number of columns : 17



Unnamed: 0,event_id,game_session,event_data,installation_id,event_count,event_code,game_time,title,type,world,date,month,hour,minute,day_of_week,game_time_seconds,game_time_minutes
0,27253bdc,45bb1e1b6b50c07b,"{""event_code"": 2000, ""event_count"": 1}",0001e90f,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE,2019-09-06,9,17,53,4,0.0,0.0
1,27253bdc,17eeb7f223665f53,"{""event_code"": 2000, ""event_count"": 1}",0001e90f,1,2000,0,Magma Peak - Level 1,Clip,MAGMAPEAK,2019-09-06,9,17,54,4,0.0,0.0
2,77261ab5,0848ef14a8dc6892,"{""version"":""1.0"",""event_count"":1,""game_time"":0...",0001e90f,1,2000,0,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,54,4,0.0,0.0
3,b2dba42b,0848ef14a8dc6892,"{""description"":""Let's build a sandcastle! Firs...",0001e90f,2,3010,53,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,54,4,0.053,0.000883
4,1bb5fbdb,0848ef14a8dc6892,"{""description"":""Let's build a sandcastle! Firs...",0001e90f,3,3110,6972,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,55,4,6.972,0.1162
5,1325467d,0848ef14a8dc6892,"{""coordinates"":{""x"":583,""y"":605,""stage_width"":...",0001e90f,4,4070,9991,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,55,4,9.991,0.166517
6,1325467d,0848ef14a8dc6892,"{""coordinates"":{""x"":601,""y"":570,""stage_width"":...",0001e90f,5,4070,10622,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,55,4,10.622,0.177033
7,1325467d,0848ef14a8dc6892,"{""coordinates"":{""x"":250,""y"":665,""stage_width"":...",0001e90f,6,4070,11255,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,55,4,11.255,0.187583
8,1325467d,0848ef14a8dc6892,"{""coordinates"":{""x"":279,""y"":629,""stage_width"":...",0001e90f,7,4070,11689,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,55,4,11.689,0.194817
9,1325467d,0848ef14a8dc6892,"{""coordinates"":{""x"":839,""y"":654,""stage_width"":...",0001e90f,8,4070,12272,Sandcastle Builder (Activity),Activity,MAGMAPEAK,2019-09-06,9,17,55,4,12.272,0.204533


In [42]:
pprint.pprint(json.loads(train['event_data'][48]))

{'description': 'Drag the shovel to the molds to fill them up!',
 'event_code': 3010,
 'event_count': 47,
 'game_time': 29685,
 'identifier': 'Dot_DragShovel',
 'media_type': 'audio',
 'total_duration': 2070}


In [51]:
pprint.pprint(json.loads(train['event_data'][1360]))

{'description': 'These dinosaurs are awfully thirsty. Fill in the clouds to '
                'make it rain. That will fill up the pond with water!',
 'event_code': 3010,
 'event_count': 2,
 'game_time': 67,
 'identifier': 'Buddy_DinosaursAwfullyThirsty,Buddy_FillClouds',
 'media_type': 'audio',
 'total_duration': 6820}


In [0]:
exp = train.iloc[1357:1539,:]

In [0]:
exp.tail()

In [62]:
print(f'Number of rows : {test.shape[0]}, Number of columns : {test.shape[1]}')
print()
test.head()

Number of rows : 1156414, Number of columns : 15



Unnamed: 0,event_id,game_session,event_data,installation_id,event_count,event_code,game_time,title,type,world,date,month,year,day_of_week,hour
0,27253bdc,0ea9ecc81a565215,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE,2019-09-10,9,2019,1,16
1,27253bdc,c1ea43d8b8261d27,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,Magma Peak - Level 1,Clip,MAGMAPEAK,2019-09-10,9,2019,1,16
2,27253bdc,7ed86c6b72e725e2,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,Magma Peak - Level 2,Clip,MAGMAPEAK,2019-09-10,9,2019,1,16
3,27253bdc,7e516ace50e7fe67,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,Crystal Caves - Level 1,Clip,CRYSTALCAVES,2019-09-10,9,2019,1,16
4,7d093bf9,a022c3f60ba547e7,"{""version"":""1.0"",""round"":0,""event_count"":1,""ga...",00abaee7,1,2000,0,Chow Time,Game,CRYSTALCAVES,2019-09-10,9,2019,1,16


In [42]:
print(sub['installation_id'].nunique())
print(sub['installation_id'].nunique() == sub['installation_id'].shape[0])

1000
True


In [63]:
print(test['installation_id'].nunique())
print(train['installation_id'].nunique())

1000
17000


In [64]:
print(train['game_session'].nunique())

303319
