Competitors will be challenged to predict scores on in-game assessments and create an algorithm that will lead to better-designed games and improved learning outcomes.

The intent of the competition is to **use the gameplay data to forecast how many attempts a child will take to pass a given assessment** (an incorrect answer is counted as an attempt).

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
from IPython.display import HTML
import warnings

In [2]:
# Read in the data CSV files
train = pd.read_csv('../input/data-science-bowl-2019/train.csv')
train_labels = pd.read_csv('../input/data-science-bowl-2019/train_labels.csv')
test = pd.read_csv('../input/data-science-bowl-2019/test.csv')
specs = pd.read_csv('../input/data-science-bowl-2019/specs.csv')
ss = pd.read_csv('../input/data-science-bowl-2019/sample_submission.csv')

**Exploratory Data Analysis**

**train.csv & test.csv**

These are the main data files which contain the gameplay events.

**event_id** - Randomly generated unique identifier for the event type. Maps to event_id column in specs table.

**game_session** - Randomly generated unique identifier grouping events within a single game or video play session.

**timestamp** - Client-generated datetime

**event_data** - Semi-structured JSON formatted string containing the events parameters. Default fields are: event_count, event_code, and game_time; otherwise fields are determined by the event type.

**installation_id** - Randomly generated unique identifier grouping game sessions within a single installed application instance.

**event_count** - Incremental counter of events within a game session (offset at 1). Extracted from event_data.

**event_code** - Identifier of the event 'class'. Unique per game, but may be duplicated across games. E.g. event code '2000' always identifies the 'Start Game' event for all games. Extracted from event_data.
game_time - Time in milliseconds since the start of the game session. Extracted from event_data.

**title** - Title of the game or video.

**type** - Media type of the game or video. Possible values are: 'Game', 'Assessment', 'Activity', 'Clip'.

**world** - The section of the application the game or video belongs to. Helpful to identify the educational curriculum goals of the media. Possible values are: 'NONE' (at the app's start screen), TREETOPCITY' (Length/Height), 'MAGMAPEAK' (Capacity/Displacement), 'CRYSTALCAVES' (Weight).

In [3]:
train.head(2)

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world
0,27253bdc,45bb1e1b6b50c07b,2019-09-06T17:53:46.937Z,"{""event_code"": 2000, ""event_count"": 1}",0001e90f,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE
1,27253bdc,17eeb7f223665f53,2019-09-06T17:54:17.519Z,"{""event_code"": 2000, ""event_count"": 1}",0001e90f,1,2000,0,Magma Peak - Level 1,Clip,MAGMAPEAK


In [4]:
test.head(2)

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world
0,27253bdc,0ea9ecc81a565215,2019-09-10T16:50:24.910Z,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE
1,27253bdc,c1ea43d8b8261d27,2019-09-10T16:50:55.503Z,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,Magma Peak - Level 1,Clip,MAGMAPEAK


**specs.csv**
This file gives the specification of the various event types.

**event_id** - Global unique identifier for the event type. Joins to event_id column in events table.

**info** - Description of the event.

**args** - JSON formatted string of event arguments. Each argument contains:

**name** - Argument name.

**type** - Type of the argument (string, int, number, object, array).

**info** - Description of the argument.

In [5]:
specs.head(2)

Unnamed: 0,event_id,info,args
0,2b9272f4,The end of system-initiated feedback (Correct)...,"[{""name"":""game_time"",""type"":""int"",""info"":""mill..."
1,df4fe8b6,The end of system-initiated feedback (Incorrec...,"[{""name"":""game_time"",""type"":""int"",""info"":""mill..."


**train_labels.csv**

This file demonstrates how to compute the ground truth for the assessments in the training set.

In [6]:
train_labels.head(2)

Unnamed: 0,game_session,installation_id,title,num_correct,num_incorrect,accuracy,accuracy_group
0,6bdf9623adc94d89,0006a69f,Mushroom Sorter (Assessment),1,0,1.0,3
1,77b8ee947eb84b4e,0006a69f,Bird Measurer (Assessment),0,11,0.0,0


**sample_submission.csv**

A sample submission in the correct format.

In [7]:
ss.head(3)

Unnamed: 0,installation_id,accuracy_group
0,00abaee7,3
1,01242218,3
2,017c5718,3


In [8]:
type(train) == pd.core.frame.DataFrame

True

In [9]:
train_shape = train.shape
test_shape = test.shape
specs_shape = specs.shape
train_labels_shape = train_labels.shape
ss_shape = ss.shape

print(test_shape)
print(train_shape)
print(specs_shape)
print(train_labels_shape)
print(ss_shape)

(1156414, 11)
(11341042, 11)
(386, 3)
(17690, 7)
(1000, 2)


In [10]:
train.isnull().sum()

event_id           0
game_session       0
timestamp          0
event_data         0
installation_id    0
event_count        0
event_code         0
game_time          0
title              0
type               0
world              0
dtype: int64

In [11]:
train.head(2)

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world
0,27253bdc,45bb1e1b6b50c07b,2019-09-06T17:53:46.937Z,"{""event_code"": 2000, ""event_count"": 1}",0001e90f,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE
1,27253bdc,17eeb7f223665f53,2019-09-06T17:54:17.519Z,"{""event_code"": 2000, ""event_count"": 1}",0001e90f,1,2000,0,Magma Peak - Level 1,Clip,MAGMAPEAK


In [12]:
train.groupby(['title']).count()

Unnamed: 0_level_0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,type,world
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
12 Monkeys,4124,4124,4124,4124,4124,4124,4124,4124,4124,4124
Air Show,306239,306239,306239,306239,306239,306239,306239,306239,306239,306239
All Star Sorting,509344,509344,509344,509344,509344,509344,509344,509344,509344,509344
Balancing Act,5522,5522,5522,5522,5522,5522,5522,5522,5522,5522
Bird Measurer (Assessment),190164,190164,190164,190164,190164,190164,190164,190164,190164,190164
Bottle Filler (Activity),1004068,1004068,1004068,1004068,1004068,1004068,1004068,1004068,1004068,1004068
Bubble Bath,458972,458972,458972,458972,458972,458972,458972,458972,458972,458972
Bug Measurer (Activity),446430,446430,446430,446430,446430,446430,446430,446430,446430,446430
Cart Balancer (Assessment),163343,163343,163343,163343,163343,163343,163343,163343,163343,163343
Cauldron Filler (Assessment),181925,181925,181925,181925,181925,181925,181925,181925,181925,181925


In [13]:
train.groupby(['world']).count()

Unnamed: 0_level_0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type
world,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
CRYSTALCAVES,3232546,3232546,3232546,3232546,3232546,3232546,3232546,3232546,3232546,3232546
MAGMAPEAK,5023687,5023687,5023687,5023687,5023687,5023687,5023687,5023687,5023687,5023687
NONE,23578,23578,23578,23578,23578,23578,23578,23578,23578,23578
TREETOPCITY,3061231,3061231,3061231,3061231,3061231,3061231,3061231,3061231,3061231,3061231


In [14]:
train.groupby(['event_count']).count()

Unnamed: 0_level_0,event_id,game_session,timestamp,event_data,installation_id,event_code,game_time,title,type,world
event_count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,303319,303319,303319,303319,303319,303319,303319,303319,303319,303319
2,117340,117340,117340,117340,117340,117340,117340,117340,117340,117340
3,116006,116006,116006,116006,116006,116006,116006,116006,116006,116006
4,113633,113633,113633,113633,113633,113633,113633,113633,113633,113633
5,112482,112482,112482,112482,112482,112482,112482,112482,112482,112482
...,...,...,...,...,...,...,...,...,...,...
3364,1,1,1,1,1,1,1,1,1,1
3365,1,1,1,1,1,1,1,1,1,1
3366,1,1,1,1,1,1,1,1,1,1
3367,1,1,1,1,1,1,1,1,1,1


In [15]:
train.groupby(['type']).count()

Unnamed: 0_level_0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,world
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Activity,4436728,4436728,4436728,4436728,4436728,4436728,4436728,4436728,4436728,4436728
Assessment,925345,925345,925345,925345,925345,925345,925345,925345,925345,925345
Clip,183676,183676,183676,183676,183676,183676,183676,183676,183676,183676
Game,5795293,5795293,5795293,5795293,5795293,5795293,5795293,5795293,5795293,5795293
