## Play video games at the age of 3 to survive school and life! ;-)

This year the Kaggle community is asked to answer whether media can help kids at the age of 3-5 to gather skills that help to be successful in school and life. We are going to explore this topic by analysing anonymous data of the *PBS KIDS Measure Up!*-App to **predict scores on ingame-assessments**. By doing so we might discover what has to be changed in the game such that we obtain better designed games and higher learning outcomes.  

Be careful on your journey! Our data may also have collections from adult family members that play this game as well! ;-) So don't expect it to origin from kids solely in the age of 3-5.

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

from IPython.display import HTML
HTML('<iframe width="750" height="420" src="https://www.youtube.com/embed/4gc7LlVGwc8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

# Prepare to start

## Loading packages

In [None]:
import numpy as np 
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set()

import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from wordcloud import WordCloud

init_notebook_mode(connected=True) 

## Loading data

In [None]:
basepath = "../input/data-science-bowl-2019/"

In [None]:
from os import listdir
listdir(basepath)

In [None]:
N = 1000000

train = pd.read_csv(basepath + "train.csv", nrows=N)
test = pd.read_csv(basepath + "test.csv", nrows=N)
train_labels = pd.read_csv(basepath + "train_labels.csv", nrows=N)
submission = pd.read_csv(basepath + "sample_submission.csv", nrows=N)
specs = pd.read_csv(basepath + "specs.csv", nrows=N)

# Sneak a peek

### Submission format

In [None]:
submission.head()

* We are asked to predict the **accuracy_group** given an **installation id**. 
* Hence each application install has it's own id. Ohoh! This again means that brothers, sisters, friends, adults ... can also play with the same id! Expect some noise to be present! ;-)
* The **accuracy-group stands for the number of attemps a player needed to pass a given assessment**:
    * 3 - was solved on the 1 attempt
    * 2 - was solved on the 2 attempt
    * 1 - was solved after >=3 attempts
    * 0 - was never solved

### Training data

In [None]:
train.head()

* A lot of history data! 

In [None]:
train.shape

In [None]:
train.isnull().sum().sum()

In [None]:
train_labels.head()

### Test data

In [None]:
test.head()

### Specs

In [None]:
specs.head()

In [None]:
train.describe()

In [None]:
train['game_time'].unique()

In [None]:
train.describe(include=[object,bool])

In [None]:
train['event_id'].value_counts()

In [None]:
train['event_data'].value_counts()

In [None]:
train['installation_id'].value_counts()

In [None]:
train['timestamp'].value_counts(normalize=True)

 **Sorting** 

In [None]:
train.sort_values(by='event_id', ascending=False).head()

**Groupby**

In [None]:
print(train.groupby('event_id').groups)

In [None]:
null_cnt = train.isnull().sum().sort_values()
print('null count:', null_cnt[null_cnt > 0])
# drop
train.dropna(inplace=True)

**Visualisations**

In [None]:
train_types = train["type"].value_counts()

In [None]:
train_worlds = train["world"].value_counts()

In [None]:
fig, ax = plt.subplots(1,2,figsize=(20,5))
sns.countplot(train.world, palette="Pastel2", ax=ax[0]);
sns.countplot(train.type, palette="Pastel1", ax=ax[1]);

"CRYSTALCAVES" and "TREETOPCITY" ratio 

In [None]:
eventbyinstallation = train.groupby(["installation_id"])["event_code"].nunique()

fig = px.histogram(x=eventbyinstallation,
                   title='Unique Event Code Count by Installation Id',
                   opacity=0.8,
                   color_discrete_sequence=['indianred']
                  )

fig.update_layout(
    yaxis_title_text='',
    xaxis_title_text='',
    height=500, width=800
)

fig.update_traces(marker_line_color='rgb(8,48,107)',
                  marker_line_width=1.5, opacity=0.8
                 )

fig.show()

In [None]:
event_id_by_ins_id_1 = train.groupby(["installation_id"])["event_id"].agg("count").sort_values(ascending=False)[382:]
event_id_by_ins_id_2 = train.groupby(["installation_id"])["event_id"].agg("count").sort_values(ascending=False)[:382]

fig = make_subplots(rows=1, cols=2)

trace1 = go.Histogram(x=event_id_by_ins_id_1,
                 marker_color='#FF9999',
                 opacity=0.2,
                 nbinsx=40
                     )

trace2 = go.Histogram(x=event_id_by_ins_id_2,
                 marker_color='#9999CC',
                 opacity=0.75,
                 nbinsx=40
                )

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)


fig.update_layout(
    height=500, width=800, showlegend=False,
    title='Event Count by Installation Id',
  )
fig['layout']['xaxis1'].update(title='Part 1: 0-5k')
fig['layout']['xaxis2'].update(title='Part 2: 5k-60k')

fig.update_traces(marker_line_color='rgb(8,48,107)',
                  marker_line_width=1.5, opacity=0.8
                 )

fig.show()

In [None]:
df_events = train.loc[:,['timestamp', 'event_id','game_time']]
df_events["date"] = df_events['timestamp'].date

In [None]:
df_events["weekdays"] = df_events['timestamp'].dt.weekday_name

gametime_wdays = df_events.groupby(['weekdays'])['game_time'].agg('sum')
gametime_wdays = gametime_wdays.T[['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']]

fig = px.bar(x=gametime_wdays.index, y=gametime_wdays.values)

fig.update_traces(marker_color='mediumvioletred', marker_line_color='rgb(8,48,107)',
                  marker_line_width=2, opacity=0.7
                 )

fig.update_layout(title='Total Game Time By Day',
                   xaxis_title='Weekdays',
                   yaxis_title='Total',
                   width=600, height=400
                 )
fig.show()