# Introduction

Fifth Annual Data Science Bowl will analyze digital game play to help build more effective educational media tools for children. The competition will look at advancements in early childhood education. The results will lead to better designed games and improved learning outcomes, empowering children, parents, caregivers and educators across the globe with insights into how young children learn through media and which approaches work best to help them build on foundational learning skills.
To better understand these challenges and develop the most effective approaches to high-quality early educational media, Booz Allen Hamilton and Kaggle launched the fifth annual Data Science Bowl, the world's largest data science competition focused on social good.
Participants will be provided with anonymous gameplay data from the PBS KIDS Measure Up! app, which was developed as a part of the CPB-PBS Ready to Learn Initiative with funding from the U.S. Department of Education. They will be tasked with **creating algorithms that utilize information about how players use the app to determine what they know and are learning from the experience**, in order to discover important relationships between their engagement with educational media and learning. The insights gleaned from these solutions will help PBS KIDS and other organizations create new solutions, content and products that help ensure each and every user has the best chance to learn important skills, helping improve childhood learning access and achievement.

**Learning Path**
Exposure ---> Exploration ---> Practice ---> Demonstration(as in demonstration of knowledge)
In the PBS KIDS Measure Up! app, **children ages 3 to 5** learn early **STEM concepts focused on length, width, capacity, and weight** while going on an adventure through Treetop City, Magma Peak, and Crystal Caves. Joined by their favorite PBS KIDS characters from Dinosaur Train, Peg + Cat, and Sid the Science Kid, children can also collect rewards and unlock digital toys as they play. At the same time, caregivers can monitor and expand upon what their child is learning using a free companion app: PBS KIDS Super Vision.
Parents can track the skills in which their kids excel, and the skills where they may need more practice. The app also provides tips and related activity ideas to extend learning into daily activities and family time.
The PBS KIDS Measure Up! app, children navigate a map and complete various levels(media types):
  1. Clip(Exposure)
     - Interstitials/Introductory
     - Longer(2-3 Minutes)/Familiar with problem
  2. Activities(Practice): No subjective/There is cause and effect
  3. Games(Practice): with the goal of solving problems/ There is an option of replay
  4. Assessments(Measure player's knowledge/skills): number of incommect and number of accuracy group
       * Bird Measurer
       * Cart Balancer
       * Cauldron Filler
       * Chest Sorter
       * Mushroom Sorter
world - The section of the application the game or video belongs to. Helpful to identify the educational curriculum goals of the media. Possible values are: 'NONE' (at the app's start screen), TREETOPCITY' (Length/Height), 'MAGMAPEAK' (Capacity/Displacement), 'CRYSTALCAVES' (Weight).
The intent of the competition is to use the gameplay data to forecast how many attempts a child will take to pass a given assessment (an incorrect answer is counted as an attempt).
Each application install is represented by an installation_id. This will typically correspond to one child, but you should expect noise from issues such as shared devices.
Note that the training set contains many installation_ids which never took assessments, whereas every installation_id in the test set made an attempt on at least one assessment.
The outcomes in this competition are grouped into 4 groups (labeled accuracy_group in the data):

3: the assessment was solved on the first attempt

2: the assessment was solved on the second attempt

1: the assessment was solved after 3 or more attempts

0: the assessment was never solved

# Load the required Libararies 

In [1]:
import numpy as np
from tqdm import tqdm
import json
import pandas as pd
import os
import gc
from sklearn.model_selection import KFold
#import lightgbm as lgb
from training import *
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

%load_ext autoreload
%autoreload 2

In [2]:
import catboost

# Download the data using Kaggle API

In [37]:
# Download the dataset
#!kaggle competitions list
#!kaggle competitions download -c data-science-bowl-2019
#check submissions status
!kaggle competitions submissions  -c data-science-bowl-2019


No submissions found


# Read the input Data

In [61]:
#Shape of data 
print('Reading train.csv file....')
train = pd.read_csv('data/train.csv')
print('Training.csv file have {} rows and {} columns'.format(train.shape[0], train.shape[1]))

print('Reading test.csv file....')
test = pd.read_csv('data/test.csv')
print('Test.csv file have {} rows and {} columns'.format(test.shape[0], test.shape[1]))

print('Reading train_labels.csv file....')
train_labels = pd.read_csv('data/train_labels.csv')
print('Train_labels.csv file have {} rows and {} columns'.format(train_labels.shape[0], train_labels.shape[1]))

print('Reading specs.csv file....')
specs = pd.read_csv('data/specs.csv')
print('Specs.csv file have {} rows and {} columns'.format(specs.shape[0], specs.shape[1]))

print('Reading sample_submission.csv file....')
sample_submission = pd.read_csv('data/sample_submission.csv')
print('Sample_submission.csv file have {} rows and {} columns'.format(sample_submission.shape[0], sample_submission.shape[1]))

Reading train.csv file....
Training.csv file have 11341042 rows and 11 columns
Reading test.csv file....
Test.csv file have 1156414 rows and 11 columns
Reading train_labels.csv file....
Train_labels.csv file have 17690 rows and 7 columns
Reading specs.csv file....
Specs.csv file have 386 rows and 3 columns
Reading sample_submission.csv file....
Sample_submission.csv file have 1000 rows and 2 columns


# Data Processing 

In [37]:
# memory useage 
train.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11341042 entries, 0 to 11341041
Data columns (total 11 columns):
event_id           object
game_session       object
timestamp          object
event_data         object
installation_id    object
event_count        int64
event_code         int64
game_time          int64
title              object
type               object
world              object
dtypes: int64(3), object(8)
memory usage: 8.1 GB


In [38]:
test.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1156414 entries, 0 to 1156413
Data columns (total 11 columns):
event_id           1156414 non-null object
game_session       1156414 non-null object
timestamp          1156414 non-null object
event_data         1156414 non-null object
installation_id    1156414 non-null object
event_count        1156414 non-null int64
event_code         1156414 non-null int64
game_time          1156414 non-null int64
title              1156414 non-null object
type               1156414 non-null object
world              1156414 non-null object
dtypes: int64(3), object(8)
memory usage: 852.0 MB


In [39]:
all_game_session = train['game_session'].append(test['game_session']).unique()
session_dict = dict(zip(all_game_session, np.arange(len(all_game_session))))

all_installs = train['installation_id'].append(test['installation_id']).unique()
installation_dict = dict(zip(all_installs, range(len(all_installs))))

all_titles = train['title'].append(test['title']).unique()
title_dict = dict(zip(all_titles, range(len(all_titles))))

all_types = train['type'].append(test['type']).unique()
type_dict = dict(zip(all_types, range(len(all_types))))

all_world = train['world'].append(test['world']).unique()
world_dict = dict(zip(all_world, range(len(all_world))))


all_events = train['event_id'].append(test['event_id']).unique()
event_dict = dict(zip(all_events, range(len(all_events))))

In [62]:
for df in [train, test, train_labels]:
    try:
#         df['game_session'] = df['game_session'].map(session_dict)
#         df['installation_id'] = df['installation_id'].map(installation_dict)
        df['title'] = df['title'].map(title_dict)
        df['world'] = df['world'].map(world_dict)
        df['type'] = df['type'].map(type_dict)
#         df['event_id'] = df['event_id'].map(event_dict)
    except:
        print('Not in df')

Not in df


In [63]:
train_labels.head()

Unnamed: 0,game_session,installation_id,title,num_correct,num_incorrect,accuracy,accuracy_group
0,6bdf9623adc94d89,0006a69f,17,1,0,1.0,3
1,77b8ee947eb84b4e,0006a69f,24,0,11,0.0,0
2,901acc108f55a5a1,0006a69f,17,1,0,1.0,3
3,9501794defd84e4d,0006a69f,17,1,1,0.5,2
4,a9ef3ecb3d1acc6a,0006a69f,24,1,0,1.0,3


In [64]:
train.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11341042 entries, 0 to 11341041
Data columns (total 11 columns):
event_id           object
game_session       object
timestamp          object
event_data         object
installation_id    object
event_count        int64
event_code         int64
game_time          int64
title              int64
type               int64
world              int64
dtypes: int64(6), object(5)
memory usage: 6.2 GB


In [65]:
activities_map = dict(zip(title_dict.values(), 
                          4100*np.ones(len(title_dict)).astype('int')))
activities_map[title_dict['Bird Measurer (Assessment)']] = 4110

In [78]:
def extract_time_features(df):
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df['month'] = df['timestamp'].dt.month
    df['hour'] = df['timestamp'].dt.hour
    df['year'] = df['timestamp'].dt.year
    df['dayofweek'] = df['timestamp'].dt.dayofweek
    df['weekofyear'] = df['timestamp'].dt.weekofyear
    return df

In [79]:
train = extract_time_features(train)
test = extract_time_features(test)

In [138]:
test.head()

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world,month,hour,year,dayofweek,weekofyear
0,27253bdc,0ea9ecc81a565215,2019-09-10 16:50:24.910000+00:00,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,0,0,0,9,16,2019,1,37
1,27253bdc,c1ea43d8b8261d27,2019-09-10 16:50:55.503000+00:00,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,1,0,1,9,16,2019,1,37
2,27253bdc,7ed86c6b72e725e2,2019-09-10 16:51:51.805000+00:00,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,4,0,1,9,16,2019,1,37
3,27253bdc,7e516ace50e7fe67,2019-09-10 16:53:12.825000+00:00,"{""event_code"": 2000, ""event_count"": 1}",00abaee7,1,2000,0,28,0,3,9,16,2019,1,37
4,7d093bf9,a022c3f60ba547e7,2019-09-10 16:54:12.115000+00:00,"{""version"":""1.0"",""round"":0,""event_count"":1,""ga...",00abaee7,1,2000,0,29,2,3,9,16,2019,1,37


In [145]:
def extracting_duration(durations):
    dur_std = 0
    dur_sum = 0
    dur_mean = 0
    if len(durations) != 0:
        dur_sum = durations.iloc[-1]
        duration_norm = durations.diff().dropna()
        if len(duration_norm) >= 2:
            dur_std = duration_norm.std()
            dur_mean = duration_norm.mean()
    return dur_mean, dur_sum, dur_std


def feature_engineering(user_sample, test_data=False):
    output = []
    Cum_Assess, Cum_Activity, Cum_Clip, Cum_Game = 0, 0, 0, 0
    cum_corr, cum_incorr, cum_acc = 0, 0, 0
    cum_dur_assess, cum_dur_clip, cum_dur_game, cum_dur_activity = 0, 0, 0, 0
    counter = 0
    cum_acc_group = []
    # itarates through each session of one instalation_id
    for session_name, session in user_sample.groupby('game_session', sort=False):

        # Start a dict to have the feature characterestics
        features = {'Clip': 0, 'Activity': 0,
                    'Assess': 0, 'Game': 0,
                    'Cum_Clip': Cum_Clip, 'Cum_Activity': Cum_Activity,
                    'Cum_Assess': Cum_Assess, 'Cum_Game': Cum_Game,
                    'cum_dur_clip': cum_dur_clip, 'cum_dur_asses': cum_dur_assess,
                    'cum_dur_activity': cum_dur_activity, 'cum_dur_game': cum_dur_game}

        features['installation_id'] = session['installation_id'].unique()[0]
        features['game_session'] = session['game_session'].unique()[0]
        # event_counter includes all event codes and all types
        features['event_counter'] = session.iloc[-1]['event_count']

        # session type
        features['type'] = session['type'].unique()[0]
        # session title
        features['title'] = session['title'].unique()[0]

        # World
        features['world'] = session['world'].unique()[0]

        # Just get back those with event codes of 4100 and 4110
        all_attempts = session.query(
            f'event_code == {activities_map[features["title"]]}')
#        all_attempts = session

        if (features['type'] == type_dict['Assessment']):
                    
            # if we consider all event codes,
            # actions should be the same as event counter
            features['Assess'] += len(all_attempts['event_data'])
            Cum_Assess += features['Assess']

            # Durations
            features['assess_dur_mean'], features['assess_dur_sum'], \
                features['assess_dur_std'] = extracting_duration(
                    all_attempts['game_time'])
            cum_dur_assess += features['assess_dur_sum']

            # Check the numbers of correct atteampts
            features['cum_corr'] = cum_corr
            features['correct'] = all_attempts['event_data'].str.contains(
                'true').sum()
            cum_corr += features['correct']

            # Check the numbers of incorrect atteampts
            features['cum_incorrect'] = cum_incorr
            features['incorrect'] = all_attempts['event_data'].str.contains(
                'false').sum()
            cum_incorr += features['incorrect']

            # To compute accuracy
            features['cum_acc'] = cum_acc / counter if counter > 0 else 0
            features['mean_acc_group'] = sum(cum_acc_group) / counter if counter > 0 else 0
            counter += 1
            features['acc'] = features['correct'] / (features['Assess'])\
                if features['Assess'] != 0 else 0
            cum_acc += features['acc']

            # To find the accuracy group
            if features['acc'] == 0:
                features['acc_group'] = 0
            elif features['acc'] == 1:
                features['acc_group'] = 3
            elif features['acc'] == 0.5:
                features['acc_group'] = 2
            else:
                features['acc_group'] = 1
            cum_acc_group.append(features['acc_group'])

        elif features['type'] == type_dict['Clip']:
            # check the total number of clips
            features['Clip'] += 1
            Cum_Clip += features['Clip']

            # Durations
            features['clip_dur_mean'], features['clip_dur_sum'], \
                features['clip_dur_std'] = extracting_duration(
                    all_attempts['game_time'])
            cum_dur_clip += features['clip_dur_sum']

        elif features['type'] == type_dict['Activity']:
            # check the total number of clips
            features['Activity'] += 1
            Cum_Activity += features['Activity']

            # Durations
            features['activity_dur_mean'], features['activity_dur_sum'], \
                features['activity_dur_std'] = extracting_duration(
                    all_attempts['game_time'])
            cum_dur_activity += features['activity_dur_sum']

        elif features['type'] == type_dict['Game']:
            # check the total number of Games
            features['Game'] += 1
            Cum_Game += features['Game']

            # Durations
            features['game_dur_mean'], features['game_dur_sum'], \
                features['game_dur_std'] = extracting_duration(
                    all_attempts['game_time'])
            cum_dur_game += features['game_dur_sum']

        if features.get('Assess', 0) > 0 or test_data:
            output.append(features)
    if test_data:
        return output[-1]
    return output

In [147]:
groups_train = train.groupby('installation_id', sort = False)
g_train = groups_train.get_group('0006a69f')
#g_train = groups_train.get_group(installation_dict['0006a69f'])
ss = pd.DataFrame(feature_engineering(g_train, False))
ss.T

# groups = test.groupby('installation_id', sort = False)
# g_test = groups.get_group('00abaee7')
# ss = pd.DataFrame(feature_engineering(g_test, True), index=[0])
# ss.T

Unnamed: 0,0,1,2,3,4
Clip,0,0,0,0,0
Activity,0,0,0,0,0
Assess,1,11,1,2,1
Game,0,0,0,0,0
Cum_Clip,11,14,14,24,28
Cum_Activity,3,4,4,9,10
Cum_Assess,0,1,12,13,15
Cum_Game,4,6,6,10,13
cum_dur_clip,0,0,0,0,0
cum_dur_asses,0,31011,121043,139069,162112


In [81]:
train_labels.head()

Unnamed: 0,game_session,installation_id,title,num_correct,num_incorrect,accuracy,accuracy_group
0,6bdf9623adc94d89,0006a69f,17,1,0,1.0,3
1,77b8ee947eb84b4e,0006a69f,24,0,11,0.0,0
2,901acc108f55a5a1,0006a69f,17,1,0,1.0,3
3,9501794defd84e4d,0006a69f,17,1,1,0.5,2
4,a9ef3ecb3d1acc6a,0006a69f,24,1,0,1.0,3


In [91]:
train['event_code'].nunique()

42

## Process train set

In [150]:
# Apply compile function to each installation_id in train dataset
groups = train.groupby('installation_id', sort = False)
df_train = pd.DataFrame()
count = 0
temp_out = []
for ins_id, user_sample in tqdm(groups):
    temp_out += feature_engineering(user_sample)
df_train = pd.DataFrame(temp_out)
#del temp_out
print(df_train.shape)
df_train['installation_id'].equals(train_labels['installation_id'])

100%|██████████| 17000/17000 [32:18<00:00, 16.12it/s]  


(17690, 29)


True

In [None]:
df_train.head()

## Process test set

In [148]:
temp_data = []
for ins_id, user_sample in tqdm(test.groupby('installation_id', sort=False)):
    a = feature_engineering(user_sample, test_data = True)
    temp_data.append(a)
    
df_test = pd.DataFrame(temp_data)
del temp_data
print(df_test.shape)
df_test['installation_id'].equals(sample_submission['installation_id'])

100%|██████████| 1000/1000 [02:03<00:00,  8.11it/s]

(1000, 29)





True

In [149]:
df_test.head().T

Unnamed: 0,0,1,2,3,4
Clip,0,0,0,0,0
Activity,0,0,0,0,0
Assess,0,0,0,0,0
Game,0,0,0,0,0
Cum_Clip,14,29,6,10,17
Cum_Activity,7,11,2,2,1
Cum_Assess,1,11,0,0,0
Cum_Game,3,12,0,1,6
cum_dur_clip,0,0,0,0,0
cum_dur_asses,22737,144871,0,0,0


In [151]:
df_test.to_csv('data_compiled/df_test2.csv', index = False)
df_train.to_csv('data_compiled/df_train2.csv', index = False)
del train, test

# Read the clean data

In [3]:
#Shape of data 
print('Reading df_train.csv file....')
df_train = pd.read_csv('data_compiled/df_train.csv')
print('df_train.csv file have {} rows and {} columns'\
      .format(df_train.shape[0], df_train.shape[1]))

#Shape of data 
print('Reading df_test.csv file....')
df_test = pd.read_csv('data_compiled/df_test.csv')
print('df_test.csv file have {} rows and {} columns'\
      .format(df_test.shape[0], df_test.shape[1]))

Reading df_train.csv file....
df_train.csv file have 17690 rows and 29 columns
Reading df_test.csv file....
df_test.csv file have 1000 rows and 29 columns


In [4]:
df_train.columns

Index(['Activity', 'Assess', 'Clip', 'Cum_Activity', 'Cum_Assess', 'Cum_Clip',
       'Cum_Game', 'Game', 'acc', 'acc_group', 'assess_dur_mean',
       'assess_dur_std', 'assess_dur_sum', 'correct', 'cum_acc', 'cum_corr',
       'cum_dur_activity', 'cum_dur_asses', 'cum_dur_clip', 'cum_dur_game',
       'cum_incorrect', 'event_counter', 'game_session', 'incorrect',
       'installation_id', 'mean_acc_group', 'title', 'type', 'world'],
      dtype='object')

# Training Step

In [5]:
x_cols = [col for col in df_train.columns if col not in 
          ['correct', 'incorrect', 'acc_group', 
           'installation_id', 'game_session' ,'type',
          'acc']]
y_col = ['acc_group']
x_encoder = ['title', 'world']

## Convert categorical variable into dummy/indicator variables

In [6]:
print(df_train.shape, df_test.shape)
df_train, df_test = Convert_LabelEncoder(df_train, df_test, x_encoder)
print(df_train.shape, df_test.shape)

(17690, 29) (1000, 29)
(17690, 29) (1000, 29)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [13]:
df_test.shape

(1000, 29)

## Train random forest model

In [47]:
RF_mdl = random_forest_param_selection(df_train[x_cols], 
                                       df_train[y_col].values.ravel(),
                                       nfolds = 5, n_jobs = -1)

The training roc_auc_score is: 0.772
The best parameters are: {'n_estimators': 860, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 'sqrt', 'max_depth': 20, 'bootstrap': False}


In [51]:
pred_RF = RF_mdl.predict(df_test[x_cols])

In [32]:
def model(X_train,y_train,final_test,n_splits=3):
    scores=[]
    pars = {
        'colsample_bytree': 0.8,                 
        'learning_rate': 0.08,
        'max_depth': 10,
        'subsample': 1,
        'objective':'multi:softprob',
        'num_class':4,
        'eval_metric':'mlogloss',
        'min_child_weight':3,
        'gamma':0.25,
        'n_estimators':500
    }
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
    y_pre=np.zeros((len(final_test),4),dtype=float)
    final_test=xgb.DMatrix(final_test)


    for train_index, val_index in kf.split(X_train, y_train):
        train_X = X_train.iloc[train_index]
        val_X = X_train.iloc[val_index]
        train_y = y_train[train_index]
        val_y = y_train[val_index]
        xgb_train = xgb.DMatrix(train_X, train_y)
        xgb_eval = xgb.DMatrix(val_X, val_y)

        xgb_model = xgb.train(pars,
                      xgb_train,
                      num_boost_round=1000,
                      evals=[(xgb_train, 'train'), (xgb_eval, 'val')],
                      verbose_eval=False,
                      early_stopping_rounds=20
                     )

        val_X=xgb.DMatrix(val_X)
        pred_val=[np.argmax(x) for x in xgb_model.predict(val_X)]
        score=cohen_kappa_score(pred_val,val_y,weights='quadratic')
        scores.append(score)
        print('choen_kappa_score :',score)

        pred=xgb_model.predict(final_test)
        y_pre+=pred

    pred = np.asarray([np.argmax(line) for line in y_pre])
    print('Mean score:',np.mean(scores))
    
    return xgb_model,pred

In [33]:
xgb_model,pred=model(df_train[x_cols], 
                     df_train[y_col].values.ravel(),
                     df_test[x_cols],5)

Here
Here
Here
here
choen_kappa_score : 0.7914736619911242
choen_kappa_score : 0.8106888472525314
choen_kappa_score : 0.8150414011794884
choen_kappa_score : 0.8064879568131574
choen_kappa_score : 0.8085913394454539
Mean score: 0.806456641336351


In [7]:
def make_classifier(iterations=6000):
    clf = CatBoostClassifier(
                               loss_function='MultiClass',
                                eval_metric="WKappa",
                               task_type="CPU",
                               #learning_rate=0.01,
                               iterations=iterations,
                               od_type="Iter",
                                #depth=4,
                               early_stopping_rounds=500,
                                #l2_leaf_reg=10,
                                #border_count=96,
                               random_seed=42,
                                #use_best_model=use_best_model
                              )
        
    return clf

In [21]:
X = df_train[x_cols]
y = df_train[y_col]
#.values.ravel()
# oof is an zeroed array of the same size of the input dataset
oof = np.zeros(len(X))
NFOLDS = 5
# here the KFold class is used to split the dataset in 5 diferents training and validation sets
# this technique is used to assure that the model isn't overfitting and can performs aswell in 
# unseen data. More the number of splits/folds, less the test will be impacted by randomness
folds = KFold(n_splits=NFOLDS, shuffle=True, random_state=2019)
training_start_time = time()
models = []
for fold, (trn_idx, test_idx) in enumerate(folds.split(X, y)):
    # each iteration of folds.split returns an array of indexes of the new training data and validation data
    start_time = time()
    print(f'Training on fold {fold+1}')
    # creates the model
    clf = make_classifier()
    # fits the model using .loc at the full dataset to select the splits indexes and features used
    clf.fit(X.loc[trn_idx, x_cols], y.loc[trn_idx], eval_set=(X.loc[test_idx, x_cols], y.loc[test_idx]),
                          use_best_model=True, verbose=500)
    
    # then, the predictions of each split is inserted into the oof array
    oof[test_idx] = clf.predict(X.loc[test_idx, x_cols]).reshape(len(test_idx))
    models.append(clf)
    print('Fold {} finished in {}'.format(fold + 1, str(datetime.timedelta(seconds=time() - start_time))))
    print('____________________________________________________________________________________________\n')
    #break
    
print('-' * 30)
# and here, the complete oof is tested against the real data using que metric (quadratic weighted kappa)
print('OOF QWK:', qwk(y, oof))
print('-' * 30)

Training on fold 1
0:	learn: 0.7539531	test: 0.7433976	best: 0.7433976 (0)	total: 15.2ms	remaining: 1m 31s
500:	learn: 0.8189450	test: 0.8050395	best: 0.8053527 (498)	total: 5.13s	remaining: 56.4s
1000:	learn: 0.8415761	test: 0.8125321	best: 0.8129725 (975)	total: 10.2s	remaining: 51.1s
1500:	learn: 0.8555910	test: 0.8122922	best: 0.8150058 (1347)	total: 15.7s	remaining: 47s
Stopped by overfitting detector  (500 iterations wait)

bestTest = 0.8150057806
bestIteration = 1347

Shrink model to first 1348 iterations.
Fold 1 finished in 0:00:20.005299
____________________________________________________________________________________________

Training on fold 2
0:	learn: 0.7477863	test: 0.7383514	best: 0.7383514 (0)	total: 16.5ms	remaining: 1m 38s
500:	learn: 0.8204107	test: 0.8049331	best: 0.8053302 (487)	total: 5.75s	remaining: 1m 3s
1000:	learn: 0.8441128	test: 0.8110192	best: 0.8129067 (927)	total: 11.6s	remaining: 57.7s
1500:	learn: 0.8559205	test: 0.8144395	best: 0.8158388 (1403)	tot

In [29]:
# make predictions on test set once
predictions = []
for model in models:
    predictions.append(model.predict(df_test[x_cols]))
predictions = np.concatenate(predictions, axis=1)
print(predictions.shape)
predictions = stats.mode(predictions, axis=1)[0].reshape(-1)
print(predictions.shape)

(1000, 5)
(1000,)


In [28]:
from scipy import stats

In [35]:
df_train.head(10).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Activity,0,0,0,0,0,0,0,0,0,0
Assess,1,11,1,2,1,1,4,2,1,2
Clip,0,0,0,0,0,0,0,0,0,0
Cum_Activity,0,0,0,0,0,0,0,0,0,0
Cum_Assess,0,1,12,13,15,0,1,5,0,0
Cum_Clip,0,0,0,0,0,0,0,0,0,0
Cum_Game,0,4,4,4,8,0,0,0,0,0
Game,0,0,0,0,0,0,0,0,0,0
acc,1,0,1,0.5,1,1,0,0.5,1,0.5
acc_group,3,0,3,2,3,3,0,2,3,2
