<center><img src="https://i.imgur.com/HrWLO8e.png"></center>


## Weights and Biases
Each W&B project has a dashboard that contains information about all the experiments in that project. Here's an example dashboard of a project.
![6fBE0hz%20-%20Imgur.png](https://i.imgur.com/6fBE0hz.png)

In [None]:
import wandb
wandb.init(project="riiid-challenge-wb", name="exploration")

In [None]:
%%time

# Import the Rapids suite here - takes abot 1.5 mins

import sys
!cp ../input/rapids/rapids.0.15.0 /opt/conda/envs/rapids.tar.gz
!cd /opt/conda/envs/ && tar -xzvf rapids.tar.gz > /dev/null
sys.path = ["/opt/conda/envs/rapids/lib/python3.7/site-packages"] + sys.path
sys.path = ["/opt/conda/envs/rapids/lib/python3.7"] + sys.path
sys.path = ["/opt/conda/envs/rapids/lib"] + sys.path 
!cp /opt/conda/envs/rapids/lib/libxgboost.so /opt/conda/lib/


In [None]:
# Regular Libraries
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.image as mpimg
from tabulate import tabulate
import missingno as msno 
from IPython.display import display_html
from PIL import Image
import gc
import cv2
from scipy.stats import pearsonr
import tqdm

import pydicom # for DICOM images
from skimage.transform import resize
import copy
import re

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

# Color Palette
custom_colors = ['#00FFE2', '#00FDFF', '#00BCFF', '#0082FF', '#8000FF', '#B300FF', '#F400FF']
sns.palplot(sns.color_palette(custom_colors))

# Set Style
sns.set_style("whitegrid")
sns.despine(left=True, bottom=True)

# Set tick size
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)

*ðŸ“ŒNote: Can't use `Dask-cuDF` because we oly have 1 worker and Memory: 13.96 in the Kaggle GPU Accelerator. If we would have had more than 1 worker, `Dask` would have performed even better :)*

In [None]:
# Rapids Imports
import cudf
import cupy # CuPy is an open-source array library accelerated with NVIDIA CUDA.


from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster()
client = Client(cluster)
client

<img src="https://i.imgur.com/NvHmO3L.png">

<div class="alert alert-block alert-info">
In this section we'll use the <code>cudf</code> and <code>cupy</code> libraries provided by RAPIDS, combined with <code>numpy</code> for the plotting part. The notebook runs at the moment in 3 minutes.
</div>

# 1. train.csv

* `row_id`: (int64) ID code for the row.
* `timestamp`: (int64) the time in milliseconds between this user interaction and the first event completion from that user.
* `user_id`: (int32) ID code for the user.
* `content_id`: (int16) ID code for the user interaction
* `content_type_id`: (bool) 0 if the event was a question being posed to the user, 1 if the event was the user watching a lecture.
* `task_container_id`: (int16) ID code for the *batch of questions or lectures*. (eg. a user might see three questions in a row before seeing the explanations for any of them - those three would all share a task_container_id)
* `user_answer`: (int8) the user's answer to the question, if any. Read -1 as null, for lectures.
* `answered_correctly`: (int8) if the user responded correctly. Read -1 as null, for lectures.
* `prior_question_elapsed_time`: (float32) The average time in milliseconds it took a user to answer each question in the previous question bundle, ignoring any lectures in between (is null for a user's first question bundle or lecture)
* `prior_question_had_explanation`: (bool) Whether or not the user saw an explanation and the correct response(s) after answering the previous question bundle, ignoring any lectures in between. The value is shared across a single question bundle, and is null for a user's first question bundle or lecture. Typically the first several questions a user sees were part of an onboarding diagnostic test where they did not get any feedback.

In [None]:
%%time

# Read in data
dtypes = {
    "row_id": "int64",
    "timestamp": "int64",
    "user_id": "int32",
    "content_id": "int16",
    "content_type_id": "boolean",
    "task_container_id": "int16",
    "user_answer": "int8",
    "answered_correctly": "int8",
    "prior_question_elapsed_time": "float32", 
    "prior_question_had_explanation": "int8"
}

train = cudf.read_csv('../input/riiid-test-answer-prediction/train.csv', dtype=dtypes)

# # Drop "row_id" column as it doesn't give any information
# train = train.drop(columns = ["row_id"], axis=1, inplace=True)

> ðŸ“ŒNote: The only 2 columns with missing data (explained in documentation - `NULL` values are present for the first question bundle)

In [None]:
# Data Information
print("Rows: {:,}".format(len(train)), "\n" +
      "Columns: {}".format(len(train.columns)))

# Find Missing Data if any
total = len(train)

for column in train.columns:
    if train[column].isna().sum() != 0:
        print("{} has: {:,} ({:.2}%) missing values.".format(column, train[column].isna().sum(), 
                                                             (train[column].isna().sum()/total)*100))
        
        
# Fill in missing values with "-1"
train["prior_question_elapsed_time"] = train["prior_question_elapsed_time"].fillna(-1)
train["prior_question_had_explanation"] = train["prior_question_had_explanation"].fillna(-1)

train.head()

## 1.1 Columns individual analysis

* numerical features (distplot): `timestamp`, `prior_question_elapsed_time`
* categorical features (distplot): `user_id` count, `content_id` count, `task_container_id` count
* categorical features (barplot): `user_answer` count, `answered_correctly` count, `prior_question_had_explanation` count

### Predefined functionsðŸ“‚

Because there is no possibility (yet) to use Rapids for visualization we need to preprocess and convert the data to numpy arrays and plot it afterwards.

In [None]:
def distplot_features(df, feature, title, color = custom_colors[4], categorical=True):
    '''Takes a column from the GPU dataframe and plots the distribution (after count).'''
    
    if categorical:
        values = cupy.asnumpy(df[feature].value_counts().values)
    else:
        values = cupy.asnumpy(df[feature].values)
        
    print('Mean: {:,}'.format(np.mean(values)), "\n"
          'Median: {:,}'.format(np.median(values)), "\n"
          'Max: {:,}'.format(np.max(values)))

    
    fig = plt.figure(figsize = (18, 3))
    
    if categorical:
        sns.distplot(values, hist=False, color = color, kde_kws = {'lw':3})
    else:
        # To speed up the process
        sns.distplot(values[::250000], hist=False, color = color, kde_kws = {'lw':3})
    
    plt.title(title, fontsize=15)
    plt.show();
    
    del values
    gc.collect()
    return fig

In [None]:
def barplot_features(df, feature, title, palette = custom_colors[2:]):
    '''Takes the numerical columns (with less than 10 categories) and plots the barplot.'''
    
    # We need to extract both the name of the category and the no. of appearences
    index = cupy.asnumpy(df[feature].value_counts().reset_index()["index"].values)
    values = cupy.asnumpy(df[feature].value_counts().reset_index()[feature].values) 

    fig = plt.figure(figsize = (18, 3))
    sns.barplot(x = index, y = values, palette = custom_colors[2:])
    plt.title(title, fontsize=15)
    plt.show();
    
    del index, values
    gc.collect()
    return fig

### Inspect numerical features

In [None]:
numerical_features = ['timestamp', 'prior_question_elapsed_time']

for feature in numerical_features:
    fig = distplot_features(train, feature=feature, title = feature + " distribution", color = custom_colors[1], categorical=False)
    wandb.log({ feature + " distribution": fig})

### Inspect Categorical Features: many values

In [None]:
categorical_features = ['user_id', 'content_id', 'task_container_id']

for feature in categorical_features:
    fig = distplot_features(train, feature=feature, title = feature + " countplot distribution", color = custom_colors[4], categorical=True)
    wandb.log({feature + " countplot distribution": fig})

### Inspect Categorical Features: fiew values

> There are only a fiew cases where content_type_id is = 1 (meaning lectures) - which is good, we're not supposed to predict those anyways.

In [None]:
categorical_for_bar = ['content_type_id', 'user_answer', 
                       'answered_correctly', 'prior_question_had_explanation']

for feature in categorical_for_bar:
    fig = barplot_features(train, feature=feature, title = feature + " barplot")
    wandb.log({feature + " barplot": fig})

## View the plots saved in W&B dashboard
You can see you live dashboard as you log metrics and plots by simply calling `wandb.run`. It displays the dashboard of the currently executing run.

In [None]:
wandb.run

## 1.2 Data Processing

> ðŸ“ŒNote: The **outliers** might strongly influence the future models. Hence, we need to carefully handle them. However, by trying to erase the outliers we can erase up to 10% of the data, which is valuable information for training our models.


In [None]:
# Total rows we started with
total = len(train)
feature = "timestamp"

# Compute Outliers
Q1 = cupy.percentile(train[feature].values, q = 25).item()
Q3 = cupy.percentile(train[feature].values, q = 75).item()
IQR = Q3 - Q1

# We'll look only at the upper interval outliers
outlier_boundry = Q3 + 1.5*IQR

print('Timestamp: around {:.2}% of the data would be erased.'.format(len(train[train[feature] >= outlier_boundry])/total * 100), 
      "\n"+
      'The outlier boundry is {:,}, which means {:,.5} hrs, which means {:,.5} days.'.format(outlier_boundry, (outlier_boundry / 3.6e+6),
                                                                                       (outlier_boundry / 3.6e+6)/24))

gc.collect()

> ðŸ“ŒNote: However, I would erase all pupils (`user_id`) that have less than 5 appearences in the data (no prediction can be made on these students)  

In [None]:
# Select ids to erase
ids_to_erase = train["user_id"].value_counts().reset_index()[train["user_id"].value_counts().reset_index()["user_id"] < 5]\
                                                                                                                ["index"].values

# Erase the ids
new_train = train[~train['user_id'].isin(ids_to_erase)][:1000]

print("We erased {} rows meaning {:.3}% of all data.".format(len(train)-len(new_train), (1 - len(new_train)/len(train))*100))
del ids_to_erase
# del train

In [None]:
# Count how many times the user answered correctly out of all available times
user_performance = train.groupby("user_id").agg({ 'row_id': ['count'], 'answered_correctly': ['sum'] }).reset_index()
user_performance.columns = ["user_id", "total_count", "correct_count"]
user_performance["performance"] = user_performance["correct_count"] / user_performance["total_count"]

# Create intervals for number of appearences
# between 0 and 1000, 1000 and 2500 and 2500+
def condition(x):
    if x <= 1000:
        return 0
    elif (x > 1000) & (x <= 2500):
        return 1
    else:
        return 2
    
user_performance["total_interval"] = user_performance["total_count"].applymap(condition)

> ðŸ“ŒNote: So yes, the *average* performance increases along with the number of times one student appears in the data.

In [None]:
# Convert to numpy arrays (so we can plot)
x = cupy.asnumpy(user_performance["total_interval"].values)
y = cupy.asnumpy(user_performance["performance"].values)

# Plot
fig = plt.figure(figsize = (18, 4))
sns.barplot(x = x, y = y, palette = custom_colors[1:])
plt.title("Performance over number of appearences", fontsize = 15)
plt.xticks([0, 1, 2], ['<1000', '1000-2500', '2500+']);

wandb.log({"Performance over number of appearences": fig})


In [None]:
wandb.run

# W&B Artifacts
 You can store different versions of your datasets and models in the cloud as Artifacts. Think of an Artifact as of a folder of data to which we can add individual files, and then upload to the cloud as a part of our W&B project, which also supports automatic versioning of datasets and models. Artifacts also track the training pipelines as DAGs. Here's an exmaple of artifacts graph.
![artifacts](https://i.imgur.com/QQULnpP.gif)
## 1.4 Save and delete

> To keep the notebook as light as possible and to not overload the memory, we save the `train` data in .feather format (lighter, takes about 7 seconds to upload using `cudf`) and delete the dataframes.


In [None]:
# Checkpoint: save to .parquet
print("Length of new_train", len(new_train))
new_train.to_parquet('new_train.parquet')
!ls

In [None]:
#save it as model artifact on W&B
artifact =  wandb.Artifact(name="train_data", type="dataset")
artifact.add_file("new_train.parquet")
wandb.log_artifact(artifact)

In [None]:
# Clean the environment
del train, new_train
gc.collect()
!rm new_train.parquet
!ls

In [None]:
import wandb
run = wandb.init()

artifact = run.use_artifact('ivangoncharov/riiid-challenge-wb/train_data:v0', type='dataset')
artifact_dir = artifact.download()

!ls

# 2. questions.csv

* `question_id`: foreign key for the train/test `content_id` column, when the content type is question (0).
* `bundle_id`: code for which questions are served together.
* `correct_answer`: the answer to the question. Can be compared with the train user_answer column to check if the user was right.
* `part`: the relevant section of the TOEIC test.
* `tags`: one or more detailed tag codes for the question. The meaning of the tags will not be provided, but these codes are sufficient for clustering the questions together.

> The Test of English for International Communication (TOEIC) is an international standardized test of English language proficiency for non-native speakers.

In [None]:
questions = cudf.read_csv('../input/riiid-test-answer-prediction/questions.csv')

# Data Information
print("Rows: {:,}".format(len(questions)), "\n" +
      "Columns: {}".format(len(questions.columns)))

# Find Missing Data if any
total = len(questions)

for column in questions.columns:
    if questions[column].isna().sum() != 0:
        print("{} has: {:,} ({:.2}%) missing values.".format(column, questions[column].isna().sum(), 
                                                             (questions[column].isna().sum()/total)*100))
        
        
# Fill in missing values with "-1"
questions["tags"] = questions["tags"].fillna(-1)

questions.head()

## 2.1 Inspect the columns

* categorical features (distplot): `question_id` count, `bundle_id` count, `tags` count
* categorical features (barplot): `correct_answer`, `part`

In [None]:
# ----- question_id -----

# The table contains an equal number of IDs for each question
print('There is a total of {:,} IDs.'.format(len(questions['question_id'].value_counts())), "\n")

# ----- bundle_id -----
print('There are {:,} unique bundle IDs.'.format(questions['bundle_id'].nunique()))

> ðŸ“ŒNote: majority of the questions are from part 5 - if this distribution doesn't match the `test` set, there might be some issues :)

In [None]:
for feature in ['part', 'correct_answer']:
    fig = barplot_features(questions, feature=feature, title=feature + " - barplot distribution")
    wandb.log({feature + " - barplot distribution": fig})
fig = distplot_features(questions, 'tags', title = "Tags - Count Distribution", color = custom_colors[0], categorical=True)
wandb.log({"Tags - Count Distribution": fig})

### Save and delete

In [None]:
# Checkpoint: save to parquet
artifact =  wandb.Artifact(name="more_train_data", type="dataset")
questions.to_parquet('questions.parquet')
artifact.add_file('questions.parquet')

In [None]:
del questions
gc.collect()

# 3. lectures.csv

* `lecture_id`: foreign key for the train/test `content_id` column, when the content type is lecture (1).
* `part`: top level category code for the lecture.
* `tag`: one tag codes for the lecture. The meaning of the tags will not be provided, but these codes are sufficient for clustering the lectures together.
* `type_of`: brief description of the core purpose of the lecture (`string` - so this data needs to be treated a bit different)

*no missing values*

In [None]:
lectures = cudf.read_csv('../input/riiid-test-answer-prediction/lectures.csv')

# Encode 'type_of' column
lectures.type_of,codes = lectures['type_of'].factorize()

# Data Information
print("Rows: {:,}".format(len(lectures)), "\n" +
      "Columns: {}".format(len(lectures.columns)))
lectures.head()

## 3.1 Inspect the columns

In [None]:
# ----- lecture_id -----
# The table contains an equal number of IDs for each question
print('There is a total of {:,} IDs.'.format(len(lectures['lecture_id'].value_counts())), "\n")

# There are 151 unique tags
print('There are a total of {:,} unique tags IDs.'.format(len(lectures['tag'].value_counts())))

> ðŸ“ŒNote: Again, part 5 is very proeminent.

In [None]:
for feature in ['part', 'type_of']:
    fig = barplot_features(lectures, feature=feature, title=feature + " - barplot distribution")
    wandb.log({feature + " - barplot distribution": fig})

## 3.3 Save and delete

In [None]:
lectures.to_parquet("lectures.parquet")
artifact.add_file("lectures.parquet")
del lectures
gc.collect()

## View the dashboard in real Time (Blurb)

In [None]:
wandb.run

<img src="https://i.imgur.com/3cBHzEF.png">

> Let's look again at the structure of our data:
<img src="https://i.imgur.com/gjuzFkl.png" width=550>

<div class="alert alert-block alert-success">
<p><b>This section uses the <code>cuML</code> package and XGBoost to compute the predictions.</b></p>
</div>

In [None]:
cudf.set_allocator("managed")

In [None]:
%%time
# Import the data
train = cudf.read_parquet("../input/riiid-answer-correctness-prediction-rapids/new_train.parquet")
questions = cudf.read_parquet("../input/riiid-answer-correctness-prediction-rapids/questions.parquet")

# Lectures we won't load, as we are not supposed to predict for these rows

In [None]:
%%time
# Let's exclude all observations where (content_type_id = 1) & (answered_correctly = -1)
train = train[train['content_type_id'] != 1]
train = train[train['answered_correctly'] != -1].reset_index(drop=True)

# 1. Feature Engineering

In [None]:
# Parameters
train_percent = 0.1
total_len = len(train)

In [None]:
# Split data into train data & feature engineering data (to use for past performance)
# Timestamp is in descending order - meaning that the last 10% observations have
# the biggest chance of having had some performance recorded before
# so looking at the performance in the past we'll try to predict the performance now

features_df = train.iloc[ : int(total_len*(1-train_percent))]
train_df = train.iloc[int(total_len*(1-train_percent)) : ]

## 1.1 Feature Engineering - Create Data

In [None]:
%%time
# --- STUDENT ANSWERS ---
# Group by student
user_answers = features_df[features_df['answered_correctly']!=-1].\
                            groupby('user_id').\
                            agg({'answered_correctly': ['sum', 'mean', 'min', 
                                                        'max', 'count', 'median', 
                                                        'std', 'var']}).\
                            reset_index()

user_answers.columns = ['user_id', 'user_sum', 'user_mean', 'user_min', 'user_max', 
                        'user_count', 'user_median', 'user_std', 'user_var']


# --- CONTENT ID ANSWERS ---
# Group by content
content_answers = features_df[features_df['answered_correctly']!=-1].\
                            groupby('content_id').\
                            agg({'answered_correctly': ['sum', 'mean', 'min', 
                                                        'max', 'count', 'median', 
                                                        'std', 'var']}).\
                            reset_index()

content_answers.columns = ['content_id', 'content_sum', 'content_mean', 'content_min', 
                           'content_max', 'content_count', 'content_median', 'content_std', 
                           'content_var']

> Save FE data; we will use it for the `test` set too :)

In [None]:
user_answers.to_parquet('user_answers.parquet')
content_answers.to_parquet('content_answers.parquet')
artifact.add_file('user_answers.parquet')
artifact.add_file('content_answers.parquet')

## Save the artifacts to cloud
We have used artifacts to to track all the files that we've pre-processes. Now let's log these artifacts so that we don't have to repeat these steps

In [None]:
wandb.log_artifact(artifact)

In [None]:
del train, questions
gc.collect()

# Download the Uploaded artifacts [BLURB **here**]

In [None]:
import wandb
run = wandb.init()

artifact = run.use_artifact('authors/riiid-challenge-wb/train_data:v0', type='dataset')
artifact_dir = artifact.download()

In [None]:
wandb.run.finish()

## 1.2 Predefined Functions for PreprocesingÂ¶

> Combine new features with the `train_df`

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# Features for ML
features_to_keep = ['user_sum', 'user_mean', 'user_min', 'user_max', 
                        'user_count', 'user_median', 'user_std', 'user_var',
                   'content_sum', 'content_mean', 'content_min', 
                           'content_max', 'content_count', 'content_median', 'content_std', 
                           'content_var']
target = 'answered_correctly'
all_features = features_to_keep.copy()
all_features.append(target)


# We need to convert True-False variables to integers
def to_bool(x):
    '''For the string variables.'''
    if x == False:
        return 0
    else:
        return 1

    
def combine_features(data = None):
    '''Combine the features with the Train/Test data.'''
    
    # Add "past" information
    features_data = data.merge(user_answers, how = 'left', on = 'user_id')
    features_data = features_data.merge(content_answers, how = 'left', on = 'content_id')

    # Apply
    features_data['content_type_id'] = features_data['content_type_id'].applymap(to_bool)
    features_data['prior_question_had_explanation'] = features_data['prior_question_had_explanation'].applymap(to_bool)

    # Fill in missing spots
    features_data.fillna(value = -1, inplace = True)
    
    return features_data


# Scaling the data did not perform as I expected to - so for now we will exclude it
def scale_data(features_data=None, train=True, features_to_keep=None, target=None):
    '''Scales the provided data - if the data is for training, excludes the target column.
    It also chooses the features used in the prediction.'''
    
    data_for_standardization = features_data[features_to_keep]
    matrix = data_for_standardization.as_matrix()
    scaled_matrix = StandardScaler().fit_transform(matrix)
    
    scaled_data = cudf.DataFrame(scaled_matrix)
    scaled_data.columns = data_for_standardization.columns
    
    # We don't want to scale the target also
    if train:
        scaled_data[target] = features_data[target]
        
    return scaled_data

## 1.3 Apply Functions - getting data ready

In [None]:
%%time

train_df = combine_features(data=train_df)
# train_df = scale_data(features_data=train_df, train=True, features_to_keep=features_to_keep, target=target)

# Comment this if you're scaling
train_df = train_df[all_features]

print("Observations in train: {:,}".format(len(train_df)))
train_df.head()

# 2. XGBoost Model

In [None]:
# RAPIDS roc_auc_score is 16x faster than sklearn. - cdeotte
from cuml.metrics import roc_auc_score
from cuml.preprocessing.model_selection import train_test_split
import xgboost
import pickle

In [None]:
# Features, target and train/test split
X = train_df[features_to_keep]
y = train_df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 
                                                    shuffle=False, random_state=13, stratify=y)

## 2.1 Baseline Model ;)

### Helper Function that runs multiple models

In [None]:
import wandb

default_params = {
    'max_depth' : 4,
    'max_leaves' : 2**4,
    'tree_method' : 'gpu_hist',
    'grow_policy' : 'lossguide',
    'eta': 0.001
}

In [None]:

def train_xgb_model():
    '''Trains an XGB and returns the trained model + ROC value.'''
    wandb.init(project="riiid-challenge-wb", name="Baseline-xgboost", config=default_params)
    config = wandb.config
    params = {
    'max_depth' : config.max_depth,
    'max_leaves' : config.max_leaves,
    'tree_method' : config.tree_method,
    'grow_policy' : config.grow_policy,
    'eta' : config.eta,
    'objective': "reg:logistic"
    }
    
    # Create DMatrix - is optimized for both memory efficiency and training speed.
    train_matrix = xgboost.DMatrix(data = X_train, label = y_train)
    
    # Create & Train the model
    model = xgboost.train(params, dtrain=train_matrix, callbacks=[wandb.xgboost.wandb_callback()])

    # Make prediction
    predicts = model.predict(xgboost.DMatrix(X_test))
    roc = roc_auc_score(y_test.astype('int32'), predicts)
    wandb.log({"ROC": roc})
    print(" - ROC: {:.5}".format(roc))
    
    return model, roc


In [None]:
%%time

model1, roc1 = train_xgb_model()

<div class="alert alert-block alert-info">
<p><b>We have a ROC score of 0.71628 in less than 10 seconds.</b></p>
<p>Incredible.</p>
</div>

In [None]:
# save model to file
pickle.dump(model1, open("baseline_model.pickle.dat", "wb"))
artifact = wandb.Artifact(name="trained_models", type="model")
artifact.add_file("baseline_model.pickle.dat")

In [None]:
wandb.run.finish()

# 3. LightGBM Model


In [None]:
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn import metrics
import lightgbm as lgbm
from sklearn import metrics
import gc
import pickle


In [None]:
# We'll do a train | validation | test situation
train, test = train_test_split(train_df, test_size=0.3, shuffle=False, random_state=13)

train = train.to_pandas()
test = test.to_pandas()

In [None]:
import numpy as np
# -----------
n_splits = 4
# -----------

skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=13)

oof = np.zeros(len(train))
predictions = np.zeros(len(test))

# Covertion to CPU data
skf_split = skf.split(X=train[features_to_keep], y=cupy.asnumpy(train[target].values))

In [None]:
param = {
        'num_leaves': 80,
        'max_bin': 250,
        'min_data_in_leaf': 11,
        'learning_rate': 0.01,
        'min_sum_hessian_in_leaf': 0.00245,
        'bagging_fraction': 1.0, 
        'bagging_freq': 5, 
        'feature_fraction': 0.05,
        'lambda_l1': 4.972,
        'lambda_l2': 2.276,
        'min_gain_to_split': 0.65,
        'max_depth': 14,
        'save_binary': True,
        'seed': 1337,
        'feature_fraction_seed': 1337,
        'bagging_seed': 1337,
        'drop_seed': 1337,
        'data_random_seed': 1337,
        'objective': 'binary',
        'boosting_type': 'gbdt',
        'verbose': 1,
        'metric': 'auc',
        'is_unbalance': True,
        'boost_from_average': False,
        'device': 'gpu',
        'gpu_platform_id': 0,
        'gpu_device_id': 0
    }

In [None]:
%%time
import wandb
from wandb.lightgbm import wandb_callback

# Training Loop
counter = 1
for train_index, valid_index in skf_split:
    wandb.init(project="riiid-challenge-wb", group='lightGBM', 
               name='gbm'+str(counter), config=param)
    print("==== Fold {} ====".format(counter))
    
    lgbm_train = lgbm.Dataset(data = train.iloc[train_index, :][features_to_keep].values,
                              label = train.iloc[train_index, :][target].values,
                              feature_name = features_to_keep,
                              free_raw_data = False)
    
    lgbm_valid = lgbm.Dataset(data = train.iloc[valid_index, :][features_to_keep].values,
                              label = train.iloc[valid_index, :][target].values,
                              feature_name = features_to_keep,
                              free_raw_data = False)
    
    lgbm_2 = lgbm.train(params = param, train_set = lgbm_train, valid_sets = [lgbm_valid],
                        early_stopping_rounds = 12, num_boost_round=100, verbose_eval=25, 
                        callbacks=[wandb_callback()])
    
    
    # X_valid to predict
    oof[valid_index] = lgbm_2.predict(train.iloc[valid_index][features_to_keep].values, 
                                      num_iteration = lgbm_2.best_iteration)
    predictions += lgbm_2.predict(test[features_to_keep], 
                                  num_iteration = lgbm_2.best_iteration) / n_splits
    
    counter += 1
    wandb.run.finish()

# W&B Reports
Reports let you organize visualizations, describe your findings, and share updates with collaborators.
## Use Cases
**Notes**: Add a graph with a quick note to yourself.
**Collaboration**: Share findings with your colleagues.
**Work log**: Track what you've tried, and plan next steps
Checkout this W&B report by OpenAI --> [How the OpenAI Robotics Team Uses W&B Reports
](https://wandb.ai/openai/published-work/Learning-Dexterity-End-to-End--VmlldzoxMTUyMDQ)