# GPU-accelerated LightGBM

This kernel explores a GPU-accelerated LGBM model to predict customer transaction.

## Notebook  Content
1. [Re-compile LGBM with GPU support](#1)
1. [Loading the data](#2)
1. [Training the model on CPU](#3)
1. [Training the model on GPU](#4)
1. [Submission](#5)

<a id="1"></a> 
## 1. Re-compile LGBM with GPU support
In Kaggle notebook setting, set the `Internet` option to `Internet connected`, and `GPU` to `GPU on`. 

We first remove the existing CPU-only lightGBM library and clone the latest github repo.

In [1]:
!rm -r /opt/conda/lib/python3.6/site-packages/lightgbm
!git clone --recursive https://github.com/Microsoft/LightGBM

Cloning into 'LightGBM'...
remote: Enumerating objects: 2, done.[K
remote: Counting objects: 100% (2/2), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 12618 (delta 0), reused 1 (delta 0), pack-reused 12616[K
Receiving objects: 100% (12618/12618), 9.26 MiB | 0 bytes/s, done.
Resolving deltas: 100% (8903/8903), done.
Submodule 'include/boost/compute' (https://github.com/boostorg/compute) registered for path 'compute'
Cloning into '/kaggle/working/LightGBM/compute'...
remote: Enumerating objects: 14, done.        
remote: Counting objects: 100% (14/14), done.        
remote: Compressing objects: 100% (12/12), done.        
remote: Total 21670 (delta 3), reused 5 (delta 1), pack-reused 21656        
Receiving objects: 100% (21670/21670), 8.51 MiB | 0 bytes/s, done.
Resolving deltas: 100% (17523/17523), done.
Submodule path 'compute': checked out '509ebe4a9282eec8a92c65ce3bbc1925f1fdbe07'


Next, the Boost development library must be installed.

In [2]:
!apt-get install -y -qq libboost-all-dev

debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 36956 files and directories currently installed.)
Preparing to unpack .../00-python2.7_2.7.13-2+deb9u3_amd64.deb ...
Unpacking python2.7 (2.7.13-2+deb9u3) over (2.7.13-2+deb9u2) ...
Preparing to unpack .../01-libpython2.7-stdlib_2.7.13-2+deb9u3_amd64.deb ...
Unpacking libpython2.7-stdlib:amd64 (2.7.13-2+deb9u3) over (2.7.13-2+deb9u2) ...
Preparing to unpack .../02-python2.7-minimal_2.7.13-2+deb9u3_amd64.deb ...
Unpacking python2.7-minimal (2.7.13-2+deb9u3) over (2.7.13-2+deb9u2) ...
Preparing to unpack .../03-libpython2.7-minimal_2.7.13-2+deb9u3_amd64.deb ...
Unpacking libpython2.7-minimal:amd64 (2.7.13-2+deb9u3) over (2.7.13-2+deb9u2) ...
Selecting previously unselected package libpython3.5-minimal:amd64.
Preparing to unpack .../04-libpython3.5-minimal_3.5.3-1+deb9u1_amd64.deb ...
Unpacking libpython3.5-minimal:amd64 (3.5.3-1+deb9u1) ...
Selecting previously unselected package 

The next step is to build and re-install lightGBM with GPU support.

In [3]:
%%bash
cd LightGBM
rm -r build
mkdir build
cd build
cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
make -j$(nproc)

-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_

rm: cannot remove 'build': No such file or directory


In [4]:
!cd LightGBM/python-package/;python3 setup.py install --precompile

running install
running build
running build_py
creating build
creating build/lib
creating build/lib/lightgbm
copying lightgbm/callback.py -> build/lib/lightgbm
copying lightgbm/basic.py -> build/lib/lightgbm
copying lightgbm/plotting.py -> build/lib/lightgbm
copying lightgbm/engine.py -> build/lib/lightgbm
copying lightgbm/__init__.py -> build/lib/lightgbm
copying lightgbm/sklearn.py -> build/lib/lightgbm
copying lightgbm/libpath.py -> build/lib/lightgbm
copying lightgbm/compat.py -> build/lib/lightgbm
running egg_info
creating lightgbm.egg-info
writing lightgbm.egg-info/PKG-INFO
writing dependency_links to lightgbm.egg-info/dependency_links.txt
writing requirements to lightgbm.egg-info/requires.txt
writing top-level names to lightgbm.egg-info/top_level.txt
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'build'
wr

Last, carry out some post processing tricks for OpenCL to work properly, and clean up.

In [5]:
!mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
!rm -r LightGBM

<a id="2"></a> 
## 2. Loading the data

In [6]:
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold
import lightgbm as lgb
from sklearn import metrics
import gc

pd.set_option('display.max_columns', 200)

In [7]:
train_df = pd.read_csv('../input/train.csv')
test_df = pd.read_csv('../input/test.csv')

#extracting a subset for quick testing
#train_df = train_df[1:1000]

<a id="3"></a>
## 3. Training the model on CPU

In [8]:
param = {
        'num_leaves': 10,
        'max_bin': 127,
        'min_data_in_leaf': 11,
        'learning_rate': 0.02,
        'min_sum_hessian_in_leaf': 0.00245,
        'bagging_fraction': 1.0, 
        'bagging_freq': 5, 
        'feature_fraction': 0.05,
        'lambda_l1': 4.972,
        'lambda_l2': 2.276,
        'min_gain_to_split': 0.65,
        'max_depth': 14,
        'save_binary': True,
        'seed': 1337,
        'feature_fraction_seed': 1337,
        'bagging_seed': 1337,
        'drop_seed': 1337,
        'data_random_seed': 1337,
        'objective': 'binary',
        'boosting_type': 'gbdt',
        'verbose': 1,
        'metric': 'auc',
        'is_unbalance': True,
        'boost_from_average': False,
    }

In [9]:
%%time
nfold = 2

target = 'target'
predictors = train_df.columns.values.tolist()[2:]

skf = StratifiedKFold(n_splits=nfold, shuffle=True, random_state=2019)

oof = np.zeros(len(train_df))
predictions = np.zeros(len(test_df))

i = 1
for train_index, valid_index in skf.split(train_df, train_df.target.values):
    print("\nfold {}".format(i))
    xg_train = lgb.Dataset(train_df.iloc[train_index][predictors].values,
                           label=train_df.iloc[train_index][target].values,
                           feature_name=predictors,
                           free_raw_data = False
                           )
    xg_valid = lgb.Dataset(train_df.iloc[valid_index][predictors].values,
                           label=train_df.iloc[valid_index][target].values,
                           feature_name=predictors,
                           free_raw_data = False
                           )   

    
    clf = lgb.train(param, xg_train, 5000, valid_sets = [xg_valid], verbose_eval=50, early_stopping_rounds = 50)
    oof[valid_index] = clf.predict(train_df.iloc[valid_index][predictors].values, num_iteration=clf.best_iteration) 
    
    predictions += clf.predict(test_df[predictors], num_iteration=clf.best_iteration) / nfold
    i = i + 1

print("\n\nCV AUC: {:<0.2f}".format(metrics.roc_auc_score(train_df.target.values, oof)))


fold 1
Training until validation scores don't improve for 50 rounds.
[50]	valid_0's auc: 0.833349
[100]	valid_0's auc: 0.845073
[150]	valid_0's auc: 0.856228
[200]	valid_0's auc: 0.862104
[250]	valid_0's auc: 0.866523
[300]	valid_0's auc: 0.869727
[350]	valid_0's auc: 0.871099
[400]	valid_0's auc: 0.872739
[450]	valid_0's auc: 0.873664
[500]	valid_0's auc: 0.874754
[550]	valid_0's auc: 0.875929
[600]	valid_0's auc: 0.877111
[650]	valid_0's auc: 0.878102
[700]	valid_0's auc: 0.879515
[750]	valid_0's auc: 0.880267
[800]	valid_0's auc: 0.881096
[850]	valid_0's auc: 0.881631
[900]	valid_0's auc: 0.882316
[950]	valid_0's auc: 0.883168
[1000]	valid_0's auc: 0.883537
[1050]	valid_0's auc: 0.884257
[1100]	valid_0's auc: 0.884868
[1150]	valid_0's auc: 0.885484
[1200]	valid_0's auc: 0.885916
[1250]	valid_0's auc: 0.886372
[1300]	valid_0's auc: 0.886782
[1350]	valid_0's auc: 0.887268
[1400]	valid_0's auc: 0.887776
[1450]	valid_0's auc: 0.888263
[1500]	valid_0's auc: 0.888574
[1550]	valid_0's auc

<a id="4"></a>
## 4. Train model on GPU

First, check the GPU availability.

In [10]:
!nvidia-smi

Fri Mar  8 00:27:19 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla P100-PCIE...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage    

In order to leverage the GPU, we need to set the following parameters: 

        'device': 'gpu',
        'gpu_platform_id': 0,
        'gpu_device_id': 0
        
        

In [11]:
param = {
        'num_leaves': 10,
        'max_bin': 127,
        'min_data_in_leaf': 11,
        'learning_rate': 0.02,
        'min_sum_hessian_in_leaf': 0.00245,
        'bagging_fraction': 1.0, 
        'bagging_freq': 5, 
        'feature_fraction': 0.05,
        'lambda_l1': 4.972,
        'lambda_l2': 2.276,
        'min_gain_to_split': 0.65,
        'max_depth': 14,
        'save_binary': True,
        'seed': 1337,
        'feature_fraction_seed': 1337,
        'bagging_seed': 1337,
        'drop_seed': 1337,
        'data_random_seed': 1337,
        'objective': 'binary',
        'boosting_type': 'gbdt',
        'verbose': 1,
        'metric': 'auc',
        'is_unbalance': True,
        'boost_from_average': False,
        'device': 'gpu',
        'gpu_platform_id': 0,
        'gpu_device_id': 0
    }

In [12]:
%%time
nfold = 2

target = 'target'
predictors = train_df.columns.values.tolist()[2:]

skf = StratifiedKFold(n_splits=nfold, shuffle=True, random_state=2019)

oof = np.zeros(len(train_df))
predictions = np.zeros(len(test_df))

i = 1
for train_index, valid_index in skf.split(train_df, train_df.target.values):
    print("\nfold {}".format(i))
    xg_train = lgb.Dataset(train_df.iloc[train_index][predictors].values,
                           label=train_df.iloc[train_index][target].values,
                           feature_name=predictors,
                           free_raw_data = False
                           )
    xg_valid = lgb.Dataset(train_df.iloc[valid_index][predictors].values,
                           label=train_df.iloc[valid_index][target].values,
                           feature_name=predictors,
                           free_raw_data = False
                           )   

    
    clf = lgb.train(param, xg_train, 5000, valid_sets = [xg_valid], verbose_eval=50, early_stopping_rounds = 50)
    oof[valid_index] = clf.predict(train_df.iloc[valid_index][predictors].values, num_iteration=clf.best_iteration) 
    
    predictions += clf.predict(test_df[predictors], num_iteration=clf.best_iteration) / nfold
    i = i + 1

print("\n\nCV AUC: {:<0.2f}".format(metrics.roc_auc_score(train_df.target.values, oof)))


fold 1
Training until validation scores don't improve for 50 rounds.
[50]	valid_0's auc: 0.833349
[100]	valid_0's auc: 0.845073
[150]	valid_0's auc: 0.856228
[200]	valid_0's auc: 0.862104
[250]	valid_0's auc: 0.866523
[300]	valid_0's auc: 0.869727
[350]	valid_0's auc: 0.871099
[400]	valid_0's auc: 0.872739
[450]	valid_0's auc: 0.873664
[500]	valid_0's auc: 0.874754
[550]	valid_0's auc: 0.875929
[600]	valid_0's auc: 0.877111
[650]	valid_0's auc: 0.878102
[700]	valid_0's auc: 0.879515
[750]	valid_0's auc: 0.880267
[800]	valid_0's auc: 0.881096
[850]	valid_0's auc: 0.881631
[900]	valid_0's auc: 0.882316
[950]	valid_0's auc: 0.883168
[1000]	valid_0's auc: 0.883537
[1050]	valid_0's auc: 0.884257
[1100]	valid_0's auc: 0.884868
[1150]	valid_0's auc: 0.885484
[1200]	valid_0's auc: 0.885916
[1250]	valid_0's auc: 0.886372
[1300]	valid_0's auc: 0.886782
[1350]	valid_0's auc: 0.887268
[1400]	valid_0's auc: 0.887776
[1450]	valid_0's auc: 0.888263
[1500]	valid_0's auc: 0.888574
[1550]	valid_0's auc

<a id="5"></a>
## 5. Submission

In [13]:
sub_df = pd.DataFrame({"ID_code": test_df.ID_code.values})
sub_df["target"] = predictions
sub_df[:10]

Unnamed: 0,ID_code,target
0,test_0,0.398256
1,test_1,0.639685
2,test_2,0.581272
3,test_3,0.610626
4,test_4,0.221492
5,test_5,0.015191
6,test_6,0.042432
7,test_7,0.549254
8,test_8,0.021228
9,test_9,0.048925


In [14]:
sub_df.to_csv("lightgbm_gpu.csv", index=False)