This kernel is based on:
- https://www.kaggle.com/vinhnguyen/gpu-acceleration-for-lightgbm

##  In Kaggle notebook setting, remember to set:

```GPU on```

```Internet connected```

## Re-compile LGBM with GPU support

We first remove the existing CPU-only lightGBM library and clone the latest github repo.

In [None]:
!rm -r /opt/conda/lib/python3.6/site-packages/lightgbm
!git clone --recursive https://github.com/Microsoft/LightGBM

In [None]:
!apt-get install -y -qq libboost-all-dev

The next step is to build and re-install lightGBM with GPU support.

In [None]:
%%bash
cd LightGBM
rm -r build
mkdir build
cd build
cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
make -j$(nproc)

In [None]:
!cd LightGBM/python-package/;python3 setup.py install --precompile

Last, carry out some post processing tricks for OpenCL to work properly, and clean up.

In [None]:
!mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
!rm -r LightGBM

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import lightgbm as lgb
from sklearn import metrics
from sklearn.metrics import mean_squared_error
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import gc
gc.enable()
import os
import warnings
warnings.filterwarnings('ignore')
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In [None]:
train_df = pd.read_csv('../input/train.csv')
test_df = pd.read_csv('../input/test.csv')

In [None]:
test_df.shape, train_df.shape

In [None]:
train_df.head()

In [None]:
test_df.head()

In [None]:
train_df.isnull().values.any()

In [None]:
test_df.isnull().values.any()

In [None]:
features = [c for c in train_df.columns if c not in ['ID_code', 'target']]
target = train_df['target']

# Train model on GPU

First, check the GPU availability.

In [None]:
!nvidia-smi

In order to leverage the GPU, we need to set the following parameters:

```
    'device': 'gpu',
    'gpu_platform_id': 0,
    'gpu_device_id': 0
    ```

### New parameters GPU

In [None]:
param = {
        'bagging_freq': 5,
        'bagging_fraction': 0.38,
        'boost_from_average':'false',
        'boost': 'gbdt',
        'feature_fraction': 0.045,
        'learning_rate': 0.01,
        'max_depth': -1,  
        'metric':'auc',
        'min_data_in_leaf': 80,
        'min_sum_hessian_in_leaf': 10.0,
        'num_leaves': 13,
        'num_threads': 8,
        'tree_learner': 'serial',
        'objective': 'binary', 
        'verbosity': 1,
        'device': 'gpu',
        'gpu_platform_id': 0,
        'gpu_device_id': 0
    }

## Run Model

In [None]:
num_round = 100000
kfold = 12

folds = StratifiedKFold(n_splits=kfold, shuffle=False, random_state=99999)
oof = np.zeros(len(train_df))
predictions = np.zeros(len(test_df))

In [None]:
%%time
for fold_, (trn_idx, val_idx) in enumerate(folds.split(train_df.values, target.values)):
    print("Fold {}".format(fold_))
    trn_data = lgb.Dataset(train_df.iloc[trn_idx][features], label=target.iloc[trn_idx])
    val_data = lgb.Dataset(train_df.iloc[val_idx][features], label=target.iloc[val_idx])
    clf = lgb.train(param, trn_data, num_round, valid_sets = [trn_data, val_data], verbose_eval=5000, early_stopping_rounds = 1000)
    oof[val_idx] = clf.predict(train_df.iloc[val_idx][features], num_iteration=clf.best_iteration)
    predictions += clf.predict(test_df[features], num_iteration=clf.best_iteration) / folds.n_splits
    del trn_data,val_data
    gc.collect()

In [None]:
print("\n>> CV score: {:<8.5f}".format(roc_auc_score(target, oof)))

# Submission

In [None]:
submission = pd.DataFrame({"ID_code": test_df.ID_code.values})
submission["target"] = predictions
submission.to_csv("submission.csv", index=False)

In [None]:
submission.head()