# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [None]:
!nvidia-smi

#Setup:
This notebook was built on RAPIDS 0.13 stable and is based on this [DataCamp Tutorial](https://www.datacamp.com/community/tutorials/xgboost-in-python).  tested and working on 0.19 stable.

#Setup:
Set up script installs
1. Updates gcc in Colab
1. Installs Conda
1. Install RAPIDS' current stable version of its libraries, as well as some external libraries including:
  1. cuDF
  1. cuML
  1. cuGraph
  1. cuSpatial
  1. cuSignal
  1. BlazingSQL
  1. xgboost
1. Copy RAPIDS .so files into current working directory, a neccessary workaround for RAPIDS+Colab integration.


In [None]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

In [None]:
# This will update the Colab environment and restart the kernel.  Don't run the next cell until you see the session crash.
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

In [None]:
# This will install CondaColab.  This will restart your kernel one last time.  Run this cell by itself and only run the next cell once you see the session crash.
import condacolab
condacolab.install()

In [None]:
# you can now run the rest of the cells as normal
import condacolab
condacolab.check()

In [None]:
# Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py <release> <packages>'
# The <release> options are 'stable' and 'nightly'.  Leaving it blank or adding any other words will default to stable.
!python rapidsai-csp-utils/colab/install_rapids.py stable
import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'

In [None]:
import cudf
import pandas as pd

import pynvml
import numpy as np
import xgboost as xgb


#load data from skl, then split it into testing and training data
## Load data
from sklearn.datasets import load_boston
boston = load_boston()
pdata = pd.DataFrame(boston.data)
data = cudf.from_pandas(pdata)

## spliting training and test set
from cuml import train_test_split
X, y = data.iloc[:,:-1],data.iloc[:,12]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)


In [None]:
# XGBoost Time!
## Create train and test dmatrix
dtrain = xgb.DMatrix(
        X_train,
        y_train    )

dtest = xgb.DMatrix(
        X_test,
        y_test    )

## Train the model
trained_model = xgb.train(
                        {
                          'learning_rate': 0.1,
                          'colsample_bytree' : 0.3,
                          'max_depth': 5,
                          'objective': 'reg:linear',
                          'n_estimators':10,
                          'alpha' : 10,
                          'silent': True,
                          'verbose_eval': True,
                          'tree_method':'gpu_hist',
                        },
                        dtrain,
                        num_boost_round=100, evals=[(dtrain, 'train')])

## Predict the model
prediction = trained_model.predict(dtest)

Parameters: { n_estimators, silent, verbose_eval } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[0]	train-rmse:12.56733
[1]	train-rmse:11.53973
[2]	train-rmse:10.59949
[3]	train-rmse:9.75142
[4]	train-rmse:9.02392
[5]	train-rmse:8.35438
[6]	train-rmse:7.74325
[7]	train-rmse:7.24983
[8]	train-rmse:6.77004
[9]	train-rmse:6.34923
[10]	train-rmse:5.89383
[11]	train-rmse:5.57719
[12]	train-rmse:5.28025
[13]	train-rmse:5.00459
[14]	train-rmse:4.79962
[15]	train-rmse:4.62024
[16]	train-rmse:4.43369
[17]	train-rmse:4.19113
[18]	train-rmse:4.05108
[19]	train-rmse:3.94673
[20]	train-rmse:3.83690
[21]	train-rmse:3.71147
[22]	train-rmse:3.55044
[23]	train-rmse:3.44966
[24]	train-rmse:3.32412
[25]	train-rmse:3.21364
[26]	train-rmse:3.15714
[27]	train-rmse:3.08512
[28]	train-rmse:3.03503
[

In [None]:
# Form and test predictions from xgboost output
## MSE requires values be float32
y_test = y_test.astype(np.float32)

## Test prediction wih RMSE, compare it to sklearn and pandas.
from cuml.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, prediction))

## RMSE for the price prediction is per 1000$.  Let's see what we get...
print("RMSE: ", rmse)

RMSE:  4.4297647
