# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [1]:
!nvidia-smi

'nvidia-smi' is not recognized as an internal or external command,
operable program or batch file.


#Setup:
This set up script:

1. Checks to make sure that the GPU is RAPIDS compatible
1. Installs the **current stable version** of RAPIDSAI's core libraries using pip, which are:
  1. cuDF
  1. cuML
  1. cuGraph
  1. xgboost

**This will complete in about 3-4 minutes**

Please use the [RAPIDS Conda Colab Template notebook](https://colab.research.google.com/drive/1TAAi_szMfWqRfHVfjGSqnGVLr_ztzUM9) if you need to install any of RAPIDS Extended libraries, such as:
- cuSpatial
- cuSignal
- cuxFilter
- cuCIM

OR
- nightly versions of any library


In [2]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py


'git' is not recognized as an internal or external command,
operable program or batch file.
python: can't open file 'C:\Users\phamp\OneDrive\Phu\Python\6. Machine Learning with Python\BaiTap\Chapter 8\rapidsai-csp-utils\colab\pip-install.py': [Errno 2] No such file or directory


# RAPIDS is now installed on Colab.  
You can copy your code into the cells below or use the below to validate your RAPIDS installation and version.  
# Enjoy!

In [None]:
import cudf
cudf.__version__

'23.06.00'

In [None]:
import cuml
cuml.__version__

'23.06.00'

In [None]:
import cugraph
cugraph.__version__

'23.06.01'

# Next Steps #

For an overview of how you can access and work with your own datasets in Colab, check out [this guide](https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92).

For more RAPIDS examples, check out our RAPIDS notebooks repos:
1. https://github.com/rapidsai/notebooks
2. https://github.com/rapidsai/notebooks-contrib

In [None]:
# KNN Using sklearn
# Import necessary modules
import time
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
#cudf is NVIDIA's GPU accelerated Pandas-like library
start = time.time()
# Loading the training data
Data_train = pd.read_csv('train_sample.csv')
# Create feature and target arrays for training data
X_train = Data_train.iloc[:,1:]
Y_train = Data_train.iloc[:,1]
# Loading the testing data
Data_test = pd.read_csv('test_sample.csv')
# Create feature and target arrays for testing data
X_test = Data_test.iloc[:,1:]
Y_test = Data_test.iloc[:,1]
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, Y_train)
# Predict on dataset which model has not seen before
Y = knn.predict(X_test)
end = time.time()
CPU_time = end-start
print("Time taken on CPU = ",CPU_time)
# KNN Using cuml
# Import necessary modules
import time
from cuml.neighbors import KNeighborsClassifier
import cudf
#cudf is NVIDIA's GPU accelerated Pandas-like library
start = time.time()
# Loading the training data
Data_train = cudf.read_csv('train_sample.csv')
# Create feature and target arrays for training data
X_train = Data_train.iloc[:,1:]
Y_train = Data_train.iloc[:,1]
# Loading the testing data
Data_test = cudf.read_csv('test_sample.csv')
# Create feature and target arrays for testing data
X_test = Data_test.iloc[:,1:]
Y_test = Data_test.iloc[:,1]
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, Y_train)
# Predict on dataset which model has not seen before
Y = knn.predict(X_test)
end = time.time()
GPU_time = end-start
print("Time taken on GPU = ",GPU_time)
# Comparing sklearn and cuml processing times
print("CPU Time to GPU time ratio: ",CPU_time/GPU_time)

Time taken on CPU =  1.1782793998718262
Time taken on GPU =  0.8428521156311035
CPU Time to GPU time ratio:  1.397966948198931


In [None]:
import numpy as np
import pandas as pd

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression, LinearRegression, Ridge
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.datasets import make_moons

from cuml.linear_model import (LogisticRegression as LogisticRegression_GPU,
                               LinearRegression as LinearRegression_gpu,
                              Ridge as Ridge_gpu)
from cuml.svm import SVC as SVC_gpu
from cuml.ensemble import RandomForestClassifier as RandomForestClassifier_gpu
from cuml.neighbors import (KNeighborsClassifier as KNeighborsClassifier_gpu,
                            KNeighborsRegressor as KNeighborsRegressor_gpu)

from time import time
from timeit import Timer, timeit

In [None]:
X, y  = datasets.make_classification(n_samples=40000)
X = X.astype(np.float32)
y = y.astype(np.float32)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
start = time()
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
# Predict on dataset which model has not seen before
Y = knn.predict(X_test)
end = time()
CPU_time = end-start
print("Time taken on CPU = ",CPU_time)

## on gpu
start = time()

knn = KNeighborsClassifier_gpu(n_neighbors=7)
knn.fit(X_train, y_train)
# Predict on dataset which model has not seen before
Y = knn.predict(X_test)
end = time()
GPU_time = end-start
print("Time taken on GPU = ",GPU_time)

# Comparing sklearn and cuml processing times
print("CPU Time to GPU time ratio: ",CPU_time/GPU_time)

Time taken on CPU =  1.6784124374389648
Time taken on GPU =  0.2896006107330322
CPU Time to GPU time ratio:  5.7956108351795095
