# GPU Accelerated Linear Regression in RAPIDS
#### By Uknown Author, Paul Hendricks
-------

While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. For the same reasons, scientific computing and deep learning has turned to NVIDIA GPU acceleration, data analytics and machine learning where GPU acceleration is ideal. 

NVIDIA created RAPIDS – an open-source data analytics and machine learning acceleration platform that leverages GPUs to accelerate computations. RAPIDS is based on Python, has pandas-like and Scikit-Learn-like interfaces, is built on Apache Arrow in-memory data format, and can scale from 1 to multi-GPU to multi-nodes. RAPIDS integrates easily into the world’s most popular data science Python-based workflows. RAPIDS accelerates data science end-to-end – from data prep, to machine learning, to deep learning. And through Arrow, Spark users can easily move data into the RAPIDS platform for acceleration.

This notebook compares a CPU implementation and a GPU implementation of Linear Regression.  It includes code example for doing Linear Regression using RAPIDS cuDF and cuML.

**Table of Contents**

* Introduction to Linear Regression
* Setup
* Generating Data
* Benchmarking: Comparing GPU and CPU
* Conclusion

## Linear Regression

To be edited.

## Setup

This notebook was tested using the following Docker containers:

* `rapidsai/rapidsai:0.6-cuda10.0-devel-ubuntu18.04-gcc7-py3.7` from [DockerHub](https://hub.docker.com/r/rapidsai/rapidsai)
* `rapidsai/rapidsai-nightly:0.6-cuda10.0-devel-ubuntu18.04-gcc7-py3.7` from [DockerHub](https://hub.docker.com/r/rapidsai/rapidsai-nightly)

This notebook was run on the NVIDIA Tesla V100 GPU. Please be aware that your system may be different and you may need to modify the code or install packages to run the below examples. 

If you think you have found a bug or an error, please file an issue here: https://github.com/rapidsai/notebooks/issues

Before we begin, let's check out our hardware setup by running the `nvidia-smi` command.

In [None]:
!nvidia-smi

Next, let's see what CUDA version we have:

In [None]:
!nvcc --version

## Let's Begin: Linear Regression
### Imports
Let's start with our Imports

In [None]:
import numpy as np
import pandas as pd

import cudf
import os

### Helper Functions

In [None]:
from timeit import default_timer

class Timer(object):
    def __init__(self):
        self._timer = default_timer
    
    def __enter__(self):
        self.start()
        return self

    def __exit__(self, *args):
        self.stop()

    def start(self):
        """Start the timer."""
        self.start = self._timer()

    def stop(self):
        """Stop the timer. Calculate the interval in seconds."""
        self.end = self._timer()
        self.interval = self.end - self.start

In [None]:
import gzip


def load_data(nrows, ncols, cached = '../data/mortgage/mortgage.npy.gz'):
    if os.path.exists(cached):
        print('use mortgage data')
        with gzip.open(cached) as f:
            X = np.load(f)
        # the 4th column is 'adj_remaining_months_to_maturity'
        # used as the label
        X = X[:,[i for i in range(X.shape[1]) if i!=4]]
        y = X[:,4:5]
        rindices = np.random.randint(0,X.shape[0]-1,nrows)
        X = X[rindices,:ncols]
        y = y[rindices]
    else:
        print('use random data')
        X = np.random.rand(nrows,ncols)
        y = np.random.rand(nrows,1)
        
    df_X = pd.DataFrame({'fea%d'%i:X[:,i] for i in range(X.shape[1])})
    df_y = pd.DataFrame({'fea%d'%i:y[:,i] for i in range(y.shape[1])})
    
    return df_X, df_y

In [None]:
from sklearn.metrics import mean_squared_error


def array_equal(a,b,threshold=2e-3,with_sign=True):
    a = to_nparray(a).ravel()
    b = to_nparray(b).ravel()
    if with_sign == False:
        a,b = np.abs(a),np.abs(b)
    error = mean_squared_error(a,b)
    res = error<threshold
    return res


def to_nparray(x):
    if isinstance(x, np.ndarray) or isinstance(x, pd.DataFrame):
        return np.array(x)
    elif isinstance(x, np.float64):
        return np.array([x])
    elif isinstance(x, cudf.DataFrame) or isinstance(x, cudf.Series):
        return x.to_pandas().values
    return x    

Now that we have our Helper functions, lets start to compare the speed and results for SciKit Learn's CPU impletmenation versus RAPIDS cuML GPU impletementation. 

In [None]:
%%time


nrows = 2**20
ncols = 399

X, y = load_data(nrows,ncols)
print('data', X.shape)
print('label', y.shape)

Even though the OLS interface of cuML is very similar to Scikit-Learn's implemetation, cuML doesn't use some of the parameters such as "copy" and "n_jobs". Also, cuML includes two different implementation of OLS using SVD and Eigen decomposition. Eigen decomposition based implementation is very fast but causes very small errors in the coefficients which is negligible for most of the applications. SVD is stable but slower than eigen decomposition based implementation. 

### Get MSE for SciKit Learn

In [None]:
from sklearn import linear_model as sklGLM
from cuml import LinearRegression as cumlOLS
from cuml import Ridge as cumlRidge

In [None]:
fit_intercept = True
normalize = False
algorithm = "eig" # eig: eigen decomposition based method, svd: singular value decomposition based method.

In [None]:
%%time


reg_sk = sklGLM.LinearRegression(fit_intercept=fit_intercept, normalize=normalize)
result_sk = reg_sk.fit(X, y)

In [None]:
%%time


y_sk = reg_sk.predict(X)
error_sk = mean_squared_error(y,y_sk)

### Get MSE for cuML

In [None]:
%%time


X_cudf = cudf.DataFrame.from_pandas(X)
y_cudf = y.values
y_cudf = y_cudf[:,0]
y_cudf = cudf.Series(y_cudf)

In [None]:
%%time


reg_cuml = cumlOLS(fit_intercept=fit_intercept, normalize=normalize, algorithm=algorithm)
result_cuml = reg_cuml.fit(X_cudf, y_cudf)

In [None]:
%%time


y_cuml = reg_cuml.predict(X_cudf)
y_cuml = to_nparray(y_cuml).ravel()
error_cuml = mean_squared_error(y,y_cuml)

## Final Comparison Between SKL and cuML
Your final output should have both MSE results close to 0 (about 1.0e-7 to 1.0e-14).  However, despite having similar answers, you should see a **massive reduction to the sys time** when using **RAPIDS cuML** versus **SciKit Learn**.  Go RAPIDS!

In [None]:
print("SKL MSE(y):")
print(error_sk)
print("CUML MSE(y):")
print(error_cuml)