# Stochastic Gradient Descent (SGD) 

Stochastic gradient descent is an iterative algorithm that optimizes an objective function by using samples from the dataset. cuML's implementation is mini-batch SGD (MBSGD), which is not implemented by Scikit-learn.

The cuML implementation can take array-like objects, either in host as NumPy arrays or in device (as Numba or _cuda_array_interface_compliant), as well  as cuDF DataFrames. 

For information about cuDF,  refer the [cuDF documentation](https://rapidsai.github.io/projects/cudf/en/latest/)

For information about cuML's mini-batch SGD implementation: https://rapidsai.github.io/projects/cuml/en/latest/api.html#stochastic-gradient-descent

In [1]:
import os

import numpy as np

import pandas as pd
import cudf as gd

from sklearn.model_selection import train_test_split

from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error

from cuml.linear_model import MBSGDRegressor as cumlSGD
from sklearn.linear_model import SGDRegressor as skSGD

## Define Parameters

In [2]:
n_samples = 2**20
n_features = 399

learning_rate = 'adaptive'
penalty = 'elasticnet'
loss = 'squared_loss'
max_iter = 500

## Generate Data

### Host

In [3]:
%%time
X,y = make_regression(n_samples=n_samples, n_features=n_features, random_state=0)

X = pd.DataFrame(X)
y = pd.Series(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=0)

CPU times: user 31.1 s, sys: 14.3 s, total: 45.4 s
Wall time: 36.1 s


### GPU

In [4]:
%%time
X_cudf = gd.DataFrame.from_pandas(X_train)
X_cudf_test = gd.DataFrame.from_pandas(X_test)

y_cudf = gd.Series(y_train)

CPU times: user 8.76 s, sys: 3.12 s, total: 11.9 s
Wall time: 11.9 s


## Scikit-learn Model

### Fit 

In [5]:
%%time
sk_sgd = skSGD(learning_rate=learning_rate, 
               eta0=0.07,
               max_iter=max_iter,
               tol=0.001,
               fit_intercept=True,
               penalty=penalty,
               loss=loss)

sk_sgd.fit(X_train, y_train)

CPU times: user 2min 23s, sys: 2.86 s, total: 2min 26s
Wall time: 2min 26s


### Predict

In [6]:
%%time
y_sk = sk_sgd.predict(X_test)

CPU times: user 700 ms, sys: 516 ms, total: 1.22 s
Wall time: 196 ms


### Evaluate

In [10]:
error_sk = mean_squared_error(y_test,y_sk)

## cuML Model

### Fit

In [7]:
%%time
cu_sgd = cumlSGD(learning_rate=learning_rate, 
                 eta0=0.07, 
                 epochs=max_iter,
                 batch_size=512,
                 tol=0.001, 
                 penalty=penalty, 
                 loss=loss)

cu_sgd.fit(X_cudf, y_cudf)

CPU times: user 14.3 s, sys: 5.86 s, total: 20.2 s
Wall time: 15.3 s


### Predict

In [8]:
%%time
y_pred = cu_sgd.predict(X_cudf_test).to_array().ravel()

CPU times: user 264 ms, sys: 12 ms, total: 276 ms
Wall time: 268 ms


### Evaluate

In [None]:
error_cu = mean_squared_error(y_test,y_pred)

## Compare Results

In [12]:
print("SKL MSE(y): %s" % error_sk)
print("CUML MSE(y): %s" % error_cu)

SKL MSE(y): 0.0003004119904696674
CUML MSE(y): 0.00029979383392877496
