# Ridge Regression Demo
Ridge extends linear regression by providing L2 regularization of the coefficients. It can reduce the variance of the predictors, and improves the conditioning of the problem.

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or _cuda_array_interface_compliant), as well  as cuDF DataFrames. 

For information about cuDF, refer to the [cuDF documentation](https://rapidsai.github.io/projects/cudf/en/latest/)

For information about cuML's ridge regression implementation, refer to the [cuML documentation](https://rapidsai.github.io/projects/cuml/en/latest/index.html)

In [None]:
import os

import numpy as np

import pandas as pd
import cudf as gd

from cuml.linear_model import Ridge as cuRidge
from sklearn.linear_model import Ridge as skRidge

from sklearn.datasets import make_regression

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Define Parameters

In [1]:
n_samples = 2**20
n_features = 399

## Generate Data

In [None]:
%%time
X,y = make_regression(n_samples=n_samples, n_features=n_features,r andom_state=0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=0)

## Scikit-learn Model

### Fit

In [None]:
%%time
skridge = skRidge(fit_intercept=False,
                  normalize=True,
                  alpha=0.1)

skridge.fit(X_train, y_train)

### Predict

In [None]:
%%time
sk_predict = skridge.predict(X_test)
error_sk = mean_squared_error(y_test,sk_predict)

## cuML Model

In [None]:
%%time
X_cudf = gd.DataFrame.from_pandas(X_train)
X_cudf_test = gd.DataFrame.from_pandas(X_test)

y_cudf = y_train.values
y_cudf = y_cudf[:,0]

y_cudf = cudf.Series(y_cudf)

### Fit

In [None]:
%%time
# run the cuml ridge regression model to fit the training dataset.  Eig is the faster algorithm, but svd is more accurate 
curidge = cuRidge(fit_intercept=False,
                  normalize=True,
                  solver='svd',
                  alpha=0.1)

curidge.fit(X_cudf, y_cudf)

### Predict

In [None]:
%%time
cu_predict = curidge.predict(X_cudf_test).to_array()
error_cu = mean_squared_error(y_test,cu_predict)

## Evaluate Results

In [None]:
print("SKL MSE(y):")
print(error_sk)
print("CUML MSE(y):")
print(error_cu)