# Linear Regression

**Linear Regression** is a simple machine learning model where the response y is modelled by a linear combination of the predictors in X.

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames as the input. 

For information about cuDF, refer to the [cuDF documentation](https://rapidsai.github.io/projects/cudf/en/latest/).

For information about cuML's linear regression API: https://rapidsai.github.io/projects/cuml/en/0.11.0/api.html#cuml.LinearRegression

## Define Parameters

In [None]:
n_samples = 2**20
n_features = 399

random_state = 23

## Generate Data

In [None]:
import cudf
from cuml import make_regression
from cuml import train_test_split

In [None]:
%%time
X, y = make_regression(n_samples=n_samples, n_features=n_features, random_state=random_state)

X = cudf.DataFrame.from_gpu_matrix(X)
y = cudf.DataFrame.from_gpu_matrix(y)[0]

X_cudf, X_cudf_test, y_cudf, y_cudf_test = train_test_split(X, y, test_size = 0.2, random_state=random_state)

In [None]:
# Copy dataset from GPU memory to host memory.
# This is done to later compare CPU and GPU results.
X_train = X_cudf.to_pandas()
X_test = X_cudf_test.to_pandas()
y_train = y_cudf.to_pandas()
y_test = y_cudf_test.to_pandas()

## Scikit-learn Model

### Fit, predict and evaluate

In [None]:
from sklearn.linear_model import LinearRegression

ols_sk = LinearRegression(fit_intercept=True,
                          normalize=True,
                          n_jobs=-1)

%time _ = ols_sk.fit(X_train, y_train)

In [None]:
%%time
predict_sk = ols_sk.predict(X_test)

In [None]:
from sklearn.metrics import r2_score

r2_score_sk = r2_score(y_test, predict_sk)

## cuML Model

### Fit, predict and evaluate

In [None]:
from cuml.linear_model import LinearRegression

ols_cuml = LinearRegression(fit_intercept=True,
                            normalize=True,
                            algorithm='eig')

%time _ = ols_cuml.fit(X_cudf, y_cudf)

In [None]:
%%time
predict_cuml = ols_cuml.predict(X_cudf_test)

In [None]:
from cuml.metrics.regression import r2_score

r2_score_cuml = r2_score(y_cudf_test, predict_cuml)

## Compare Results

In [None]:
print("R^2 score (SKL):  %s" % r2_score_sk)
print("R^2 score (cuML): %s" % r2_score_cuml)