# Intel® Extension for Scikit-learn Linear Regression for YearPredictionMSD dataset

In [1]:
from timeit import default_timer as timer
from sklearn import metrics
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_csv('data/YearPredictionMSD.txt', header=None)
x = data.iloc[:, 1:].to_numpy(dtype=np.float32)
y = data.iloc[:, 0].to_numpy(dtype=np.float32)
x_train, x_test, y_train, y_test = train_test_split(x, y, shuffle=False,
                                                    train_size=463715,
                                                    test_size=51630)

In [4]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)
x_train.shape, x_test.shape, y_train.shape, y_test.shape

### Normalize the data

In [6]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler_x = MinMaxScaler()
scaler_y = StandardScaler()

In [8]:
scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_test = scaler_x.transform(x_test)

scaler_y.fit(y_train.reshape(-1, 1))
y_train = scaler_y.transform(y_train.reshape(-1, 1)).ravel()
y_test = scaler_y.transform(y_test.reshape(-1, 1)).ravel()

### Patch original scikit-learn with Intel® Extension for Scikit-learn
Intel Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock scikit-learn package. You can take advantage of the performance optimizations of Intel Extension for Scikit-learn by adding just two lines of code before the usual scikit-learn imports:

In [10]:
from sklearnex import patch_sklearn
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


Intel(R) Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the [list of supported algorithms and parameters](https://intel.github.io/scikit-learn-intelex/algorithms.html) for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, [submit an issue on GitHub](https://github.com/intel/scikit-learn-intelex/issues).

Training of the Linear Regression algorithm with Intel(R) Extension for Scikit-learn for YearPredictionMSD dataset

In [11]:
from sklearn.linear_model import LinearRegression
params = {
    "n_jobs": -1,
    "copy_X": False
}

In [13]:
start = timer()
model = LinearRegression(**params).fit(x_train, y_train)
f"Intel(R) extension for Scikit-learn time: {(timer() - start):.2f} s"

'Intel(R) extension for Scikit-learn time: 0.03 s'

Predict and get a result of the Linear Regression algorithm with Intel(R) Extension for Scikit-learn

In [14]:
y_predict = model.predict(x_test)
mse_metric = metrics.mean_squared_error(y_test, y_predict)
mse_metric_p

0.77168185

### Train the same algorithm with original scikit-learn
In order to cancel optimizations, we use *unpatch_sklearn* and reimport the class LinearRegression

In [15]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

Training of the Linear Regression algorithm with original scikit-learn library for YearPredictionMSD dataset

In [16]:
from sklearn.linear_model import LinearRegression

In [17]:
start = timer()
model = LinearRegression(**params).fit(x_train, y_train)
f"Original Scikit-learn time: {(timer() - start):.2f} s"

'Original Scikit-learn time: 0.70 s'

Predict and get a result of the Linear Regression algorithm with original Scikit-learn

In [18]:
y_predict = model.predict(x_test)
mse_metric = metrics.mean_squared_error(y_test, y_predict)
mse_metric_o

0.77168566

### Compare MSE metric of patched scikit-learn and original

In [None]:
print(f"mse metric of unpatched scikit-learn: {mse_metric_o}")
print(f"mse metric of patched scikit-learn: {mse_metric_p}")
print(f"attitude: {mse_metric_p/mse_metric_o}")

### With scikit-learn-intelex patching you can:

- Use your scikit-learn code for training and prediction with minimal changes (a couple of lines of code);
- Fast execution training and prediction of scikit-learn models;
- Get the same quality;
- Get speedup more than **23** times.