# Optimizing Kaggle kernels using Intel(R) Extension for Scikit-learn*

For classical machine learning algorithms, we often use the most popular Python library, [scikit-learn](https://scikit-learn.org/stable/). We use it to fit models and search for optimal parameters, but scikit-learn sometimes works for hours, if not days. Speeding up this process is something anyone who uses scikit-learn would be interested in.

I want to show you how to get results faster without changing the code. To do this, we will use another Python library, **[scikit-learn-intelex](https://github.com/intel/scikit-learn-intelex)**. It accelerates scikit-learn and does not require you changing the code written for scikit-learn.

I will use a Kaggle notebook in which the training of the KNN model executed in **over 35 minutes**.

In [None]:
import pandas as pd, numpy as np
from sklearn.model_selection import train_test_split, KFold
import matplotlib.pyplot as plt

In [None]:
train = pd.read_csv('../input/digit-recognizer/train.csv')
test = pd.read_csv('../input/digit-recognizer/test.csv')

x_train = train[train.columns[1:]]
x_test = test
y_train = train[train.columns[0]]

train.head()

Let's take the training and predict into a separate function:

In [None]:
def train_predict():
    from sklearn.neighbors import KNeighborsClassifier
    knn = KNeighborsClassifier(n_neighbors=3)
    knn.fit(x_train, y_train)
    return knn.predict(test)

In [None]:
%%time
y_pred_original = train_predict()

The training of the KNN model took **almost 35 minutes**. Let's try to use scikit-learn-intelex. First, download it:

In [None]:
!pip install scikit-learn-intelex --progress-bar off >> /tmp/pip_sklearnex.log

To get optimizations, patch scikit-learn using scikit-learn-intelex:

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

Let’s run the same code to train and predict the KNN model:

In [None]:
%%time
y_pred_oprimized = train_predict()

This time, the training and predict took a **little over minute**, which saved us **almost 35 minutes**! Let’s make sure that the quality has not changed:

In [None]:
np.mean(y_pred_oprimized == y_pred_original)

Save the result. Now the search for the perfect model has accelerated significantly.

In [None]:
sub = pd.read_csv('../input/digit-recognizer/sample_submission.csv')
sub.Label = y_pred_oprimized
sub.to_csv('submission_sklearnex.csv',index=False)
sub.head()

With scikit-learn-intelex patching you can:

- Use your scikit-learn code for training and inference without modification.
- Train and predict scikit-learn models up to **35 times faster**.
- Get the same quality of predictions as other tested frameworks.

*Please, upvote if you like.*