<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>Tabular Playground Series - Jan 2022
</div>

<center><a><img src="https://i.ibb.co/PWvpT9F/header.png" alt="header" border="0" width=800 height=400 class="center"></a>

<h1> Fast AutoML and Intel® Extension for Scikit-learn* - Kaggle Tabular Playground Series - January 2022 </h1>

AutoML significantly simplifies building of high quality models but sometimes has insufficient performance, especially for big problems. In this notebook, we will show how to accelerate AutoML framework PyCaret using Intel® Extension for Scikit-learn* which speedups Scikit-learn's algorithms in seamless way with one pip package installation and two lines of code.

This notebook solves binary classification task, but you can use it as template for many other competitions with few changes depending on task type (multiclass or regression) and your needs.

I will show you how to **speed up** your kernel without changing your code using **Intel® Extension for Scikit-learn**.

### Intel® Extension for Scikit-learn installation:

In [None]:
!pip install scikit-learn-intelex -q --progress-bar off

<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>Importing Libraries and Data</center></h1>
</div>

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import warnings
import gc
from IPython.display import HTML
warnings.filterwarnings("ignore")

from timeit import default_timer as timer
import matplotlib.pyplot as plt

random_state = 42

### Reading Data

In [None]:
PATH_TRAIN      = '../input/tabular-playground-series-jan-2022/train.csv'
PATH_TEST       = '../input/tabular-playground-series-jan-2022/test.csv'
PATH_SUBMISSION = '../input/tabular-playground-series-jan-2022/sample_submission.csv'

In [None]:
train_data = pd.read_csv(PATH_TRAIN)
test_data  = pd.read_csv(PATH_TEST)
submission = pd.read_csv(PATH_SUBMISSION)

In [None]:
train_data[:5]

In [None]:
train_data.info()

<center><a><img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2020/05/Screenshot-from-2020-05-13-18-30-22.png" alt="header" border="0" width=300 height=200 class="center"></a>

### PyCaret Installation

In [None]:
!pip install pycaret --user > /dev/null 2>&1

In [None]:
from pycaret.regression import *

In [None]:
setup(data = train_data, 
      target = 'num_sold',
      silent = True,
      ignore_features = ['row_id'],
      fold = 2)

<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>RandomForest</center></h1>
</div>

## RandomForest with default Scikit-learn

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 250, random_state = random_state)

In [None]:
tDefS = timer()
rf = create_model(rf, fold = 5)
tDefE = timer()

In [None]:
print("Total fitting Random Forest time with default Scikit-learn: {} seconds".format(tDefE - tDefS))

## RandomForest with optimized Scikit-learn

### Accelerate Scikit-learn with two lines of code:

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

Setup logging to track accelerated cases:

In [None]:
import logging

logger = logging.getLogger()
fh = logging.FileHandler('log.txt')
fh.setLevel(10)
logger.addHandler(fh)

Don't forget reimport modules to get effect of patch:

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 250, random_state = random_state)

In [None]:
tOptS = timer()
rf = create_model(rf, fold = 5)
tOptE = timer()

In [None]:
print("Total fitting Random Forest time with optimized Scikit-learn: {} seconds".format(tOptE - tOptS))

### List of algorithms which are accelerated by sklearnex

In [None]:
!cat log.txt | grep 'running accelerated version' | sort | uniq

In [None]:
rf_speedup = round((tDefE - tDefS) / (tOptE - tOptS), 2)
HTML(f'<h2>RandomForest speedup: {rf_speedup}x</h2>'
     f'(from {round((tDefE - tDefS), 2)} to {round((tOptE - tOptS), 2)} seconds)')

In [None]:
gc.collect()
!> log.txt

<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>SVR</center></h1>
</div>

## SVR with default Scikit-learn

In [None]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

In [None]:
from sklearn.svm import SVR
svr = SVR()

In [None]:
tDefS = timer()
svr = create_model(svr, fold = 5)
tDefE = timer()

In [None]:
tDefTS = timer()
svr = tune_model(svr, fold = 5)
tDefTE = timer()

In [None]:
print("Total fitting SVR time with default Scikit-learn: {} seconds".format(tDefE - tDefS))

In [None]:
print("Total tuning SVR time with default Scikit-learn: {} seconds".format(tDefTE - tDefTS))

## SVR with optimized Scikit-learn

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

In [None]:
from sklearn.svm import SVR
svr = SVR()

In [None]:
tOptS = timer()
svr = create_model(svr, fold = 5)
tOptE = timer()

In [None]:
tOptTS = timer()
svr = tune_model(svr, fold = 5)
tOptTE = timer()

In [None]:
svr.get_params()

In [None]:
print("Total fitting SVR time with optimized Scikit-learn: {} seconds".format(tOptE - tOptS))

In [None]:
print("Total tuning SVR time with optimized Scikit-learn: {} seconds".format(tOptTE - tOptTS))

### List of algorithms which are accelerated by sklearnex

In [None]:
!cat log.txt | grep 'running accelerated version' | sort | uniq

In [None]:
svr_fit_speedup = round((tDefE - tDefS) / (tOptE - tOptS), 2)
HTML(f'<h2>SVR fitting speedup: {svr_fit_speedup}x</h2>'
     f'(from {round((tDefE - tDefS), 2)} to {round((tOptE - tOptS), 2)} seconds)')

In [None]:
svr_tune_speedup = round((tDefTE - tDefTS) / (tOptTE - tOptTS), 2)
HTML(f'<h2>SVR tuning speedup: {svr_tune_speedup}x</h2>'
     f'(from {round((tDefTE - tDefTS), 2)} to {round((tOptTE - tOptTS), 2)} seconds)')

<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>Blending with boosting</center></h1>
</div>

In [None]:
cat = create_model('catboost')
light = create_model('lightgbm')
xgboost = create_model('xgboost')
blender_specific = blend_models(estimator_list = [rf, cat, light, xgboost])
final_model = finalize_model(blender_specific)

<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>Predicition</center></h1>
</div>

In [None]:
predict = predict_model(final_model, test_data)
predict[:5]

In [None]:
submission['num_sold'] = predict['Label']

In [None]:
submission.to_csv("submit.csv", index = False)

<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>Conclusion</center></h1>
</div>

**Intel® Extension for Scikit-learn** gives you opportunities to:
* Use your Scikit-learn code for training and inference without modification.
* Get speed up your kernel

*Please upvote if you liked it.*

<div style="background-color:rgba(0, 167, 255, 0.6);border-radius:5px;display:fill">
    <h1><center>Other notebooks with sklearnex usage</center></h1>
</div>

### [[predict sales] Stacking with scikit-learn-intelex](https://www.kaggle.com/alexeykolobyanin/predict-sales-stacking-with-scikit-learn-intelex)

### [[TPS-Aug] NuSVR with Intel Extension for Sklearn](https://www.kaggle.com/alexeykolobyanin/tps-aug-nusvr-with-intel-extension-for-sklearn)

### [Using scikit-learn-intelex for What's Cooking](https://www.kaggle.com/kppetrov/using-scikit-learn-intelex-for-what-s-cooking?scriptVersionId=58739642)

### [Fast KNN using  scikit-learn-intelex for MNIST](https://www.kaggle.com/kppetrov/fast-knn-using-scikit-learn-intelex-for-mnist?scriptVersionId=58738635)

### [Fast SVC using scikit-learn-intelex for MNIST](https://www.kaggle.com/kppetrov/fast-svc-using-scikit-learn-intelex-for-mnist?scriptVersionId=58739300)

### [Fast SVC using scikit-learn-intelex for NLP](https://www.kaggle.com/kppetrov/fast-svc-using-scikit-learn-intelex-for-nlp?scriptVersionId=58739339)

### [Fast AutoML with Intel Extension for Scikit-learn](https://www.kaggle.com/lordozvlad/fast-automl-with-intel-extension-for-scikit-learn)

### [[Titanic] AutoML with Intel Extension for Sklearn](https://www.kaggle.com/lordozvlad/titanic-automl-with-intel-extension-for-sklearn)