In [None]:
# Upgrade Oracle ADS to pick up latest features and maintain compatibility with Oracle Cloud Infrastructure.

!pip install -U oracle-ads

Oracle Data Science service sample notebook.

Copyright (c) 2021, 2022 Oracle, Inc. All rights reserved. Licensed under the [Universal Permissive License v 1.0](https://oss.oracle.com/licenses/upl).

---

# <font color="red">Intel Extension for Scikit-Learn</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color="teal">Oracle Cloud Infrastructure Data Science Service.</font></p>

---

# Overview:

This notebook demonstrates an easy way to enhance performance of scikit-learn models using Intel provided Python accelerators. Acceleration is achieved by using the Intel(R) oneAPI Data Analytics Library (oneDAL) that allows fast use of the framework suited for Data Scientists or Machine Learning users. The Intel Extension for Scikit-learn was created to give data scientists the easiest way to get better performance while using the familiar `scikit-learn` package.

Compatible conda pack: [Intel Extension for Scikit-learn 2021.3.0](https://docs.oracle.com/iaas/data-science/using/conda-sklearn-fam.htm) for CPU on Python 3.7 (version 1.0)

## Contents:

- <a href='#intro'>Check for an Intel-based Shape</a>
- <a href='#prepare'>Prepare the Data</a>
- <a href='#default'>Train a K-Means Model Using `sklearn`</a>
- <a href='#scikit-learn-intelex'>Train K-Means Model Using the `scikit-learn-intelex` Accelerator</a>
- <a href='#unpatch'>Unpatch `scikit-learn-intelex` from `sklearn`</a>
- <a href="#reference">References</a>

---


Datasets are provided as a convenience.  Datasets are considered third-party content and are not considered materials 
under your agreement with Oracle.

---


<a id='intro'></a>
### Check for an Intel-based Shape

Ensure that this notebook is running on an instance with Intel. The next cell validates whether this notebook is running on a valid instance.

In [None]:
import cpuinfo

shape_name = cpuinfo.get_cpu_info()["brand_raw"]

assert "Intel" in shape_name, "Switch to a VM shape with Intel"

Load the necessary modules:

In [None]:
from sklearnex import patch_sklearn, unpatch_sklearn
import importlib
import logging
import numpy as np
import sklearn
import time
import warnings

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

warnings.filterwarnings("ignore")
logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.ERROR)

<a id='prepare'></a>
### Prepare the Data

The data is prepared using sci-kit learn's `.make_blobs()` method. It generates isotropic Gaussian blobs for clustering.

In [None]:
rows, cols = 1000, 150
X, y = make_blobs(n_samples=rows, n_features=cols, centers=8, random_state=42)

<a id='default'></a>
### Train a K-Means Model Using `sklearn`

Use `sklearn` to train a K-Means model on a dataset:

In [None]:
estimator = KMeans(n_clusters=8)
print("Module being used: " + estimator.__module__)

t0 = time.perf_counter()
trained = estimator.fit(X)
fit_elapsed = str(time.perf_counter() - t0)

print("Training took seconds " + fit_elapsed + " seconds")

In [None]:
t0 = time.perf_counter()
preds = trained.predict([[1] * 150])
predict_elapsed = str(time.perf_counter() - t0)

print("Prediction took " + predict_elapsed + " seconds")

<a id='scikit-learn-intelex'></a>
### Train K-Means Model Using the `scikit-learn-intelex` Accelerator

To use oneDAL as the underlying solver, you use `scikit-learn-intelex` to dynamically patch the `sklearn` estimators. You get the same solution as before, but faster. The `sklearn` modules must be imported again after the patching is complete.

In [None]:
patch_sklearn()
estimator = KMeans(n_clusters=8)

# After patching, this should indicate scikit-learn-intelex is being used
print("Module being used: " + estimator.__module__)

In [None]:
t0 = time.perf_counter()
trained = estimator.fit(X)
fit_elapsed = str(time.perf_counter() - t0)

print("Training took seconds " + fit_elapsed + " seconds")

In [None]:
t0 = time.perf_counter()
preds = trained.predict([[1] * 150])
predict_elapsed = str(time.perf_counter() - t0)

print("Prediction took " + predict_elapsed + " seconds")

Comparing the performance when using `sklearn` versus `scikit-learn-intelex`, it is evident that `scikit-learn-intelex` significantly improves performance.

<a id='unpatch'></a>
### Unpatch `scikit-learn-intelex` from `sklearn`

To use `sklearn` again, you simply unpatch `scikit-learn-intelex`, reload `sklearn`, and import the relevant `sklearn` modules again: 

In [None]:
unpatch_sklearn()
sklearn = importlib.reload(sklearn)
# remember to re-import all the relevant modules

<a id="reference"></a>
# References

- [ADS Library Documentation](https://accelerated-data-science.readthedocs.io/en/latest/index.html)
- [Data Science YouTube Videos](https://www.youtube.com/playlist?list=PLKCk3OyNwIzv6CWMhvqSB_8MLJIZdO80L)
- [OCI Data Science Documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
- [Oracle Data & AI Blog](https://blogs.oracle.com/datascience/)