Allow users to use scikit-learn-intelex on the API inference backend #251

adrinjalali · 2022-12-13T17:23:14Z

This requirement comes from the contract between Hugging Face and Intel.

We should have a way to allow users to utilize scikit-learn-intelex at inference time. This requires a few steps:

Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.
Expose a method in skops.hub_utils to add a configuration flag to use scikit-learn-intelex
- Should probably also add a tag, so that we can have better visibility on models using this option.
On the https://github.com/huggingface/api-inference-community/tree/main/docker_images/sklearn side, if the flag exists, install the scikit-learn-intelex package in the environment and run the script with this command: python -m sklearnex my_application.py
- We should check if packages on conda-forge are up-to-date: https://anaconda.org/conda-forge/scikit-learn-intelex

Project homepage: https://intel.github.io/scikit-learn-intelex/

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2023-01-16T16:25:17Z

Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.

I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:

I created two scripts, make-sklearn.py and make-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scripts load-sklearn.py and load-intelex.py load a model (either fitted with or w/o intelex) and call predict_proba on it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the 200000 in the call, which is just sample size):

$ python make-sklearn.py 200000
Fit time:	14.487415

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	14.715771


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.506398

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.360727


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.501361

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.408000

Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).

Click to show scripts


# make-sklearn.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

def main(n_samples):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    model = Pipeline([
        ('features', FeatureUnion([
            ('scale', StandardScaler()),
            ('poly', PolynomialFeatures()),
        ])),
        ('clf', LogisticRegression()),
    ])

    out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
    print(f"Fit time:\t{out:.6f}")

    with open('sklearn.pickle', 'wb') as f:
        pickle.dump(model, f)

if __name__ == '__main__':
    main(int(sys.argv[1]))


# make-intelex.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearnex import patch_sklearn

patch_sklearn()

def main(n_samples):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    model = Pipeline([
        ('features', FeatureUnion([
            ('scale', StandardScaler()),
            ('poly', PolynomialFeatures()),
        ])),
        ('clf', LogisticRegression()),
    ])

    out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
    print(f"Fit time:\t{out:.6f}")

    with open('sklearn.pickle', 'wb') as f:
        pickle.dump(model, f)

if __name__ == '__main__':
    main(int(sys.argv[1]))


# load-sklearn.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification

def main(n_samples, fname):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    with open(fname, 'rb') as f:
        model = pickle.load(f)

    out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
    print(f"Predict time:\t{out:.6f}")

if __name__ == '__main__':
    main(int(sys.argv[1]), sys.argv[2])


# load-intelex.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearnex import patch_sklearn

patch_sklearn()

def main(n_samples, fname):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    with open(fname, 'rb') as f:
        model = pickle.load(f)

    out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
    print(f"Predict time:\t{out:.6f}")

if __name__ == '__main__':
    main(int(sys.argv[1]), sys.argv[2])

Partially solves skops-dev#251 This is one part of the work required to solve the mentioned issue. The other part will have to be added on the API inference side, once this PR is finalized. Description Added the option use_intelex to hub_utils.init. It adds a new entry to config.json which can later be used by the inference API to decide whether to run it with intelex or not. On top of that, if use_intelex=True, scikit-learn-intelex will be added as a dependency to the requirements if not already there. Moreover, if a metadata for a model card is loaded, a scikit-learn-intelex tag will be added.

adrinjalali · 2023-01-19T16:37:27Z

yeah the speedups are only in a few models and some specific cases, not always.

BenjaminBossan · 2023-01-20T10:49:50Z

yeah the speedups are only in a few models and some specific cases, not always.

According to this article, logistic regression should be faster though. I didn't study the benchmark in detail, so not sure what differs here, but in the end it doesn't really matter I guess.

Partially solves #251 This is one part of the work required to solve the mentioned issue. The other part will have to be added on the API inference side, once this PR is finalized. ## Description Added the option `use_intelex` to `hub_utils.init`. It adds a new entry to `config.json` which can later be used by the inference API to decide whether to run it with intelex or not. On top of that, if `use_intelex=True`, scikit-learn-intelex will be added as a dependency to the requirements if not already there. Moreover, if metadata for a model card is loaded, a `"scikit-learn-intelex"` tag will be added.

adrinjalali · 2023-01-23T17:10:08Z

During the call with Intel a next step we talked about was to have an example where having the sklearn-intelex would help on the inference side.

@napetrov here's an example of how we write examples for our docs: https://github.com/skops-dev/skops/blob/main/examples/plot_model_card.py, and it gets rendered in this page. It'd be nice for your folks to open a PR here with an example, and we're happy to review it.

adrinjalali · 2023-01-24T13:28:22Z

@napetrov since we're not sure how the patching is working on the intelex side, here's a question:

what happens if a user trains a model with sklearn, saves the model with pickle for instance, and in a new process (aka the hub's backend) runs the patch from intelex, then loads the model. Would they ever end up using intelex?

napetrov · 2023-01-24T15:13:20Z

Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.

I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:

I created two scripts, make-sklearn.py and make-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scripts load-sklearn.py and load-intelex.py load a model (either fitted with or w/o intelex) and call predict_proba on it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the 200000 in the call, which is just sample size):
$ python make-sklearn.py 200000
Fit time:	14.487415

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	14.715771


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.506398

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.360727


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.501361

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.408000
Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).

Click to show scripts

patch_sklearn() function should be called prior to sklearn exports to make it work. So no acceleration observed because stock scikit-learn is used.

BenjaminBossan · 2023-01-24T15:47:09Z

patch_sklearn() function should be called prior to sklearn exports to make it work

Ah, you mean prior to sklearn imports, right? Thanks, good catch. I changed the scripts to call patch_sklearn() first, like so:

import pickle
import sys
import timeit

from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
...

However, the results were still pretty much the same as before:

$ python make-sklearn.py 200000
Fit time:	14.736895

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	15.247209


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.538356

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.392418


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.561795

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.453521

@napetrov So I assume that the patching first-part is only relevant during training, right? When loading the model, if it was trained with the patch, import order would not matter?

Also, is there some way we can verify that a loaded estimator correctly uses intelex under the hood?

napetrov · 2023-01-26T20:34:57Z

Sorry for delayed response. We have not been looking on models that much so was looking to see what actually happens.

First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO

@BenjaminBossan - yes, currently this would impact only training as inference would be defined by model class. And interesting that results do not change between stock and intelex, need to look on script.

In terms what would happen for stock models - the answer is nothing currently, as models would have different types

>>> type(stock)
<class 'sklearn.decomposition._pca.PCA'>
>>> type(intel)
<class 'daal4py.sklearn.decomposition._pca.PCA'>

So regardless of enabling intelex on top of stock model nothing would happen

>>> res = stock.transform(X)
>>> res = intel.transform(X)
SKLEARNEX INFO: sklearn.decomposition.PCA.transform: running accelerated version on CPU

But for simple models such as PCA, Linear models and several others there is no much difference between models - object class and scikit version.
So it should be possible for us to pickup stock version as well - not sure however yet how this should look like from user perspective.

So for now usage would be limited to inferencing intelex models only, but we can extend this.

Models

   >>> s
b'\x80\x04\x95\x94\x03\x00\x00\x00\x00\x00\x00\x8c\x1asklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C a,\xf9\xc5\x92_\x17@D\x19\xbd-ku\x08@\xb0\xf1\xd2Mb\x10\x0e@\x9a\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xfe\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`}\x96;:\xf5 \xd7?\x88\xda\xaeyD\xa3\xb5\xbf*\xf8\x7fy\xd8i\xeb?\x17\xaa\x11\xd05\xee\xd6?F!st\xc6\x02\xe5?\xd3wf\x83{]\xe7? D]N\x131\xc6\xbf\x90\xa4\x03`\xb9R\xb3\xbf*\xf8\x14\x11\xfd\x9f\xe2\xbf\xd2\x87\xa6\xe4\x15"\xe3?t)m\x1c5\x84\xb3?X\xf6\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\r\x03\x9c1\xb8\xe9\x10@\xc0P\x06\xc7\xd5\x0f\xcf?\xc6\xbc\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18Tv-\x01z\x96\xed?)\xca\x00\xb3\x87+\xab?\xde\x894\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xc8\x12\xee\x01\x97\x199@\x11-\xe4\x81v\r\x18@-\x8c\x82\xcb7O\x0b@\x94t\x94b\x8c\x10_sklearn_version\x94\x8c\x051.1.0\x94ub.'
>>> i
b'\x80\x04\x95\x81\x03\x00\x00\x00\x00\x00\x00\x8c"daal4py.sklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C b,\xf9\xc5\x92_\x17@E\x19\xbd-ku\x08@\xad\xf1\xd2Mb\x10\x0e@\x98\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xc7\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`^\x96;:\xf5 \xd7?\x94\xdb\xaeyD\xa3\xb5\xbf+\xf8\x7fy\xd8i\xeb?%\xaa\x11\xd05\xee\xd6?\x96!st\xc6\x02\xe5?\x89wf\x83{]\xe7?vD]N\x131\xc6\xbfG\xa3\x03`\xb9R\xb3\xbf\xdb\xf8\x14\x11\xfd\x9f\xe2\xbf\xed\x88\xa6\xe4\x15"\xe3?\x074m\x1c5\x84\xb3?7\xf4\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xf8\x02\x9c1\xb8\xe9\x10@\xa8N\x06\xc7\xd5\x0f\xcf?k\xbf\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18[v-\x01z\x96\xed?|\xc8\x00\xb3\x87+\xab?H\x8c4\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xb9\x12\xee\x01\x97\x199@A,\xe4\x81v\r\x18@\xfb\x8d\x82\xcb7O\x0b@\x94t\x94bub.'

BenjaminBossan · 2023-01-27T11:25:49Z

@napetrov thank you for clarifying

First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO

@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.

So for now usage would be limited to inferencing intelex models only, but we can extend this.

Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.

napetrov · 2023-01-27T11:43:32Z

Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.

Yes, but i think we can get things better especially if models are mostly identical. Would be looking on this.

adrinjalali · 2023-01-30T10:45:59Z

@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.

For debugging purposes that makes sense @BenjaminBossan , but I don't think users care enough or that we have the tools to show users the information. We also shouldn't be warning users for this since the outputs they're getting is correct either way. So I'd say we're good as things are on the backend side.

BenjaminBossan mentioned this issue Jan 17, 2023

ENH Work on supporting scikit-learn intelex #267

Merged

ahuber21 mentioned this issue Feb 20, 2023

DOC Add intelex inference example #303

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to use scikit-learn-intelex on the API inference backend #251

Allow users to use scikit-learn-intelex on the API inference backend #251

adrinjalali commented Dec 13, 2022

BenjaminBossan commented Jan 16, 2023 •

edited

adrinjalali commented Jan 19, 2023

BenjaminBossan commented Jan 20, 2023

adrinjalali commented Jan 23, 2023

adrinjalali commented Jan 24, 2023

napetrov commented Jan 24, 2023

BenjaminBossan commented Jan 24, 2023

napetrov commented Jan 26, 2023

BenjaminBossan commented Jan 27, 2023

napetrov commented Jan 27, 2023

adrinjalali commented Jan 30, 2023

Allow users to use scikit-learn-intelex on the API inference backend #251

Allow users to use scikit-learn-intelex on the API inference backend #251

Comments

adrinjalali commented Dec 13, 2022

BenjaminBossan commented Jan 16, 2023 • edited

adrinjalali commented Jan 19, 2023

BenjaminBossan commented Jan 20, 2023

adrinjalali commented Jan 23, 2023

adrinjalali commented Jan 24, 2023

napetrov commented Jan 24, 2023

BenjaminBossan commented Jan 24, 2023

napetrov commented Jan 26, 2023

BenjaminBossan commented Jan 27, 2023

napetrov commented Jan 27, 2023

adrinjalali commented Jan 30, 2023

BenjaminBossan commented Jan 16, 2023 •

edited