Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to use scikit-learn-intelex on the API inference backend #251

Open
adrinjalali opened this issue Dec 13, 2022 · 11 comments
Open

Comments

@adrinjalali
Copy link
Member

This requirement comes from the contract between Hugging Face and Intel.

We should have a way to allow users to utilize scikit-learn-intelex at inference time. This requires a few steps:

Project homepage: https://intel.github.io/scikit-learn-intelex/

@BenjaminBossan
Copy link
Collaborator

BenjaminBossan commented Jan 16, 2023

  • Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.

I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:

I created two scripts, make-sklearn.py and make-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scripts load-sklearn.py and load-intelex.py load a model (either fitted with or w/o intelex) and call predict_proba on it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the 200000 in the call, which is just sample size):

$ python make-sklearn.py 200000
Fit time:	14.487415

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	14.715771


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.506398

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.360727


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.501361

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.408000

Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).

Click to show scripts
# make-sklearn.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

def main(n_samples):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    model = Pipeline([
        ('features', FeatureUnion([
            ('scale', StandardScaler()),
            ('poly', PolynomialFeatures()),
        ])),
        ('clf', LogisticRegression()),
    ])

    out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
    print(f"Fit time:\t{out:.6f}")

    with open('sklearn.pickle', 'wb') as f:
        pickle.dump(model, f)

if __name__ == '__main__':
    main(int(sys.argv[1]))


# make-intelex.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearnex import patch_sklearn

patch_sklearn()

def main(n_samples):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    model = Pipeline([
        ('features', FeatureUnion([
            ('scale', StandardScaler()),
            ('poly', PolynomialFeatures()),
        ])),
        ('clf', LogisticRegression()),
    ])

    out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
    print(f"Fit time:\t{out:.6f}")

    with open('sklearn.pickle', 'wb') as f:
        pickle.dump(model, f)

if __name__ == '__main__':
    main(int(sys.argv[1]))


# load-sklearn.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification

def main(n_samples, fname):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    with open(fname, 'rb') as f:
        model = pickle.load(f)

    out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
    print(f"Predict time:\t{out:.6f}")

if __name__ == '__main__':
    main(int(sys.argv[1]), sys.argv[2])


# load-intelex.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearnex import patch_sklearn

patch_sklearn()

def main(n_samples, fname):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    with open(fname, 'rb') as f:
        model = pickle.load(f)

    out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
    print(f"Predict time:\t{out:.6f}")

if __name__ == '__main__':
    main(int(sys.argv[1]), sys.argv[2])

BenjaminBossan added a commit to BenjaminBossan/skops that referenced this issue Jan 17, 2023
Partially solves skops-dev#251

This is one part of the work required to solve the mentioned issue. The
other part will have to be added on the API inference side, once this PR
is finalized.

Description

Added the option use_intelex to hub_utils.init. It adds a new entry to
config.json which can later be used by the inference API to decide
whether to run it with intelex or not.

On top of that, if use_intelex=True, scikit-learn-intelex will be added
as a dependency to the requirements if not already there. Moreover, if a
metadata for a model card is loaded, a scikit-learn-intelex tag will be
added.
@adrinjalali
Copy link
Member Author

yeah the speedups are only in a few models and some specific cases, not always.

@BenjaminBossan
Copy link
Collaborator

yeah the speedups are only in a few models and some specific cases, not always.

According to this article, logistic regression should be faster though. I didn't study the benchmark in detail, so not sure what differs here, but in the end it doesn't really matter I guess.

adrinjalali pushed a commit that referenced this issue Jan 23, 2023
Partially solves #251

This is one part of the work required to solve the mentioned issue. The other part will have to be added on the API inference side, once this PR is finalized.

## Description

Added the option `use_intelex` to `hub_utils.init`. It adds a new entry to `config.json` which can later be used by the inference API to decide whether to run it with intelex or not.

On top of that, if `use_intelex=True`, scikit-learn-intelex will be added as a dependency to the requirements if not already there. Moreover, if metadata for a model card is loaded, a `"scikit-learn-intelex"` tag will be added.
@adrinjalali
Copy link
Member Author

During the call with Intel a next step we talked about was to have an example where having the sklearn-intelex would help on the inference side.

@napetrov here's an example of how we write examples for our docs: https://github.com/skops-dev/skops/blob/main/examples/plot_model_card.py, and it gets rendered in this page. It'd be nice for your folks to open a PR here with an example, and we're happy to review it.

@adrinjalali
Copy link
Member Author

@napetrov since we're not sure how the patching is working on the intelex side, here's a question:

what happens if a user trains a model with sklearn, saves the model with pickle for instance, and in a new process (aka the hub's backend) runs the patch from intelex, then loads the model. Would they ever end up using intelex?

@napetrov
Copy link

  • Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.

I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:

I created two scripts, make-sklearn.py and make-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scripts load-sklearn.py and load-intelex.py load a model (either fitted with or w/o intelex) and call predict_proba on it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the 200000 in the call, which is just sample size):

$ python make-sklearn.py 200000
Fit time:	14.487415

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	14.715771


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.506398

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.360727


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.501361

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.408000

Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).

Click to show scripts

patch_sklearn() function should be called prior to sklearn exports to make it work. So no acceleration observed because stock scikit-learn is used.

@BenjaminBossan
Copy link
Collaborator

patch_sklearn() function should be called prior to sklearn exports to make it work

Ah, you mean prior to sklearn imports, right? Thanks, good catch. I changed the scripts to call patch_sklearn() first, like so:

import pickle
import sys
import timeit

from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
...

However, the results were still pretty much the same as before:

$ python make-sklearn.py 200000
Fit time:	14.736895

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	15.247209


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.538356

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.392418


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.561795

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.453521

@napetrov So I assume that the patching first-part is only relevant during training, right? When loading the model, if it was trained with the patch, import order would not matter?

Also, is there some way we can verify that a loaded estimator correctly uses intelex under the hood?

@napetrov
Copy link

Sorry for delayed response. We have not been looking on models that much so was looking to see what actually happens.

First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO

@BenjaminBossan - yes, currently this would impact only training as inference would be defined by model class. And interesting that results do not change between stock and intelex, need to look on script.

In terms what would happen for stock models - the answer is nothing currently, as models would have different types

>>> type(stock)
<class 'sklearn.decomposition._pca.PCA'>
>>> type(intel)
<class 'daal4py.sklearn.decomposition._pca.PCA'>

So regardless of enabling intelex on top of stock model nothing would happen

>>> res = stock.transform(X)
>>> res = intel.transform(X)
SKLEARNEX INFO: sklearn.decomposition.PCA.transform: running accelerated version on CPU

But for simple models such as PCA, Linear models and several others there is no much difference between models - object class and scikit version.
So it should be possible for us to pickup stock version as well - not sure however yet how this should look like from user perspective.

So for now usage would be limited to inferencing intelex models only, but we can extend this.

Models

   >>> s
b'\x80\x04\x95\x94\x03\x00\x00\x00\x00\x00\x00\x8c\x1asklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C a,\xf9\xc5\x92_\x17@D\x19\xbd-ku\x08@\xb0\xf1\xd2Mb\x10\x0e@\x9a\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xfe\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`}\x96;:\xf5 \xd7?\x88\xda\xaeyD\xa3\xb5\xbf*\xf8\x7fy\xd8i\xeb?\x17\xaa\x11\xd05\xee\xd6?F!st\xc6\x02\xe5?\xd3wf\x83{]\xe7? D]N\x131\xc6\xbf\x90\xa4\x03`\xb9R\xb3\xbf*\xf8\x14\x11\xfd\x9f\xe2\xbf\xd2\x87\xa6\xe4\x15"\xe3?t)m\x1c5\x84\xb3?X\xf6\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\r\x03\x9c1\xb8\xe9\x10@\xc0P\x06\xc7\xd5\x0f\xcf?\xc6\xbc\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18Tv-\x01z\x96\xed?)\xca\x00\xb3\x87+\xab?\xde\x894\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xc8\x12\xee\x01\x97\x199@\x11-\xe4\x81v\r\x18@-\x8c\x82\xcb7O\x0b@\x94t\x94b\x8c\x10_sklearn_version\x94\x8c\x051.1.0\x94ub.'
>>> i
b'\x80\x04\x95\x81\x03\x00\x00\x00\x00\x00\x00\x8c"daal4py.sklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C b,\xf9\xc5\x92_\x17@E\x19\xbd-ku\x08@\xad\xf1\xd2Mb\x10\x0e@\x98\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xc7\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`^\x96;:\xf5 \xd7?\x94\xdb\xaeyD\xa3\xb5\xbf+\xf8\x7fy\xd8i\xeb?%\xaa\x11\xd05\xee\xd6?\x96!st\xc6\x02\xe5?\x89wf\x83{]\xe7?vD]N\x131\xc6\xbfG\xa3\x03`\xb9R\xb3\xbf\xdb\xf8\x14\x11\xfd\x9f\xe2\xbf\xed\x88\xa6\xe4\x15"\xe3?\x074m\x1c5\x84\xb3?7\xf4\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xf8\x02\x9c1\xb8\xe9\x10@\xa8N\x06\xc7\xd5\x0f\xcf?k\xbf\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18[v-\x01z\x96\xed?|\xc8\x00\xb3\x87+\xab?H\x8c4\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xb9\x12\xee\x01\x97\x199@A,\xe4\x81v\r\x18@\xfb\x8d\x82\xcb7O\x0b@\x94t\x94bub.'


@BenjaminBossan
Copy link
Collaborator

@napetrov thank you for clarifying

First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO

@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.

So for now usage would be limited to inferencing intelex models only, but we can extend this.

Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.

@napetrov
Copy link

Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.

Yes, but i think we can get things better especially if models are mostly identical. Would be looking on this.

@adrinjalali
Copy link
Member Author

@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.

For debugging purposes that makes sense @BenjaminBossan , but I don't think users care enough or that we have the tools to show users the information. We also shouldn't be warning users for this since the outputs they're getting is correct either way. So I'd say we're good as things are on the backend side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants