Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of creating a static transformer for transfer learning using joblib or onnx deserialisation #13288

Open
jnothman opened this issue Feb 26, 2019 · 6 comments

Comments

@jnothman
Copy link
Member

Often we want to transfer knowledge learnt from large-scale unlabelled data (e.g. decomposition, dictionary learning, etc.) to a smaller-scale supervised learning problem. We want to include such a pre-trained transformation in a Pipeline within a grid search.

This is a key use case for a freezing API (#8370) but a simpler and possible-now solution involves serialising the pre-trained model, and implementing an estimator which loads the model in a pipeline.

class StaticTransformer(TransformerMixin):
    def __init__(self, pickle_path):
        ....
    def fit(self, X, y, ...):
        self.model_ = joblib.load(self.pickle_path)
        return self
    def transform(self, X):
        return self.model_.transform(X)

We would like to see an example of this paradigm of transfer learning, perhaps using ONNX for serialisation, added to our example gallery.

Ping @xadupre

@xadupre
Copy link

xadupre commented Feb 26, 2019

This is still unfinished work but here is what I ended up with: https://github.com/xadupre/scikit-onnxruntime/blob/master/skonnxrt/sklapi/onnx_transformer.py. The backend is onnxruntime but that's Something you could make optional. Here is a very basic example using it for transfer learning with deep learning: https://xadupre.github.io/scikit-onnxruntime/auto_examples/plot_transfer_learning.html#sphx-glr-auto-examples-plot-transfer-learning-py.

@jnothman
Copy link
Member Author

jnothman commented Feb 27, 2019 via email

@sdpython
Copy link
Contributor

What about if modify my code to use this model in a pipeline, do a cross validation where one the parameter is the onnx model: let's say we want to choose between two models to do transfer models and cross validate this choice. Would that be a interesting scenario?

@jnothman
Copy link
Member Author

jnothman commented Feb 27, 2019 via email

@adrinjalali
Copy link
Member

I remember we decided against a FrozenEstimator meta-estimator , which could use either joblib or onnx as the backend in sklearn, but I don't remember why. I guess it makes sense to have it, and let the user choose the onnx backend which would have soft dependencies to onnx libs and otherwise do joblib.

@adrinjalali
Copy link
Member

Playing around and borrowing a bunch from @xadupre 's codes, I put a draft here: https://gist.github.com/adrinjalali/de9ac56c61f3931b38b24e577f54d083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants