-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ONNX support #142
Add ONNX support #142
Conversation
refactoring of convert creation of ir_converter dir containing the converters for the irs add onnxml tree models to supporteed
some refactoring of the other tree converters
Fix ONNXML operator mapping returning None instead of exception Some renaming Fix a couple of bugs on tree_ensembles
Add ONNX to converter API
Add check that only onnx backend can be target for onnx models Some renaming
Working on classification tasks
Fix setup for reordering trees in multiclass tasks Fix bug for onnxml
Some import clean up
Test only HB files
Codecov Report
@@ Coverage Diff @@
## master #142 +/- ##
==========================================
- Coverage 94.54% 93.33% -1.22%
==========================================
Files 20 24 +4
Lines 1211 1545 +334
Branches 202 285 +83
==========================================
+ Hits 1145 1442 +297
- Misses 40 63 +23
- Partials 26 40 +14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial comments, more tomorrow ;)
.github/workflows/pythonapp.yml
Outdated
@@ -49,7 +49,7 @@ jobs: | |||
export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/opt/libomp/lib -L/usr/local/opt/libomp/lib -lomp" | |||
- name: Install extra dependencies | |||
run: | | |||
pip install .[extra] | |||
pip install --no-cache-dir .[extra,onnx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i determined that we don't need --no-cache-dir
here. was it giving some error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I just copied from the version we had before. Let me remove it.
not (onnx_ml_tools_installed() and onnx_installed()), reason="ONNXML test require ONNX, ORT and ONNXMLTOOLS" | ||
) | ||
def test_lightgbm_pytorch(self): | ||
X = [[0, 1], [1, 1], [2, 0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does datasize impact anything we need to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have a mix of test using random numbers and fixed arrays for that reason. The fixed array tests are taken from the LightGBM convert tests.
hummingbird/ml/convert.py
Outdated
""" | ||
Function returning whether the input model is an ONNX model or not. | ||
""" | ||
return "onnx" in model.__module__ and "graph" in dir(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like
>>> type(onnx_ml_model).__name__
'ModelProto'
or
>>> isinstance(onnx_ml_model, onnx.onnx_ONNX_REL_1_6_ml_pb2.ModelProto)
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess for the second one, the version isn't known. but we could check just the first part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the version can change. Should I go with the first one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, i think the first one is cleanest, although i don't feel strongly :)
.github/workflows/pythonapp.yml
Outdated
@@ -61,7 +61,7 @@ jobs: | |||
pytest | |||
- name: Coverage | |||
run: | | |||
coverage run -a -m pytest tests | |||
coverage run -a -m pytest tests --ignore=tests/test_no_extra_install.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this part should no longer be needed due to the skips, although it doesn't hurt. (maybe you just did a revert on this file?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My doubt was whether coverage will get messed up since we generate the coverage when some dependencies are off and then again when they are on, and we run the same tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next batch....
extra_config: Extra configurations to be used by the individual operator converters. | ||
The set of supported extra configurations can be found at `hummingbird.ml.supported` | ||
|
||
Examples: | ||
>>> pytorch_model = convert(sklearn_model,`pytorch`) | ||
>>> pytorch_model = convert(sklearn_model,`torch`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need some comment somewhere (not necessarily here) about how these are equivalent, and that this is a non-breaking change. I'm being picky because folks have >100 forks of HB, and I don't want to break anything or confuse anyone like last time w/pypi! (our blog, etc has this as 'pytorch' so i just want to make sure we communicate this change and that it is backward compat.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add an item on this in the next release description.
classes = post_transform = None | ||
|
||
for attr in model.origin.attribute: | ||
if attr.name == "nodes_falsenodeids": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put a comment about how these strings are from https://xadupre.github.io/skl2onnx/onnx_ops.html or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Let me add a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, waiting to merge #145 first.
If the tests pass it should be good to go. They just bumped PyTorch to 1.5.1. There is a bug in the ONNX exporter for GEMM: I thought they fixed it by with 1.5.1 it's still failing. I enabled |
* add onnx_installed in utils refactoring of convert creation of ir_converter dir containing the converters for the irs add onnxml tree models to supported * add documentation * add tree ensemble converters some refactoring of the other tree converters * Tree ensemble regressor now points to gbdt implementation * Add check that only onnx backend can be target for onnx models * Add check in covert for supported input model format \ backend Working on classification tasks * Add example notebook for ONNX * rename pytorch into torch
* add onnx_installed in utils refactoring of convert creation of ir_converter dir containing the converters for the irs add onnxml tree models to supported * add documentation * add tree ensemble converters some refactoring of the other tree converters * Tree ensemble regressor now points to gbdt implementation * Add check that only onnx backend can be target for onnx models * Add check in covert for supported input model format \ backend Working on classification tasks * Add example notebook for ONNX * rename pytorch into torch
This PR tries to fix the coverage problem of #86.