Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ONNX support #142

Merged
merged 91 commits into from
Jun 18, 2020
Merged

Add ONNX support #142

merged 91 commits into from
Jun 18, 2020

Conversation

interesaaat
Copy link
Collaborator

This PR tries to fix the coverage problem of #86.

refactoring of convert
creation of ir_converter dir containing the converters for the irs
add onnxml tree models to supporteed
some refactoring of the other tree converters
Fix ONNXML operator mapping returning None instead of exception
Some renaming
Fix a couple of bugs on tree_ensembles
Add check that only onnx backend can be target for onnx models
Some renaming
Fix setup for reordering trees in multiclass tasks
Fix bug for onnxml
Some import clean up
Test only HB files
@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2020

Codecov Report

Merging #142 into master will decrease coverage by 1.21%.
The diff coverage is 90.53%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #142      +/-   ##
==========================================
- Coverage   94.54%   93.33%   -1.22%     
==========================================
  Files          20       24       +4     
  Lines        1211     1545     +334     
  Branches      202      285      +83     
==========================================
+ Hits         1145     1442     +297     
- Misses         40       63      +23     
- Partials       26       40      +14     
Flag Coverage Δ
#unittests 93.33% <90.53%> (-1.22%) ⬇️
Impacted Files Coverage Δ
hummingbird/ml/convert.py 83.65% <80.00%> (-7.12%) ⬇️
hummingbird/ml/_utils.py 81.42% <84.21%> (+1.03%) ⬆️
hummingbird/ml/ir_converters/linked_node.py 90.17% <90.17%> (ø)
...bird/ml/operator_converters/onnx_tree_ensembles.py 91.81% <91.81%> (ø)
hummingbird/ml/ir_converters/topology.py 92.00% <92.00%> (ø)
hummingbird/ml/supported.py 92.18% <96.55%> (+2.44%) ⬆️
hummingbird/ml/__init__.py 100.00% <100.00%> (ø)
hummingbird/ml/ir_converters/__init__.py 100.00% <100.00%> (ø)
hummingbird/ml/operator_converters/__init__.py 100.00% <100.00%> (ø)
...ummingbird/ml/operator_converters/_gbdt_commons.py 100.00% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e56f88...6620ab0. Read the comment docs.

Copy link
Collaborator

@ksaur ksaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial comments, more tomorrow ;)

@@ -49,7 +49,7 @@ jobs:
export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/opt/libomp/lib -L/usr/local/opt/libomp/lib -lomp"
- name: Install extra dependencies
run: |
pip install .[extra]
pip install --no-cache-dir .[extra,onnx]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i determined that we don't need --no-cache-dir here. was it giving some error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I just copied from the version we had before. Let me remove it.

not (onnx_ml_tools_installed() and onnx_installed()), reason="ONNXML test require ONNX, ORT and ONNXMLTOOLS"
)
def test_lightgbm_pytorch(self):
X = [[0, 1], [1, 1], [2, 0]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does datasize impact anything we need to test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a mix of test using random numbers and fixed arrays for that reason. The fixed array tests are taken from the LightGBM convert tests.

"""
Function returning whether the input model is an ONNX model or not.
"""
return "onnx" in model.__module__ and "graph" in dir(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like

>>> type(onnx_ml_model).__name__
'ModelProto'

or

>>> isinstance(onnx_ml_model, onnx.onnx_ONNX_REL_1_6_ml_pb2.ModelProto)
True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess for the second one, the version isn't known. but we could check just the first part

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the version can change. Should I go with the first one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, i think the first one is cleanest, although i don't feel strongly :)

@@ -61,7 +61,7 @@ jobs:
pytest
- name: Coverage
run: |
coverage run -a -m pytest tests
coverage run -a -m pytest tests --ignore=tests/test_no_extra_install.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this part should no longer be needed due to the skips, although it doesn't hurt. (maybe you just did a revert on this file?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My doubt was whether coverage will get messed up since we generate the coverage when some dependencies are off and then again when they are on, and we run the same tests.

hummingbird/ml/convert.py Show resolved Hide resolved
hummingbird/ml/ir_converters/linked_node.py Show resolved Hide resolved
Copy link
Collaborator

@ksaur ksaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next batch....

hummingbird/ml/convert.py Show resolved Hide resolved
hummingbird/ml/convert.py Outdated Show resolved Hide resolved
extra_config: Extra configurations to be used by the individual operator converters.
The set of supported extra configurations can be found at `hummingbird.ml.supported`

Examples:
>>> pytorch_model = convert(sklearn_model,`pytorch`)
>>> pytorch_model = convert(sklearn_model,`torch`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need some comment somewhere (not necessarily here) about how these are equivalent, and that this is a non-breaking change. I'm being picky because folks have >100 forks of HB, and I don't want to break anything or confuse anyone like last time w/pypi! (our blog, etc has this as 'pytorch' so i just want to make sure we communicate this change and that it is backward compat.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an item on this in the next release description.

hummingbird/ml/ir_converters/linked_node.py Outdated Show resolved Hide resolved
hummingbird/ml/ir_converters/linked_node.py Show resolved Hide resolved
hummingbird/ml/operator_converters/onnx_tree_ensembles.py Outdated Show resolved Hide resolved
classes = post_transform = None

for attr in model.origin.attribute:
if attr.name == "nodes_falsenodeids":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put a comment about how these strings are from https://xadupre.github.io/skl2onnx/onnx_ops.html or something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Let me add a link.

hummingbird/ml/supported.py Show resolved Hide resolved
@ksaur ksaur self-requested a review June 17, 2020 22:51
Copy link
Collaborator

@ksaur ksaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting to merge #145 first.

@interesaaat
Copy link
Collaborator Author

If the tests pass it should be good to go. They just bumped PyTorch to 1.5.1. There is a bug in the ONNX exporter for GEMM: I thought they fixed it by with 1.5.1 it's still failing. I enabled tree_trav by default.

@ksaur ksaur merged commit 4cb113a into master Jun 18, 2020
@ksaur ksaur deleted the mainterl/onnxml2 branch June 18, 2020 22:12
interesaaat added a commit that referenced this pull request Jun 18, 2020
* add onnx_installed in utils
refactoring of convert
creation of ir_converter dir containing the converters for the irs
add onnxml tree models to supported

* add documentation

* add tree ensemble converters
some refactoring of the other tree converters

* Tree ensemble regressor now points to gbdt implementation

* Add check that only onnx backend can be target for onnx models

* Add check in covert for supported input model format \ backend
Working on classification tasks

* Add example notebook for ONNX

* rename pytorch into torch
interesaaat added a commit that referenced this pull request Aug 18, 2020
* add onnx_installed in utils
refactoring of convert
creation of ir_converter dir containing the converters for the irs
add onnxml tree models to supported

* add documentation

* add tree ensemble converters
some refactoring of the other tree converters

* Tree ensemble regressor now points to gbdt implementation

* Add check that only onnx backend can be target for onnx models

* Add check in covert for supported input model format \ backend
Working on classification tasks

* Add example notebook for ONNX

* rename pytorch into torch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants