Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes GBT Classifier Issue #648 #653

Merged
merged 11 commits into from
Oct 2, 2023
Merged

Fixes GBT Classifier Issue #648 #653

merged 11 commits into from
Oct 2, 2023

Conversation

yungcero
Copy link
Contributor

Closes issue #648. There is an issue within spark conversion that does not allow models to be saved during conversion if Spark is running in cluster mode.

Cluster mode is what is used widely with Spark contexts, it is highly unlikely that production environments would run jobs on a single node as that would defeat the purpose of using a distributed infrastructure for training.

The changes requires ONNX_DFS_PATH to be set in the spark config, and pulls that value to use as a root distributed file directory from which a temporary directory will be generated from. The temporary directory will be where the model is saved and read from, so that conversions to ONNX will work.

This only fixes GBT Classifier conversion. However, it is likely that other tree based models that use the function will also be able to be saved due to this fix.

@yungcero
Copy link
Contributor Author

@jcwchen let me know if there is any questions regarding this issue and changes requested

@jcwchen
Copy link
Member

jcwchen commented Sep 21, 2023

cc maintainers @xiaowuhu, @xadupre for review. Thanks!

@xadupre
Copy link
Collaborator

xadupre commented Sep 21, 2023

Can you sign off your commit and resolve the style issues?

xadupre and others added 3 commits September 21, 2023 13:08
…o float (onnx#637)

Signed-off-by: Donald Tolley <tolleybot@gmail.com>
Signed-off-by: Xavier Dupre <xadupre@microsoft.com>
Co-authored-by: Donald Tolley <tolleybot@gmail.com>
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
Signed-off-by: Xavier Dupre <xadupre@microsoft.com>
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
* verify onnx 1.14.1 rc2

Signed-off-by: jcwchen <jacky82226@gmail.com>

* Bump ONNX 1.14.1

Signed-off-by: jcwchen <jacky82226@gmail.com>

---------

Signed-off-by: jcwchen <jacky82226@gmail.com>
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
…able saving and reading of models from Spark, a requirement for GBTClassifier tree conversion

Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
@yungcero
Copy link
Contributor Author

yep @xadupre. Just signed off the commits, accidentally rebased it to the last 6 commits so it included others commit in there as well. Sorry about that, tried to rebase again to only have my changes. Will look at the style issues and see what i have to change as well

yungcero and others added 3 commits September 21, 2023 13:38
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
@yungcero
Copy link
Contributor Author

@xadupre style issues fixed using black

@yungcero
Copy link
Contributor Author

hey! just wanted to check in. @xadupre @xiaowuhu any other changes or questions about these changes?

@@ -5,6 +5,7 @@
from ..common.onnx_ex import get_maximum_opset_supported
from ..common._topology import convert_topology
from ._parse import parse_sparkml
from . import operator_converters

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'operator_converters' is not used.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed or else this error will fail on Spark GBT conversion: ValueError: Unsupported shape calculation for operator pyspark.ml.classification.GBTClassificationModel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It is weird we did not catch this error until now.

Copy link
Contributor Author

@yungcero yungcero Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was a weird one. Because when I do a pip3 install on the remote onnxmltools, it throws the AnalysisException: Unable to infer schema for Parquet. It must be specified manually. error that i opened up the issue #648 for. When we forked the repo to put up a fix, the unsupported shape calculation error threw until we added in the import. We assumed that it was due to Spark capabilities being built out still or potentially it gets added in when packaged? But yeah weird error

yungcero and others added 2 commits September 26, 2023 11:36
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
fix: Fixed formatting style to pass ruff tests
@yungcero
Copy link
Contributor Author

Alright all formatting changes should pass now. Ruff will however fail on "unused import operator_converter". However, if that is not used, all GBT classifiers will fail with this error: ValueError: Unsupported shape calculation for operator pyspark.ml.classification.GBTClassificationModel

@xadupre xadupre merged commit 606b41e into onnx:main Oct 2, 2023
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants