-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes GBT Classifier Issue #648 #653
Conversation
@jcwchen let me know if there is any questions regarding this issue and changes requested |
Can you sign off your commit and resolve the style issues? |
…o float (onnx#637) Signed-off-by: Donald Tolley <tolleybot@gmail.com> Signed-off-by: Xavier Dupre <xadupre@microsoft.com> Co-authored-by: Donald Tolley <tolleybot@gmail.com> Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
* verify onnx 1.14.1 rc2 Signed-off-by: jcwchen <jacky82226@gmail.com> * Bump ONNX 1.14.1 Signed-off-by: jcwchen <jacky82226@gmail.com> --------- Signed-off-by: jcwchen <jacky82226@gmail.com> Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
…able saving and reading of models from Spark, a requirement for GBTClassifier tree conversion Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
yep @xadupre. Just signed off the commits, accidentally rebased it to the last 6 commits so it included others commit in there as well. Sorry about that, tried to rebase again to only have my changes. Will look at the style issues and see what i have to change as well |
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
fix: reformat style
@xadupre style issues fixed using black |
@@ -5,6 +5,7 @@ | |||
from ..common.onnx_ex import get_maximum_opset_supported | |||
from ..common._topology import convert_topology | |||
from ._parse import parse_sparkml | |||
from . import operator_converters |
Check notice
Code scanning / CodeQL
Unused import Note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed or else this error will fail on Spark GBT conversion: ValueError: Unsupported shape calculation for operator pyspark.ml.classification.GBTClassificationModel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. It is weird we did not catch this error until now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was a weird one. Because when I do a pip3 install
on the remote onnxmltools, it throws the AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
error that i opened up the issue #648 for. When we forked the repo to put up a fix, the unsupported shape calculation error
threw until we added in the import. We assumed that it was due to Spark capabilities being built out still or potentially it gets added in when packaged? But yeah weird error
Signed-off-by: James Cao <james.cao@ironwoodcyber.com>
fix: Fixed formatting style to pass ruff tests
Alright all formatting changes should pass now. Ruff will however fail on "unused import operator_converter". However, if that is not used, all GBT classifiers will fail with this error: |
Closes issue #648. There is an issue within spark conversion that does not allow models to be saved during conversion if Spark is running in cluster mode.
Cluster mode is what is used widely with Spark contexts, it is highly unlikely that production environments would run jobs on a single node as that would defeat the purpose of using a distributed infrastructure for training.
The changes requires ONNX_DFS_PATH to be set in the spark config, and pulls that value to use as a root distributed file directory from which a temporary directory will be generated from. The temporary directory will be where the model is saved and read from, so that conversions to ONNX will work.
This only fixes GBT Classifier conversion. However, it is likely that other tree based models that use the function will also be able to be saved due to this fix.