-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparkML GBTClassifier fails to convert to ONNX #648
Comments
Found some bugs when walking through the source code in regards to how spark handles the model conversion - will work a fix and push up a MR for review |
Going to clean up the spark code on our forked repo then will open a pull request to merge back into the main onnxtools |
xadupre
added a commit
that referenced
this issue
Oct 2, 2023
* Check if base_score is available and it is a string type convert it to float (#637) Signed-off-by: Donald Tolley <tolleybot@gmail.com> Signed-off-by: Xavier Dupre <xadupre@microsoft.com> Co-authored-by: Donald Tolley <tolleybot@gmail.com> Signed-off-by: James Cao <james.cao@ironwoodcyber.com> * signed (#639) Signed-off-by: Xavier Dupre <xadupre@microsoft.com> Signed-off-by: James Cao <james.cao@ironwoodcyber.com> * Bump ONNX 1.14.1 in CI pipelines (#644) * verify onnx 1.14.1 rc2 Signed-off-by: jcwchen <jacky82226@gmail.com> * Bump ONNX 1.14.1 Signed-off-by: jcwchen <jacky82226@gmail.com> --------- Signed-off-by: jcwchen <jacky82226@gmail.com> Signed-off-by: James Cao <james.cao@ironwoodcyber.com> * fix (dev): Working start to address issue #648. This will help enable saving and reading of models from Spark, a requirement for GBTClassifier tree conversion Signed-off-by: James Cao <james.cao@ironwoodcyber.com> * feat: Allow conversions of SparkML models to ONNX using cluster mode Signed-off-by: James Cao <james.cao@ironwoodcyber.com> * fix: fix bug that did not fully create temp paths Signed-off-by: James Cao <james.cao@ironwoodcyber.com> * fix: reformat style Signed-off-by: James Cao <james.cao@ironwoodcyber.com> * fix: Fixed formatting style to pass ruff tests Signed-off-by: James Cao <james.cao@ironwoodcyber.com> --------- Signed-off-by: Donald Tolley <tolleybot@gmail.com> Signed-off-by: Xavier Dupre <xadupre@microsoft.com> Signed-off-by: James Cao <james.cao@ironwoodcyber.com> Signed-off-by: jcwchen <jacky82226@gmail.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: Donald Tolley <tolleybot@gmail.com> Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
I'll close this since the PR fixing it was merged. |
Awesome. Thank you and sounds good! @xadupre |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi all - I am running into issues where I am unable to convert a Spark ML GBTClassifier to ONNX. Not sure if this is a initialization issue or something that has not been added in - so would like some insight into a solution or update so that I can help contribute if this feature is not supported yet. The code that I am using below to replicate the error is pulled from the onnxmltools/test folder for spark ml GBT Classifier. I have also replicated this error with other models I have trained where my label columns are double type (0.0/1.0) and my feature columns are created using the VectorAssembler.
lib versions:
onnxmltools: 1.11.2
onnxconverter-common: 1.13.0
pyspark: 3.3.2
This will fail on convert_sparkml method, specifically in this class https://github.com/onnx/onnxmltools/blob/main/onnxmltools/convert/sparkml/operator_converters/tree_ensemble_common.py on line 65 when trying to read the written model as a parquet. This is the error message:
AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
Any help is appreciated. Thanks in advance!
The text was updated successfully, but these errors were encountered: