New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SW-1502] Upgrade to mojo2 library v2.1.3 #1429
Conversation
@mmalohlava locally verified that 2.1.3 works as expected, it is just not released on Maven yet |
ml/build.gradle
Outdated
@@ -21,12 +21,11 @@ dependencies { | |||
|
|||
|
|||
compile "ai.h2o:mojo2-runtime-api:${mojoPipelineVersion}" | |||
runtime "ai.h2o:mojo2-runtime-impl:${mojoPipelineVersion}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asked to test it. There is a breaking conflict between versions of the com.google.protobuf
library. hadoop-commons
(hadoop 2.7) utilizes version 2.5 and the new MOJO utilizes the version 3.7. It worked for me to make the dependency as compile
and include and relocate the com.google.protobuf
within the assembly jar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share reproducible code please? It works on this build, spark local
mode with:
from pysparkling.ml import *
path = "/Users/kuba/devel/repos/sparkling-water/ml/src/test/resources/mojo2data/pipeline.mojo"
settings = H2OMOJOSettings(namedMojoOutputColumns=False)
mojo = H2OMOJOPipelineModel.createFromMojo(path, settings)
data = spark.read.csv("/Users/kuba/devel/repos/sparkling-water/examples/smalldata/prostate/prostate.csv", header=True)
mojo.transform(data).collect()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this pipeline.mojo
file produced by the latest version of DAI? I will send you the script that I tested on ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! This pipeline is using the toml config file -> I guess this error occurs for proto based pipelines :) Will add test for it and apply your suggestion, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: protobuf will be shadowed in mojo library, but it won't solve the issue with exceeding limit for pypi upload as we still bundle the libraries into our fat jar
bf15285
to
4b27c97
Compare
Thanks @mn-mikke, I merged your changes go this PR & also ported MM's test in our tests. Passing now. It also helped me discover a different bug -> when mojo expects String & we have parsed data to more specific types, the conversion fails. So in case the mojo expects string on specific column we now convert it to String |
- Add test - Relocation of com.google.protobuf for MOJO2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.