Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-1502] Upgrade to mojo2 library v2.1.3 #1429

Merged
merged 5 commits into from Aug 14, 2019
Merged

[SW-1502] Upgrade to mojo2 library v2.1.3 #1429

merged 5 commits into from Aug 14, 2019

Conversation

jakubhava
Copy link
Contributor

No description provided.

@jakubhava
Copy link
Contributor Author

@mmalohlava locally verified that 2.1.3 works as expected, it is just not released on Maven yet

ml/build.gradle Outdated
@@ -21,12 +21,11 @@ dependencies {


compile "ai.h2o:mojo2-runtime-api:${mojoPipelineVersion}"
runtime "ai.h2o:mojo2-runtime-impl:${mojoPipelineVersion}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asked to test it. There is a breaking conflict between versions of the com.google.protobuf library. hadoop-commons (hadoop 2.7) utilizes version 2.5 and the new MOJO utilizes the version 3.7. It worked for me to make the dependency as compile and include and relocate the com.google.protobuf within the assembly jar.

Copy link
Contributor Author

@jakubhava jakubhava Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share reproducible code please? It works on this build, spark local mode with:

from pysparkling.ml import *
path = "/Users/kuba/devel/repos/sparkling-water/ml/src/test/resources/mojo2data/pipeline.mojo"
settings = H2OMOJOSettings(namedMojoOutputColumns=False)
mojo = H2OMOJOPipelineModel.createFromMojo(path, settings)

data = spark.read.csv("/Users/kuba/devel/repos/sparkling-water/examples/smalldata/prostate/prostate.csv", header=True)
mojo.transform(data).collect()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pipeline.mojo file produced by the latest version of DAI? I will send you the script that I tested on ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! This pipeline is using the toml config file -> I guess this error occurs for proto based pipelines :) Will add test for it and apply your suggestion, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: protobuf will be shadowed in mojo library, but it won't solve the issue with exceeding limit for pypi upload as we still bundle the libraries into our fat jar

@jakubhava jakubhava force-pushed the jh/sw-1502 branch 2 times, most recently from bf15285 to 4b27c97 Compare August 14, 2019 07:04
@jakubhava
Copy link
Contributor Author

Thanks @mn-mikke, I merged your changes go this PR & also ported MM's test in our tests. Passing now.

It also helped me discover a different bug -> when mojo expects String & we have parsed data to more specific types, the conversion fails. So in case the mojo expects string on specific column we now convert it to String

build.gradle Show resolved Hide resolved
Copy link
Collaborator

@mn-mikke mn-mikke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mn-mikke mn-mikke merged commit 5a94935 into master Aug 14, 2019
@jakubhava jakubhava deleted the jh/sw-1502 branch August 19, 2019 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants