Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-1502] Upgrade to mojo2 library v2.1.3 #1429

Merged
merged 5 commits into from
Aug 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions assembly/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ shadowJar {

relocate 'javassist', 'ai.h2o.javassist'
relocate 'com.google.common', 'ai.h2o.com.google.common'
relocate 'com.google.protobuf', 'ai.h2o.com.google.protobuf'
relocate 'org.eclipse.jetty', 'ai.h2o.org.eclipse.jetty'
relocate 'org.eclipse.jetty.orbit', 'ai.h2o.org.eclipse.jetty.orbit'

Expand Down
12 changes: 1 addition & 11 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -178,17 +178,7 @@ configure(subprojects) { project ->
maven {
url "http://h2o-release.s3.amazonaws.com/h2o/master/$h2oBuild/maven/repo/"
}

jakubhava marked this conversation as resolved.
Show resolved Hide resolved
// We add the local nexus repository because of mojo2. It is defined as the last one so all the artifacts are
// resolved from public repositories first
maven {
url "$localNexusLocation/releases/"
credentials {
username project.findProperty("localNexusUsername") ?: "<NA>"
password project.findProperty("localNexusPassword") ?: "<NA>"
}
}


if (sparkVersion.endsWith("-SNAPSHOT")) {
maven {
url "https://repository.apache.org/content/repositories/snapshots/"
Expand Down
37 changes: 23 additions & 14 deletions doc/src/site/sphinx/deployment/scoring_mojo_pipeline.rst
Original file line number Diff line number Diff line change
@@ -1,41 +1,43 @@
Using the MOJO Scoring Pipeline with Spark/Sparkling Water
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MOJO scoring pipeline artifacts can be used in Spark to deploy predictions in parallel using the Sparkling Water API. This section shows how to load and run predictions on the MOJO scoring pipeline in Spark using Scala and the Python API.
MOJO scoring pipeline artifacts can be used in Spark to carry out predictions in parallel
using the Sparkling Water API. This section shows how to load and run predictions on the
MOJO scoring pipeline in Spark using Scala and the Python API.

In the event that you upgrade H2O Driverless AI, we have a good news! Sparkling Water is backwards compatible with MOJO versions produced by older Driverless AI versions.
jakubhava marked this conversation as resolved.
Show resolved Hide resolved
**Note**: Sparkling Water is backwards compatible with MOJO versions produced by different Driverless AI versions.

Requirements
''''''''''''

- You must have a Spark cluster with the Sparkling Water JAR file passed to Spark.
- To run with PySparkling, you must have the PySparkling zip file.

The H2OContext does not have to be created if you only want to run predictions on MOJOs using Spark. This is because they are written to be independent of the H2O run-time.
The H2OContext does not have to be created if you only want to run predictions on MOJOs using Spark.
This is because the scoring is independent of the H2O run-time.

Preparing Your Environment
''''''''''''''''''''''''''
In order use the MOJO scoring pipeline, Driverless AI license has to be passed to Spark.
This can be achieved via ``--jars`` argument of the Spark launcher scripts.

Both PySparkling and Sparkling Water need to be started with some extra configurations in order to enable the MOJO scoring pipeline. Examples are provided below. Specifically, you must pass the path of the H2O Driverless AI license to the Spark ``--jars`` argument. Additionally, you need to add to the same ``--jars`` configuration path to the MOJO scoring pipeline implementation JAR file ``mojo2-runtime.jar``. This file is proprietary and is not part of the resulting Sparkling Water assembly JAR file.

**Note**: In Local Spark mode, please use ``--driver-class-path`` to specify path to the license file and the MOJO Pipeline JAR file.
**Note**: In Local Spark mode, please use ``--driver-class-path`` to specify path to the license file.

PySparkling
'''''''''''

First, start PySpark with all the required dependencies. The following command passes the license file and the MOJO scoring pipeline implementation library to the
``--jars`` argument and also specifies the path to the PySparkling Python library.
First, start PySpark with PySparkling Python package and Driverless AI license.

.. code:: bash

./bin/pyspark --jars license.sig,mojo2-runtime.jar --py-files pysparkling.zip
./bin/pyspark --jars license.sig --py-files pysparkling.zip

or, you can download official Sparkling Water distribution from `H2O Download page <https://www.h2o.ai/download/>`__. Please follow steps on the
Sparkling Water download page. Once you are in the Sparkling Water directory, you can call:

.. code:: bash

./bin/pysparkling --jars license.sig,mojo2-runtime.jar
./bin/pysparkling --jars license.sig


At this point, you should have available a PySpark interactive terminal where you can try out predictions. If you would like to productionalize the scoring process, you can use the same configuration, except instead of using ``./bin/pyspark``, you would use ``./bin/spark-submit`` to submit your job to a cluster.
Expand Down Expand Up @@ -72,14 +74,21 @@ At this point, you should have available a PySpark interactive terminal where yo
Sparkling Water
'''''''''''''''

First start Spark with all the required dependencies. The following command passes the license file and the MOJO scoring pipeline implementation library
``mojo2-runtime.jar`` to the ``--jars`` argument and also specifies the path to the Sparkling Water assembly jar.
First, start Spark with Sparkling Water Scala assembly and Driverless AI license.

.. code:: bash

./bin/spark-shell --jars license.sig,sparkling-water-assembly.jar

or, you can download official Sparkling Water distribution from `H2O Download page <https://www.h2o.ai/download/>`__. Please follow steps on the
Sparkling Water download page. Once you are in the Sparkling Water directory, you can call:

.. code:: bash

./bin/spark-shell --jars license.sig,mojo2-runtime.jar,sparkling-water-assembly.jar
./bin/sparkling-shell --jars license.sig


At this point, you should have available a Sparkling Water interactive terminal where you can try out predictions. If you would like to productionalize the scoring process, you can use the same configuration, except instead of using ``./bin/spark-shell``, you would use ``./bin/spark-submit`` to submit your job to a cluster.
At this point, you should have available a Sparkling Water interactive terminal where you can carry out predictions. If you would like to productionalize the scoring process, you can use the same configuration, except instead of using ``./bin/spark-shell``, you would use ``./bin/spark-submit`` to submit your job to a cluster.

.. code:: scala

Expand Down
6 changes: 2 additions & 4 deletions gradle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,10 @@ h2oMajorName=yau
h2oBuild=2
# Scala version for Sparkling Water. By default, for 2.11 we use 2.11.12.
scalaBaseVersion=2.11
# Internal Nexus location
localNexusLocation=http://nexus:8081/nexus/repository
# Version of Mojo Pipeline library
mojoPipelineVersion=0.13.16
mojoPipelineVersion=2.1.3
# Defines whether to run tests with Driverless AI mojo pipelines
# These tests require license and runtime jar artifact
# These tests require the Driverless AI license
testMojoPipeline=false
# Disable execution of the gradle daemon
org.gradle.daemon=false
Expand Down
1 change: 0 additions & 1 deletion gradle/shadowRuntime.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ configurations {
exclude group: 'commons-io', module: 'commons-io' // a dependency of org.apache.spark:spark-core_2.11
exclude group: 'commons-logging', module: 'commons-logging' // a dependency of org.apache.hadoop:hadoop-auth
exclude group: 'log4j', module: 'log4j' // a dependency of org.apache.hadoop:hadoop-auth
exclude group: 'com.google.protobuf' // a dependency of org.apache.hadoop:hadoop-common
exclude group: 'com.fasterxml.jackson.core', module: 'jackson-core'
// a dependency of org.apache.spark:spark-sql_2.11
exclude group: 'org.apache.httpcomponents' // a dependency of org.apache.hadoop:hadoop-auth
Expand Down
5 changes: 2 additions & 3 deletions jenkins/Jenkinsfile-release
Original file line number Diff line number Diff line change
Expand Up @@ -274,12 +274,11 @@ def updateRelVersionStage() {
def build() {
return { params ->
stage('Build') {
withCredentials([usernamePassword(credentialsId: "LOCAL_NEXUS", usernameVariable: 'LOCAL_NEXUS_USERNAME', passwordVariable: 'LOCAL_NEXUS_PASSWORD'),
usernamePassword(credentialsId: "SIGNING_KEY", usernameVariable: 'SIGN_KEY', passwordVariable: 'SIGN_PASSWORD'),
withCredentials([usernamePassword(credentialsId: "SIGNING_KEY", usernameVariable: 'SIGN_KEY', passwordVariable: 'SIGN_PASSWORD'),
file(credentialsId: 'release-secret-key-ring-file', variable: 'RING_FILE_PATH')]) {
sh """
activate_java_8
./gradlew dist -Pspark=${params.spark} -Pversion=${getVersion(params)} -PlocalNexusUsername=$LOCAL_NEXUS_USERNAME -PlocalNexusPassword=$LOCAL_NEXUS_PASSWORD -PdoRelease -Psigning.keyId=${SIGN_KEY} -Psigning.secretKeyRingFile=${RING_FILE_PATH} -Psigning.password=
./gradlew dist -Pspark=${params.spark} -Pversion=${getVersion(params)} -PdoRelease -Psigning.keyId=${SIGN_KEY} -Psigning.secretKeyRingFile=${RING_FILE_PATH} -Psigning.password=
"""
}
}
Expand Down
10 changes: 3 additions & 7 deletions jenkins/sparklingWaterPipeline.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -244,13 +244,9 @@ def buildAndLint() {
return { config ->
stage('QA: Build and Lint - ' + config.backendMode) {
try {
withCredentials([usernamePassword(credentialsId: "LOCAL_NEXUS", usernameVariable: 'LOCAL_NEXUS_USERNAME', passwordVariable: 'LOCAL_NEXUS_PASSWORD')]) {
sh """
${config.gradleCmd} clean build -x check scalaStyle -PlocalNexusUsername=$LOCAL_NEXUS_USERNAME -PlocalNexusPassword=$LOCAL_NEXUS_PASSWORD
"""
if (config.runIntegTests.toBoolean()) {
stash "sw-build-${config.sparkMajorVersion}"
}
sh "${config.gradleCmd} clean build -x check scalaStyle"
if (config.runIntegTests.toBoolean()) {
stash "sw-build-${config.sparkMajorVersion}"
}
} finally {
arch 'assembly/build/reports/dependency-license/**/*'
Expand Down
9 changes: 4 additions & 5 deletions ml/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,11 @@ dependencies {


compile "ai.h2o:mojo2-runtime-api:${mojoPipelineVersion}"
compile "ai.h2o:mojo2-runtime-impl:${mojoPipelineVersion}"

testCompile "ai.h2o:mojo2-runtime-api:${mojoPipelineVersion}"

if (project.property("testMojoPipeline") == "true") {
testRuntime "ai.h2o:mojo2-runtime-impl:${mojoPipelineVersion}"
}

testCompile "ai.h2o:mojo2-runtime-impl:${mojoPipelineVersion}"

// And use scalatest for Scala testing
testCompile "org.scalatest:scalatest_${scalaBaseVersion}:2.2.1"
testCompile "junit:junit:4.11"
Expand Down
Binary file not shown.
20 changes: 2 additions & 18 deletions py/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -45,28 +45,17 @@ python {
}

configurations {
mojopipeline
s3deps
sparklingWaterAssemblyJar
}

dependencies {
sparklingWaterAssemblyJar project(path: ':sparkling-water-assembly', configuration: 'shadow')

if (project.property("testMojoPipeline") == "true") {
mojopipeline "ai.h2o:mojo2-runtime-impl:${mojoPipelineVersion}"
}


s3deps "com.amazonaws:aws-java-sdk:1.7.4"
s3deps "org.apache.hadoop:hadoop-aws:2.7.3"
}

task resolveTestRuntimeDependencies {
doLast {
project.configurations.mojopipeline.resolve()
}
}

//
// Create a file with version for Python dist task
//
Expand Down Expand Up @@ -286,11 +275,7 @@ def createUnitTestArgs() {
"${distPython.archiveFile.get()}",
"spark.ext.h2o.backend.cluster.mode=${detectBackendClusterMode()}"
]

if (project.property("testMojoPipeline") == "true") {
args.add("spark.jars=${configurations.mojopipeline.resolve().join(",")}")
}


gradle.taskGraph.whenReady {
if (gradle.taskGraph.hasTask(":${project.name}:hadoopSmokeTests")) {
args.add("spark.jars=${configurations.s3deps.resolve().join(",")}")
Expand Down Expand Up @@ -393,4 +378,3 @@ distPython.dependsOn createVersionFile
build.dependsOn distPython
test.dependsOn testPython
integTest.dependsOn integTestPython

19 changes: 19 additions & 0 deletions py/tests/tests_unit_mojo_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,25 @@ def test_mojo_dai_pipeline_serialize(self):
assert preds[3][0] == 65.78772654671035
assert preds[4][0] == 66.11327967814829

def testMojoPipelineProtoBackendWithoutError(self):
mojo = H2OMOJOPipelineModel.createFromMojo(
"file://" + os.path.abspath("../ml/src/test/resources/proto_based_pipeline.mojo"))

data = [(2.0,'male',0.41670000553131104,111361,6.449999809265137,'A19'),
(1.0,'female',0.33329999446868896,110413,6.4375,'A14'),
(1.0,'female',0.16670000553131104,111320,6.237500190734863,'A21'),
(1.0,'female',2.0,111361,6.237500190734863,'A20'),
(3.0,'female',1.0,110152,6.75,'A14'),
(1.0,'male',0.666700005531311,110489,6.85830020904541,'A10'),
(3.0,'male',0.33329999446868896,111320,0.0,'A11'),
(3.0,'male',2.0,110413,6.85830020904541,'A24'),
(1.0,'female',1.0,110489,3.170799970626831,'A21'),
(1.0,'female',0.33329999446868896,111240,0.0,'A14')
]
rdd = self._spark.sparkContext.parallelize(data)
df = self._spark.createDataFrame(rdd, ['pclass', 'sex', 'age', 'ticket', 'fare', 'cabin'])
prediction = mojo.transform(df)
prediction.collect()

if __name__ == '__main__':
generic_test_utils.run_tests([H2OMojoPipelineTest], file_name="py_unit_tests_mojo_pipeline_report")
1 change: 1 addition & 0 deletions scoring/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dependencies {
compileOnly project(":sparkling-water-macros")

compile "ai.h2o:mojo2-runtime-api:${mojoPipelineVersion}"
compile "ai.h2o:mojo2-runtime-impl:${mojoPipelineVersion}"
compile "ai.h2o:h2o-genmodel:${h2oVersion}"
compile project(":sparkling-water-utils")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ class H2OMOJOPipelineModel(override val uid: String) extends H2OMOJOModelBase[H2
1
} else if (colType != Type.Bool && colType.isnumeric && colData.toString.toLowerCase() == "false") {
0
} else if (colType.isAssignableFrom(classOf[String]) && !colData.isInstanceOf[String]) {
// MOJO expects String, but we have DataFrame with different column type, cast to String
colData.toString
} else {
colData
}
Expand Down Expand Up @@ -105,7 +108,13 @@ class H2OMOJOPipelineModel(override val uid: String) extends H2OMOJOModelBase[H2
builder.addRow(rowBuilder)
val output = mojoPipeline.transform(builder.toMojoFrame)
val predictions = output.getColumnNames.zipWithIndex.map { case (_, i) =>
val predictedVal = output.getColumnData(i).asInstanceOf[Array[Double]]
val columnOutput = output.getColumnData(i)
val predictedVal = columnOutput match {
case floats: Array[Float] =>
floats.map(_.asInstanceOf[Double])
case _ =>
columnOutput.asInstanceOf[Array[Double]]
}
if (predictedVal.length != 1) {
throw new RuntimeException("Invalid state, we predict on each row by row, independently at this moment.")
}
Expand Down