-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SW-617] Support for exporting mojo to hdfs #494
Conversation
so, this is the new pull request with only my changes. The last comment of @mmalohlava was that "the output stream handling should be handled by our Persist layer". I can update the code to use the persist layer. |
Yup, i just wanted to add that comment. If you can do that, that would be great and we can merge after that :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can make these last changes that would be great, thanks a lot!
} | ||
} else { | ||
val destFile = new File(destination) | ||
val fos = new FileOutputStream(destFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persist manger should be used here as well:
val p: Persist = H2O.getPM.getPersistForURI(destinationURI)
val os: OutputStream = p.create(destination.toString, true)```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently having some problems when I try to build
sparkling-water git:(master) ✗ ./gradlew clean build
Starting a Gradle Daemon, 1 busy and 1 stopped Daemons could not be reused, use --status for details
> Task :sparkling-water-repl:compileScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
> Task :sparkling-water-core:compileScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
/Users/taausmi1/Documents/git/sparkling-water/core/src/main/scala/org/apache/spark/h2o/AnnouncementService.scala:120: class DefaultHttpClient in package client is deprecated: see corresponding Javadoc for more information.
val httpClient = new DefaultHttpClient()
^
one warning found
> Task :sparkling-water-app-streaming:compileScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
> Task :sparkling-water-app-streaming:scalaStyle
Found 0 warnings
Found 0 errors
> Task :sparkling-water-core:compileTestScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
> Task :sparkling-water-ml:compileScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
> Task :sparkling-water-examples:compileScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
> Task :sparkling-water-assembly:shadowJar
The SimpleWorkResult type has been deprecated and is scheduled to be removed in Gradle 5.0. Please use WorkResults.didWork() instead.
> Task :sparkling-water-core:compileIntegTestScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
> Task :sparkling-water-core:integTest
water.sparkling.itest.local.H2OContextLocalClusterSuite > verify H2O cloud building on local cluster FAILED
java.lang.RuntimeException at H2OContextLocalClusterSuite.scala:29
1 test completed, 1 failed
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':sparkling-water-core:integTest'.
> There were failing tests. See the report at: file:///Users/taausmi1/Documents/git/sparkling-water/core/build/reports/tests/integTest/index.html
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Get more help at https://help.gradle.org
BUILD FAILED in 3m 15s
50 actionable tasks: 46 executed, 4 up-to-date
If I check the file file:///Users/taausmi1/Documents/git/sparkling-water/core/build/reports/tests/integTest/index.html
I find the following exception
java.lang.RuntimeException: Cloud size under 3
at water.H2O.waitForCloudSize(H2O.java:1691)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:117)
at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:121)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:355)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:371)
I would appreciate any support. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might happen for several reasons, the most common is that spark executor died.
I would suggest just building the code using ./gradlew build -x test
and leave the testing up on our testing infrastructure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get exactly the same issue running ./gradlew build -x test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups, sorry, meant ./gradlew build -x check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now it looks a bit different. But I still get the next issue:
➜ sparkling-water git:(master) ✗ ./gradlew build -x check
FAILURE: Build failed with an exception.
* Where:
Build file '/Users/taausmi1/Documents/git/sparkling-water/py/build.gradle' line: 185
* What went wrong:
Execution failed for task ':sparkling-water-py:distPython'.
> java.io.FileNotFoundException: /Users/taausmi1/Documents/git/sparkling-water/py/build/pkg/h2o/__init__.py (No such file or directory)
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Get more help at https://help.gradle.org
BUILD FAILED in 1s
45 actionable tasks: 7 executed, 38 up-to-date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest getting more familiar with the build process just by experimenting, it can be part of the contribution to sparkling-water :) However in this case, please rebase your PR on the sparkling water master and mainly, make sure that you recreate the h2o.whl which the build asks you to download as this looks like it is still pointing to old h2o.whl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now it seems to work, it's quite confusing though:
when I try to build I get the following information
Please specify:
- H2O_HOME to point to H2O Git repo version 3.16.0.2
or
- H2O_PYTHON_WHEEL to point to downloaded H2O Python Wheel package version 3.16.0.2
For example:
mkdir -p $(pwd)/private/
curl -s http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/2/Python/h2o-3.16.0.2-py2.py3-none-any.whl > $(pwd)/private/h2o.whl
export H2O_PYTHON_WHEEL=$(pwd)/private/h2o.whl
It says specify H2O_HOME
or H2O_PYTHON_WHEEL
. If I specify H2O_HOME
will give me the error above, on the other hand if I specify H2O_PYTHON_WHEEL
is building successfully.
try { | ||
val fs = FileSystem.get(sc.hadoopConfiguration) | ||
val output = fs.create(new Path(destination)) | ||
val os = new BufferedOutputStream(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here ( again just example code )
val p: Persist = H2O.getPM.getPersistForURI(destinationURI)
val os: OutputStream = p.create(destination.toString, true)
@miuma2 seems like you removed your code with the introduction of persist manager. Can you please keep the original code you created and use persist manager on the 2 places we pointed out in the comments ? Thanks! |
@jakubhava, not sure if we need that. The persist layer is actually checking if it is hdfs or not. I was testing what I wrote and passing hdfs URI and file URI. In both cases is exporting the model the function I just wrote. Or do you think that something is missing? |
@miuma2 sorry for confusion, you are actually right! This is the cleanest solution. Thanks a lot for contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me merge it tomorrow
Running final tests before merging |
Thanks for contributing @miuma2 !! |
Thanks also for your support @jakubhava . It would also make sense to implement exportPOJOModel accordingly, using the persist layer. Do you agree? if yes, I can make a new pull request for that. |
That would be great, absolutely agree! If you have time your help is very welcomed! Kuba |
(cherry picked from commit eb81879)
(cherry picked from commit eb81879)
(cherry picked from commit eb81879)
No description provided.