Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-617] Support for exporting mojo to hdfs #494

Merged
merged 2 commits into from
Dec 14, 2017
Merged

Conversation

miuma2
Copy link
Contributor

@miuma2 miuma2 commented Dec 12, 2017

No description provided.

@miuma2
Copy link
Contributor Author

miuma2 commented Dec 12, 2017

so, this is the new pull request with only my changes. The last comment of @mmalohlava was that "the output stream handling should be handled by our Persist layer". I can update the code to use the persist layer.

@jakubhava
Copy link
Contributor

Yup, i just wanted to add that comment. If you can do that, that would be great and we can merge after that :)

Copy link
Contributor

@jakubhava jakubhava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can make these last changes that would be great, thanks a lot!

}
} else {
val destFile = new File(destination)
val fos = new FileOutputStream(destFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persist manger should be used here as well:

val p: Persist = H2O.getPM.getPersistForURI(destinationURI)
    val os: OutputStream = p.create(destination.toString, true)```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently having some problems when I try to build

sparkling-water git:(master) ✗ ./gradlew clean build
Starting a Gradle Daemon, 1 busy and 1 stopped Daemons could not be reused, use --status for details

> Task :sparkling-water-repl:compileScala 
Pruning sources from previous analysis, due to incompatible CompileSetup.

> Task :sparkling-water-core:compileScala 
Pruning sources from previous analysis, due to incompatible CompileSetup.
/Users/taausmi1/Documents/git/sparkling-water/core/src/main/scala/org/apache/spark/h2o/AnnouncementService.scala:120: class DefaultHttpClient in package client is deprecated: see corresponding Javadoc for more information.
  val httpClient = new DefaultHttpClient()
                       ^
one warning found

> Task :sparkling-water-app-streaming:compileScala 
Pruning sources from previous analysis, due to incompatible CompileSetup.

> Task :sparkling-water-app-streaming:scalaStyle 
Found 0 warnings
Found 0 errors

> Task :sparkling-water-core:compileTestScala 
Pruning sources from previous analysis, due to incompatible CompileSetup.

> Task :sparkling-water-ml:compileScala 
Pruning sources from previous analysis, due to incompatible CompileSetup.

> Task :sparkling-water-examples:compileScala 
Pruning sources from previous analysis, due to incompatible CompileSetup.

> Task :sparkling-water-assembly:shadowJar 
The SimpleWorkResult type has been deprecated and is scheduled to be removed in Gradle 5.0. Please use WorkResults.didWork() instead.

> Task :sparkling-water-core:compileIntegTestScala 
Pruning sources from previous analysis, due to incompatible CompileSetup.

> Task :sparkling-water-core:integTest 

water.sparkling.itest.local.H2OContextLocalClusterSuite > verify H2O cloud building on local cluster FAILED
    java.lang.RuntimeException at H2OContextLocalClusterSuite.scala:29

1 test completed, 1 failed


FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sparkling-water-core:integTest'.
> There were failing tests. See the report at: file:///Users/taausmi1/Documents/git/sparkling-water/core/build/reports/tests/integTest/index.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 3m 15s
50 actionable tasks: 46 executed, 4 up-to-date

If I check the file file:///Users/taausmi1/Documents/git/sparkling-water/core/build/reports/tests/integTest/index.html I find the following exception

java.lang.RuntimeException: Cloud size under 3
	at water.H2O.waitForCloudSize(H2O.java:1691)
	at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:117)
	at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:121)
	at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:355)
	at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:371)

I would appreciate any support. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might happen for several reasons, the most common is that spark executor died.

I would suggest just building the code using ./gradlew build -x test and leave the testing up on our testing infrastructure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get exactly the same issue running ./gradlew build -x test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups, sorry, meant ./gradlew build -x check

Copy link
Contributor Author

@miuma2 miuma2 Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now it looks a bit different. But I still get the next issue:

➜  sparkling-water git:(master) ✗ ./gradlew build -x check

FAILURE: Build failed with an exception.

* Where:
Build file '/Users/taausmi1/Documents/git/sparkling-water/py/build.gradle' line: 185

* What went wrong:
Execution failed for task ':sparkling-water-py:distPython'.
> java.io.FileNotFoundException: /Users/taausmi1/Documents/git/sparkling-water/py/build/pkg/h2o/__init__.py (No such file or directory)

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 1s
45 actionable tasks: 7 executed, 38 up-to-date

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest getting more familiar with the build process just by experimenting, it can be part of the contribution to sparkling-water :) However in this case, please rebase your PR on the sparkling water master and mainly, make sure that you recreate the h2o.whl which the build asks you to download as this looks like it is still pointing to old h2o.whl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it seems to work, it's quite confusing though:

when I try to build I get the following information

Please specify:
       - H2O_HOME to point to H2O Git repo version 3.16.0.2
      or
       - H2O_PYTHON_WHEEL to point to downloaded H2O Python Wheel package version 3.16.0.2
         For example:
  
          mkdir -p $(pwd)/private/
          curl -s http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/2/Python/h2o-3.16.0.2-py2.py3-none-any.whl > $(pwd)/private/h2o.whl
          export H2O_PYTHON_WHEEL=$(pwd)/private/h2o.whl

It says specify H2O_HOME or H2O_PYTHON_WHEEL. If I specify H2O_HOME will give me the error above, on the other hand if I specify H2O_PYTHON_WHEEL is building successfully.

try {
val fs = FileSystem.get(sc.hadoopConfiguration)
val output = fs.create(new Path(destination))
val os = new BufferedOutputStream(output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here ( again just example code )

val p: Persist = H2O.getPM.getPersistForURI(destinationURI)
    val os: OutputStream = p.create(destination.toString, true)

@jakubhava
Copy link
Contributor

@miuma2 seems like you removed your code with the introduction of persist manager. Can you please keep the original code you created and use persist manager on the 2 places we pointed out in the comments ? Thanks!

@miuma2
Copy link
Contributor Author

miuma2 commented Dec 13, 2017

@jakubhava, not sure if we need that. The persist layer is actually checking if it is hdfs or not. I was testing what I wrote and passing hdfs URI and file URI. In both cases is exporting the model the function I just wrote. Or do you think that something is missing?

@jakubhava
Copy link
Contributor

jakubhava commented Dec 13, 2017

@miuma2 sorry for confusion, you are actually right! This is the cleanest solution. Thanks a lot for contribution!

Copy link
Contributor

@jakubhava jakubhava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me merge it tomorrow

@jakubhava
Copy link
Contributor

Running final tests before merging

@jakubhava jakubhava merged commit eb81879 into h2oai:master Dec 14, 2017
@jakubhava
Copy link
Contributor

Thanks for contributing @miuma2 !!

@miuma2
Copy link
Contributor Author

miuma2 commented Dec 15, 2017

Thanks also for your support @jakubhava . It would also make sense to implement exportPOJOModel accordingly, using the persist layer. Do you agree? if yes, I can make a new pull request for that.

@jakubhava
Copy link
Contributor

That would be great, absolutely agree! If you have time your help is very welcomed!

Kuba

jakubhava pushed a commit that referenced this pull request Jan 3, 2018
jakubhava pushed a commit that referenced this pull request Jan 3, 2018
jakubhava pushed a commit that referenced this pull request Jan 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants