Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failure in creating H2O cloud #5037

Closed
exalate-issue-sync bot opened this issue May 22, 2023 · 5 comments
Closed

Intermittent failure in creating H2O cloud #5037

exalate-issue-sync bot opened this issue May 22, 2023 · 5 comments
Assignees

Comments

@exalate-issue-sync
Copy link

Though I get a cloud eventually, it still does fail now and then.
So I'll make a habit out of reporting the stack traces, so you know it still has rough edges.

The incantation code:

from pysparkling import *
conf = (H2OConf(sc)
.use_auto_cluster_start()
.set_yarn_queue("spark-analytics")
.set_num_of_external_h2o_nodes(8)
.set_mapper_xmx("10G")
)

context = H2OContext.getOrCreate(sc, conf)

Many times this works, but today I got:

Py4JJavaError: An error occurred while calling z:org.apache.spark.h2o.JavaH2OContext.getOrCreate.
: java.io.FileNotFoundException: notify_sparkling-water-bteeuwen_155098482 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
at org.apache.spark.h2o.backends.external.ExternalH2OBackend.launchH2OOnYarn(ExternalH2OBackend.scala:75)
at org.apache.spark.h2o.backends.external.ExternalH2OBackend.init(ExternalH2OBackend.scala:109)
at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:102)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:279)
at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala)
at org.apache.spark.h2o.JavaH2OContext.getOrCreate(JavaH2OContext.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

I reran the code, and it worked.

@exalate-issue-sync
Copy link
Author

Avkash Chauhan commented: #90282 (https://support.h2o.ai/helpdesk/tickets/90282) - creating h2o cloud fails

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: This problem probably means that for some reason h2o cluster couldn't be started, thus notify file wasn't created and that's why we see this exception.

The solution at least for know is to ensure transparent logging of h2o cluster startup in sparkling-water application. Then we can see why h2o failed to start up and can investigate further.

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: Marking this problem as fixed. If there's ever problem with creation of h2o cloud in automatic mode in YARN, we can see it in logs which wasn't supported before and new issue should be reported in that case

@DinukaH2O
Copy link

JIRA Issue Migration Info

Jira Issue: SW-308
Assignee: Jakub Hava
Reporter: Avkash Chauhan
State: Resolved
Fix Version: 1.6.9
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#201

@hasithjp
Copy link
Member

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2017-01-16T12:47:39.000-0800

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants