Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create h2o_context on Databricks using R and Scala #1193

Closed
sasikiran opened this issue May 12, 2019 · 36 comments
Closed

Unable to create h2o_context on Databricks using R and Scala #1193

sasikiran opened this issue May 12, 2019 · 36 comments

Comments

@sasikiran
Copy link

I'm trying to use sparkling water on Azure Databricks and I'm not able to create h2o_context. I tried this in both R and Scala on the same cluster.

R Sample code:

install.packages("sparklyr")
install.packages("rsparkling")
install.packages("h2o", type="source", repos="https://h2o-release.s3.amazonaws.com/h2o/rel-yates/2/R")

library(rsparkling)
library(sparklyr)

sc <- spark_connect(method="databricks")
h2o_context(sc)

Scala sample code:

import org.apache.spark.h2o._
val hc = H2OContext.getOrCreate(spark)

Configuration details

  • Azure Databricks version: 5.3 ML (includes Apache Spark 2.4.0, Scala 2.11)
  • Driver type: Standard_DS3_V2: 14.0 GB Memory, 4 Cores, 0.75 DBU
  • Min workers: 2
  • Max workers: 8
  • Enable autoscaling: Yes
  • Sparkling Water library: sparkling_water_assembly_2_11_2_4_10_all.jar

Error log:
Error : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 14.0 failed 4 times, most recent failure: Lost task 1.3 in stage 14.0 (TID 144, 10.139.64.6, executor 18): ExecutorLostFailure (executor 18 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2355) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2343) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2342) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2342) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2574) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2510) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:893) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2233) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2274) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:379) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.startH2O(InternalBackendUtils.scala:196) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$.startH2O(InternalBackendUtils.scala:306) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:104) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:438) at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sparklyr.Invoke.invoke(invoke.scala:139) at sparklyr.StreamHandler.handleMethodCall(stream.scala:123) at sparklyr.StreamHandler.read(stream.scala:66) at sparklyr.BackendHandler.channelRead0(handler.scala:51) at sparklyr.BackendHandler.channelRead0(handler.scala:4) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)

@jakubhava
Copy link
Contributor

jakubhava commented May 12, 2019 via email

@sasikiran
Copy link
Author

Hi @jakubhava, I created a new Databricks cluster without autoscaling and by setting number of workers to 4. I ran the same code in R and Scala and the same error happened.

Databricks configuration:
Databricks runtime: 5.3 ML (includes Apache Spark 2.4.0, Scala 2.11)
Enable autoscaling: No
Python version: 3
Worker type: Standard_DS3_v2, 14.0 GB Memory, 4 Cores, 0.75 DBU
Number of workers: 4
Driver type: Standard_DS3_v2, 14.0 GB Memory, 4 Cores, 0.75 DBU
Sparkling Water library: sparkling_water_assembly_2_11_2_4_10_all.jar

Error log:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 5.0 failed 4 times, most recent failure: Lost task 3.3 in stage 5.0 (TID 156, 10.139.64.7, executor 17): ExecutorLostFailure (executor 17 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2355) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2343) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2342) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2342) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2574) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2510) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:893) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2233) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2274) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:379) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.startH2O(InternalBackendUtils.scala:196) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$.startH2O(InternalBackendUtils.scala:306) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:104) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:419) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:3) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:48) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:50) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw.<init>(command-3675229254105724:52) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw.<init>(command-3675229254105724:54) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw.<init>(command-3675229254105724:56) at linea79e6cc413f242d48e96cfbe9622892d25.$read.<init>(command-3675229254105724:58) at linea79e6cc413f242d48e96cfbe9622892d25.$read$.<init>(command-3675229254105724:62) at linea79e6cc413f242d48e96cfbe9622892d25.$read$.<clinit>(command-3675229254105724) at linea79e6cc413f242d48e96cfbe9622892d25.$eval$.$print$lzycompute(<notebook>:7) at linea79e6cc413f242d48e96cfbe9622892d25.$eval$.$print(<notebook>:6) at linea79e6cc413f242d48e96cfbe9622892d25.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:199) at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:590) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:545) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:323) at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:303) at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:47) at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:47) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:303) at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591) at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591) at scala.util.Try$.apply(Try.scala:192) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:586) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:477) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:544) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:383) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:330) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:216) at java.lang.Thread.run(Thread.java:748)

@jakubhava
Copy link
Contributor

Can you please share yarn logs?

@sasikiran
Copy link
Author

I am running this on Databricks. There is no yarn integration.

@jakubhava
Copy link
Contributor

I believe this is not true. Internally, Databricks run their Spark on Hadoop as well.

What we need is full logs from executors and driver

@sasikiran
Copy link
Author

I found these logs under spark driver logs. Are these sufficient?
Log4j: http://txt.do/1d0iv
StdErr: https://file.io/OPri5A
StdOut: https://file.io/G37Zar

@jakubhava
Copy link
Contributor

Thank you @sasikiran, will have a look

@jakubhava
Copy link
Contributor

Could you please try the nightly build from this link? We changed some internal configuration which my influence this behavior https://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/nightly/91/index.html

Thank you!

@sasikiran
Copy link
Author

Thanks @jakubhava. I used the nightly build on the same cluster with same code. The error remains same as well. Do you want me to share the logs again?

@jakubhava
Copy link
Contributor

Oki, thank you! I think it is fine, we will think about what we can do.

@jakubhava
Copy link
Contributor

Hi @sasikiran, we have been working on a better solution. Can you please give it another try and try this nightly release? This one should do http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/nightly/98/index.html

@sasikiran
Copy link
Author

Hi @jakubhava, thanks for the update. I tried it out using the new nightly build. I am getting a different error this time. I tried it in both R and Scala.

org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$$anonfun$startH2OWorkers$1.apply(InternalH2OBackend.scala:161) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$$anonfun$startH2OWorkers$1.apply(InternalH2OBackend.scala:159) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.startH2OWorkers(InternalH2OBackend.scala:159) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.org$apache$spark$h2o$backends$internal$InternalH2OBackend$$startH2OCluster(InternalH2OBackend.scala:97) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:76) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:419) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:3) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:48) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:50) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw.<init>(command-3675229254105724:52)

Here are the detailed logs:
stderr - https://file.io/LFuxcr
stdout - https://file.io/zlrMRu
log4j - https://file.io/e1hVZU

@jakubhava
Copy link
Contributor

Thank you this is helpful! This is still fairly new change so we need to iterate.

However, the main error happens on the executor machines. Could you please sent us logs from the executors?

@sasikiran
Copy link
Author

I was able to find these. Please let me know if these are good enough.

Executor 1 log: https://file.io/NZsDMH
Executor 2 log: https://file.io/kjlqAT

@jakubhava
Copy link
Contributor

Thanks, I had a look inside, but did not see useful logs. I will spent some time on it and try to reproduce myself.

@jakubhava
Copy link
Contributor

jakubhava commented May 28, 2019

I created testing environment with the same configuration in Databricks Azure you have but still can't reproduce, testing using Scala with the simple code bellow:

import org.apache.spark.h2o._
val hc = H2OContext.getOrCreate(spark)

Quick question: Are you specifying any additional spark options?

What the Error above says is that the the driver did not get response from H2O worker node, so the H2O worker node probably didn't start. The logs you sent do not contain information about this behavior.

But you should be able to get the right logs of failing case when you go to Spark UI -> click on executors and send here stdout & stderr from the executors. That would help us a lot

@jakubhava
Copy link
Contributor

These screenshots should help with getting the right logs
Screen Shot 2019-05-28 at 3 55 51 PM
Screen Shot 2019-05-28 at 3 55 57 PM

@sasikiran
Copy link
Author

Hi @jakubhava, I think there is some conflict between databricks runtime and sparkling water. I created a new cluster and selected databricks runtime: 5.3 (includes Apache Spark 2.4.0, Scala 2.11) and ran it. It worked. Then, I went back to my previous cluster which uses 5.3 ML (includes Apache Spark 2.4.0, Scala 2.11) and ran the same code. It failed with Exception thrown in awaitResult:

As a workaround, I will use sparkling water on databricks runtime 5.3.

Thanks so much!

@jakubhava
Copy link
Contributor

Hi @sasikiran thanks so much, this information will help us with debugging!

@jakubhava
Copy link
Contributor

jakubhava commented May 30, 2019

@sasikiran just FYI: I was able to reproduce the issue

Core issue is:

java.lang.IllegalAccessError: tried to access class ml.dmlc.xgboost4j.java.NativeLibLoader from class hex.tree.xgboost.XGBoostExtension
	at hex.tree.xgboost.XGBoostExtension.initXgboost(XGBoostExtension.java:68)
	at hex.tree.xgboost.XGBoostExtension.isEnabled(XGBoostExtension.java:49)
	at water.ExtensionManager.isEnabled(ExtensionManager.java:189)
	at water.ExtensionManager.registerCoreExtensions(ExtensionManager.java:103)
	at water.H2O.main(H2O.java:2002)
	at water.H2OStarter.start(H2OStarter.java:22)
	at water.H2OStarter.start(H2OStarter.java:47)
	at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.startH2OWorker(InternalH2OBackend.scala:123)
	at org.apache.spark.h2o.backends.internal.H2ORpcEndpoint$$anonfun$receiveAndReply$1.applyOrElse(H2ORpcEndpoint.scala:52)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:226)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@sasikiran
Copy link
Author

sasikiran commented May 31, 2019 via email

@PyramidDelta
Copy link

Hi, @jakubhava do you have any workarounds or recommendations on this issue?

@jakubhava
Copy link
Contributor

Adding @honzasterba as I discussed that with him. Do you please remember the issue with XGBoost conflicts we discussed on Slack?

@honzasterba
Copy link

I think this was caused by there being two xgboost jars on the classpath - one from h2o and one from another library, this caused various issues with xgboost classes being present twice.

@jakubhava
Copy link
Contributor

Yup, exactly. Do you have any workaround on your mind?

@honzasterba
Copy link

  1. figure out where the second xgboost is coming from and if you can somehow prevent it from getting on the classpath
  2. if not than we will have fix it on our side (by package-rename or something similar)

@PyramidDelta
Copy link

PyramidDelta commented Aug 29, 2019

I see, thank you.

One more question, which versions of spark and scala where used for sparkling water latest release? It seems, that I've found the issue with xgboost and fixed it, but now I've faced some issues with compiler.

I'm using spark 2.4.3 with scala 2.11.12 on win10 x64 with hadoop 2.7.1 via winutils. Should I create a new issue on this problem?

java.lang.NoClassDefFoundError: scala/tools/nsc/interpreter/InteractiveReader at water.api.scalaInt.ScalaCodeHandler.createInterpreterInPool(ScalaCodeHandler.scala:145) at water.api.scalaInt.ScalaCodeHandler$$anonfun$initializeInterpreterPool$1.apply(ScalaCodeHandler.scala:139) at water.api.scalaInt.ScalaCodeHandler$$anonfun$initializeInterpreterPool$1.apply(ScalaCodeHandler.scala:138) at scala.collection.immutable.Range.foreach(Range.scala:160) at water.api.scalaInt.ScalaCodeHandler.initializeInterpreterPool(ScalaCodeHandler.scala:138) at water.api.scalaInt.ScalaCodeHandler.<init>(ScalaCodeHandler.scala:42) at water.api.scalaInt.ScalaCodeHandler$.registerEndpoints(ScalaCodeHandler.scala:171) at water.api.CoreRestAPI$.registerEndpoints(CoreRestAPI.scala:32) at water.api.RestAPIManager.register(RestAPIManager.scala:39) at water.api.RestAPIManager.registerAll(RestAPIManager.scala:31) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:76) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:128) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:396) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:431) at test.spark.assembly.job.Test$$anonfun$working$1.apply(Test.scala:76) at test.spark.assembly.job.Test$$anonfun$working$1.apply(Test.scala:69) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.ClassNotFoundException: scala.tools.nsc.interpreter.InteractiveReader at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 23 more

@jakubhava
Copy link
Contributor

Yes, please create a new issue for this one, thanks

@BhushG
Copy link

BhushG commented Dec 25, 2019

@MrHadgehog How did you fix XGBoost issue? I'm having same issue in my project. I've following XGBoost dependency in my project along with H2O Sparkling water dependency.

ml.dmlc xgboost4j-spark 0.82 ai.h2o sparkling-water-package_2.11 3.28.0.1-1-2.3

@BhushG
Copy link

BhushG commented Dec 26, 2019

@jakubhava Is there any way I can avoid this XGboost : java.lang.IllegalAccessError: tried to access class ml.dmlc.xgboost4j.java.NativeLibLoader from class hex.tree.xgboost.XGBoostExtension error?

@PyramidDelta
Copy link

PyramidDelta commented Dec 31, 2019

@MrHadgehog How did you fix XGBoost issue? I'm having same issue in my project. I've following XGBoost dependency in my project along with H2O Sparkling water dependency.

ml.dmlc xgboost4j-spark 0.82 ai.h2o sparkling-water-package_2.11 3.28.0.1-1-2.3

Hi @BhushG, sorry for the late reply. I removed dependencies ml.dmlc.xgboost4j or any similar from the entire project. After that the issue was solved.

@BhushG
Copy link

BhushG commented Jan 5, 2020

@MrHadgehog @jakubhava Hi, I found a turnaround for this problem. Apparently, the sequence of the dependencies matters in maven. So I moved the dependency of h2o above dependency of xgboost4j-spark and it worked.

    <dependency>
        <groupId>ai.h2o</groupId>
        <artifactId>sparkling-water-package_2.11</artifactId>
        <version>${h2o.automl.version}</version>
    </dependency>
    <dependency>
        <groupId>ml.dmlc</groupId>
        <artifactId>xgboost4j-spark</artifactId>
        <version>${xgboost4j.spark.verion}</version>
    </dependency>

Thus h2o doesn't throw any error related to XGBoost

@garimagupta25
Copy link

hi, how to resolve this xgboost conflict error in azure databricks?

@garimagupta25
Copy link

Py4JJavaError: An error occurred while calling o411.getOrCreate.
: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:431)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.$anonfun$startH2OWorkers$1(InternalH2OBackend.scala:183)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.org$apache$spark$h2o$backends$internal$InternalH2OBackend$$startH2OWorkers(InternalH2OBackend.scala:181)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend.startH2OCluster(InternalH2OBackend.scala:48)
at org.apache.spark.h2o.H2OContext.(H2OContext.scala:85)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:509)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at scala.concurrent.Promise.tryFailure(Promise.scala:112)
at scala.concurrent.Promise.tryFailure$(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:214)
at org.apache.spark.rpc.netty.NettyRpcEnv.onSuccess$1(NettyRpcEnv.scala:223)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$askAbortable$7(NettyRpcEnv.scala:246)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$askAbortable$7$adapted(NettyRpcEnv.scala:246)
at org.apache.spark.rpc.netty.RpcOutboxMessage.onSuccess(Outbox.scala:90)
at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:194)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Caused by: java.lang.IllegalAccessError: tried to access class ml.dmlc.xgboost4j.java.NativeLibLoader from class hex.tree.xgboost.XGBoostExtension
at hex.tree.xgboost.XGBoostExtension.initXgboost(XGBoostExtension.java:70)
at hex.tree.xgboost.XGBoostExtension.isEnabled(XGBoostExtension.java:51)
at water.ExtensionManager.isEnabled(ExtensionManager.java:189)
at water.ExtensionManager.registerCoreExtensions(ExtensionManager.java:103)
at water.H2O.main(H2O.java:2158)
at water.H2OStarter.start(H2OStarter.java:22)
at water.H2OStarter.start(H2OStarter.java:52)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.startH2OWorker(InternalH2OBackend.scala:153)
at org.apache.spark.h2o.backends.internal.H2ORpcEndpoint$$anonfun$receiveAndReply$1.applyOrElse(H2ORpcEndpoint.scala:58)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.$anonfun$run$1(SparkThreadLocalForwardingThreadPoolExecutor.scala:104)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:68)
at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured$(SparkThreadLocalForwardingThreadPoolExecutor.scala:54)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:101)
at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run(SparkThreadLocalForwardingThreadPoolExecutor.scala:104)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more

@mn-mikke
Copy link
Collaborator

@garimagupta25 Can you use a plain non-ML version of Databricks runtime?

@CarlaFernandez
Copy link

Hello, bumping this issue again. I'm working in a databricks cluster in which I have XGBoost4j libraries installed. I'm trying to create a dummy model using Pysparkling water, finding the same issues described above. I cannot uninstall the xgboost libraries, since they're needed for other apps running on our cluster, and I cannot switch the cluster mode either.

Thus, the solution proposed by @honzasterba is the only option to use Pysparkling for my team in our current setup:

  1. if not than we will have fix it on our side (by package-rename or something similar)

Is it feasible to do it? I'm currently using pysparkling-water version 3.1 and the issue still remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants