New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`:cp` command takes 10s of minutes with not output, fails with mysterious error #298

Open
ryan-williams opened this Issue Aug 18, 2015 · 1 comment

Comments

Projects
None yet
1 participant
@ryan-williams

ryan-williams commented Aug 18, 2015

That JAR exists and I've successfully added it to the classpath in YARN-backed notebooks like the one where I see the error above; I don't know why it took so long, why it ultimately failed, or why sc was no longer in scope afterward.

@ryan-williams

This comment has been minimized.

ryan-williams commented Aug 18, 2015

All my ~100 executors seemed to be failing to communicate with the driver in the above app; they all had stack traces like:

15/08/18 00:16:07 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 58410.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        ... 4 more

I have no idea why that would happen; I regularly run Spark apps via {adam,spark}-{submit,shell} on this YARN cluster with the same config params with no issues.

Update: I originally included a further repro case here, but had an incorrect path to my JAR in the :cp command; that typo was only manifested in the :cp command finishing and the JAR I wanted added not being on the classpath, which was confusing. Also, the :cp command started a second YARN app, re-ran a previous Spark job I'd run, then tore down the 2nd YARN app and started a 3rd, which seems like strange behavior to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment