Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ContextFrontend: Ask worker connection for context failed #543

Closed
rajexp opened this issue Mar 27, 2019 · 1 comment
Closed

ContextFrontend: Ask worker connection for context failed #543

rajexp opened this issue Mar 27, 2019 · 1 comment

Comments

@rajexp
Copy link

rajexp commented Mar 27, 2019

Jobs get completed successfully on most of the occasions. But recently mist server failed jobs with error executor was terminated. After it for a certain duration mist server was returning 500 error for other jobs.
Logs recorded are provided below

    2019-03-27 02:24:36 WARN  ReliableDeliverySupervisor:131 - Association with remote system [akka.tcp://mist-info-provider@127.0.0.1:38177] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://mist-info-provider@127.0.0.1:38177]] Caused by: [Connection refused: /127.0.0.1:38177]
    2019-03-27 02:24:37 WARN  RemoteWatcher:131 - Detected unreachable: [akka.tcp://mist-worker-Big-Query-3-v1_66b0b2a4-624b-4d52-b947-de1445870c80-pool-1@x.x.x.x:46424]  
    2019-03-27 02:24:37 WARN  RemoteWatcher:131 - Detected unreachable: [akka.tcp://mist-worker-Big-Query-1-v1_46079ac9-5e6b-44c0-a736-4eca7735d41d-pool-1@y.y.y.y:40868]  
    2019-03-27 02:24:37 WARN  RemoteWatcher:131 - Detected unreachable: [akka.tcp://mist-info-provider@127.0.0.1:38177]
    2019-03-27 02:24:37 INFO  JobActor:107 - Job fa70b0a9-617b-4de1-b71a-8dcef2f25f55 completed with error  
    2019-03-27 02:24:37 INFO  JobActor:107 - Job 6cd0a160-05ae-4ad5-bc4b-6abb0bac063d completed with error  
    2019-03-27 02:24:37 INFO  SharedConnector:107 - Releasing connection: requested 0, pooled 0, in use 1, starting: 0  
    2019-03-27 02:24:37 INFO  SharedConnector:107 - Releasing connection: requested 0, pooled 0, in use 0, starting: 0  
    2019-03-27 02:24:37 INFO  SharedConnector:107 - Released unused connection  
    2019-03-27 02:24:37 INFO  ContextFrontend:107 - Context Context-1 - move to inactive state  
    2019-03-27 02:24:37 INFO  ContextFrontend:107 - Context Context-3 - move to inactive state  
    2019-03-27 02:24:37 ERROR RestartSupervisor:143 - Reference for FunctionInfoProvider was terminated. Restarting

Also in mist logs I am getting this error continuously.

2019-03-27 04:00:01 INFO  ContextFrontend:107 - Context-1 - connected state(active connections: 0, max: 1)  
2019-03-27 04:00:09 ERROR SharedConnector:159 - Could not start worker connection  
java.lang.RuntimeException: Process terminated with error java.lang.RuntimeException: Process exited with status code 1 and out: Ivy Default Cache set to: /home/cassandra/.ivy2/cache;The jars for the packages stored in: /home/cassandra/.ivy2/jars;:: loading settings :: url = jar:file:/cassandra/spark2.2.1/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml;org.apache.hadoop#hadoop-aws added as a dependency;org.apache.hadoop#hadoop-client added as a dependency;com.typesafe#config added as a dependency;:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0;	confs: [default];	found org.apache.hadoop#hadoop-aws;2.7.4 in spark-list;	found org.apache.hadoop#hadoop-common;2.7.4 in spark-list;	found org.apache.hadoop#hadoop-annotations;2.7.4 in spark-list;	found com.google.guava#guava;11.0.2 in spark-list;	found com.google.code.findbugs#jsr305;3.0.0 in spark-list;	found commons-cli#commons-cli;1.2 in spark-list;	found org.apache.commons#commons-math3;3.1.1 in spark-list;	found xmlenc#xmlenc;0.52 in spark-list;	found commons-httpclient#commons-httpclient;3.1 in spark-list;	found commons-logging#commons-logging;1.1.3 in spark-list;	found commons-codec#commons-codec;1.4 in spark-list;	found commons-io#commons-io;2.4 in spark-list;	found commons-net#commons-net;3.1 in spark-list;	found commons-collections#commons-collections;3.2.2 in spark-list;	found javax.servlet#servlet-api;2.5 in spark-list;	found org.mortbay.jetty#jetty;6.1.26 in spark-list;	found org.mortbay.jetty#jetty-util;6.1.26 in spark-list
	at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
	at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
	at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
	at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
	at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)  
2019-03-27 04:00:09 ERROR ContextFrontend:159 - Ask new worker connection for Context-2 failed  
java.lang.RuntimeException: Process terminated with error java.lang.RuntimeException: Process exited with status code 1 and out: Ivy Default Cache set to: /home/cassandra/.ivy2/cache;The jars for the packages stored in: /home/cassandra/.ivy2/jars;:: loading settings :: url = jar:file:/cassandra/spark2.2.1/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml;org.apache.hadoop#hadoop-aws added as a dependency;org.apache.hadoop#hadoop-client added as a dependency;com.typesafe#config added as a dependency;:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0;	confs: [default];	found org.apache.hadoop#hadoop-aws;2.7.4 in spark-list;	found org.apache.hadoop#hadoop-common;2.7.4 in spark-list;	found org.apache.hadoop#hadoop-annotations;2.7.4 in spark-list;	found com.google.guava#guava;11.0.2 in spark-list;	found com.google.code.findbugs#jsr305;3.0.0 in spark-list;	found commons-cli#commons-cli;1.2 in spark-list;	found org.apache.commons#commons-math3;3.1.1 in spark-list;	found xmlenc#xmlenc;0.52 in spark-list;	found commons-httpclient#commons-httpclient;3.1 in spark-list;	found commons-logging#commons-logging;1.1.3 in spark-list;	found commons-codec#commons-codec;1.4 in spark-list;	found commons-io#commons-io;2.4 in spark-list;	found commons-net#commons-net;3.1 in spark-list;	found commons-collections#commons-collections;3.2.2 in spark-list;	found javax.servlet#servlet-api;2.5 in spark-list;	found org.mortbay.jetty#jetty;6.1.26 in spark-list;	found org.mortbay.jetty#jetty-util;6.1.26 in spark-list
	at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
	at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
	at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
	at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
	at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)  
2019-03-27 04:00:09 INFO  ContextFrontend:107 - Context-2 - connected state(active connections: 0, max: 1)  
2019-03-27 04:00:09 INFO  SharedConnector:107 - Pool is empty and we are able to start new one connection: inUse size :0

What is the possible cause is it related with some configuration issue? If it is then why is it not happening for all jobs?

@dos65
Copy link
Contributor

dos65 commented Mar 29, 2019

Probably, there are some errors in context configuration.
Could you check additional process logs in logs directory? There should be log files with name like local-worker-$context-name.

@dos65 dos65 closed this as completed Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants