Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to change elasticsearchClusterName #416

Closed
nberthet opened this issue Dec 1, 2015 · 13 comments
Closed

Unable to change elasticsearchClusterName #416

nberthet opened this issue Dec 1, 2015 · 13 comments

Comments

@nberthet
Copy link

nberthet commented Dec 1, 2015

Hi,

I didn't try yet to play with different combination, but I was trying to deploy the framework using the marathon json example.

I tried to set both the frameworkName and elasticsearchClusterName, the framework was properly started, but it was unable to start the elasticsearch nodes, it seems some zookeeper path is being computed wrongly (additional /), please see the stack trace I included.

EDIT: after playing around with options, I confirm it's only about the elasticsearchClusterName, changing the frameworkName is fine.

EDIT: After verification, it seems the problems happens anyway, no matter if I specify the cluster name / framework name or not

My marathon json

{
  "id": "/system/frameworks/elasticsearch-scheduler",
  "container": {
    "docker": {
      "image": "mesos/elasticsearch-scheduler",
      "network": "BRIDGE",
      "portMappings": [
        { "containerPort": 31100, "hostPort": 31100, "protocol": "tcp" }
      ]
    }
  },
  "args": [
    "--zookeeperMesosUrl", "zk://zk1:2181,zk2:2181,zk3:2181/mesos",
    "--frameworkName", "elk-elasticsearch",
    "--elasticsearchClusterName", "elk-elasticsearch"
  ],
  "cpus": 0.2,
  "mem": 512.0,
  "env": {
    "JAVA_OPTS": "-Xms128m -Xmx256m"
  },
  "instances": 1
}

Stacktrace found in the framework logs

[INFO] 2015-12-01 04:18:38,794 org.apache.mesos.elasticsearch.scheduler.state.ClusterState getTaskList - Unable to get key for cluster state due to invalid frameworkID.
java.io.IOException: Unable to get zNode
    at org.apache.mesos.elasticsearch.scheduler.state.SerializableZookeeperState.get(SerializableZookeeperState.java:51)
    at org.apache.mesos.elasticsearch.scheduler.state.ClusterState.getTaskList(ClusterState.java:44)
    at org.apache.mesos.elasticsearch.scheduler.state.ClusterState.getGuiTaskList(ClusterState.java:57)
    at org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler.getTasks(ElasticsearchScheduler.java:42)
    at org.apache.mesos.elasticsearch.scheduler.controllers.SearchProxyController.stats(SearchProxyController.java:38)
    at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:137)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:776)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:705)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:967)
    at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:858)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:843)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:85)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1521)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1478)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: Failed to get '/elk-elasticsearch/elk-elasticsearch//stateList' in ZooKeeper: bad arguments
    at org.apache.mesos.state.AbstractState$FetchFuture.get(Native Method)
    at org.apache.mesos.state.AbstractState$FetchFuture.get(AbstractState.java:226)
    at org.apache.mesos.elasticsearch.scheduler.state.SerializableZookeeperState.get(SerializableZookeeperState.java:29)
    ... 48 more
@nberthet nberthet changed the title Unable to change frameworkName and elasticsearchClusterName Unable to change elasticsearchClusterName Dec 1, 2015
@philwinder
Copy link
Contributor

Hi @nberthet, thanks for getting in touch.

You're problem is nothing to do with the cli parameters, what you have put is correct. The problem is that the framework is unable to register as a framework with the master. This is usually caused by not being able to connect to zookeeper, or the master. There is an open issue to sanity check the framework state before this method is called, to prevent this stack trace. But the real issue is the inability to register as a framework. Check ip addresses/hostnames/hostname resolution/dns/firewall/auth/etc.

#410

Thanks, Phil

@nberthet
Copy link
Author

nberthet commented Dec 1, 2015

Hi @philwinder,

thanks for the quick heads up.

Actually, I schedule the framework from marathon, the task starts "properly". After that, I can see the framework being registered in mesos, that's from there that I accessed the framework web UI.

Also according to netstat.. I can see a connection established to both mesos master and zookeeper

CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                      NAMES
a6f9b340903c        mesos/elasticsearch-scheduler   "/tmp/start-scheduler"   17 minutes ago      Up 17 minutes       0.0.0.0:31100->31100/tcp   mesos-adce775b-a017-4309-8208-20440bce5d76-S22.b39bfdd4-b177-42d7-bfa6-f6ed56212c1b
docker exec a6 netstat -pan | grep EST
tcp        0      0 172.17.0.96:43552       192.168.118.22:2181     ESTABLISHED 5/java          
tcp        0      0 172.17.0.96:43548       192.168.118.22:2181     ESTABLISHED 5/java          
tcp        0      0 172.17.0.96:49693       192.168.118.20:5050     ESTABLISHED 5/java 

Anything else I can look at ?

@philwinder
Copy link
Contributor

Because you started it from marathon, it will always keep the scheduler alive. Just because the scheduler is "on" doesn't mean it is working. All I can say is that the scheduler can't register the elasticsearch framework for some reason. As to why, it could be a number of reasons and each user will have different problems. It is usually hostame resolution that causes the problem.

Ah, I see that you are running the scheduler in "BRIDGE" mode. Are you sure your master can route to your scheduler? Have you got a dns/service discovery mechanism set up so that the master can obtain the correct IP address of the scheduler? If not, and I suspect not, then try running the scheduler in HOST mode, so that the scheduler takes the IP address of the slave it is running on.

Thanks, Phil

@nberthet
Copy link
Author

nberthet commented Dec 1, 2015

Hi @philwinder,

I confirm, running with HOST would work. I'm just trying to understand why, because the port mapping was correct and all config done by IPs rather than hostnames.

What connectivity (ie. ports) is required between the framework and the mesos master / what is being advertised ?

@philwinder
Copy link
Contributor

OK great. So the scheduler needs to communicate with the master. The scheduler requests to register the framework, then the master acknowledges this. This is done asynchronously so the scheduler needs to be routable from the master. In order to do that, the scheduler must advertise itself on the correct hostname/ipaddress which is normally taken from the mesos-agent hostname. Unfortunately that is the hostname of the agent machine, and not the scheduler.

So in short, you would need to have a proxy running on the agent to proxy the traffic to the scheduler, although I haven't tested this.

I will add a task to update the docs to recommend host mode, as BRIDGE mode is probably more trouble than it's worth for the scheduler. Thanks.

@nberthet
Copy link
Author

nberthet commented Dec 2, 2015

Hi @philwinder,

Thanks for your help. I'll give another shot at bridged configuration whenever I'll have some time to experiment.

At least for the time being everything is peachy

@zoza1982
Copy link

I am running in host mode and getting the same error. It clearly says in the beginning that successfully connected to Zookeeper.

Mesos version: 0.25.0-0.2.70.ubuntu1404 ( tried 0.26 at first then downgraded to what you guys tested on )

es.json:

cloud-user@mesos-master:~$ cat es.json 
{
  "id": "/system/frameworks/elasticsearch-scheduler",
  "container": {
    "docker": {
      "image": "mesos/elasticsearch-scheduler",
      "network": "HOST"
    }
  },
  "args": [
    "--zookeeperMesosUrl", "zk://192.168.3.156:2181,192.168.3.157:2181,192.168.3.158:2181/mesos",
    "--frameworkName", "elk-elasticsearch",
    "--elasticsearchClusterName", "webex"
  ],
  "cpus": 0.2,
  "mem": 512.0,
  "env": {
    "JAVA_OPTS": "-Xms128m -Xmx256m"
  },
  "instances": 1
}
cloud-user@mesos-master:~$ 

Here is the full log from start of the container till the error:

cloud-user@mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e3416:~$ sudo docker logs mesos-a64dbeff-b981-49f7-ad32-c8e13c00e18d-S0.c8b95d4e-15d2-4ad5-87c9-cf72ce8bc29c | more
2016-01-12 06:25:15,482:5(0x7f615bba8700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
[INFO] 2016-01-12 06:25:15,429 org.apache.mesos.elasticsearch.scheduler.Configuration getMesosStateZKURL - Zookeeper framework option is blank, using Zookeeper f
or Mesos: zk://192.168.3.156:2181,192.168.3.157:2181,192.168.3.158:2181/mesos
2016-01-12 06:25:15,482:5(0x7f615bba8700):ZOO_INFO@log_env@716: Client environment:host.name=mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e3416
2016-01-12 06:25:15,482:5(0x7f615bba8700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2016-01-12 06:25:15,482:5(0x7f615bba8700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-68-generic
2016-01-12 06:25:15,483:5(0x7f615bba8700):ZOO_INFO@log_env@725: Client environment:os.version=#111-Ubuntu SMP Fri Nov 6 18:17:06 UTC 2015
2016-01-12 06:25:15,483:5(0x7f615bba8700):ZOO_INFO@log_env@733: Client environment:user.name=(null)
2016-01-12 06:25:15,483:5(0x7f615bba8700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2016-01-12 06:25:15,483:5(0x7f615bba8700):ZOO_INFO@log_env@753: Client environment:user.dir=/
2016-01-12 06:25:15,483:5(0x7f615bba8700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=192.168.3.156:2181,192.168.3.157:2181,192.168.3.158:2181 sessionTimeout=20000 watcher=0x7f6192225600 sessionId=0 sessionPasswd=<null> context=0x7f61440031d0 flags=0
2016-01-12 06:25:15,487:5(0x7f6152ffd700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.3.156:2181]
2016-01-12 06:25:15,492:5(0x7f6152ffd700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.3.156:2181], sessionId=0x152346d1a500004, negotiated timeout=20000
SLF4J: Class path contains multiple SLF4J bindings.
[DEBUG] 2016-01-12 06:25:15,516 org.apache.mesos.elasticsearch.scheduler.FrameworkInfoFactory setWebuiUrl - Setting webuiUrl to http://mesos-slave-9590a52c-077b-
4d7c-8fa2-9667f73e3416:31100
SLF4J: Found binding in [jar:file:/tmp/elasticsearch-mesos-scheduler.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/tmp/elasticsearch-mesos-scheduler.jar!/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] 2016-01-12 06:25:15,810 org.apache.mesos.elasticsearch.scheduler.Main logStarting - Starting Main v0.5.2 on mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e34
16 with PID 5 (/tmp/elasticsearch-mesos-scheduler.jar started by root in /)
[DEBUG] 2016-01-12 06:25:15,811 org.apache.mesos.elasticsearch.scheduler.Main logStarting - Running with Spring Boot v0.5.2, Spring v0.5.2
[DEBUG] 2016-01-12 06:25:16,629 org.jboss.logging make - Logging Provider: org.jboss.logging.Log4jLoggerProvider
[INFO] 2016-01-12 06:25:16,630 org.hibernate.validator.internal.util.Version <clinit> - HV000001: Hibernate Validator 5.1.3.Final
[DEBUG] 2016-01-12 06:25:16,637 org.hibernate.validator.internal.engine.resolver.DefaultTraversableResolver detectJPA - Cannot find javax.persistence.Persistence
 on classpath. Assuming non JPA 2 environment. All properties will per default be traversable.
[DEBUG] 2016-01-12 06:25:16,643 org.hibernate.validator.internal.engine.ConfigurationImpl messageInterpolator - Setting custom MessageInterpolator of type org.sp
ringframework.validation.beanvalidation.LocaleContextMessageInterpolator
[DEBUG] 2016-01-12 06:25:16,644 org.hibernate.validator.internal.engine.ConfigurationImpl constraintValidatorFactory - Setting custom ConstraintValidatorFactory 
of type org.springframework.validation.beanvalidation.SpringConstraintValidatorFactory
[DEBUG] 2016-01-12 06:25:16,645 org.hibernate.validator.internal.engine.ConfigurationImpl parameterNameProvider - Setting custom ParameterNameProvider of type co
m.sun.proxy.$Proxy25
[DEBUG] 2016-01-12 06:25:16,650 org.hibernate.validator.internal.xml.ValidationXmlParser getInputStream - Trying to load META-INF/validation.xml for XML based Va
lidator configuration.
[DEBUG] 2016-01-12 06:25:16,657 org.hibernate.validator.internal.xml.ValidationXmlParser getInputStream - No META-INF/validation.xml found. Using annotation base
d configuration only.
[INFO] 2016-01-12 06:25:17,191 org.apache.catalina.core.StandardService log - Starting service Tomcat
[INFO] 2016-01-12 06:25:17,191 org.apache.catalina.core.StandardEngine log - Starting Servlet Engine: Apache Tomcat/8.0.23
[INFO] 2016-01-12 06:25:17,375 org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] log - Initializing Spring embedded WebApplicationContext
[DEBUG] 2016-01-12 06:25:18,174 org.hibernate.validator.internal.engine.resolver.DefaultTraversableResolver detectJPA - Cannot find javax.persistence.Persistence
 on classpath. Assuming non JPA 2 environment. All properties will per default be traversable.
[DEBUG] 2016-01-12 06:25:18,175 org.hibernate.validator.internal.engine.ConfigurationImpl messageInterpolator - Setting custom MessageInterpolator of type org.sp
ringframework.validation.beanvalidation.LocaleContextMessageInterpolator
[DEBUG] 2016-01-12 06:25:18,176 org.hibernate.validator.internal.engine.ConfigurationImpl constraintValidatorFactory - Setting custom ConstraintValidatorFactory 
of type org.springframework.validation.beanvalidation.SpringConstraintValidatorFactory
[DEBUG] 2016-01-12 06:25:18,176 org.hibernate.validator.internal.engine.ConfigurationImpl parameterNameProvider - Setting custom ParameterNameProvider of type co
m.sun.proxy.$Proxy25
[DEBUG] 2016-01-12 06:25:18,177 org.hibernate.validator.internal.xml.ValidationXmlParser getInputStream - Trying to load META-INF/validation.xml for XML based Va
lidator configuration.
2016-01-12 06:25:18,461:5(0x7f6153fff700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2016-01-12 06:25:18,461:5(0x7f6153fff700):ZOO_INFO@log_env@716: Client environment:host.name=mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e3416
--More--2016-01-12 06:25:18,462:5(0x7f6153fff700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
I0112 06:25:18.461844    20 sched.cpp:164] Version: 0.25.0
2016-01-12 06:25:18,462:5(0x7f6153fff700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-68-generic
2016-01-12 06:25:18,462:5(0x7f6153fff700):ZOO_INFO@log_env@725: Client environment:os.version=#111-Ubuntu SMP Fri Nov 6 18:17:06 UTC 2015
2016-01-12 06:25:18,462:5(0x7f6153fff700):ZOO_INFO@log_env@733: Client environment:user.name=(null)
2016-01-12 06:25:18,462:5(0x7f6153fff700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2016-01-12 06:25:18,462:5(0x7f6153fff700):ZOO_INFO@log_env@753: Client environment:user.dir=/
2016-01-12 06:25:18,462:5(0x7f6153fff700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=192.168.3.156:2181,192.168.3.157:2181,192.168.3.158:2181 sessionTimeout=10000 watcher=0x7f6192225600 sessionId=0 sessionPasswd=<null> context=0x7f6134000db0 flags=0
2016-01-12 06:25:18,468:5(0x7f6130ffe700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.3.158:2181]
2016-01-12 06:25:18,470:5(0x7f6130ffe700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.3.158:2181], sessionId=0x352346d16640004, negotiated timeout=10000
I0112 06:25:18.470600    22 group.cpp:331] Group process (group(1)@127.0.1.1:36615) connected to ZooKeeper
I0112 06:25:18.470690    22 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0112 06:25:18.470741    22 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I0112 06:25:18.472239    22 detector.cpp:156] Detected a new leader: (id='1')
I0112 06:25:18.472376    28 group.cpp:674] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I0112 06:25:18.473311    28 detector.cpp:481] A new leading master (UPID=master@192.168.3.159:5050) is detected
I0112 06:25:18.473372    28 sched.cpp:262] New master detected at master@192.168.3.159:5050
I0112 06:25:18.473510    28 sched.cpp:272] No credentials provided. Attempting to register without authentication
[DEBUG] 2016-01-12 06:25:18,178 org.hibernate.validator.internal.xml.ValidationXmlParser getInputStream - No META-INF/validation.xml found. Using annotation base
d configuration only.
[INFO] 2016-01-12 06:25:18,431 org.apache.coyote.http11.Http11NioProtocol log - Initializing ProtocolHandler ["http-nio-31100"]
[INFO] 2016-01-12 06:25:18,440 org.apache.coyote.http11.Http11NioProtocol log - Starting ProtocolHandler ["http-nio-31100"]
[INFO] 2016-01-12 06:25:18,445 org.apache.tomcat.util.net.NioSelectorPool log - Using a shared selector for servlet write/read
[INFO] 2016-01-12 06:25:18,460 org.apache.mesos.elasticsearch.scheduler.Main logStarted - Started Main in 2.834 seconds (JVM running for 3.437)
[INFO] 2016-01-12 06:25:18,461 org.apache.mesos.elasticsearch.scheduler.Configuration getFrameworkZKURL - Zookeeper framework option is blank, using Zookeeper fo
r Mesos: zk://192.168.3.156:2181,192.168.3.157:2181,192.168.3.158:2181/mesos
[INFO] 2016-01-12 06:25:18,461 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler run - Starting ElasticSearch on Mesos - [numHwNodes: 3, zk m
esos: zk://192.168.3.156:2181,192.168.3.157:2181,192.168.3.158:2181/mesos, zk framework: 192.168.3.156:2181,192.168.3.157:2181,192.168.3.158:2181, ram:256.0]
[INFO] 2016-01-12 06:25:19,897 org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] log - Initializing Spring FrameworkServlet 'dispatcherServlet'
[INFO] 2016-01-12 06:25:19,934 org.apache.mesos.elasticsearch.scheduler.state.ClusterState getTaskList - Unable to get key for cluster state due to invalid frame
workID.
java.io.IOException: Unable to get zNode
    at org.apache.mesos.elasticsearch.scheduler.state.SerializableZookeeperState.get(SerializableZookeeperState.java:51)
    at org.apache.mesos.elasticsearch.scheduler.state.ClusterState.getTaskList(ClusterState.java:44)
    at org.apache.mesos.elasticsearch.scheduler.state.ClusterState.getGuiTaskList(ClusterState.java:57)
    at org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler.getTasks(ElasticsearchScheduler.java:42)
    at org.apache.mesos.elasticsearch.scheduler.controllers.SearchProxyController.stats(SearchProxyController.java:38)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:137)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:776)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:705)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:967)
    at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:858)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:843)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:85)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1521)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1478)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: Failed to get '/elk-elasticsearch/webex//stateList' in ZooKeeper: bad arguments

Any ideas @philwinder ?
Thanks

@zoza1982
Copy link

I was looking #411 could it be because I dont use hostnames at all? My config is only ip based. See above.

@philwinder
Copy link
Contributor

Hi @zoza1982
Like the comments above, you are getting this error because the framework cannot register with the Master. What is causing this is specific to your system configuration.

In the upcoming release, which should be landing within the next couple of days, I have added some sanity checks to ensure the framework is registered. This will remove this obscure ZooKeeper error and replace it with a simpler "the framework cannot register with the master" type message.

I can't see from your logs what you specific problem is. I would recommend waiting a couple of days and trying on the new 0.7.0 version. If that doesn't work, then we can look again. Unfortunately the framework register code is an async callback from Mesos. So it's hard to know why it hasn't registered. The only thing you can do is trawl through the master logs and try to find an error message that relates to the ES framework.

@zoza1982
Copy link

Thank you @philwinder . I will wait for 0.7 release and then come back here.
Can you tell me in your #411 what did you mean by required DNS records? I wonder if has something to do with that.

@philwinder
Copy link
Contributor

To cut a long story short, if you are using hostnames, then the hostnames need to be resolvable. Otherwise Mesos and ES won't know the address of the hostname.

In 0.7 we've added the option to overwrite the hostnames with ip addresses, like Mesos can. This might help your situation.

But either way, I would recommend always thinking about your network. E.g. "is that routable, is that on a different subnet, is that hostname resolvable", etc. It is the hardest thing to get right in clusters, especially when using docker.

@zoza1982
Copy link

Here is my case:

  1. all in same subnet (no ACL/Firewall )
  2. definitely routable ( tested it manually too )
  3. I don't use hostname, just IP's ( hence my question above) ***
  4. Running container in host network.

I just shrunk cluster to just 1 mesos master and 1 slave to narrow down the troubleshooting and define manually hostnames in /etc/hosts on both. What names do you suggest to put? mesos.master? zookepers?

@zoza1982
Copy link

I fixed it ! Here is the issue.

By looking at the master logs very deeply...I found something weird.

This keeps looping ....

I0112 17:11:40.827235  3088 master.cpp:2179] Received SUBSCRIBE call for framework 'elasticsearch' at scheduler-a95171e8-e9db-48b1-9065-553e630645a2@127.0.1.1:53203
I0112 17:11:40.827296  3088 master.cpp:2250] Subscribing framework elasticsearch with checkpointing enabled and capabilities [  ]
I0112 17:11:40.827309  3088 master.cpp:2260] Framework a64dbeff-b981-49f7-ad32-c8e13c00e18d-0003 (elasticsearch) at scheduler-a95171e8-e9db-48b1-9065-553e630645a2@127.0.1.1:53203 already subscribed, resending acknowledgement
W0112 17:11:40.827325  3088 master.hpp:1532] Master attempted to send message to disconnected framework a64dbeff-b981-49f7-ad32-c8e13c00e18d-0003 (elasticsearch) at scheduler-a95171e8-e9db-48b1-9065-553e630645a2@127.0.1.1:53203
E0112 17:11:40.827643  3089 socket.hpp:174] Shutdown failed on fd=30: Transport endpoint is not connected [107]
I0112 17:11:43.330909  3087 master.cpp:4967] Sending 1 offers to framework a64dbeff-b981-49f7-ad32-c8e13c00e18d-0000 (marathon) at scheduler-d3606a2c-3815-4c61-bced-37ffe309f503@127.0.1.1:49789
I0112 17:11:43.334976  3085 master.cpp:3300] Processing DECLINE call for offers: [ a64dbeff-b981-49f7-ad32-c8e13c00e18d-O18705 ] for framework a64dbeff-b981-49f7-ad32-c8e13c00e18d-0000 (marathon) at scheduler-d3606a2c-3815-4c61-bced-37ffe309f503@127.0.1.1:49789
I0112 17:11:43.335073  3085 hierarchical.hpp:1103] Recovered cpus(*):3.8; mem(*):6449; disk(*):42827; ports(*):[31000-31676, 31678-32000] (total: cpus(*):4; mem(*):6961; disk(*):42827; ports(*):[31000-32000], allocated: cpus(*):0.2; mem(*):512; ports(*):[31677-31677]) on slave a64dbeff-b981-49f7-ad32-c8e13c00e18d-S0 from framework a64dbeff-b981-49f7-ad32-c8e13c00e18d-0000

Notice from masters perspective Elasticsearch framework is on IP address @127.0.1.1:53203 ???? What?? It can't be ...it should be a slave's IP address..

So I went to slave in /etc/hosts and saw very weird thing ( a line )which was there by default ( ubuntu )

127.0.1.1  mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e3416.cisco.com        mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e3416 mesos-slave

So I corrected it to

127.0.1.1  localhost
192.168.3.160  mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e3416.cisco.com        mesos-slave-9590a52c-077b-4d7c-8fa2-9667f73e3416 mesos-slave

And on mesos master logs I see now :-)

I0112 17:26:25.314343  3081 master.cpp:2618] Processing REQUEST call for framework a64dbeff-b981-49f7-ad32-c8e13c00e18d-0005 (elasticsearch) at scheduler-942bf412-66c3-404f-b376-c6d602522529@192.168.3.160:53164

And everything else now started working.... 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants