This repository has been archived by the owner. It is now read-only.

Plugin does not reattempt discovery after UnknownHostException in KubernetesClient #70

Closed
das-vinculum opened this Issue Jan 27, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@das-vinculum

das-vinculum commented Jan 27, 2017

When using ElasticSearch 2.3.5 with the appropriate version of the Plugin, it seems that the API-DNS name is not discovered on the first attempt, which causes the join of the cluster to fail. Unfortunately the node does not reattempt joining the cluster, nor just shutdown the node in that case.

When manually restarting the pod it does recover the DNS gets resolved and the node joins the existing cluster.

This is the corresponding log.

[2017-01-27 09:55:00,649][INFO ][node                     ] [elasticsearch-2] version[2.3.5], pid[1], build[90f439f/2016-07-27T10:36:52Z]
[2017-01-27 09:55:00,650][INFO ][node                     ] [elasticsearch-2] initializing ...
[2017-01-27 09:55:02,923][INFO ][plugins                  ] [elasticsearch-2] modules [reindex, lang-expression, lang-groovy], plugins [kopf, cloud-kubernetes], sites [kopf]
[2017-01-27 09:55:03,034][INFO ][env                      ] [elasticsearch-2] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/xvdba)]], net usable_space [934gb], net total_space [984.1gb], spins? [possibly], types [ext4]
[2017-01-27 09:55:03,034][INFO ][env                      ] [elasticsearch-2] heap size [990.7mb], compressed ordinary object pointers [true]
[2017-01-27 09:55:12,144][INFO ][node                     ] [elasticsearch-2] initialized
[2017-01-27 09:55:12,144][INFO ][node                     ] [elasticsearch-2] starting ...
[2017-01-27 09:55:12,627][INFO ][transport                ] [elasticsearch-2] publish_address {10.10.81.3:9300}, bound_addresses {0.0.0.0:9300}
[2017-01-27 09:55:12,632][INFO ][discovery                ] [elasticsearch-2] graylog2/gGarC_0CTri6gLvNI4bUWQ
[2017-01-27 09:55:23,937][WARN ][io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider] [elasticsearch-2] Exception caught during discovery: An error has occurred.
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:125)
	at io.fabric8.elasticsearch.cloud.kubernetes.KubernetesAPIServiceImpl.endpoints(KubernetesAPIServiceImpl.java:35)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.readNodes(KubernetesUnicastHostsProvider.java:112)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.lambda$buildDynamicNodes$0(KubernetesUnicastHostsProvider.java:80)
	at java.security.AccessController.doPrivileged(Native Method)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.buildDynamicNodes(KubernetesUnicastHostsProvider.java:79)
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
	at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
	at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
	at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:886)
	at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:350)
	at org.elasticsearch.discovery.zen.ZenDiscovery.access$4800(ZenDiscovery.java:91)
	at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1237)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Name or service not known
	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
	at com.squareup.okhttp.Dns$1.lookup(Dns.java:39)
	at com.squareup.okhttp.internal.http.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:175)
	at com.squareup.okhttp.internal.http.RouteSelector.nextProxy(RouteSelector.java:141)
	at com.squareup.okhttp.internal.http.RouteSelector.next(RouteSelector.java:83)
	at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:174)
	at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126)
	at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95)
	at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281)
	at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224)
	at com.squareup.okhttp.Call.getResponse(Call.java:286)
	at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils$3.intercept(HttpClientUtils.java:110)
	at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:232)
	at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205)
	at com.squareup.okhttp.Call.execute(Call.java:80)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:210)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:205)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:510)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:118)
	... 16 more
[2017-01-27 09:55:25,526][WARN ][io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider] [elasticsearch-2] Exception caught during discovery: An error has occurred.
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:125)
	at io.fabric8.elasticsearch.cloud.kubernetes.KubernetesAPIServiceImpl.endpoints(KubernetesAPIServiceImpl.java:35)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.readNodes(KubernetesUnicastHostsProvider.java:112)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.lambda$buildDynamicNodes$0(KubernetesUnicastHostsProvider.java:80)
	at java.security.AccessController.doPrivileged(Native Method)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.buildDynamicNodes(KubernetesUnicastHostsProvider.java:79)
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.doRun(UnicastZenPing.java:249)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: kubernetes.default.svc
	at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
	at com.squareup.okhttp.Dns$1.lookup(Dns.java:39)
	at com.squareup.okhttp.internal.http.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:175)
	at com.squareup.okhttp.internal.http.RouteSelector.nextProxy(RouteSelector.java:141)
	at com.squareup.okhttp.internal.http.RouteSelector.next(RouteSelector.java:83)
	at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:174)
	at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126)
	at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95)
	at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281)
	at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224)
	at com.squareup.okhttp.Call.getResponse(Call.java:286)
	at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils$3.intercept(HttpClientUtils.java:110)
	at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:232)
	at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205)
	at com.squareup.okhttp.Call.execute(Call.java:80)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:210)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:205)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:510)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:118)
	... 11 more
[2017-01-27 09:55:27,030][WARN ][io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider] [elasticsearch-2] Exception caught during discovery: An error has occurred.
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:125)
	at io.fabric8.elasticsearch.cloud.kubernetes.KubernetesAPIServiceImpl.endpoints(KubernetesAPIServiceImpl.java:35)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.readNodes(KubernetesUnicastHostsProvider.java:112)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.lambda$buildDynamicNodes$0(KubernetesUnicastHostsProvider.java:80)
	at java.security.AccessController.doPrivileged(Native Method)
	at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.buildDynamicNodes(KubernetesUnicastHostsProvider.java:79)
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2$1.doRun(UnicastZenPing.java:253)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: kubernetes.default.svc
	at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
	at com.squareup.okhttp.Dns$1.lookup(Dns.java:39)
	at com.squareup.okhttp.internal.http.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:175)
	at com.squareup.okhttp.internal.http.RouteSelector.nextProxy(RouteSelector.java:141)
	at com.squareup.okhttp.internal.http.RouteSelector.next(RouteSelector.java:83)
	at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:174)
	at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126)
	at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95)
	at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281)
	at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224)
	at com.squareup.okhttp.Call.getResponse(Call.java:286)
	at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils$3.intercept(HttpClientUtils.java:110)
	at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:232)
	at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205)
	at com.squareup.okhttp.Call.execute(Call.java:80)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:210)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:205)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:510)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:118)
	... 11 more
[2017-01-27 09:55:27,135][INFO ][cluster.service          ] [elasticsearch-2] new_master {elasticsearch-2}{gGarC_0CTri6gLvNI4bUWQ}{10.10.81.3}{10.10.81.3:9300}{master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2017-01-27 09:55:27,148][INFO ][http                     ] [elasticsearch-2] publish_address {10.10.81.3:9200}, bound_addresses {0.0.0.0:9200}
[2017-01-27 09:55:27,149][INFO ][node                     ] [elasticsearch-2] started
[2017-01-27 09:55:27,235][INFO ][gateway                  ] [elasticsearch-2] recovered [0] indices into cluster_state

Thanks for the plugin, it saves us a lot of time.

@pires

This comment has been minimized.

Show comment
Hide comment
@pires

pires Jan 27, 2017

Contributor

I don't think it's the responsibility of the plug-in to retry but rather the discovery engine. Is there anything in contrary?

Contributor

pires commented Jan 27, 2017

I don't think it's the responsibility of the plug-in to retry but rather the discovery engine. Is there anything in contrary?

@das-vinculum

This comment has been minimized.

Show comment
Hide comment
@das-vinculum

das-vinculum Jan 27, 2017

I am not sure, the plugin does return an empty list and does not throw the Exception.

I found out, that the Unicast provider in ElasticSearch, for AWS, Azure and GCE, do it in the same way. See the GCEUnicastHostsProvider https://github.com/elastic/elasticsearch/blob/2.3/plugins/cloud-gce/src/main/java/org/elasticsearch/discovery/gce/GceUnicastHostsProvider.java#L99 for example.

So @pires is right, the plugin should not be responsible for retry.

I am sorry for the incovenience.

das-vinculum commented Jan 27, 2017

I am not sure, the plugin does return an empty list and does not throw the Exception.

I found out, that the Unicast provider in ElasticSearch, for AWS, Azure and GCE, do it in the same way. See the GCEUnicastHostsProvider https://github.com/elastic/elasticsearch/blob/2.3/plugins/cloud-gce/src/main/java/org/elasticsearch/discovery/gce/GceUnicastHostsProvider.java#L99 for example.

So @pires is right, the plugin should not be responsible for retry.

I am sorry for the incovenience.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.