New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No ElasticSearch Node Available" #312

Closed
nicolaifsf opened this Issue Jun 23, 2016 · 69 comments

Comments

Projects
None yet
@nicolaifsf
Copy link

nicolaifsf commented Jun 23, 2016

Please use the following questions as a guideline to help me answer
your issue/question without further inquiry. Thank you.

Which version of Elastic are you using?

[ ] elastic.v2 (for Elasticsearch 1.x)
[x ] elastic.v3 (for Elasticsearch 2.x)

Please describe the expected behavior

NewClient(elastic.SetURL("http://:9200")
would correctly generate a new Client object connecting to the node

Please describe the actual behavior

"no ElasticSearch node available"

Any steps to reproduce the behavior?

elastic.NewClient(elastic.SetURL("http://:9200")

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 24, 2016

Does it work with "http://127.0.0.1:9200"?

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 24, 2016

What does curl 'http://:9200/_nodes/http?pretty' return? Because Elastic uses a process called Sniffing to determine the other nodes of the cluster. Maybe they're unavailable?

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2016

Please respond to the issue if the question remains. Otherwise I'll close the issue. Thank you.

@nicolaifsf

This comment has been minimized.

Copy link
Author

nicolaifsf commented Jun 25, 2016

I apologize for the delay. I made a mistake in formatting the markdown here.
The issue is when I do something like elastic.NewClient(elastic.SetURL("http://someaddresshere:9200"))
A curl to that address works just fine, however it tells me that no elastic search node is available still.
Thank you for your prompt responses and well maintaining of this repo.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2016

Can you please try what I wrote in the 2nd comment?

@nicolaifsf

This comment has been minimized.

Copy link
Author

nicolaifsf commented Jun 25, 2016

elastic.NewClient(elastic.SetURL("http://127.0.0.1:9200")) works

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2016

What does curl 'http://:9200/_nodes/http?pretty' return?

@nicolaifsf

This comment has been minimized.

Copy link
Author

nicolaifsf commented Jun 25, 2016

{
"cluster_name" : "nicolaifsf-dev",
"nodes" : {
"e7123kDbSdmTdB-j3YV5iQ" : {
"name" : "nico_node01-dev",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "2.3.3",
"build" : "218bdf1",
"http_address" : "127.0.0.1:9200",
"http" : {
"bound_address" : [ "[fe80::1]:9200", "[::1]:9200", "127.0.0.1:9200" ],
"publish_address" : "127.0.0.1:9200",
"max_content_length_in_bytes" : 104857600
}
}
}
}

@nicolaifsf

This comment has been minimized.

Copy link
Author

nicolaifsf commented Jun 25, 2016

I ended up using SimpleClient and it worked, however I just thought I should let you know that I was having some issues with NewClient. Are there any major differences between the two that could have been the root cause?

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2016

SimpleClient simply disables a few things like automatically finding new nodes added to your cluster by the Sniffing procedure I linked to above. You can disable sniffing via options in NewClient, as described in the Wiki.

Not sure what's wrong with your setup. Cluster output is looking good, sniffing should pick up the setting in http_address.

@nicolaifsf

This comment has been minimized.

Copy link
Author

nicolaifsf commented Jun 26, 2016

Ah well, thanks for your help!

@olivere olivere closed this Jun 26, 2016

@arglucas

This comment has been minimized.

Copy link

arglucas commented Sep 16, 2016

I have the same issue, directly using CURL to my node on localhost works, including the _nodes call mentioned above.

I use SetSniffer(false) and it all started working. What information can I provide that would help with debugging?

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Sep 16, 2016

This problem is most probably due to Elastic picking up the address returned by Nodes API, which simply isn't routable. This is e.g. the case for ES running inside a Docker container. Does it work with sniffing disabled? Refer to the wiki for details.

Am 17.09.2016 um 00:17 schrieb arglucas notifications@github.com:

I have the same issue, directly using CURL to my node on localhost works, including the _nodes call mentioned above.

I use SetSniffer(false) and it all started working. What information can I provide that would help with debugging?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Sep 16, 2016

What is the output from the Nodes API call mentioned earlier?

Am 17.09.2016 um 00:17 schrieb arglucas notifications@github.com:

I have the same issue, directly using CURL to my node on localhost works, including the _nodes call mentioned above.

I use SetSniffer(false) and it all started working. What information can I provide that would help with debugging?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.

@jairwen

This comment has been minimized.

Copy link

jairwen commented Nov 1, 2016

Same issue here.

I do some tests with two machines in one LAN. And they can ping each other well.
ES(PC1) GoProgram(PC2)
I tried put Goprogram on PC1 and PC2.
I got "no ElasticSearch node available" no matter where I put go program or SetSniff(false) until I change to elastic.SimpleClient().

I do curl http://localhost:9200/_nodes/http?pretty' on PC1

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "testcluster",
  "nodes" : {
    "wsu6A6VUSSyAWTfVu-S0gg" : {
      "name" : "node1",
      "transport_address" : "192.168.0.103:9300",
      "host" : "192.168.0.103",
      "ip" : "192.168.0.103",
      "version" : "5.0.0",
      "build_hash" : "253032b",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "http" : {
        "bound_address" : [
          "[::]:9200"
        ],
        "publish_address" : "192.168.0.103:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

I notice that http bound address "[::]:9200"
So, I changed network.host from 0.0.0.0 to the exact IP address of PC1 - 192.168.0.103.
And curl http://192.168.0.103:9200/_nodes/http?pretty' which gave me

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "testcluster",
  "nodes" : {
    "wsu6A6VUSSyAWTfVu-S0gg" : {
      "name" : "node1",
      "transport_address" : "192.168.0.103:9300",
      "host" : "192.168.0.103",
      "ip" : "192.168.0.103",
      "version" : "5.0.0",
      "build_hash" : "253032b",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "http" : {
        "bound_address" : [
          "192.168.0.103:9200"
        ],
        "publish_address" : "192.168.0.103:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

But NewClient(elastic.SetURL("http://192.168.0.103:9200")) still not work. Any suggestion?

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Nov 1, 2016

@jairwen Which version of elastic are you using? v2, v3 or v5?

@jairwen

This comment has been minimized.

Copy link

jairwen commented Nov 2, 2016

I am using v5.
curl localhost:9200

{
  "name" : "node-1",
  "cluster_name" : "dnscluster",
  "cluster_uuid" : "OS9mmFYmRLC2gUklA3WrcA",
  "version" : {
    "number" : "5.0.0",
    "build_hash" : "253032b",
    "build_date" : "2016-10-26T05:11:34.737Z",
    "build_snapshot" : false,
    "lucene_version" : "6.2.0"
  },
  "tagline" : "You Know, for Search"
}

@jairwen Which version of elastic are you using? v2, v3 or v5?

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Nov 2, 2016

@jairwen Yes, that's Elasticsearch 5.0. But are you using elastic.v5? What does this little program print for you?

package main

import (
    "log"

    elastic "gopkg.in/olivere/elastic.v5"
)

func main() {
    _, err := elastic.NewClient()
    if err != nil {
        log.Fatalf("Connect failed: %v", err)
    }
    log.Print("Connected")
}
@jairwen

This comment has been minimized.

Copy link

jairwen commented Nov 2, 2016

Oh! my bad. Sorry for missing the release of .v5. I still use the .v3 got earlier.
It works like a charm now

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Nov 2, 2016

No worries... you're welcome. It'll probably be FAQ issue no. 1 :-)

The problem with the sniffing process (and hence the "no ElasticSearch node available" error) is that Elasticsearch changed the return structure of the Nodes Info API at least 4 times. :-(

@mahmoudhossam

This comment has been minimized.

Copy link

mahmoudhossam commented Nov 3, 2016

I have this problem with elasticsearch 5 on Arch Linux

I'm using elastic.v5

output of curl 'http://localhost:9200/_nodes/http?pretty'

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "pb-aU3TCRBKm6MgRqKLnoQ" : {
      "name" : "pb-aU3T",
      "transport_address" : "192.168.1.209:9300",
      "host" : "192.168.1.209",
      "ip" : "192.168.1.209",
      "version" : "5.0.0",
      "build_hash" : "253032b",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "http" : {
        "bound_address" : [
          "[::]:9200"
        ],
        "publish_address" : "192.168.1.209:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

Edit: restarting the server seems to fix the problem.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Nov 3, 2016

@mahmoudhossam Do you have a reproducible test case for me?

@mahmoudhossam

This comment has been minimized.

Copy link

mahmoudhossam commented Nov 4, 2016

Unfortunately, no.

I'd be happy to supply more information when it happens again, but I'm not sure what causes that exactly.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Dec 4, 2016

@mremond

This comment has been minimized.

Copy link

mremond commented Jan 3, 2017

Just for the record: I think Amazon Elastic search does the load balancing to the node itself and that discovering nodes does not return any transport, despite the fact that several nodes are available. Disabling sniffing seems to be the right option in that case.

I hope this helps.

@gm42

This comment has been minimized.

Copy link

gm42 commented Jan 25, 2017

Also affected by this issue. I am using:

package main

import (
	"fmt"
	"log"
	"net"
	"os"

	"gopkg.in/olivere/elastic.v3"
)

func main() {
	if len(os.Args) < 2 {
		fmt.Fprintf(os.Stderr, "ERROR: no Elasticsearch hosts specified\n")
		os.Exit(1)
	}
	args := os.Args[1:]
	for _, arg := range args {
		r, err := net.LookupHost(arg)
		if err != nil {
			fmt.Fprintf(os.Stderr, "ERROR: %v\n", err)
			os.Exit(2)
		}
		fmt.Fprintf(os.Stdout, "%q resolved as: %v\n", arg, r)

		// now perform Elasticsearch-specific ping
		es, err := elastic.NewSimpleClient(elastic.SetURL(arg))
		if err != nil {
			log.Fatalf("error creating elastic client to %q: %v", arg, err)
		}

		_, err = es.IndexNames()
		if err != nil {
			log.Fatalf("error fetching indices from %q: %v", arg, err)
		}
	}
}

Using SetSniff(false) doesn't change anything.

This the nodes output:

{
  "cluster_name" : "es.example.com",
  "nodes" : {
    "ECBFak3BSuaeNjiwzGziQA" : {
      "name" : "es-1.example.com",
      "transport_address" : "192.168.96.101:9300",
      "host" : "192.168.96.101",
      "ip" : "192.168.96.101",
      "version" : "2.4.3",
      "build" : "d38a34e",
      "http_address" : "192.168.96.101:9200",
      "http" : {
        "bound_address" : [ "[::]:9200" ],
        "publish_address" : "192.168.96.101:9200",
        "max_content_length_in_bytes" : 104857600
      }
    },
    "VpQlXfBHSa2qJSVMn4AIDg" : {
      "name" : "es-2.example.com",
      "transport_address" : "192.168.107.119:9300",
      "host" : "192.168.107.119",
      "ip" : "192.168.107.119",
      "version" : "2.4.3",
      "build" : "d38a34e",
      "http_address" : "192.168.107.119:9200",
      "http" : {
        "bound_address" : [ "[::]:9200" ],
        "publish_address" : "192.168.107.119:9200",
        "max_content_length_in_bytes" : 104857600
      }
    },
    "6Wxtssi8S4a-Dl9W8QNYdQ" : {
      "name" : "es-3.example.com",
      "transport_address" : "192.168.115.89:9300",
      "host" : "192.168.115.89",
      "ip" : "192.168.115.89",
      "version" : "2.4.3",
      "build" : "d38a34e",
      "http_address" : "192.168.115.89:9200",
      "http" : {
        "bound_address" : [ "[::]:9200" ],
        "publish_address" : "192.168.115.89:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}
@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jan 25, 2017

@gm42 What are you using as arguments? Any chance you got bitten by this?

@gm42

This comment has been minimized.

Copy link

gm42 commented Jan 25, 2017

@olivere I am using an URL like this: http://es.example.com:9200

It's a DNS entry with 3 addresses within.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jan 25, 2017

@gm42 Hmm... can't reproduce locally: Your example code works fine here.

What elastic does is try to find the http.publish_address (or https.publish_address) for each node, put it in a list, and keep watching that list, periodically looking for new entries and ignoring those that failed etc. You may not like that, but that's what all official clients do last time I checked.

Elasticsearch has changed the return structure of the /_nodes/http endpoint several times between the different versions (sometimes even in minor updates), hence I always get a heart attack when I read this error message ;-)

So if the executable performing HTTP requests is able to connect the 3 nodes http://192.168.96.101:9200, http://192.168.107.119:9200, and http://192.168.115.89:9200, then all should be fine.

If you're using containers, this inability to talk to each other is typically the problem with the "No Elasticsearch node available" error.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Mar 21, 2018

@Single430 Can you try this first? Otherwise, disabling sniffing works most of the time.

a-magdy added a commit to credcollective/monstache that referenced this issue Apr 25, 2018

Update elastic.NewClient to elastic.NewClient(elastic.SetSniff(false))
As by default elastic "sniffes" nodes, and accessing them by the IP returned from each node, but when elasticsearch is hosted in docker, this IP is not available to the localhost by default

Links:

- olivere/elastic#312 (comment)
- https://github.com/olivere/elastic/wiki/Connection-Problems#how-to-figure-out-connection-problems
- https://github.com/olivere/elastic/wiki/Sniffing
@hzjiangjian

This comment has been minimized.

Copy link

hzjiangjian commented May 30, 2018

Hey guys, I got this error recently and it waste me about 2 days to work out. Here is how I did, maybe helpful for you.

First of all, I use govendor to manage golang dependency. Since I use elasticsearch6.1.3, so I aim to get the latest version of elastic, which is v6. I do this in govendor : "govendor fetch github.com/olivere/elastic". But actually get vendor.json like this:

"package": [
{
"checksumSHA1": "JU+kcMNB8CPi7DJmxK5E3vpNxcg=",
"path": "github.com/olivere/elastic",
"revision": "eb69fbabfd6151d6c0f6d3d850d4f61edcff3763",
"revisionTime": "2016-10-08T09:19:33Z"
},
{
"checksumSHA1": "AmcqvKDAUGvVsrf8NDCQ/B8cMj4=",
"path": "github.com/olivere/elastic/uritemplates",
"revision": "eb69fbabfd6151d6c0f6d3d850d4f61edcff3763",
"revisionTime": "2016-10-08T09:19:33Z"
},
{
"checksumSHA1": "O5dZe+m70S7vbkgePsMLrUc86DA=",
"path": "gopkg.in/olivere/elastic.v2/uritemplates",
"revision": "2cd27b9ae892f6b4dd8dd6d73489797c99e9c6fa",
"revisionTime": "2018-05-17T14:35:08Z"
}
],

Which means I got elastic.v2 ??? Then I go to see the code in olivere/elastic/client.go:

const (
// Version is the current version of Elastic.
Version = "2.0.23"

// DefaultUrl is the default endpoint of Elasticsearch on the local machine.
// It is used e.g. when initializing a new Client without a specific URL.
DefaultURL = "http://127.0.0.1:9200"

}

It's true!!!

So there is a version mismatch between my elasticsearch(6.1.3) and elastic client(2.0.23).

I think mostly this problem is caused by version mismatch. You need to check client version(in olivere/elastic/client.go) and version of elasticsearch you actually use. In my case, I think there must be some error with go vendor, maybe it does not know the lastest version(6.0.0) of github/olivere/elastic. I decide to move to dep cause vendor hurt me!

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented May 30, 2018

@hzjiangjian Seems like an issue with govendor. Use dep if you can: It's doing the right thing (tm).

@hzjiangjian

This comment has been minimized.

Copy link

hzjiangjian commented May 30, 2018

@olivere That's right ! It works well with dep.

@thoellrich

This comment has been minimized.

Copy link

thoellrich commented Jun 24, 2018

I ran into the same issue and even after setting elastic.SetSniff(false), I would still get "No active connection" errors.

Turned out that I also needed to set elastic.SetHealthcheck(false), because the Elasticsearch server (via elastic.io/found.io) did not respond to HEAD requests and as a result the connection was marked bad. (see comment further down).

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2018

@thoellrich Interesting. We use Elastic Cloud (https://www.elastic.co/cloud) in one project, and we disabled neither sniffing nor health checks. This is a project hosted on Google Cloud, using a hosted Elasticsearch Cloud cluster. Everything just works. Can you get a bit more into detail?

@thoellrich

This comment has been minimized.

Copy link

thoellrich commented Jun 25, 2018

@olivere - I guess I need to revise my statement above: the health-check indeed does not seem to make a difference in the test case I created at: https://gist.github.com/thoellrich/fbe650e8e23b78f866ff82034bcf75a4

For my Elastic Cloud server (AWS), I see the following output:

$ go run main.go --url="https://<username>:<password>@<xxx>.us-east-1.aws.found.io:<ppp>"
health=true, sniff=true - err=no active connection found: no Elasticsearch node available
health=true, sniff=false - version=6.2.4, version_err=<nil>, index_count=82, index_err=<nil>
health=false, sniff=true - version=6.2.4, version_err=<nil>, index_count=0, index_err=Get http://172.25.aaa.bbb:ccc/_all/_settings: dial tcp 172.25.aaa.bbb:ccc: i/o timeout
health=false, sniff=false - version=6.2.4, version_err=<nil>, index_count=82, index_err=<nil>

Not sure what I saw yesterday. I'll try to recreate that and if it fails again, will report back.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2018

@thoellrich I'm not an expert on AWS: It's not related to AWS signing, is it? See this recipe which I tested against an Elasticsearch service on AWS a few weeks ago.

@thoellrich

This comment has been minimized.

Copy link

thoellrich commented Jun 25, 2018

Thanks for the pointer and, no, this is certainly not the issue: request signing is only necessary if you talk to AWS's Elasticsearch service. I talk to an Elastic Cloud server which happens to be hosted in AWS. Confusing - I know :-)
This one compares/contrasts the two: https://www.elastic.co/blog/hosted-elasticsearch-services-roundup-elastic-cloud-and-amazon-elasticsearch-service

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2018

Oh, I see, you're hosting your own Elasticsearch cluster.

The key to make sniffing and health checks work is that the node IPs returned from the cluster must be routeable/accessible from your application. I often see people start their cluster in Docker and then use something like 127.0.0.1 or 192.168.1.1 in the connection URL, only to see the cluster returning private node IPs like 172.aaa.bbb.ccc, which are not routable from the host. When sniffing is enabled, the 192.168.1.1 is only the starting point when drilling into the cluster. It's using http://192.168.1.1/_nodes/http?pretty=true to retrieve information about the nodes and uses those IPs to try to connect to all nodes in a round-robin mode; that obviously fails if the IPs are not accessible from your application. It generally works well when e.g. your application lives inside a scheduler like Kubernetes, and your ES cluster lives there as well, so routing will succeed.

@thoellrich

This comment has been minimized.

Copy link

thoellrich commented Jun 25, 2018

Sorry to have to correct you again: I'm not running those instances, Elastic Cloud does.
When you sign up with Elastic Cloud you can pick which cloud (AWS, GCP, Kubernetes) you want to be used for the instances (see https://www.elastic.co/cloud/elasticsearch-service). I have no control over the public and/or private IP address space used for them. But it makes perfectly sense that sniffing won't work for inaccessible IP addresses.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jun 25, 2018

Oh well. I'm on vacation and just flying over the comments. I'm sorry.

@thoellrich

This comment has been minimized.

Copy link

thoellrich commented Jun 25, 2018

Please, no need to be sorry at all. The package works as expected. I posted something that turned out to be a red herring. My turn to say sorry and wish you a nice vacation!

@wangzi19870227

This comment has been minimized.

Copy link

wangzi19870227 commented Jul 5, 2018

Solution One:
client, err := elastic.NewClient( elastic.SetSniff(false), elastic.SetURL("http://127.0.0.1:9200"))

Solution Two:
client, err := elastic.NewSimpleClient( elastic.SetURL("http://127.0.0.1:9200"))

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jul 5, 2018

@bruceadams

This comment has been minimized.

Copy link

bruceadams commented Jan 10, 2019

I'm running in a large environment, with many ElasticSearch clusters (all v5), all provided by https://www.elastic.co/cloud on AWS (in us-east). Most of our services are written in Java and have no trouble reaching all of our clusters. Recently, I've introduced a small service written in Go and using this library. It works most of the time, but consistent fails to connect to just a few target clusters. Again, these cluster are being actively used by Java services. The clusters are up and reachable. Our connection information is in the same form for the clusters that work and the ones that do not work.

Is there any way for olivere/elastic.v5 to provide more information about what is failing?

I've turned on Error, Info and Trace logging, but the connect failures do not log anything, (successful connections do log activity: so my log setup appears to be correct. Maybe the logging setups only take place after the connection is successful?) they just return the error: health check timeout: no Elasticsearch node available and no additional information. I feel completely blind trying to diagnose what is happening. Any hints would be greatly appreciated!

@bruceadams

This comment has been minimized.

Copy link

bruceadams commented Jan 10, 2019

Looking for differences between the clusters which fail and the one that succeed, the failures appear to be only for our oldest clusters. Clusters that were upgraded in place from Elastic v2 to v5. Maybe that is relevant? 🤷

Filling in some details: the call that returns the health check timeout: no Elasticsearch node available error is:

	return elastic.NewClient(
		elastic.SetURL(c.URLs),
		elastic.SetBasicAuth(c.User, c.Pass),
		elastic.SetSniff(false),
		elastic.SetHttpClient(buildHTTPClient(c.Cert)),
		elastic.SetRetrier(&elasticRetrier{}),
		elastic.SetErrorLog(&elasticLogger{Level: "error"}),
		elastic.SetInfoLog(&elasticLogger{Level: "info"}),
		elastic.SetTraceLog(&elasticLogger{Level: "trace"}),
	)
@thoellrich

This comment has been minimized.

Copy link

thoellrich commented Jan 10, 2019

Given that this works sometimes, I'd give the following 2 a try:

  • Use elastic.SetHealthcheckTimeoutStartup() and elastic.SetHealthcheckTimeout() to increase the threshold where the health check considers the instance to be alive.
  • Use elastic.SetHealthcheck(false) to disable health checks altogether (not recommended by Oliver)
@bruceadams

This comment has been minimized.

Copy link

bruceadams commented Jan 10, 2019

@thoellrich I have tried increasing the health check timeouts (a lot) and can see (from the context of other log messages in my code) that the connection failure takes as long as whatever the timeout is set to. I have not tried completely disabling the health check; that is an interesting idea.

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jan 11, 2019

@bruceadams Hmm... I'm a bit concerned about the fact that the Java clients seem to work fine while this (unofficial) Go client has problems. May I ask if the Java clients still use the TransportClient, or do they already use the REST client? The former will be deprecated in 7.0 and be removed in 8.0 AFAIK, so using the REST client is preferred.

Anyway, I've tried to create this library with the same logic that the other official clients use, i.e. health checks and client-side sniffing. Sniffing, if enabled, will use the URL you provide to call into the Elasticsearch Cluster API at startup and find all nodes in the cluster, then use the HTTP(S) addresses returned from that API. Healthchecks, if enabled, will periodically do the same and mark nodes as healthy or dead. It should also log this via the provided loggers, so I'm not exactly sure why your logging doesn't seem to work. In our production clusters (also mostly v5 as of now), our log files clearly log nodes going up and down when e.g. updating a cluster:

2018/08/25 00:16:21 elastic: http://a.b.c.d:9200 is dead [status=503]
2018/08/25 00:16:21 elastic: http://a.b.c.d:9200 is dead [status=503]
2018/08/25 00:16:21 elastic: http://a.b.c.d:9200 is dead [status=503]
2018/08/25 00:16:23 elastic: http://a.b.c.d:9200 is dead
2018/08/25 00:17:22 elastic: http://a.b.c.d:9200 is dead
2018/08/25 00:18:22 elastic: http://a.b.c.d:9200 is dead

We're using Elastic Cloud as well for a specific application, and we initially got a lot of 429 Too Many Requests on it, but that wasn't the driver's fault.

So, if you're sure that the problem is with healthchecks, you could simply disable them and see if it makes a difference. The driver should work fine even without it, as the Client.PerformRequest method will try to revive a random node even if all nodes are marked as dead.

@thoellrich Thanks for jumping in and helping. I was a bit busy this week, so sorry for the delay.

@olivere olivere reopened this Jan 11, 2019

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jan 11, 2019

@bruceadams What does curl 'http://<cluster-url>:9200/_nodes/http?pretty=true' return?

@bruceadams

This comment has been minimized.

Copy link

bruceadams commented Jan 11, 2019

Oh my. False alarm here. Sorry!

What a mess. Manually doing various actions works fine. Doing stuff in my real service (written in Go and using this library) fails. It turns out that we also changed how we deal with credentials for clusters many months ago. Older clusters have a different scheme and our Go code (completely unrelated to our use of this library) was mishandling the credentials for our older clusters. The upgrade from Elastic v2 to v5 just happened to occur around the same time that we changed credential handling. The upgraded clusters are irrelevant to the troubles I've had and this library is not misbehaving, other than I couldn't get this library to tell me that Elastic was returning a 401 Unauthorized.

Maybe I'll open a separate issue for enhanced error messages and/or logging for connection failures?

@olivere

This comment has been minimized.

Copy link
Owner

olivere commented Jan 17, 2019

Okay. If you found a bug, go ahead and file an issue. I'll close this for now.

@olivere olivere closed this Jan 17, 2019

@tehmoon

This comment has been minimized.

Copy link

tehmoon commented Jan 31, 2019

Reopening this issue because I had the same problem and found out how to fix it.
First off, thank you for that amazing library. I use it on all my projects.

So I had most likely a DNS issue with docker containers and linking. If I would use the IP address of the docker container in both the library and cURL it would work (regardless if I disable sniffing or not).
Then if I had done something like --link elasticsearch to link it to the --name elasticsearch container, and use http://elasticsearch:9200, it would not work with the library but work with cURL (again, regardless of the sniffing).

This is due to: golang/go#22846 which the /etc/nsswitch.conf file was not present, and the go http library not looking in /etc/hosts by default.

I am using an image based on alpine on this go project. The file was not there. Simply adding the /etc/nsswitch.conf file with the recommended content fixed the problem which was never from this library. I did not try other kind of images so feel free to check if the file is there.

I hope it will help other peeps.

@benkauffman

This comment has been minimized.

Copy link

benkauffman commented Feb 16, 2019

I fought with this problem for about 3 hours and finally tried the official package.
It worked first try: https://github.com/elastic/go-elasticsearch
Doesn't look super developed as it's only been out for a couple weeks, but works for what I need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment