Improved, unified, sniffing heuristics #24871

andrewvc · 2017-05-24T19:45:14Z

Currently, sniffing can be problematic due to its reliance of http endpoints meaning the node is sniffable. Part of this has bled through in #12792 . Its also complicated in that its hard or impossible for users to define arbitrary sniffing criteria.

Goals

I think a good set of sniffing heuristics would have the following heuristics:

Simple: It should do the right thing for most people out of the box without them having to think too much about it
Powerful: It should let users who want to customize it be able to
Safe: It should make doing bad things (e.g. directing traffic toward master only nodes) difficult or impossible.

The Proposal

I propose that clients alter their sniffing logic to one of two modes:

Match everything except dedicated masters
Match everything, filtered by node metadata, except dedicated masters

So, we would use the same node metadata that is normally used for rack awareness to restrict sniffing based on a custom user defined property. A client, such as the logstash elasticsearch output, might expose the following config options:

elasticsearch {
  sniffing => true
  sniffing_filter => { my_key => myvalue }
}

A client would construct the list of sniffed nodes by:

Querying the /_nodes/http API
Including only nodes that have either:
1. a roles array that includes data
2. a roles array that does not include master
Filter that list down further based on exact matches of key == value in the node metadata if present. If the user has not enabled this option then all results will be returned.

I think this algorithm meets all the goals discussed earlier.

The text was updated successfully, but these errors were encountered:

abeyad · 2017-05-24T19:55:05Z

@javanna WDYT?

andrewvc · 2017-05-24T20:12:01Z

It would also be nice to discuss which nodes should be queried for this data.

I would propose that clients:

Be provided with a seed node list to connect to
Hit the /_nodes/http endpoint on one of those nodes
Perform subsequent sniffing requests against nodes on the sniffed list
Be allowed to re-sniff the seed list if the sniffed nodes list fails

The idea here is that it would be advisable for users to seed their clients with stable nodes, which in many installations will be the master nodes. It is also advisable that clients try to keep load off the master nodes, hence making them a last resort.

There is a risk here, that in large installs, if the data nodes were to all disappear a thundering herd of /_nodes/http requests could overwhelm the master nodes, but I imagine that would only fill their request queues and would stabilize rather quickly.

javanna · 2017-05-24T20:22:06Z

This seems like a general discussion on what Elasticsearch REST clients should do compared to what they do now in terms of sniffing. We should hear what @elastic/es-clients think of this proposal.

javanna · 2017-05-24T20:28:54Z

I thought that the official clients already allowed to plug in custom selectors so that users can choose which nodes get selected. The java low level REST client doesn't support this yet but there's #21888 for it. Probably the default behaviour when sniffing should be adjusted as I think all nodes are selected by the official clients while master only nodes should be skipped. Apart from that I am not sure what else should change provided that the notion of selector is exposed and pluggable in every client.

andrewvc · 2017-05-24T20:33:21Z

@javanna I took a look at the python and ruby clients, I can't find any support for this. Maybe I'm missing something? I also don't believe beats supports this either. Maybe @honzakral @karmi and @andrewkroh can confirm?

Perhaps other clients support this? At any rate, the forced omission of master-only nodes would be a great thing to standardize.

andrewkroh · 2017-05-24T20:40:13Z

I also don't believe beats supports this either.

Correct, Beats does not currently have sniffing support. @7AC is working on go-elasticsearch which Beats will eventually use. Having support for this in the client library would be nice.

GlenRSmith · 2017-05-24T20:51:29Z

elastic/elasticsearch-py@8966902

andrewvc · 2017-05-24T20:54:26Z

Thanks @GlenRSmith , it looks like the python client rejects master nodes. That's great! It would still be great if we could standardize on node settings filtering as well.

I mean, maybe this has been tackled before, but ideally clients would all have identical sniffing algorithms, which is what I'm trying to get at here.

GlenRSmith · 2017-05-24T21:48:04Z

@andrewvc of course, and that's very much the broad intent - uniformity and parity among the low-level clients. Striving for that on this particular item isn't objectionable.

honzakral · 2017-05-25T07:07:35Z

Currently there is the default behavior (filtering out master-only nodes) and the ability to override the callback used to filter out the nodes. In python all you have to do is supply your own implementation of the get_host_info (0) function and introduce any logic they wish to have which is documented (1).

Since this is an advanced functionality I think that is sufficient. So far I haven't had a single request for anything more structured/systematic. This approach is common to all the clients.

0 - https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/transport.py#L11-L28
1 - https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/transport.py#L48-L50

russcam · 2017-05-25T07:31:41Z

There's documentation for the sniffing behaviour of the .NET clients, which includes the ability to specify a predicate to determine which nodes in the cluster API calls should be executed on.

karmi · 2017-05-25T11:46:25Z

The Ruby client doesn't do any filtering right now, there's an old open issue: elastic/elasticsearch-ruby#251. So far nobody showed any interest in the feature, though, so I've put on backburner. I've realized that the Sniffer class is not injectable via the constructor, which is maybe an oversight on our consistent "plugability" of everything, I'd like to add it.

The Ruby client, as all the official clients, supports supplying a custom connection selector, which allows people to programatically select nodes based on arbitrary criteria, eg. the node attributes.

Regarding the feature itself, I'm not sure I grasp what is exactly being suggested -- it looks to me like a feature request to add a sniffing_filter option to the clients. I think along the same lines as @honzakral, that it's more or less an advanced functionality, which people can implement via custom component classes, and passing them to the client.

andrewvc · 2017-05-25T13:25:27Z

Having spoken to a variety of client authors, the consensus is not to have this discussion on the ES repo since this isn't an ES issue, closing.

abeyad added :Java High Level REST Client discuss labels May 24, 2017

abeyad assigned javanna May 24, 2017

javanna removed their assignment May 24, 2017

javanna removed the :Java High Level REST Client label May 24, 2017

andrewvc mentioned this issue May 24, 2017

Get rid of http.enabled #12792

Closed

andrewvc mentioned this issue May 25, 2017

Dealing with proposed http.enabled changes on ES's side logstash-plugins/logstash-output-elasticsearch#599

Closed

colings86 added :Core/Infra/REST API REST infrastructure and utilities >enhancement labels May 25, 2017

andrewvc closed this as completed May 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved, unified, sniffing heuristics #24871

Improved, unified, sniffing heuristics #24871

andrewvc commented May 24, 2017 •

edited

Loading

abeyad commented May 24, 2017

andrewvc commented May 24, 2017

javanna commented May 24, 2017

javanna commented May 24, 2017

andrewvc commented May 24, 2017

andrewkroh commented May 24, 2017

GlenRSmith commented May 24, 2017

andrewvc commented May 24, 2017

GlenRSmith commented May 24, 2017

honzakral commented May 25, 2017

russcam commented May 25, 2017

karmi commented May 25, 2017 •

edited

Loading

andrewvc commented May 25, 2017

Improved, unified, sniffing heuristics #24871

Improved, unified, sniffing heuristics #24871

Comments

andrewvc commented May 24, 2017 • edited Loading

Goals

The Proposal

abeyad commented May 24, 2017

andrewvc commented May 24, 2017

javanna commented May 24, 2017

javanna commented May 24, 2017

andrewvc commented May 24, 2017

andrewkroh commented May 24, 2017

GlenRSmith commented May 24, 2017

andrewvc commented May 24, 2017

GlenRSmith commented May 24, 2017

honzakral commented May 25, 2017

russcam commented May 25, 2017

karmi commented May 25, 2017 • edited Loading

andrewvc commented May 25, 2017

andrewvc commented May 24, 2017 •

edited

Loading

karmi commented May 25, 2017 •

edited

Loading