Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File based Unicast Host Provider #20323

Closed
bleskes opened this issue Sep 5, 2016 · 10 comments

Comments

Projects
None yet
9 participants
@bleskes
Copy link
Member

commented Sep 5, 2016

The unicast host list determines the list of port and ips an Elasticsearch node ping when it starts up and tries to find the cluster. The same list is used upon master lost and election start.

It has been long requested to be able to update this list via an API. That presented a challenge since, unlike other setting changes, this information can not be easily serialized in the cluster state as it needs to be used when the node, for example, is restarted.

An alternative is for us to develop a plugin that, similarly to the current EC2 discovery plugin, monitors a separate (configurable) file and supplies the list of port/ips stored in it when needed (i.e., when pinging starts).

The format for the file should be simple and contain one hostname/ip per line (in the same format the current yml file supports).

@djschny

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2016

Can it be a URL? That way folks to point it git repo URL, file on disk, apache hosted file, etc. to make global updating for the cluster much easier.

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2016

That way folks to point it git repo URL, file on disk, apache hosted file, etc. to make global updating for the cluster much easier.

I feel like a local file is the most stable thing and the best thing if you have any sort of configuration management system. I get that we could get http for free with the JVM but this opens up a whole can of worms I'd rather not deal with like "what about https? what about certs? what about git://?"

@jasontedor

This comment has been minimized.

Copy link
Member

commented Sep 6, 2016

Can it be a URL? That way folks to point it git repo URL, file on disk, apache hosted file, etc. to make global updating for the cluster much easier.

These can't easily be watched, but a file on disk can. And if we could connect to arbitrary resources via a URI, now we have a whole ton of complexity to worry about if the service is down. Finally, every respectable configuration management system can source from these suggestions and push updated configs to the relevant nodes. I really think can and should just keep it simple here.

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2016

I really think can and should just keep it simple here.

+1

Note that urls can bring a whole can of worms? what if they time out? how long should we wait? should we cache last results? etc.

These can't easily be watched, but a file on disk can

We don't really need to watch (although possible) - we can just read the file content when a ping round starts.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2016

++ to keep it simple! The way to manage a file in a central / global way is know and proven, let's not reinvent the wheel.

@jasontedor

This comment has been minimized.

Copy link
Member

commented Sep 7, 2016

We don't really need to watch (although possible) - we can just read the file content when a ping round starts.

Sure, that's fine, but I do think it's operationally more friendly to emit a log line communicating that the new configuration file was picked up.

@muradm

This comment has been minimized.

Copy link

commented Sep 7, 2016

Does this feature intended to close the gap with discovery tools like Consul, Etcd etc? If yes or no, what is the intended use case?
I'm asking because, just last week while trying to solve cluster bootstrap with Consul & Docker Overlay Network, I had to do pretty dirty workarounds which environment and /etc/hosts... Initial description doesn't sound like it is solving discovery problem very much..

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Sep 7, 2016

close the gap with discovery tools like Consul, Etcd etc?

Care to describe the specifics of that gap? Without it it's hard to give a definitive answer.

@dakrone dakrone added >feature and removed :Core/Infra/Plugins labels Sep 7, 2016

@muradm

This comment has been minimized.

Copy link

commented Sep 8, 2016

Elasticsearch currently relying on zen unicast discovery, where I have to provide list of hosts in advance. But in automated environment I don't know actual IP addresses or hosts that will be assigned to servers, containers, VMs etc. Explaining my last case:

This basically leads for such ugly configurations like below.

  logs-es:
    container_name: logs-es-${HOSTNAME}
    hostname: logs-es-${HOSTNAME}
    logging:
      options:
        tag: logs-es
    image: elasticsearch:5.0.0-alpha5
    volumes_from:
      - logs-es-data
    command: >
      -Ecluster.name=rwb-logs -Enode.data=true -Enode.master=true
      -Ediscovery.zen.minimum_master_nodes=2
      -Ediscovery.zen.ping.unicast.hosts=logs-es-lab1,logs-es-lab2,logs-es-lab3
    restart: unless-stopped
    environment:
      SERVICE_NAME: logs-es
      SERVICE_9200_TAGS: ${HOSTNAME},traefik.frontend.rule=Host:logs-es
      SERVICE_9300_TAGS: ${HOSTNAME},traefik.enable=false
      ES_JAVA_OPTS: "-Xms2G -Xmx2G"
    depends_on:
      - registrator
    extra_hosts:
      - "logs-es-lab1:10.20.5.1"
      - "logs-es-lab2:10.20.5.2"
      - "logs-es-lab3:10.20.5.3"
    networks:
      service:
        ipv4_address: ${LOGS_ES_HOSTADDR}

This is a snippet from long docker-compose.yml file. With current Elasticsearch I have to specify host names and IP addresses. There is already another related issue #14441. As mentioned above, the main problem I think that should be solved, is that we don't know in advance IP address or host names. Which is actually the basic property of scalable environment.

Consul (or any other similar tool) manage DNS entries for hosts, containers etc. And also services. In this example, 3 Elasticsearch instances once booted, will have the following entry in DNS:

# nslookup logs-es-9200.service.consul
Server:     10.10.110.31
Address:    10.10.110.31#53

Name:   logs-es-9200.service.consul
Address: 10.20.5.1
Name:   logs-es-9200.service.consul
Address: 10.20.5.3
Name:   logs-es-9200.service.consul
Address: 10.20.5.2

With proper tagging in Consul, I also get per instance lookups.

# nslookup lab1.logs-es-9200.service.consul
Server:     10.10.110.31
Address:    10.10.110.31#53

Name:   lab1.logs-es-9200.service.consul
Address: 10.20.5.1

# nslookup lab2.logs-es-9200.service.consul
Server:     10.10.110.31
Address:    10.10.110.31#53

Name:   lab2.logs-es-9200.service.consul
Address: 10.20.5.2

# nslookup lab3.logs-es-9200.service.consul
Server:     10.10.110.31
Address:    10.10.110.31#53

Name:   lab3.logs-es-9200.service.consul
Address: 10.20.5.3

From the above, I can generate any type of configuration file. For instance consul-template tool can generate that. For Etcd it is confd. I believe that similar tools exists for other environments. So problem here is not in configuration file it self.

Because of #14441, I have to do:

    extra_hosts:
      - "logs-es-lab1:10.20.5.1"
      - "logs-es-lab2:10.20.5.2"
      - "logs-es-lab3:10.20.5.3"
    networks:
      service:
        ipv4_address: ${LOGS_ES_HOSTADDR}

I.e. preassign hostnames and IP addresses. Because when there is no instance running, i.e. not at all, if in zen unicast hosts list I put

[ "lab1.logs-es-9200.service.consul", "lab2.logs-es-9200.service.consul", "lab3.logs-es-9200.service.consul" ]

or any other configuration file which ever will be provided, Elasticsearch will not work. It will fail, because DNS will return error as can't find host.

Let's say lab4 is going to start, once service is changed in Consul catalog, consul-template will update configurations to:

[ "lab1.logs-es-9200.service.consul", "lab2.logs-es-9200.service.consul", "lab3.logs-es-9200.service.consul", "lab4.logs-es-9200.service.consul" ]

Configuration file generating tool, if necessary restart processes. In gossip like clusters, even restart is not needed.
If really necessary to inform the process about update, you can implement kill -HUP to force file re-reading. Also can be done by configuration generating tool.

So basically, if we are running in service discoverable environment, in some or another way, we can generate the configuration file and inform the process or restart it.

Of course, may be I don't see the intended use case for externalizing unicast hosts list. Instead of doing that, I would appreciate that #14441 is solved. Or one of these become more standard and the part of base:

None of them are working out of the box. However in my opinion:

  • DNS is pretty standard. Just resolve a service domain name, and get list of cluster members. Since DNS is UDP based, no worms like mentioned above. Of course still DNS query timeout, but this is really standard.
  • KV store is almost pretty standard, many environments implement that. Consul, Docker, Etcd etc. As how much inter operating they are, needs to be investigated of course.

In any case, year 2016.. having one configuration file? :-)

@abeyad

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2016

The goal is indeed to enable specifying the hosts for unicast discovery via a separate file, and have that file be dynamically updated outside of Elasticsearch (for example, if the IP addresses of some of your hosts changed and you want to re-publish the list of unicast hosts to ping during discovery, it would now be possible with this plugin).

abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Sep 13, 2016

Ali Beyad
File-based discovery plugin
This commit introduces a new plugin for file-based unicast hosts
discovery. This allows specifying the unicast hosts participating
in discovery through a `unicast_hosts.txt` file located in the
config directory. The plugin will use the hosts specified in this
file as the set of hosts to ping during discovery.

The format of the `unicast_hosts.txt` file is to have one host/port
entry per line. The hosts file is read and parsed every time
discovery makes ping requests, thus a new version of the file that
is published to the config directory will automatically be picked
up.

Closes elastic#20323

abeyad pushed a commit that referenced this issue Sep 14, 2016

Ali Beyad
File-based discovery plugin (#20394)
This commit introduces a new plugin for file-based unicast hosts
discovery. This allows specifying the unicast hosts participating
in discovery through a `unicast_hosts.txt` file located in the
`config/discovery-file` directory. The plugin will use the hosts 
specified in this file as the set of hosts to ping during discovery.

The format of the `unicast_hosts.txt` file is to have one host/port
entry per line. The hosts file is read and parsed every time
discovery makes ping requests, thus a new version of the file that
is published to the config directory will automatically be picked
up.

Closes #20323

abeyad pushed a commit that referenced this issue Sep 14, 2016

Ali Beyad
File-based discovery plugin (#20394)
This commit introduces a new plugin for file-based unicast hosts
discovery. This allows specifying the unicast hosts participating
in discovery through a `unicast_hosts.txt` file located in the
`config/discovery-file` directory. The plugin will use the hosts 
specified in this file as the set of hosts to ping during discovery.

The format of the `unicast_hosts.txt` file is to have one host/port
entry per line. The hosts file is read and parsed every time
discovery makes ping requests, thus a new version of the file that
is published to the config directory will automatically be picked
up.

Closes #20323

abeyad pushed a commit that referenced this issue Sep 14, 2016

Ali Beyad
File-based discovery plugin (#20394)
This commit introduces a new plugin for file-based unicast hosts
discovery. This allows specifying the unicast hosts participating
in discovery through a `unicast_hosts.txt` file located in the
`config/discovery-file` directory. The plugin will use the hosts 
specified in this file as the set of hosts to ping during discovery.

The format of the `unicast_hosts.txt` file is to have one host/port
entry per line. The hosts file is read and parsed every time
discovery makes ping requests, thus a new version of the file that
is published to the config directory will automatically be picked
up.

Closes #20323
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.