Method to wait for healthy cluster, that can be called before ES connection exists #778

melissachang · 2018-04-25T17:47:22Z

This code:

  es = Elasticsearch(["elasticsearch:9200"])
  es.cluster.health(wait_for_status='yellow')

Fails with:

index_test_data_1  | Traceback (most recent call last):
index_test_data_1  |   File "indexer.py", line 74, in <module>
index_test_data_1  |     main()
index_test_data_1  |   File "indexer.py", line 55, in main
index_test_data_1  |     es = init_elasticsearch()
index_test_data_1  |   File "indexer.py", line 34, in init_elasticsearch
index_test_data_1  |     es.cluster.health(wait_for_status='yellow')
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
index_test_data_1  |     return func(*args, params=params, **kwargs)
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/cluster.py", line 33, in health
index_test_data_1  |     'health', index), params=params)
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 314, in perform_request
index_test_data_1  |     status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 175, in perform_request
index_test_data_1  |     raise ConnectionError('N/A', str(e), e)
index_test_data_1  | elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7fa33e09f110>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fa33e09f110>: Failed to establish a new connection: [Errno 111] Connection refused)

I use docker-compose to start index_test_data (Python script) and elasticsearch at the same time. The ConnectionError is because nothing is listening on port 9200 yet.

I would like a method like this:

es.wait_for_status('yellow')
es = Elasticsearch(["elasticsearch:9200"])

That will swallow the ConnectionError's, and block until elasticsearch is up.

The text was updated successfully, but these errors were encountered:

melissachang · 2018-04-25T17:56:01Z

Code is here, just need to expose through client.

    # wait for yellow status
    for _ in range(1 if nowait else 100):
        try:
            client.cluster.health(wait_for_status='yellow')
            return client
        except ConnectionError:
            time.sleep(.1)
    else:
        # timeout
        raise SkipTest("Elasticsearch failed to start.")

fxdgear · 2018-04-25T18:33:12Z

Thanks for the issue @melissachang, and I'm sorry to hear you're having some troubles.

This is something other people have come across as well (please see #715). But building this into the client is not something I feel would add any value. If someone configures their ES instances incorrectly, we'll end up swallowing errors and nothing will get reported. (even if we built a timeout into it, we have to wait till the timeout expires before the application reports that the cluster is no reachable).

I feel that this is something that should be solved in the application code that implements the Elasticsearch client and not something that should be built into the client itself.

The standard pattern for connecting to any service in docker is to build a "wait" into the application code that waits for a service to start before continuing. Please reference the Docker documentation on controlling startup order.

I have in the past created this bash script which will wait for elasticsearch before starting the command in your container:

#!/bin/bash

set -e

host="$1"
shift
cmd="$@"


until $(curl --output /dev/null --silent --head --fail "$host"); do
    printf '.'
    sleep 1
done

# First wait for ES to start...
response=$(curl $host)

until [ "$response" = "200" ]; do
    response=$(curl --write-out %{http_code} --silent --output /dev/null "$host")
    >&2 echo "Elastic Search is unavailable - sleeping"
    sleep 1
done


# next wait for ES status to turn to Green
health="$(curl -fsSL "$host/_cat/health?h=status")"
health="$(echo "$health" | sed -r 's/^[[:space:]]+|[[:space:]]+$//g')" # trim whitespace (otherwise we'll have "green ")

until [ "$health" = 'green' ]; do
    health="$(curl -fsSL "$host/_cat/health?h=status")"
    health="$(echo "$health" | sed -r 's/^[[:space:]]+|[[:space:]]+$//g')" # trim whitespace (otherwise we'll have "green ")
    >&2 echo "Elastic Search is unavailable - sleeping"
    sleep 1
done

>&2 echo "Elastic Search is up"
exec $cmd

You can use this script in your Dockerfile with:

CMD ["/code/wait-for-elasticsearch.sh", "http://elasticsearch:9200", "--", "binary", "command", "sub-command"]

or similarly you can use this script in your docker-compose.yml by overriding the command in a similar fashion.

melissachang · 2018-04-25T18:59:30Z

docker-compose documentation says that the container must wait for service to be ready. It doesn't say anything about service clients providing tools to make this easier. My suggestion is compatible with that docker-compose page. My suggestion just means it would be one line of application code, vs 20 lines.

If someone configures their ES instances incorrectly, we'll end up swallowing errors and nothing will get reported. (even if we built a timeout into it, we have to wait till the timeout expires before the application reports that the cluster is no reachable).

I don't understand this part. Calling this method would be optional.

fxdgear · 2018-04-25T19:40:25Z

Maybe I misread your initial comment regarding the swallowing of errors. For that I apologize. I read your example code:

es.wait_for_status('yellow')
es = Elasticsearch(["elasticsearch:9200"])

And initially thought that this won't work because es doesn't get instantiated till after we call wait_for_status. So I thought that was a typo and you wanted the wait_for_status built into the initialization of the Elasticsearch client object.

But I'm still going to argue against adding this into the client itself. I don't see this providing any other benefit outside of local development with docker-compose. So now we have to have testing, regression, and maintenance for a feature that's so narrow in scope. That being said if you want to create a PR I'll happily review it for you and offer feedback.

The better solution is to put this kind of logic outside the client:

create a wait mechanism in the container itself
add the aforementioned wait for yellow status to the application

melissachang · 2018-04-25T20:01:14Z

local development with docker-compose

Minor clarification - It's a bit more broad: just "with docker-compose". I've needed this for:

Local development with docker-compose
CircleCI testing with docker-compose
May need to for deployment on Google Cloud Platform, not sure

Leocete · 2019-11-26T11:00:12Z

@fxdgear , I've got the same problem as @melissachang, and your script worked! Thank you very much :)

But I've got one question:
here is my command line from docker-compose.yml file:
command: ["./wait-for-elasticsearch.sh", "http://elasticsearch:9200", "npm run start-dev"]

There is a command at the end of the script exec $cmd that executes all shell scripts, and due to the console it works perfectly:

server | Elastic Search is unavailable - sleeping
server | Elastic Search is up
server | 
server | >server@0.0.0 start-dev /app
server | > NODE_ENV=development ./node_modules/nodemon/bin/nodemon.js server.js

Why it didn't execute the shell script again and only executed the one that goes after (npm run start-dev)?
Should 'Elastic Search is up' message be displayed twice as at the end of a shell script we're calling all scripts again (including itself)?

Ocramius · 2020-02-11T09:58:08Z

Just contributing back here, since I needed a script to monitor elasticsearch startup, but with a timeout:

#!/usr/bin/env php
<?php

declare(strict_types=1);

namespace WaitForElasticsearch;

use InvalidArgumentException;
use UnexpectedValueException;
use function curl_close;
use function curl_exec;
use function curl_getinfo;
use function curl_init;
use function error_log;
use function getenv;
use function is_string;
use function microtime;
use function sprintf;
use function usleep;
use const CURLINFO_HTTP_CODE;
use const CURLOPT_HEADER;
use const CURLOPT_RETURNTRANSFER;
use const CURLOPT_TIMEOUT_MS;
use const CURLOPT_URL;

// Note: this is a dependency-less file that only relies on ext-curl to function. We do not want any dependencies in
//       here, since the system may not yet be in functional state at this stage.
(static function () : void {
    $elasticsearch = getenv('ELASTICSEARCH_URL');

    if (! is_string($elasticsearch)) {
        throw new InvalidArgumentException('Missing "ELASTICSEARCH_URL" environment variable');
    }

    $timeLimit     = (float) (getenv('ELASTICSEARCH_WAIT_TIMEOUT_SECONDS') ?: 60.0);
    $retryInterval = (int) ((((float) getenv('ELASTICSEARCH_RETRY_INTERVAL_SECONDS')) ?: 0.5) * 1000000);
    $start         = microtime(true);
    $elapsedTime   = static function () use ($start) : float {
        return microtime(true) - $start;
    };
    $remainingTime = static function () use ($elapsedTime, $timeLimit) : float {
        return $timeLimit - $elapsedTime();
    };

    while ($remainingTime() > 0) {
        $curl = curl_init();

        // @see https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html
        curl_setopt($curl, CURLOPT_HEADER, 0);
        curl_setopt($curl, CURLOPT_TIMEOUT_MS, 500);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt(
            $curl,
            CURLOPT_URL,
            $elasticsearch . sprintf('/_cluster/health?wait_for_status=yellow&timeout=%ds', (int) $timeLimit)
        );

        $response = curl_exec($curl);

        $errorCode    = curl_errno($curl);
        $errorMessage = curl_error($curl);
        $statusCode   = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        curl_close($curl);

        if ($errorCode === 0) {
            /** @noinspection ForgottenDebugOutputInspection */
            error_log(sprintf('ElasticSearch connection succeeded after %.2f seconds', $elapsedTime()));

            if ($statusCode === 200) {
                /** @noinspection ForgottenDebugOutputInspection */
                error_log(sprintf(
                    'ElasticSearch status is (at least) yellow after %.2f seconds with response: %s',
                    $elapsedTime(),
                    $response
                ));

                return;
            }

            /** @noinspection ForgottenDebugOutputInspection */
            error_log(sprintf(
                'ElasticSearch status is pending after %.2f seconds with response code %d',
                $elapsedTime(),
                $statusCode
            ));
        }

        if ($errorCode !== 0) {
            /** @noinspection ForgottenDebugOutputInspection */
            error_log(sprintf(
                'Failed to contact ElasticSearch: curl error "%s", code %d, retrying for another %.2f seconds',
                $errorMessage,
                $errorCode,
                $remainingTime()
            ));
        }

        usleep($retryInterval);
    }

    throw new UnexpectedValueException(sprintf('Failed to connect to Elasticsearch after %.2f seconds', $elapsedTime()));
})();

Feel free to grab, butcher or burn it with fire :-)

The backend starts up too fast, before Elasticsearch's indexes are actually up. This causes an error: (node:1) UnhandledPromiseRejectionWarning: ConnectionError: connect ECONNREFUSED 127.0.0.1:9200 So we use wait-for-elasticsearch.sh to wait until Elasticsearch is actually up and accepting connections before starting the backend. For some reason our cluster status is always "yellow", so I had to modify the original script from "green". See: elastic/elasticsearch-py#778

fxdgear closed this as completed Apr 25, 2018

melissachang mentioned this issue May 11, 2018

Add Dockerfile for indexer.py DataBiosphere/data-explorer-indexers#13

Merged

stuartshay mentioned this issue Jan 25, 2021

Site Notes stuartshay/AzureDevOpsKats#347

Open

bodo-hugo-barwich mentioned this issue Sep 20, 2021

Unable to index MiniCPAN metacpan/metacpan-docker#47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Method to wait for healthy cluster, that can be called before ES connection exists #778

Method to wait for healthy cluster, that can be called before ES connection exists #778

melissachang commented Apr 25, 2018

melissachang commented Apr 25, 2018

fxdgear commented Apr 25, 2018

melissachang commented Apr 25, 2018

fxdgear commented Apr 25, 2018

melissachang commented Apr 25, 2018

Leocete commented Nov 26, 2019 •

edited

Ocramius commented Feb 11, 2020

Method to wait for healthy cluster, that can be called before ES connection exists #778

Method to wait for healthy cluster, that can be called before ES connection exists #778

Comments

melissachang commented Apr 25, 2018

melissachang commented Apr 25, 2018

fxdgear commented Apr 25, 2018

melissachang commented Apr 25, 2018

fxdgear commented Apr 25, 2018

melissachang commented Apr 25, 2018

Leocete commented Nov 26, 2019 • edited

Ocramius commented Feb 11, 2020

Leocete commented Nov 26, 2019 •

edited