Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method to wait for healthy cluster, that can be called before ES connection exists #778

Closed
melissachang opened this issue Apr 25, 2018 · 7 comments

Comments

@melissachang
Copy link

This code:

  es = Elasticsearch(["elasticsearch:9200"])
  es.cluster.health(wait_for_status='yellow')

Fails with:

index_test_data_1  | Traceback (most recent call last):
index_test_data_1  |   File "indexer.py", line 74, in <module>
index_test_data_1  |     main()
index_test_data_1  |   File "indexer.py", line 55, in main
index_test_data_1  |     es = init_elasticsearch()
index_test_data_1  |   File "indexer.py", line 34, in init_elasticsearch
index_test_data_1  |     es.cluster.health(wait_for_status='yellow')
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
index_test_data_1  |     return func(*args, params=params, **kwargs)
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/cluster.py", line 33, in health
index_test_data_1  |     'health', index), params=params)
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 314, in perform_request
index_test_data_1  |     status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
index_test_data_1  |   File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 175, in perform_request
index_test_data_1  |     raise ConnectionError('N/A', str(e), e)
index_test_data_1  | elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7fa33e09f110>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fa33e09f110>: Failed to establish a new connection: [Errno 111] Connection refused)

I use docker-compose to start index_test_data (Python script) and elasticsearch at the same time. The ConnectionError is because nothing is listening on port 9200 yet.

I would like a method like this:

es.wait_for_status('yellow')
es = Elasticsearch(["elasticsearch:9200"])

That will swallow the ConnectionError's, and block until elasticsearch is up.

@melissachang
Copy link
Author

Code is here, just need to expose through client.

    # wait for yellow status
    for _ in range(1 if nowait else 100):
        try:
            client.cluster.health(wait_for_status='yellow')
            return client
        except ConnectionError:
            time.sleep(.1)
    else:
        # timeout
        raise SkipTest("Elasticsearch failed to start.")

@fxdgear
Copy link
Contributor

fxdgear commented Apr 25, 2018

Thanks for the issue @melissachang, and I'm sorry to hear you're having some troubles.

This is something other people have come across as well (please see #715). But building this into the client is not something I feel would add any value. If someone configures their ES instances incorrectly, we'll end up swallowing errors and nothing will get reported. (even if we built a timeout into it, we have to wait till the timeout expires before the application reports that the cluster is no reachable).

I feel that this is something that should be solved in the application code that implements the Elasticsearch client and not something that should be built into the client itself.

The standard pattern for connecting to any service in docker is to build a "wait" into the application code that waits for a service to start before continuing. Please reference the Docker documentation on controlling startup order.

I have in the past created this bash script which will wait for elasticsearch before starting the command in your container:

#!/bin/bash

set -e

host="$1"
shift
cmd="$@"


until $(curl --output /dev/null --silent --head --fail "$host"); do
    printf '.'
    sleep 1
done

# First wait for ES to start...
response=$(curl $host)

until [ "$response" = "200" ]; do
    response=$(curl --write-out %{http_code} --silent --output /dev/null "$host")
    >&2 echo "Elastic Search is unavailable - sleeping"
    sleep 1
done


# next wait for ES status to turn to Green
health="$(curl -fsSL "$host/_cat/health?h=status")"
health="$(echo "$health" | sed -r 's/^[[:space:]]+|[[:space:]]+$//g')" # trim whitespace (otherwise we'll have "green ")

until [ "$health" = 'green' ]; do
    health="$(curl -fsSL "$host/_cat/health?h=status")"
    health="$(echo "$health" | sed -r 's/^[[:space:]]+|[[:space:]]+$//g')" # trim whitespace (otherwise we'll have "green ")
    >&2 echo "Elastic Search is unavailable - sleeping"
    sleep 1
done

>&2 echo "Elastic Search is up"
exec $cmd

You can use this script in your Dockerfile with:

CMD ["/code/wait-for-elasticsearch.sh", "http://elasticsearch:9200", "--", "binary", "command", "sub-command"]

or similarly you can use this script in your docker-compose.yml by overriding the command in a similar fashion.

@fxdgear fxdgear closed this as completed Apr 25, 2018
@melissachang
Copy link
Author

docker-compose documentation says that the container must wait for service to be ready. It doesn't say anything about service clients providing tools to make this easier. My suggestion is compatible with that docker-compose page. My suggestion just means it would be one line of application code, vs 20 lines.

If someone configures their ES instances incorrectly, we'll end up swallowing errors and nothing will get reported. (even if we built a timeout into it, we have to wait till the timeout expires before the application reports that the cluster is no reachable).

I don't understand this part. Calling this method would be optional.

@fxdgear
Copy link
Contributor

fxdgear commented Apr 25, 2018

Maybe I misread your initial comment regarding the swallowing of errors. For that I apologize. I read your example code:

es.wait_for_status('yellow')
es = Elasticsearch(["elasticsearch:9200"])

And initially thought that this won't work because es doesn't get instantiated till after we call wait_for_status. So I thought that was a typo and you wanted the wait_for_status built into the initialization of the Elasticsearch client object.

But I'm still going to argue against adding this into the client itself. I don't see this providing any other benefit outside of local development with docker-compose. So now we have to have testing, regression, and maintenance for a feature that's so narrow in scope. That being said if you want to create a PR I'll happily review it for you and offer feedback.

The better solution is to put this kind of logic outside the client:

  1. create a wait mechanism in the container itself
  2. add the aforementioned wait for yellow status to the application

@melissachang
Copy link
Author

local development with docker-compose

Minor clarification - It's a bit more broad: just "with docker-compose". I've needed this for:

  • Local development with docker-compose
  • CircleCI testing with docker-compose
  • May need to for deployment on Google Cloud Platform, not sure

@Leocete
Copy link

Leocete commented Nov 26, 2019

@fxdgear , I've got the same problem as @melissachang, and your script worked! Thank you very much :)

But I've got one question:
here is my command line from docker-compose.yml file:
command: ["./wait-for-elasticsearch.sh", "http://elasticsearch:9200", "npm run start-dev"]

There is a command at the end of the script exec $cmd that executes all shell scripts, and due to the console it works perfectly:

server | Elastic Search is unavailable - sleeping
server | Elastic Search is up
server | 
server | >server@0.0.0 start-dev /app
server | > NODE_ENV=development ./node_modules/nodemon/bin/nodemon.js server.js 

Why it didn't execute the shell script again and only executed the one that goes after (npm run start-dev)?
Should 'Elastic Search is up' message be displayed twice as at the end of a shell script we're calling all scripts again (including itself)?

@Ocramius
Copy link

Just contributing back here, since I needed a script to monitor elasticsearch startup, but with a timeout:

#!/usr/bin/env php
<?php

declare(strict_types=1);

namespace WaitForElasticsearch;

use InvalidArgumentException;
use UnexpectedValueException;
use function curl_close;
use function curl_exec;
use function curl_getinfo;
use function curl_init;
use function error_log;
use function getenv;
use function is_string;
use function microtime;
use function sprintf;
use function usleep;
use const CURLINFO_HTTP_CODE;
use const CURLOPT_HEADER;
use const CURLOPT_RETURNTRANSFER;
use const CURLOPT_TIMEOUT_MS;
use const CURLOPT_URL;

// Note: this is a dependency-less file that only relies on ext-curl to function. We do not want any dependencies in
//       here, since the system may not yet be in functional state at this stage.
(static function () : void {
    $elasticsearch = getenv('ELASTICSEARCH_URL');

    if (! is_string($elasticsearch)) {
        throw new InvalidArgumentException('Missing "ELASTICSEARCH_URL" environment variable');
    }

    $timeLimit     = (float) (getenv('ELASTICSEARCH_WAIT_TIMEOUT_SECONDS') ?: 60.0);
    $retryInterval = (int) ((((float) getenv('ELASTICSEARCH_RETRY_INTERVAL_SECONDS')) ?: 0.5) * 1000000);
    $start         = microtime(true);
    $elapsedTime   = static function () use ($start) : float {
        return microtime(true) - $start;
    };
    $remainingTime = static function () use ($elapsedTime, $timeLimit) : float {
        return $timeLimit - $elapsedTime();
    };

    while ($remainingTime() > 0) {
        $curl = curl_init();

        // @see https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html
        curl_setopt($curl, CURLOPT_HEADER, 0);
        curl_setopt($curl, CURLOPT_TIMEOUT_MS, 500);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt(
            $curl,
            CURLOPT_URL,
            $elasticsearch . sprintf('/_cluster/health?wait_for_status=yellow&timeout=%ds', (int) $timeLimit)
        );

        $response = curl_exec($curl);

        $errorCode    = curl_errno($curl);
        $errorMessage = curl_error($curl);
        $statusCode   = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        curl_close($curl);

        if ($errorCode === 0) {
            /** @noinspection ForgottenDebugOutputInspection */
            error_log(sprintf('ElasticSearch connection succeeded after %.2f seconds', $elapsedTime()));

            if ($statusCode === 200) {
                /** @noinspection ForgottenDebugOutputInspection */
                error_log(sprintf(
                    'ElasticSearch status is (at least) yellow after %.2f seconds with response: %s',
                    $elapsedTime(),
                    $response
                ));

                return;
            }

            /** @noinspection ForgottenDebugOutputInspection */
            error_log(sprintf(
                'ElasticSearch status is pending after %.2f seconds with response code %d',
                $elapsedTime(),
                $statusCode
            ));
        }

        if ($errorCode !== 0) {
            /** @noinspection ForgottenDebugOutputInspection */
            error_log(sprintf(
                'Failed to contact ElasticSearch: curl error "%s", code %d, retrying for another %.2f seconds',
                $errorMessage,
                $errorCode,
                $remainingTime()
            ));
        }

        usleep($retryInterval);
    }

    throw new UnexpectedValueException(sprintf('Failed to connect to Elasticsearch after %.2f seconds', $elapsedTime()));
})();

Feel free to grab, butcher or burn it with fire :-)

alanorth added a commit to ilri/OpenRXV that referenced this issue Aug 30, 2021
The backend starts up too fast, before Elasticsearch's indexes are
actually up. This causes an error:

    (node:1) UnhandledPromiseRejectionWarning: ConnectionError: connect ECONNREFUSED 127.0.0.1:9200

So we use wait-for-elasticsearch.sh to wait until Elasticsearch is
actually up and accepting connections before starting the backend.
For some reason our cluster status is always "yellow", so I had to
modify the original script from "green".

See: elastic/elasticsearch-py#778
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants