Manage Elasticsearch nodes with dedicated subcommands #830

danielmitterdorfer · 2019-11-27T14:46:54Z

With this commit we introduce three new subcommands to Rally:

install: To install a single Elasticsearch node locally
start: To start an Elasticsearch node that has been previously installed
stop: To stop a running Elasticsearch node

To run a benchmark, users first issue install, followed by start on all
nodes. Afterwards, the benchmark is run using the benchmark-only pipeline.
Finally, the stop command is invoked on all nodes to shutdown the cluster.

To ensure that system metrics are stored consistently (i.e. they contain the
same metadata like race id and race timestamp), we expose the race id as a
command line parameter and defer writing any system metrics until the stop
command is invoked. We attempt to read race metadata from the Elasticsearch
metrics store for that race id which have been written earlier by the benchmark
and merge the metadata when we write the system metrics.

The current implementation is considered a new experimental addition to the
existing mechanism to manage clusters with the intention to eventually replace
it. The command line interface is specific to Zen discovery and subject to
change as we learn more about its use.

Closes #697

As we pickle the node, in-memory state of telemetry devices is persisted anyway and there is no need to store it in an additional file.

danielmitterdorfer · 2019-11-27T14:56:56Z

To help with the review, here are some test commands:

# benchmark a single-node cluster
esrally install --quiet --distribution-version=7.4.2 --build-type=tar --node-name="rally-node-0" --master-nodes="rally-node-0" --network-host="127.0.0.1" --seed-hosts="127.0.0.1:39300"

# TODO: Capture the installation id from the previous command
esrally start --installation-id=c2e6d5fb-0405-4f39-9213-cf8799a760cc --race-id=b1228394-998f-413d-b454-adef7e0f2a7b

esrally --pipeline=benchmark-only --target-host=127.0.0.1:39200 --track=geonames --challenge=append-no-conflicts-index-only --on-error=abort --race-id=b1228394-998f-413d-b454-adef7e0f2a7b

esrally stop --installation-id=c2e6d5fb-0405-4f39-9213-cf8799a760cc --race-id=b1228394-998f-413d-b454-adef7e0f2a7b

To test a Docker container (see --build-type=docker):

esrally install --quiet --distribution-version=7.4.2 --build-type=docker --node-name="rally-node-0" --master-nodes="rally-node-0" --network-host="127.0.0.1" --seed-hosts="127.0.0.1:39300"

All other commands (start, stop) are identical to the scenario above. The Docker support in Rally has (and always had) several restrictions, e.g. you can only spin up a single node. This has not changed with this PR.

It is also possible to use user-tags. Just specify them when running the benchmark:


esrally --pipeline=benchmark-only --target-host=127.0.0.1:39200 --track=geonames --challenge=append-no-conflicts-index-only --on-error=abort --race-id=de604d0d-926c-4764-bbef-d8bc564755ae --user-tag="with-user-tags:true"

This should also show up on system metrics.

To benchmark a source build, you need to modify the install command (use --skip-build to skip the build):

esrally install --revision=latest --node-name="rally-node-0" --network-host="127.0.0.1" --master-nodes="rally-node-0" --seed-hosts="127.0.0.1:39300"

For a multi-node cluster, install multiple nodes but reference the other node(s) as seed hosts:

esrally install --distribution-version=7.4.2 --node-name="rally-node-0" --network-host="127.0.0.1" --master-nodes="rally-node-0,rally-node-1" --seed-hosts="127.0.0.1:39300,127.0.0.1:39301"

esrally install --distribution-version=7.4.2 --node-name="rally-node-1" --network-host="127.0.0.1" --master-nodes="rally-node-0,rally-node-1" --seed-hosts="127.0.0.1:39300,127.0.0.1:39301"

Note that we are always using the default http port (which is 39200). To customize it, specify e.g. --http-port=19200. Note that the transport port will always be 100 ports above the http port.

danielmitterdorfer · 2019-11-27T15:58:15Z

integration-test.sh

+    random_build_type build_type
+
+    # for Docker we force the most recent distribution as we don't have Docker images for all versions that are tested
+    if [[ "$build_type" == "docker" ]]; then


While we kill Rally processes in kill_related_es_processes we don't do the same for any running Docker images. I think that we might need to stop Docker containers as well but I wonder whether we can find a robust approach of finding the correct image.

This is possible. What we need to do is:

Change docker-compose.yml.j2 to further specify a label like so:

diff --git a/esrally/resources/docker-compose.yml.j2 b/esrally/resources/docker-compose.yml.j2 index c5bc10a..19e72e9 100644 --- a/esrally/resources/docker-compose.yml.j2 +++ b/esrally/resources/docker-compose.yml.j2 @@ -4,6 +4,8 @@ services: cap_add: - IPC_LOCK image: "{{docker_image}}:{{es_version}}" + labels: + io.rally.description: "Label to help identify Elasticsearch containers launched by Rally" {%- if docker_cpu_count is defined %} cpu_count: {{docker_cpu_count}} {%- endif %}

Use the label to retrieve the ID of container(s) launched by Rally using a filter like docker ps --filter "label=io.rally.description" --format "{{.ID}}" (if >1 they'll be newline separated).
This will match all nodes matching the specific label key. To more targeted, we can also match the value which could be optionally specified via a j2 variable in the compose file, but this functionality would require an optional argument for the start subcommand (e.g. --docker-label="...") to override the j2 variable.

Thanks! I'll implement this in the integration test. I think it's fine to use the less targeted approach and rely on a static label in docker-compose.yml.j2.

Added in 9ef5660.

dliappis

This is a huge amount of work at a very high level of quality! Thank you.

I am leaving an initial batch of comments; I've reviewed up until the rally.py file.

docs/cluster_management.rst

dliappis · 2019-11-28T13:19:25Z

docs/cluster_management.rst

+
+    esrally stop --installation-id="69ffcfee-6378-4090-9e93-87c9f8ee59a7" --race-id="${RACE_ID}"
+
+If you only want to shutdown the node but don't want to delete the node and the data, pass ``--preserve-install`` additionally.


"pass additionally --preserve-install" instead?

dliappis · 2019-11-28T13:33:34Z

esrally/mechanic/launcher.py

+    def stop(self, nodes, metrics_store):
+        self.logger.info("Shutting down [%d] nodes running in Docker on this host.", len(nodes))
+        for node in nodes:
+            # readd meta-data - we already did this on startup but in case dedicated subcommands are used for


s/readd/read

I really meant this as: "add the meta-data again". But this was only valid in an earlier commit so I'll remove the comment entirely.

esrally/mechanic/launcher.py

esrally/mechanic/mechanic.py

With this commit we introduce a new `put-settings` operation that can be used to update cluster settings via the REST API. We also deprecate the track property `cluster-settings` which had a similar purpose but the cluster settings ended up in `elasticsearch.yml` instead of being updated via an API. This is now tricky as we will move away from an integrated cluster management (see also elastic#830) and we should instead add settings that need to be persistent in `elasticsearch.yml` via `--car-params` and settings that are per track via the cluster settings API. Relates elastic/rally-tracks#93

dliappis

Finished the review, left a few more comments.

esrally/rally.py

dliappis · 2019-12-02T14:10:42Z

integration-test.sh

+    random_build_type build_type
+
+    # for Docker we force the most recent distribution as we don't have Docker images for all versions that are tested
+    if [[ "$build_type" == "docker" ]]; then


This is possible. What we need to do is:

Change docker-compose.yml.j2 to further specify a label like so:

diff --git a/esrally/resources/docker-compose.yml.j2 b/esrally/resources/docker-compose.yml.j2 index c5bc10a..19e72e9 100644 --- a/esrally/resources/docker-compose.yml.j2 +++ b/esrally/resources/docker-compose.yml.j2 @@ -4,6 +4,8 @@ services: cap_add: - IPC_LOCK image: "{{docker_image}}:{{es_version}}" + labels: + io.rally.description: "Label to help identify Elasticsearch containers launched by Rally" {%- if docker_cpu_count is defined %} cpu_count: {{docker_cpu_count}} {%- endif %}

Use the label to retrieve the ID of container(s) launched by Rally using a filter like docker ps --filter "label=io.rally.description" --format "{{.ID}}" (if >1 they'll be newline separated).
This will match all nodes matching the specific label key. To more targeted, we can also match the value which could be optionally specified via a j2 variable in the compose file, but this functionality would require an optional argument for the start subcommand (e.g. --docker-label="...") to override the j2 variable.

integration-test.sh

drawlerr

Looks good! Some minor comments on arguments for now.

drawlerr · 2019-12-02T14:25:52Z

esrally/rally.py

+    stop_parser.add_argument(
+        "--installation-id",
+        required=True,
+        help="The id of the installation to start",


s/start/stop

Good catch!

drawlerr · 2019-12-02T14:34:30Z

esrally/rally.py

+        "--race-id",
+        required=True,
+        help="Define a unique id for this race.",
+        default="")


Presumably, the "stop" command shouldn't be caring about races as it is just stopping nodes?

Upon stopping a node, Rally needs to store metrics and aggregate them. To have consistent meta-data across all nodes involved in a benchmark, we need the race id to tie them together and that's why this command line parameter is needed. We could probably store it when a node is started but I've opted for this simpler approach here. In fact, we could even skip it upon node startup because we only store metrics when a node is stopped but this felt like leaking implementation details into the command line interface which I wanted to avoid.

I like the approach of storing the active race ID. Any chance we could include it in this PR?

Good idea, it makes the interface definitely simpler and easier to use. I've implemented this in dbbbdb4.

danielmitterdorfer · 2019-12-03T07:43:12Z

@dliappis, @drawlerr thanks for your feedback. I've addressed it now. Can you please have another look?

dliappis · 2019-12-03T08:22:56Z

data-params-hot.json

@@ -0,0 +1,9 @@
+{


This file was added by accident?

removed in 947c110

dliappis · 2019-12-03T08:23:02Z

data-params-warm.json

@@ -0,0 +1,9 @@
+{


This file was added by accident?

removed in 947c110

dliappis · 2019-12-03T08:23:19Z

master-params.json

@@ -0,0 +1,8 @@
+{


This file was added by accident?

removed in 947c110

dliappis

LGTM

drawlerr

Thanks, looks good to me!

danielmitterdorfer · 2019-12-04T11:05:50Z

Thanks to both of you for the review! :)

With this commit we introduce a new `put-settings` operation that can be used to update cluster settings via the REST API. We also deprecate the track property `cluster-settings` which had a similar purpose but the cluster settings ended up in `elasticsearch.yml` instead of being updated via an API. This is now tricky as we will move away from an integrated cluster management (see also #830) and we should instead add settings that need to be persistent in `elasticsearch.yml` via `--car-params` and settings that are per track via the cluster settings API. Relates elastic/rally-tracks#93 Relates #831

danielmitterdorfer added 14 commits November 21, 2019 14:56

Manage nodes via subcommands (WIP)

c05d278

Keep disk I/O state in memory

26943f0

As we pickle the node, in-memory state of telemetry devices is persisted anyway and there is no need to store it in an additional file.

Fix tests

4dedb52

Further cleanups

29ae368

Reduce hacks and add an integration test

a7d4cc0

More cleanups

b90a8f1

Tests for DockerLauncher

af9a597

Merge remote-tracking branch 'origin/master' into node-mgmt

3fd6a58

Platform-neutral Java

529454e

Fine-tune logging

d66b6ff

Also escape Java path

6d98cf9

Check effective user id only on Posix systems

bf0fb93

Adjust paths for launching

fcea15e

Some cleanups

d44210d

danielmitterdorfer added enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics :Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch labels Nov 27, 2019

danielmitterdorfer added this to the 1.4.0 milestone Nov 27, 2019

danielmitterdorfer requested review from dliappis, drawlerr and ebadyano November 27, 2019 14:46

danielmitterdorfer self-assigned this Nov 27, 2019

danielmitterdorfer mentioned this pull request Nov 27, 2019

Allow to manage Elasticsearch nodes separately from benchmarking #697

Closed

7 tasks

Randomize integration tests on build type

fb59b6f

danielmitterdorfer commented Nov 27, 2019

View reviewed changes

Add user docs

93383a0

danielmitterdorfer added the highlight A substantial improvement that is worth mentioning separately in release notes label Nov 28, 2019

dliappis reviewed Nov 28, 2019

View reviewed changes

danielmitterdorfer mentioned this pull request Nov 29, 2019

Expose API for cluster settings #831

Merged

dliappis reviewed Dec 2, 2019

View reviewed changes

drawlerr reviewed Dec 2, 2019

View reviewed changes

danielmitterdorfer added 4 commits December 3, 2019 08:37

Restore --keep-cluster-running

c4a5e38

Shutdown lingering Docker containers in integration-tests

9ef5660

Doc fixes

a1d6136

Only require race id on startup

dbbbdb4

danielmitterdorfer requested review from dliappis and drawlerr December 3, 2019 07:43

Also remove lingering Docker containers

b76e5b0

dliappis reviewed Dec 3, 2019

View reviewed changes

Uncommit

947c110

dliappis self-requested a review December 3, 2019 09:00

dliappis approved these changes Dec 3, 2019

View reviewed changes

drawlerr approved these changes Dec 3, 2019

View reviewed changes

danielmitterdorfer merged commit 2df73e9 into elastic:master Dec 4, 2019

danielmitterdorfer deleted the node-mgmt branch December 4, 2019 11:07

danielmitterdorfer mentioned this pull request Jan 17, 2020

Store system metrics if race metadata are present #874

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage Elasticsearch nodes with dedicated subcommands #830

Manage Elasticsearch nodes with dedicated subcommands #830

danielmitterdorfer commented Nov 27, 2019 •

edited

danielmitterdorfer commented Nov 27, 2019

danielmitterdorfer Nov 27, 2019

dliappis Dec 2, 2019

danielmitterdorfer Dec 2, 2019

dliappis Dec 2, 2019

danielmitterdorfer Dec 3, 2019

dliappis left a comment

dliappis Nov 28, 2019 •

edited

dliappis Nov 28, 2019

danielmitterdorfer Dec 2, 2019

dliappis left a comment

dliappis Dec 2, 2019

drawlerr left a comment

drawlerr Dec 2, 2019

danielmitterdorfer Dec 2, 2019

drawlerr Dec 2, 2019

danielmitterdorfer Dec 2, 2019

drawlerr Dec 2, 2019

danielmitterdorfer Dec 3, 2019

danielmitterdorfer commented Dec 3, 2019

dliappis Dec 3, 2019

danielmitterdorfer Dec 3, 2019

danielmitterdorfer Dec 3, 2019

dliappis Dec 3, 2019

danielmitterdorfer Dec 3, 2019

dliappis Dec 3, 2019

danielmitterdorfer Dec 3, 2019

dliappis left a comment

drawlerr left a comment

danielmitterdorfer commented Dec 4, 2019


		esrally stop --installation-id="69ffcfee-6378-4090-9e93-87c9f8ee59a7" --race-id="${RACE_ID}"

		If you only want to shutdown the node but don't want to delete the node and the data, pass ``--preserve-install`` additionally.

Manage Elasticsearch nodes with dedicated subcommands #830

Manage Elasticsearch nodes with dedicated subcommands #830

Conversation

danielmitterdorfer commented Nov 27, 2019 • edited

danielmitterdorfer commented Nov 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dliappis left a comment

Choose a reason for hiding this comment

dliappis Nov 28, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dliappis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drawlerr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmitterdorfer commented Dec 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dliappis left a comment

Choose a reason for hiding this comment

drawlerr left a comment

Choose a reason for hiding this comment

danielmitterdorfer commented Dec 4, 2019

danielmitterdorfer commented Nov 27, 2019 •

edited

dliappis Nov 28, 2019 •

edited