Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable transport compression between nodes to reduce DTS costs #73497

Closed
10 of 12 tasks
dakrone opened this issue May 27, 2021 · 3 comments
Closed
10 of 12 tasks

Enable transport compression between nodes to reduce DTS costs #73497

dakrone opened this issue May 27, 2021 · 3 comments
Assignees
Labels
:Distributed/Network Http and internode communication implementations >enhancement Meta Team:Distributed Meta label for distributed team

Comments

@dakrone
Copy link
Member

dakrone commented May 27, 2021

Summary

In order to reduce DTS costs for cross-zone data transfer, we should investigate whether we should enable transport compression between nodes by default.

We could consider using a lightweight compression algorithm between the coordinating node and the primary shard, so that bulk requests are compressed. We would want to investigate using a lightweight compression algorithm such as LZ4, which is light on CPU usage so that we don't have a deleterious impact on indexing throughput.

7.14

7.15

7.16

Maybe Later:

  • Transition to LZ4 frame format opposed to custom lz4-java block format
@dakrone dakrone added >enhancement :Distributed/Network Http and internode communication implementations labels May 27, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label May 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@Tim-Brooks Tim-Brooks self-assigned this May 27, 2021
Tim-Brooks added a commit that referenced this issue Jun 29, 2021
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Jun 29, 2021
This commit is related to elastic#73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
Tim-Brooks added a commit that referenced this issue Jun 29, 2021
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Jul 6, 2021
This is related to elastic#73497. Currently replica requests are wrapped in a
concrete replica shard request. This leads to the transport layer not
properly identifying them as replica index_data requests and not
compressing them properly. This commit resolves this bug.
Tim-Brooks added a commit that referenced this issue Jul 6, 2021
This is related to #73497. Currently replica requests are wrapped in a
concrete replica shard request. This leads to the transport layer not
properly identifying them as replica index_data requests and not
compressing them properly. This commit resolves this bug.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Jul 6, 2021
This is related to elastic#73497. Currently replica requests are wrapped in a
concrete replica shard request. This leads to the transport layer not
properly identifying them as replica index_data requests and not
compressing them properly. This commit resolves this bug.
Tim-Brooks added a commit that referenced this issue Jul 6, 2021
This is related to #73497. Currently replica requests are wrapped in a
concrete replica shard request. This leads to the transport layer not
properly identifying them as replica index_data requests and not
compressing them properly. This commit resolves this bug.
Tim-Brooks added a commit that referenced this issue Jul 6, 2021
This is related to #73497. Currently replica requests are wrapped in a
concrete replica shard request. This leads to the transport layer not
properly identifying them as replica index_data requests and not
compressing them properly. This commit resolves this bug.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Aug 12, 2021
In 7.15, we intend for the `indexing_data` compression level and the
compression scheme `lz4` to no longer be experimental. This commit
updates the documentation to reflect this.

Relates to elastic#73497.
Tim-Brooks added a commit that referenced this issue Aug 12, 2021
In 7.15, we intend for the indexing_data compression level and the
compression scheme lz4 to no longer be experimental. This commit
updates the documentation to reflect this. Additionally, it adds
missing docs for the cluster.remote.*.transport.compression_scheme
setting.

Relates to #73497.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Aug 13, 2021
In 7.15, we intend for the indexing_data compression level and the
compression scheme lz4 to no longer be experimental. This commit
updates the documentation to reflect this. Additionally, it adds
missing docs for the cluster.remote.*.transport.compression_scheme
setting.

Relates to elastic#73497.
Tim-Brooks added a commit that referenced this issue Aug 13, 2021
In 7.15, we intend for the indexing_data compression level and the
compression scheme lz4 to no longer be experimental. This commit
updates the documentation to reflect this. Additionally, it adds
missing docs for the cluster.remote.*.transport.compression_scheme
setting.

Relates to #73497.
Tim-Brooks added a commit that referenced this issue Aug 13, 2021
This is related to #73497. Currently, we only use the configured
transport.compression_scheme setting when compressing a request or a
response. Additionally, the cluster.remote.*.compression_scheme
setting is ignored. This commit fixes this behavior by respecting the
per-cluster setting. Additionally, it resolves confusion around inbound
and outbound connections by always responding with the same scheme that
was received. This allows remote connections to have different schemes
than local connections.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Aug 13, 2021
This is related to elastic#73497. Currently, we only use the configured
transport.compression_scheme setting when compressing a request or a
response. Additionally, the cluster.remote.*.compression_scheme
setting is ignored. This commit fixes this behavior by respecting the
per-cluster setting. Additionally, it resolves confusion around inbound
and outbound connections by always responding with the same scheme that
was received. This allows remote connections to have different schemes
than local connections.
Tim-Brooks added a commit that referenced this issue Aug 14, 2021
This is related to #73497. Currently, we only use the configured
transport.compression_scheme setting when compressing a request or a
response. Additionally, the cluster.remote.*.compression_scheme
setting is ignored. This commit fixes this behavior by respecting the
per-cluster setting. Additionally, it resolves confusion around inbound
and outbound connections by always responding with the same scheme that
was received. This allows remote connections to have different schemes
than local connections.
Tim-Brooks added a commit that referenced this issue Aug 17, 2021
This commit enables LZ4 transport compression by default at the
indexing_data level.

Relates to #73497.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Aug 18, 2021
Currently we use the custom lz4-block scheme when compressing data. This
scheme automatically calculates and write a checksum when compressing.
We do not actually read this checksum when decompressing so it is
unnecessary. This commit resolves this by not writing a no-op checksum.
This will break arbitrary decompressors. However, since the lz4 block
format is not an official format anyway, this should be fine.

Relates to elastic#73497/
Tim-Brooks added a commit that referenced this issue Aug 18, 2021
Currently we use the custom lz4-block scheme when compressing data. This
scheme automatically calculates and write a checksum when compressing.
We do not actually read this checksum when decompressing so it is
unnecessary. This commit resolves this by not writing a no-op checksum.
This will break arbitrary decompressors. However, since the lz4 block
format is not an official format anyway, this should be fine.

Relates to #73497.
Tim-Brooks added a commit that referenced this issue Aug 18, 2021
Currently we use the custom lz4-block scheme when compressing data. This
scheme automatically calculates and write a checksum when compressing.
We do not actually read this checksum when decompressing so it is
unnecessary. This commit resolves this by not writing a no-op checksum.
This will break arbitrary decompressors. However, since the lz4 block
format is not an official format anyway, this should be fine.

Relates to #73497.
@repantis repantis added the Meta label Sep 2, 2021
@Tim-Brooks
Copy link
Contributor

This will be closed by #80165.

@Tim-Brooks
Copy link
Contributor

Closing as the final required task is tracked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Network Http and internode communication implementations >enhancement Meta Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

4 participants