New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable TCP compression when shipping Lucene segments during peer recovery #33844
Comments
Pinging @elastic/es-core-infra |
Pinging @elastic/es-distributed |
+1 to make transport options have precedence over the setting. It's not completely obvious to me that Lucene files don't need further compression: Lucene files are indeed compressed, but in a way that still provides quick (often random) access, which prevents the effectiveness of compression techniques that may be used. Re-compressiong on top of Lucene might bring further space savings. It might still be the wrong trade-off, especially on a local network, but I wanted to clarify this point. |
I agree that transport options set explicitly on the request should override the
|
cc @original-brownbear @tbrooks8 |
@ywelsch @jasontedor I experimented with compression performance today and there's hard data provided by @DaveCTurner provided in a linked issue (more details there) and both my experiments and the provided data suggest that compression severely limits the network throughput to around 20-30MB/s (~160-240Mbit/s) |
@ywelsch thanks for pointing out the issue with the data I worked from :) This makes the situation a little closer I guess :) |
We discussed this during FixitFriday today. While re-compression on top of Lucene can bring further space savings (needs more benchmarks, but numbers between 10% to 50% reduction in size were mentioned), it comes at the cost of additional CPU usage, which 1) can have a negative impact on indexing / search performance, as recovery is supposed to be a background task and 2) due to the way we're sequentially sending files during recovery, can limit the throughput of single shard recoveries. While we're less enthusiastic about adding a new configuration option just for recoveries, we want to explore always disabling compression for phase 1 of recoveries, even if |
@ywelsch I looked into 2.4 and it seems the same issues this PR points out is present there as well: The global setting overrides the per request option exactly like it still does, see https://github.com/elastic/elasticsearch/blob/2.4/core/src/main/java/org/elasticsearch/transport/netty/NettyTransport.java#L832. So while we even did have the option (removed in #15235) to set compression off for the recovery it had no effect in practice with compression enabled as far as I can tell. => seems like this issue isn't new at all. That said in the real world, I'd still vote for turning off compression for shipping the segments even when compression is globally enabled:
=> IMO it's consistent to turn off compression for data that gets little benefit out of it in this special case. |
Also:
Shouldn't we regardless of what we decide on how we want to transfer the segment data fix this to actually have the options override the global setting (or remove the option?)? |
* Individual setting should only be able to negate the global compress setting since `org.elasticsearch.transport.TcpTransport#canCompress` ensures that compression only ever happens if the global compression is enabled regardless of the `TransportRequestOptions` * Disables compression of segment files during recovery to bring code and comment in line with each other * Fixes elastic#33844
Created a PR with the above suggestion #34959 |
I ran a compression test with different datasets and chunk sizes. These indices use the default codec without force merge. Below is the result.
The detail can be found here. /cc @ywelsch |
@dnhatn thanks for running this test. A few observations:
|
Another observation is that the compression ratio varies a lot depending on the extension. Out of curiosity, would it be possible to figure out whether to enable compression dynamically, eg. by starting by compressing half the chunks and decreasing or increasing this ratio dynamically depending on whichever performs faster? |
Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the connection is encrypted, we might not be able to saturate the network pipe because encrypting/decrypting are cpu bound rather than network-bound. With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target. Below are the benchmark results for PMC and NYC_taxis. - PMC (20.2 GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| -------- | -------- | -------- | -------- | | Plain | 184s | 137s | 106s | 105s | 106s | | TLS | 346s | 294s | 176s | 153s | 117s | | Compress | 1556s | 1407s | 1193s | 1183s | 1211s | - NYC_Taxis (38.6GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| ---------| ---------| ---------| -------- | | Plain | 321s | 249s | 191s | * | * | | TLS | 618s | 539s | 323s | 290s | 213s | | Compress | 2622s | 2421s | 2018s | 2029s | n/a | Relates #33844
@dnhatn What was the network bandwidth between the nodes in your benchmark? Or was it |
@jasontedor It's a 12GiB connection between two GCP instances. |
Have you done benchmarks on slower networks or networks with high-latency? |
Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the connection is encrypted, we might not be able to saturate the network pipe because encrypting/decrypting are cpu bound rather than network-bound. With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target. Below are the benchmark results for PMC and NYC_taxis. - PMC (20.2 GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| -------- | -------- | -------- | -------- | | Plain | 184s | 137s | 106s | 105s | 106s | | TLS | 346s | 294s | 176s | 153s | 117s | | Compress | 1556s | 1407s | 1193s | 1183s | 1211s | - NYC_Taxis (38.6GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| ---------| ---------| ---------| -------- | | Plain | 321s | 249s | 191s | * | * | | TLS | 618s | 539s | 323s | 290s | 213s | | Compress | 2622s | 2421s | 2018s | 2029s | n/a | Relates #33844
@jasontedor Not yet. The above result is a pure compression test. |
I think an option to disable Some metrics, we index about 500K docs per second 24/7 and rely on Our use case would benefit greatly from separate options for compression during online/offline replication and recovery. |
I'm closing this issue since there's been quite a bit of related work to optimize recoveries since this issue was open. |
Shipping lucene segments during peer recovery is configured to set the transport compress options to false:
elasticsearch/server/src/main/java/org/elasticsearch/indices/recovery/RemoteRecoveryTargetHandler.java
Lines 70 to 71 in 7f473b6
Unfortunately this is overridden when
transport.tcp.compress
is enabled, discarding the request-level option:elasticsearch/server/src/main/java/org/elasticsearch/transport/TcpTransport.java
Lines 873 to 875 in 7f473b6
This means that when
transport.tcp.compress
is enabled, the (already compressed) Lucene files are unnecessarily compressed a second time, slowing peer recovery.My preference would be that transport options that were explicitly set (
withCompress(true)
orwithCompress(false)
) take precedence overtransport.tcp.compress
.Relates to https://discuss.elastic.co/t/elasticsearch-6-3-0-shard-recovery-is-slow/140940/
The text was updated successfully, but these errors were encountered: