Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Ccr recovery file chunk size configurable #38370

Merged
merged 6 commits into from Feb 5, 2019

Conversation

Projects
None yet
5 participants
@tbrooks8
Copy link
Contributor

commented Feb 4, 2019

This commit adds a byte setting ccr.indices.recovery.chunk_size. This
setting configs the size of file chunk requested while recovering from
remote.

tbrooks8 added some commits Feb 4, 2019

WIP
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Feb 4, 2019

@tbrooks8

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2019

I don't know if our plan is to actually merge this. But I pushed it up here. It un-awaits fix a test that only failed due to a different test failing (because I use that test here). If we do not actually merge this PR, I will do that change elsewhere.

@dliappis - You can use this branch for benchmarking. The setting is ccr.indices.recovery.chunk_size. Theoretically you should see increases in performance as you increase that setting. I sanity checked the setting be running in debug mode and ensuring that it is use if set.

@tbrooks8

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2019

If we do want to add this setting to production, I will probably need to add a max size (since I do the Math.toIntExact call).

tbrooks8 added some commits Feb 4, 2019

@dliappis

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

Using the geopoint track and 1MB chunk size, the time for the entire recovery from remote task (this includes recovery from remote, plus whatever remaining docs where ops > global checkpoint that got fetched by CCR) went down to:

0:14:04.368000 i.e. 14mins.

Earlier with the default 64KB the same operation took: 3:59:44.900000 (approx 4hrs).

The latency of the connection is ~100-130ms.

The amount of docs/store size can be seen below:

green open leader tzH5IiTRQEKEGOQTRkuIiQ 3 1 60844404 0 8.3gb 4.9gb

green open follower y4trGB0kTNSeum5h109XxA 3 1 60844404 0 6.8gb 3.4gb

Network stats on follower

Node 0

image

Node 1

image

Node 2

image

Rally summary

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/


[INFO] Racing on track [geopoint], challenge [autogenerated-ids-then-recovery-from-remote] and car ['external'] with version [7.0.0-SNAPSHOT].

Running setup-delete-remote-follower-index                                     [100% done]
Running setup-delete-leader-index                                              [100% done]
Running check-cluster-health                                                   [100% done]
Running create-leader-indices                                                  [100% done]
Running wait-green                                                             [100% done]
Running bulk-leader-index-autogenerated-ids                                    [100% done]
Running refresh-after-bulk                                                     [100% done]
Running flush-all-indices                                                      [100% done]
Running join-ccr-clusters                                                      [100% done]
Running disable-recovery-rate-limiting-on-leader                               [100% done]
Running disable-recovery-rate-limiting-on-follower                             [100% done]
Running set-recovery-file-chunk-size-to-1MB-on-follower                        [100% done]
Running start-following-from-leader                                            [100% done]
Running wait-for-followers-to-sync                                             [100% done]
[INFO] Keeping benchmark candidate including index at (may need several GB).

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|   Lap |                                                         Metric |                                Task |      Value |   Unit |
|------:|---------------------------------------------------------------:|------------------------------------:|-----------:|-------:|
|   All |                     Cumulative indexing time of primary shards |                                     |    11.5354 |    min |
|   All |             Min cumulative indexing time across primary shards |                                     |     3.6826 |    min |
|   All |          Median cumulative indexing time across primary shards |                                     |    3.88147 |    min |
|   All |             Max cumulative indexing time across primary shards |                                     |    3.97132 |    min |
|   All |            Cumulative indexing throttle time of primary shards |                                     |          0 |    min |
|   All |    Min cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All | Median cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |    Max cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |                        Cumulative merge time of primary shards |                                     |    10.2602 |    min |
|   All |                       Cumulative merge count of primary shards |                                     |        196 |        |
|   All |                Min cumulative merge time across primary shards |                                     |    2.26245 |    min |
|   All |             Median cumulative merge time across primary shards |                                     |    3.85265 |    min |
|   All |                Max cumulative merge time across primary shards |                                     |     4.1451 |    min |
|   All |               Cumulative merge throttle time of primary shards |                                     |     4.7776 |    min |
|   All |       Min cumulative merge throttle time across primary shards |                                     |   0.789417 |    min |
|   All |    Median cumulative merge throttle time across primary shards |                                     |    1.87313 |    min |
|   All |       Max cumulative merge throttle time across primary shards |                                     |    2.11505 |    min |
|   All |                      Cumulative refresh time of primary shards |                                     |   0.633883 |    min |
|   All |                     Cumulative refresh count of primary shards |                                     |        364 |        |
|   All |              Min cumulative refresh time across primary shards |                                     |   0.206733 |    min |
|   All |           Median cumulative refresh time across primary shards |                                     |   0.210983 |    min |
|   All |              Max cumulative refresh time across primary shards |                                     |   0.216167 |    min |
|   All |                        Cumulative flush time of primary shards |                                     |       0.32 |    min |
|   All |                       Cumulative flush count of primary shards |                                     |         18 |        |
|   All |                Min cumulative flush time across primary shards |                                     |     0.0934 |    min |
|   All |             Median cumulative flush time across primary shards |                                     |   0.101533 |    min |
|   All |                Max cumulative flush time across primary shards |                                     |   0.125067 |    min |
|   All |                                             Total Young Gen GC |                                     |     28.026 |      s |
|   All |                                               Total Old Gen GC |                                     |          0 |      s |
|   All |                                                     Store size |                                     |    8.38973 |     GB |
|   All |                                                  Translog size |                                     |    3.18834 |     GB |
|   All |                                         Heap used for segments |                                     |    31.3554 |     MB |
|   All |                                       Heap used for doc values |                                     | 0.00350189 |     MB |
|   All |                                            Heap used for terms |                                     |    29.2354 |     MB |
|   All |                                            Heap used for norms |                                     |          0 |     MB |
|   All |                                           Heap used for points |                                     |   0.810921 |     MB |
|   All |                                    Heap used for stored fields |                                     |    1.30553 |     MB |
|   All |                                                  Segment count |                                     |         54 |        |
|   All |                                                 Min Throughput | bulk-leader-index-autogenerated-ids |     265589 | docs/s |
|   All |                                              Median Throughput | bulk-leader-index-autogenerated-ids |     269994 | docs/s |
|   All |                                                 Max Throughput | bulk-leader-index-autogenerated-ids |     273376 | docs/s |
|   All |                                        50th percentile latency | bulk-leader-index-autogenerated-ids |     131.36 |     ms |
|   All |                                        90th percentile latency | bulk-leader-index-autogenerated-ids |    182.432 |     ms |
|   All |                                        99th percentile latency | bulk-leader-index-autogenerated-ids |    249.803 |     ms |
|   All |                                      99.9th percentile latency | bulk-leader-index-autogenerated-ids |     457.52 |     ms |
|   All |                                       100th percentile latency | bulk-leader-index-autogenerated-ids |    567.282 |     ms |
|   All |                                   50th percentile service time | bulk-leader-index-autogenerated-ids |     131.36 |     ms |
|   All |                                   90th percentile service time | bulk-leader-index-autogenerated-ids |    182.432 |     ms |
|   All |                                   99th percentile service time | bulk-leader-index-autogenerated-ids |    249.803 |     ms |
|   All |                                 99.9th percentile service time | bulk-leader-index-autogenerated-ids |     457.52 |     ms |
|   All |                                  100th percentile service time | bulk-leader-index-autogenerated-ids |    567.282 |     ms |
|   All |                                                     error rate | bulk-leader-index-autogenerated-ids |          0 |      % |
|   All |                                                 Min Throughput |                   flush-all-indices |       1.94 |  ops/s |
|   All |                                              Median Throughput |                   flush-all-indices |       1.94 |  ops/s |
|   All |                                                 Max Throughput |                   flush-all-indices |       1.94 |  ops/s |
|   All |                                       100th percentile latency |                   flush-all-indices |    515.485 |     ms |
|   All |                                  100th percentile service time |                   flush-all-indices |    515.485 |     ms |
|   All |                                                     error rate |                   flush-all-indices |          0 |      % |
|   All |                                                 Min Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                              Median Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                                 Max Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                       100th percentile latency |          wait-for-followers-to-sync |     844368 |     ms |
|   All |                                  100th percentile service time |          wait-for-followers-to-sync |     844368 |     ms |
|   All |                                                     error rate |          wait-for-followers-to-sync |          0 |      % |


----------------------------------
[INFO] SUCCESS (took 1167 seconds)
----------------------------------
@dliappis

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

One clarification is that the small peak in the network graphs towards the end is due to peer recovery (within the network of the followers) and not related to recovery from remote or CCR.

@dliappis

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

Using 5MB chunk size

An additional experiment with 5MB chunk size (same track, same settings, with the exception of using 0 replicas this time to avoid the network spikes at the end of the benchmark).

  • Time taken for recovery from remote: 0:04:47.406000 (down from 14min when using 1MB chunk). Each node was hovering between 5.6 - 6.7MB/s of network traffic
Rally summary
Auto-updating Rally from origin

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/


[INFO] Racing on track [geopoint], challenge [autogenerated-ids-then-recovery-from-remote] and car ['external'] with version [7.0.0-SNAPSHOT].

Running setup-delete-remote-follower-index                                     [100% done]
Running setup-delete-leader-index                                              [100% done]
Running check-cluster-health                                                   [100% done]
Running create-leader-indices                                                  [100% done]
Running wait-green                                                             [100% done]
Running bulk-leader-index-autogenerated-ids                                    [100% done]
Running refresh-after-bulk                                                     [100% done]
Running flush-all-indices                                                      [100% done]
Running join-ccr-clusters                                                      [100% done]
Running disable-recovery-rate-limiting-on-leader                               [100% done]
Running disable-recovery-rate-limiting-on-follower                             [100% done]
Running set-recovery-file-chunk-size-on-follower                               [100% done]
Running start-following-from-leader                                            [100% done]
Running wait-for-followers-to-sync                                             [100% done]
[INFO] Keeping benchmark candidate including index at (may need several GB).

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|   Lap |                                                         Metric |                                Task |      Value |   Unit |
|------:|---------------------------------------------------------------:|------------------------------------:|-----------:|-------:|
|   All |                     Cumulative indexing time of primary shards |                                     |     11.493 |    min |
|   All |             Min cumulative indexing time across primary shards |                                     |    3.64602 |    min |
|   All |          Median cumulative indexing time across primary shards |                                     |    3.89948 |    min |
|   All |             Max cumulative indexing time across primary shards |                                     |    3.94748 |    min |
|   All |            Cumulative indexing throttle time of primary shards |                                     |          0 |    min |
|   All |    Min cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All | Median cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |    Max cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |                        Cumulative merge time of primary shards |                                     |    3.88052 |    min |
|   All |                       Cumulative merge count of primary shards |                                     |         41 |        |
|   All |                Min cumulative merge time across primary shards |                                     |    1.24525 |    min |
|   All |             Median cumulative merge time across primary shards |                                     |    1.28603 |    min |
|   All |                Max cumulative merge time across primary shards |                                     |    1.34923 |    min |
|   All |               Cumulative merge throttle time of primary shards |                                     |      1.408 |    min |
|   All |       Min cumulative merge throttle time across primary shards |                                     |     0.4442 |    min |
|   All |    Median cumulative merge throttle time across primary shards |                                     |     0.4598 |    min |
|   All |       Max cumulative merge throttle time across primary shards |                                     |      0.504 |    min |
|   All |                      Cumulative refresh time of primary shards |                                     |   0.187767 |    min |
|   All |                     Cumulative refresh count of primary shards |                                     |         84 |        |
|   All |              Min cumulative refresh time across primary shards |                                     |  0.0553833 |    min |
|   All |           Median cumulative refresh time across primary shards |                                     |  0.0585667 |    min |
|   All |              Max cumulative refresh time across primary shards |                                     |  0.0738167 |    min |
|   All |                        Cumulative flush time of primary shards |                                     |   0.509167 |    min |
|   All |                       Cumulative flush count of primary shards |                                     |         18 |        |
|   All |                Min cumulative flush time across primary shards |                                     |     0.1652 |    min |
|   All |             Median cumulative flush time across primary shards |                                     |   0.167733 |    min |
|   All |                Max cumulative flush time across primary shards |                                     |   0.176233 |    min |
|   All |                                             Total Young Gen GC |                                     |     14.692 |      s |
|   All |                                               Total Old Gen GC |                                     |          0 |      s |
|   All |                                                     Store size |                                     |    4.12151 |     GB |
|   All |                                                  Translog size |                                     |     1.5017 |     GB |
|   All |                                         Heap used for segments |                                     |    27.0411 |     MB |
|   All |                                       Heap used for doc values |                                     | 0.00583649 |     MB |
|   All |                                            Heap used for terms |                                     |    25.1515 |     MB |
|   All |                                            Heap used for norms |                                     |          0 |     MB |
|   All |                                           Heap used for points |                                     |   0.784356 |     MB |
|   All |                                    Heap used for stored fields |                                     |    1.09942 |     MB |
|   All |                                                  Segment count |                                     |         62 |        |
|   All |                                                 Min Throughput | bulk-leader-index-autogenerated-ids |     362048 | docs/s |
|   All |                                              Median Throughput | bulk-leader-index-autogenerated-ids |     378572 | docs/s |
|   All |                                                 Max Throughput | bulk-leader-index-autogenerated-ids |     381745 | docs/s |
|   All |                                        50th percentile latency | bulk-leader-index-autogenerated-ids |     83.327 |     ms |
|   All |                                        90th percentile latency | bulk-leader-index-autogenerated-ids |    106.878 |     ms |
|   All |                                        99th percentile latency | bulk-leader-index-autogenerated-ids |    164.788 |     ms |
|   All |                                      99.9th percentile latency | bulk-leader-index-autogenerated-ids |    2031.68 |     ms |
|   All |                                       100th percentile latency | bulk-leader-index-autogenerated-ids |    2324.91 |     ms |
|   All |                                   50th percentile service time | bulk-leader-index-autogenerated-ids |     83.327 |     ms |
|   All |                                   90th percentile service time | bulk-leader-index-autogenerated-ids |    106.878 |     ms |
|   All |                                   99th percentile service time | bulk-leader-index-autogenerated-ids |    164.788 |     ms |
|   All |                                 99.9th percentile service time | bulk-leader-index-autogenerated-ids |    2031.68 |     ms |
|   All |                                  100th percentile service time | bulk-leader-index-autogenerated-ids |    2324.91 |     ms |
|   All |                                                     error rate | bulk-leader-index-autogenerated-ids |          0 |      % |
|   All |                                                 Min Throughput |                   flush-all-indices |       3.33 |  ops/s |
|   All |                                              Median Throughput |                   flush-all-indices |       3.33 |  ops/s |
|   All |                                                 Max Throughput |                   flush-all-indices |       3.33 |  ops/s |
|   All |                                       100th percentile latency |                   flush-all-indices |     300.37 |     ms |
|   All |                                  100th percentile service time |                   flush-all-indices |     300.37 |     ms |
|   All |                                                     error rate |                   flush-all-indices |          0 |      % |
|   All |                                                 Min Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                              Median Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                                 Max Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                       100th percentile latency |          wait-for-followers-to-sync |     287406 |     ms |
|   All |                                  100th percentile service time |          wait-for-followers-to-sync |     287406 |     ms |
|   All |                                                     error rate |          wait-for-followers-to-sync |          0 |      % |


---------------------------------
[INFO] SUCCESS (took 546 seconds)
---------------------------------

Network stats on the follower

Node 0

image

Node 1

image

Node 2

image

@dliappis
Copy link
Contributor

left a comment

LGTM (also see the benchmarking results).

@tbrooks8 tbrooks8 referenced this pull request Feb 5, 2019

Open

Implement CCR bootstrap from remote #35975

28 of 33 tasks complete
@tbrooks8

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/default-distro

@ywelsch

ywelsch approved these changes Feb 5, 2019

@tbrooks8 tbrooks8 merged commit 4a15e2b into elastic:master Feb 5, 2019

7 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 5, 2019

Merge branch 'master' into retention-leases-recovery
* master: (23 commits)
  Lift retention lease expiration to index shard (elastic#38380)
  Make Ccr recovery file chunk size configurable (elastic#38370)
  Prevent CCR recovery from missing documents (elastic#38237)
  re-enables awaitsfixed datemath tests (elastic#38376)
  Types removal fix FullClusterRestartIT warnings (elastic#38445)
  Make sure to reject mappings with type _doc when include_type_name is false. (elastic#38270)
  Updates the grok patterns to be consistent with logstash (elastic#27181)
  Ignore type-removal warnings in XPackRestTestHelper (elastic#38431)
  testHlrcFromXContent() should respect assertToXContentEquivalence() (elastic#38232)
  add basic REST test for geohash_grid (elastic#37996)
  Remove DiscoveryPlugin#getDiscoveryTypes (elastic#38414)
  Fix the clock resolution to millis in GetWatchResponseTests (elastic#38405)
  Throw AssertionError when no master (elastic#38432)
  `if_seq_no` and `if_primary_term` parameters aren't wired correctly in REST Client's CRUD API (elastic#38411)
  Enable CronEvalToolTest.testEnsureDateIsShownInRootLocale (elastic#38394)
  Fix failures in BulkProcessorIT#testGlobalParametersAndBulkProcessor. (elastic#38129)
  SQL: Implement CURRENT_DATE (elastic#38175)
  Mute testReadRequestsReturnLatestMappingVersion (elastic#38438)
  [ML] Report index unavailable instead of waiting for lazy node (elastic#38423)
  Update Rollup Caps to allow unknown fields (elastic#38339)
  ...

@colings86 colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

tbrooks8 added a commit to tbrooks8/elasticsearch that referenced this pull request Feb 8, 2019

Make Ccr recovery file chunk size configurable (elastic#38370)
This commit adds a byte setting `ccr.indices.recovery.chunk_size`. This
setting configs the size of file chunk requested while recovering from
remote.

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 12, 2019

Make Ccr recovery file chunk size configurable (elastic#38370)
This commit adds a byte setting `ccr.indices.recovery.chunk_size`. This
setting configs the size of file chunk requested while recovering from
remote.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.