Prepare for removal of the CMS garbage collector #46973

danielmitterdorfer · 2019-09-23T12:02:34Z

Context

According to JEP 363 (originally draft JEP 8229049) the CMS garbage collector will be removed from OpenJDK (discussion of the patch and commit). The corresponding discussion on the OpenJDK mailing list mentions JDK 14 as targeted JDK. There is currently no release date for JDK 14 but based on past release cadence we should expect it to be released in late March 2020. As Elasticsearch currently recommends the CMS garbage collector we should be prepared for its removal. I'm raising this issue as a meta-issue to collect our thoughts on what is missing to run Elasticsearch with a different garbage collector than CMS out of the box.

Prior work

We have already investigated G1GC as an alternative garbage collector for Elasticsearch in #33685 and have recently adjusted the out-of-the box settings in #46169 to improve the effectiveness of Elasticsearch's real-memory circuit breaker with G1GC.

Tasks

The following list may be incomplete, please add new tasks as needed:

Ensure we don't use CMS for JDK 14 anymore: Restrict support for CMS to pre-JDK 14 #49123
Investigate whether we always want to use G1 as the garbage collector (it might make sense to ergonomically choose parallel GC for smaller heaps) - see Prepare for removal of the CMS garbage collector #46973 (comment).
Update our default JVM configuration to not use CMS Restrict support for CMS to pre-JDK 14 #49123
Update our reference documentation to remove any references to CMS (e.g. our bootstrap checks docs)

elasticmachine · 2019-09-23T12:02:36Z

Pinging @elastic/es-core-infra

ebadyano · 2019-10-24T15:29:41Z

I did a few experiments with jdk13 G1GC, running nyc_taxic, 1 node benchmark on our original nightly benchmarks environment:

with 4G heap, I was previously triggering circuit breaker when running with jdk12 and G1GC enabled; with jdk13 I can run to completion fine with even with default (1G heap) and 4G heaps, although the total accumulated GC pauses are slightly higher than with CMS. Indexing throughout is regressed by ~5%
With 8G heap, I see an improvement in GC pauses when running with 8G heap and interestingly indexing throughout seems to improve by ~8% at least with nyc_taxis-1-node.
I tried running with -XX:+UseStringDeduplication + G1GC, and for some reason circuit breaker is triggered with 4G heap (didn't try with 1G heap). With 8G heap I see slightly higher accumulated GC pauses with -XX:+UseStringDeduplication and indexing throughput is regressed by ~6% compared to G1GC without -XX:+UseStringDeduplication
I also tried http_logs with jdk13 and 8G heap: indexing throughput for G1GC is more or less on par with CMS, sometimes slightly higher, sometimes slightly lower

ebadyano · 2019-11-18T23:24:21Z

Logs for nyc_taxis benchmark with 4g heap + G1GC with jdk12 (intermittently fails with circuit breaker, logs with fail and completed runs), and jdk13 run:
https://drive.google.com/open?id=16clDwJohf50-KA1YFuLzrc7WooUwi9Zg

ebadyano · 2019-12-06T15:15:40Z

Tried -XX:++UseParallelOldGC with default and 1g heap with http_logs, it keeps triggering circuit breaker. I tried running with indices.breaker.total.limit: 99% it takes a bit longer to trigger circuit breaker, but it still gets there during indexing. With indices.breaker.total.limit: 99.9% it triggers circuit breaker intermittently during queries.

ebadyano · 2019-12-06T15:22:05Z

when Parallel GC does finish, indexing and queries seem to be faster for 1g heap:
http_logs G1GC vs Parallel

|                                                        Metric |                                        Task |   Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|--------------------------------------------:|-----------:|------------:|---------:|--------:|
|                    Cumulative indexing time of primary shards |                                             |    128.667 |     105.868 | -22.7986 |     min |
|             Min cumulative indexing time across primary shard |                                             |     0.4068 |      0.3474 |  -0.0594 |     min |
|          Median cumulative indexing time across primary shard |                                             |    1.26412 |      0.9528 | -0.31132 |     min |
|             Max cumulative indexing time across primary shard |                                             |     18.991 |     15.6193 | -3.37172 |     min |
|           Cumulative indexing throttle time of primary shards |                                             |          0 |           0 |        0 |     min |
|    Min cumulative indexing throttle time across primary shard |                                             |          0 |           0 |        0 |     min |
| Median cumulative indexing throttle time across primary shard |                                             |          0 |           0 |        0 |     min |
|    Max cumulative indexing throttle time across primary shard |                                             |          0 |           0 |        0 |     min |
|                       Cumulative merge time of primary shards |                                             |    111.338 |     105.281 | -6.05687 |     min |
|                      Cumulative merge count of primary shards |                                             |        526 |         523 |       -3 |         |
|                Min cumulative merge time across primary shard |                                             |  0.0459833 |   0.0406833 |  -0.0053 |     min |
|             Median cumulative merge time across primary shard |                                             |   0.279417 |      0.2484 | -0.03102 |     min |
|                Max cumulative merge time across primary shard |                                             |    22.1314 |     21.0968 | -1.03457 |     min |
|              Cumulative merge throttle time of primary shards |                                             |    68.1129 |     65.1518 | -2.96117 |     min |
|       Min cumulative merge throttle time across primary shard |                                             |          0 |           0 |        0 |     min |
|    Median cumulative merge throttle time across primary shard |                                             |          0 |           0 |        0 |     min |
|       Max cumulative merge throttle time across primary shard |                                             |     14.112 |     14.4817 |  0.36973 |     min |
|                     Cumulative refresh time of primary shards |                                             |    13.0514 |     13.4064 |  0.35507 |     min |
|                    Cumulative refresh count of primary shards |                                             |       2130 |        2293 |      163 |         |
|              Min cumulative refresh time across primary shard |                                             |    0.04335 |      0.0387 | -0.00465 |     min |
|           Median cumulative refresh time across primary shard |                                             |     0.1217 |    0.139367 |  0.01767 |     min |
|              Max cumulative refresh time across primary shard |                                             |    1.97307 |     1.96867 |  -0.0044 |     min |
|                       Cumulative flush time of primary shards |                                             |    4.22462 |       4.856 |  0.63138 |     min |
|                      Cumulative flush count of primary shards |                                             |        117 |         115 |       -2 |         |
|                Min cumulative flush time across primary shard |                                             |      0.001 |     0.00415 |  0.00315 |     min |
|             Median cumulative flush time across primary shard |                                             |  0.0108167 |   0.0139667 |  0.00315 |     min |
|                Max cumulative flush time across primary shard |                                             |   0.840933 |      0.9375 |  0.09657 |     min |
|                                            Total Young Gen GC |                                             |    105.258 |      88.501 |  -16.757 |       s |
|                                              Total Old Gen GC |                                             |          0 |      21.663 |   21.663 |       s |
|                                                    Store size |                                             |    19.1857 |     18.9982 | -0.18753 |      GB |
|                                                 Translog size |                                             | 1.7928e-06 |  1.7928e-06 |        0 |      GB |
|                                        Heap used for segments |                                             |    105.664 |     102.275 | -3.38937 |      MB |
|                                      Heap used for doc values |                                             | 0.00476456 |  0.00476456 |        0 |      MB |
|                                           Heap used for terms |                                             |    98.7018 |     95.3662 | -3.33561 |      MB |
|                                           Heap used for norms |                                             | 0.00213623 |  0.00213623 |        0 |      MB |
|                                          Heap used for points |                                             |          0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |                                             |    6.95544 |     6.90169 | -0.05376 |      MB |
|                                                 Segment count |                                             |         35 |          35 |        0 |         |
|                                                Min Throughput |                                index-append |     150090 |      161244 |  11153.6 |  docs/s |
|                                             Median Throughput |                                index-append |     163942 |      170634 |  6692.13 |  docs/s |
|                                                Max Throughput |                                index-append |     195448 |      199597 |  4149.01 |  docs/s |
|                                       50th percentile latency |                                index-append |    234.193 |     208.972 | -25.2208 |      ms |
|                                       90th percentile latency |                                index-append |     359.03 |     339.721 | -19.3083 |      ms |
|                                       99th percentile latency |                                index-append |    1038.53 |     989.105 | -49.4283 |      ms |
|                                     99.9th percentile latency |                                index-append |    1858.92 |     1756.26 |  -102.66 |      ms |
|                                    99.99th percentile latency |                                index-append |    2853.51 |     2383.23 | -470.275 |      ms |
|                                      100th percentile latency |                                index-append |    2880.77 |     2675.52 | -205.252 |      ms |
|                                  50th percentile service time |                                index-append |    234.193 |     208.972 | -25.2208 |      ms |
|                                  90th percentile service time |                                index-append |     359.03 |     339.721 | -19.3083 |      ms |
|                                  99th percentile service time |                                index-append |    1038.53 |     989.105 | -49.4283 |      ms |
|                                99.9th percentile service time |                                index-append |    1858.92 |     1756.26 |  -102.66 |      ms |
|                               99.99th percentile service time |                                index-append |    2853.51 |     2383.23 | -470.275 |      ms |
|                                 100th percentile service time |                                index-append |    2880.77 |     2675.52 | -205.252 |      ms |
|                                                    error rate |                                index-append |          0 |           0 |        0 |       % |
|                                                Min Throughput |                                     default |    8.01299 |     8.01308 |    9e-05 |   ops/s |
|                                             Median Throughput |                                     default |    8.01403 |     8.01398 |   -6e-05 |   ops/s |
|                                                Max Throughput |                                     default |    8.01519 |     8.01524 |    5e-05 |   ops/s |
|                                       50th percentile latency |                                     default |    5.36415 |     5.00933 | -0.35483 |      ms |
|                                       90th percentile latency |                                     default |     5.6521 |     5.43838 | -0.21372 |      ms |
|                                       99th percentile latency |                                     default |    7.69529 |     5.78511 | -1.91017 |      ms |
|                                      100th percentile latency |                                     default |    8.21218 |     7.35293 | -0.85925 |      ms |
|                                  50th percentile service time |                                     default |    5.06884 |     4.70835 | -0.36049 |      ms |
|                                  90th percentile service time |                                     default |    5.35518 |     5.12218 | -0.23299 |      ms |
|                                  99th percentile service time |                                     default |    7.38002 |     5.47012 |  -1.9099 |      ms |
|                                 100th percentile service time |                                     default |    7.89408 |     7.03901 | -0.85508 |      ms |
|                                                    error rate |                                     default |          0 |           0 |        0 |       % |
|                                                Min Throughput |                                        term |    50.0332 |     50.0609 |  0.02773 |   ops/s |
|                                             Median Throughput |                                        term |    50.0368 |      50.061 |  0.02415 |   ops/s |
|                                                Max Throughput |                                        term |    50.0405 |      50.061 |  0.02057 |   ops/s |
|                                       50th percentile latency |                                        term |    12.0435 |     6.72208 | -5.32139 |      ms |
|                                       90th percentile latency |                                        term |    12.6216 |      7.9926 | -4.62904 |      ms |
|                                       99th percentile latency |                                        term |    16.6301 |     11.4386 | -5.19152 |      ms |
|                                      100th percentile latency |                                        term |    20.4803 |     11.9234 | -8.55687 |      ms |
|                                  50th percentile service time |                                        term |    11.8239 |     6.58407 |  -5.2398 |      ms |
|                                  90th percentile service time |                                        term |    12.3903 |     7.85069 |  -4.5396 |      ms |
|                                  99th percentile service time |                                        term |    16.4101 |     11.2991 | -5.11099 |      ms |
|                                 100th percentile service time |                                        term |    20.2541 |     11.7813 | -8.47277 |      ms |
|                                                    error rate |                                        term |          0 |           0 |        0 |       % |
|                                                Min Throughput |                                       range |    1.00494 |     1.00494 |    1e-05 |   ops/s |
|                                             Median Throughput |                                       range |    1.00658 |     1.00658 |    1e-05 |   ops/s |
|                                                Max Throughput |                                       range |    1.00983 |     1.00985 |    2e-05 |   ops/s |
|                                       50th percentile latency |                                       range |    16.4738 |     15.6901 |  -0.7837 |      ms |
|                                       90th percentile latency |                                       range |    17.1529 |      15.946 | -1.20687 |      ms |
|                                       99th percentile latency |                                       range |    19.1744 |     16.3289 | -2.84554 |      ms |
|                                      100th percentile latency |                                       range |    19.4498 |     19.1965 | -0.25334 |      ms |
|                                  50th percentile service time |                                       range |    15.2952 |     14.5162 |   -0.779 |      ms |
|                                  90th percentile service time |                                       range |    15.9749 |       14.76 | -1.21491 |      ms |
|                                  99th percentile service time |                                       range |    17.9938 |     15.1449 | -2.84887 |      ms |
|                                 100th percentile service time |                                       range |    18.2629 |     18.0331 | -0.22988 |      ms |
|                                                    error rate |                                       range |          0 |           0 |        0 |       % |
|                                                Min Throughput |                                  hourly_agg |   0.200472 |    0.200461 |   -1e-05 |   ops/s |
|                                             Median Throughput |                                  hourly_agg |   0.200628 |    0.200614 |   -1e-05 |   ops/s |
|                                                Max Throughput |                                  hourly_agg |   0.200941 |    0.200935 |   -1e-05 |   ops/s |
|                                       50th percentile latency |                                  hourly_agg |    2637.86 |     2671.83 |  33.9609 |      ms |
|                                       90th percentile latency |                                  hourly_agg |     2661.4 |     2707.89 |  46.4864 |      ms |
|                                       99th percentile latency |                                  hourly_agg |    2675.39 |     2725.14 |  49.7542 |      ms |
|                                      100th percentile latency |                                  hourly_agg |    2676.09 |     2730.02 |  53.9239 |      ms |
|                                  50th percentile service time |                                  hourly_agg |    2635.29 |     2669.32 |  34.0258 |      ms |
|                                  90th percentile service time |                                  hourly_agg |    2658.85 |     2705.36 |  46.5122 |      ms |
|                                  99th percentile service time |                                  hourly_agg |    2672.83 |     2722.64 |  49.8052 |      ms |
|                                 100th percentile service time |                                  hourly_agg |     2673.5 |     2727.54 |  54.0336 |      ms |
|                                                    error rate |                                  hourly_agg |          0 |           0 |        0 |       % |
|                                                Min Throughput |                                      scroll |    25.0149 |      25.014 | -0.00091 | pages/s |
|                                             Median Throughput |                                      scroll |    25.0315 |     25.0239 | -0.00756 | pages/s |
|                                                Max Throughput |                                      scroll |    25.1304 |     25.1191 | -0.01125 | pages/s |
|                                       50th percentile latency |                                      scroll |    777.278 |     810.028 |  32.7497 |      ms |
|                                       90th percentile latency |                                      scroll |    819.226 |     830.047 |  10.8214 |      ms |
|                                       99th percentile latency |                                      scroll |    829.322 |     844.551 |  15.2294 |      ms |
|                                      100th percentile latency |                                      scroll |    842.365 |     849.221 |  6.85621 |      ms |
|                                  50th percentile service time |                                      scroll |    776.894 |     809.656 |  32.7618 |      ms |
|                                  90th percentile service time |                                      scroll |    818.816 |     829.658 |   10.842 |      ms |
|                                  99th percentile service time |                                      scroll |    828.862 |     844.167 |  15.3056 |      ms |
|                                 100th percentile service time |                                      scroll |    841.959 |     848.848 |  6.88915 |      ms |
|                                                    error rate |                                      scroll |          0 |           0 |        0 |       % |
|                                                Min Throughput |                         desc_sort_timestamp |   0.501638 |    0.501645 |    1e-05 |   ops/s |
|                                             Median Throughput |                         desc_sort_timestamp |   0.501969 |    0.501972 |        0 |   ops/s |
|                                                Max Throughput |                         desc_sort_timestamp |   0.502457 |    0.502459 |        0 |   ops/s |
|                                       50th percentile latency |                         desc_sort_timestamp |    34.9101 |     32.3529 | -2.55717 |      ms |
|                                       90th percentile latency |                         desc_sort_timestamp |    36.8483 |     35.2537 | -1.59454 |      ms |
|                                       99th percentile latency |                         desc_sort_timestamp |    40.2969 |       37.66 | -2.63687 |      ms |
|                                      100th percentile latency |                         desc_sort_timestamp |    41.9122 |     38.7657 | -3.14649 |      ms |
|                                  50th percentile service time |                         desc_sort_timestamp |    32.7502 |     30.2327 | -2.51751 |      ms |
|                                  90th percentile service time |                         desc_sort_timestamp |    34.6973 |     33.0975 | -1.59983 |      ms |
|                                  99th percentile service time |                         desc_sort_timestamp |    38.1306 |     35.5072 | -2.62336 |      ms |
|                                 100th percentile service time |                         desc_sort_timestamp |    39.8577 |     38.3515 | -1.50618 |      ms |
|                                                    error rate |                         desc_sort_timestamp |          0 |           0 |        0 |       % |
|                                                Min Throughput |                          asc_sort_timestamp |    0.50162 |    0.501645 |    3e-05 |   ops/s |
|                                             Median Throughput |                          asc_sort_timestamp |   0.501942 |    0.501971 |    3e-05 |   ops/s |
|                                                Max Throughput |                          asc_sort_timestamp |    0.50241 |    0.502459 |    5e-05 |   ops/s |
|                                       50th percentile latency |                          asc_sort_timestamp |    62.5424 |     32.2299 | -30.3125 |      ms |
|                                       90th percentile latency |                          asc_sort_timestamp |    70.6123 |     34.3233 | -36.2889 |      ms |
|                                       99th percentile latency |                          asc_sort_timestamp |    73.8659 |     37.8268 | -36.0391 |      ms |
|                                      100th percentile latency |                          asc_sort_timestamp |    80.2303 |     39.4474 | -40.7829 |      ms |
|                                  50th percentile service time |                          asc_sort_timestamp |    60.4461 |     30.0738 | -30.3724 |      ms |
|                                  90th percentile service time |                          asc_sort_timestamp |    68.4823 |     32.1535 | -36.3288 |      ms |
|                                  99th percentile service time |                          asc_sort_timestamp |    71.7583 |     35.6573 | -36.1009 |      ms |
|                                 100th percentile service time |                          asc_sort_timestamp |    78.1952 |     37.2704 | -40.9248 |      ms |
|                                                    error rate |                          asc_sort_timestamp |          0 |           0 |        0 |       % |
|                                                Min Throughput | desc-sort-timestamp-after-force-merge-1-seg |   0.501474 |    0.501463 |   -1e-05 |   ops/s |
|                                             Median Throughput | desc-sort-timestamp-after-force-merge-1-seg |   0.501778 |    0.501825 |    5e-05 |   ops/s |
|                                                Max Throughput | desc-sort-timestamp-after-force-merge-1-seg |   0.502238 |    0.502286 |    5e-05 |   ops/s |
|                                       50th percentile latency | desc-sort-timestamp-after-force-merge-1-seg |    226.611 |     167.949 | -58.6622 |      ms |
|                                       90th percentile latency | desc-sort-timestamp-after-force-merge-1-seg |    236.714 |     174.961 | -61.7524 |      ms |
|                                       99th percentile latency | desc-sort-timestamp-after-force-merge-1-seg |    242.694 |     215.673 | -27.0209 |      ms |
|                                      100th percentile latency | desc-sort-timestamp-after-force-merge-1-seg |    244.922 |     267.275 |  22.3528 |      ms |
|                                  50th percentile service time | desc-sort-timestamp-after-force-merge-1-seg |    224.641 |     165.933 | -58.7081 |      ms |
|                                  90th percentile service time | desc-sort-timestamp-after-force-merge-1-seg |    234.722 |      172.93 | -61.7924 |      ms |
|                                  99th percentile service time | desc-sort-timestamp-after-force-merge-1-seg |    240.715 |     213.656 | -27.0583 |      ms |
|                                 100th percentile service time | desc-sort-timestamp-after-force-merge-1-seg |    242.962 |     265.244 |  22.2815 |      ms |
|                                                    error rate | desc-sort-timestamp-after-force-merge-1-seg |          0 |           0 |        0 |       % |
|                                                Min Throughput |  asc-sort-timestamp-after-force-merge-1-seg |   0.501561 |    0.501605 |    4e-05 |   ops/s |
|                                             Median Throughput |  asc-sort-timestamp-after-force-merge-1-seg |   0.501875 |    0.501926 |    5e-05 |   ops/s |
|                                                Max Throughput |  asc-sort-timestamp-after-force-merge-1-seg |   0.502342 |    0.502398 |    6e-05 |   ops/s |
|                                       50th percentile latency |  asc-sort-timestamp-after-force-merge-1-seg |    122.138 |     79.3329 | -42.8048 |      ms |
|                                       90th percentile latency |  asc-sort-timestamp-after-force-merge-1-seg |    129.641 |     83.0229 | -46.6184 |      ms |
|                                       99th percentile latency |  asc-sort-timestamp-after-force-merge-1-seg |     143.98 |     87.5347 | -56.4454 |      ms |
|                                      100th percentile latency |  asc-sort-timestamp-after-force-merge-1-seg |    145.287 |     87.9334 | -57.3537 |      ms |
|                                  50th percentile service time |  asc-sort-timestamp-after-force-merge-1-seg |    120.054 |     77.2313 | -42.8229 |      ms |
|                                  90th percentile service time |  asc-sort-timestamp-after-force-merge-1-seg |    127.579 |     80.8976 | -46.6814 |      ms |
|                                  99th percentile service time |  asc-sort-timestamp-after-force-merge-1-seg |    141.903 |      85.767 | -56.1359 |      ms |
|                                 100th percentile service time |  asc-sort-timestamp-after-force-merge-1-seg |    143.198 |     85.8129 |  -57.385 |      ms |
|                                                    error rate |  asc-sort-timestamp-after-force-merge-1-seg |          0 |           0 |        0 |       % |

nyc_taxis 1G heap G1GC vs Parallel:

|                                                        Metric |                Task |    Baseline |   Contender |     Diff |   Unit |
|--------------------------------------------------------------:|--------------------:|------------:|------------:|---------:|-------:|
|                    Cumulative indexing time of primary shards |                     |     237.357 |     208.174 | -29.1829 |    min |
|             Min cumulative indexing time across primary shard |                     |     237.357 |     208.174 | -29.1829 |    min |
|          Median cumulative indexing time across primary shard |                     |     237.357 |     208.174 | -29.1829 |    min |
|             Max cumulative indexing time across primary shard |                     |     237.357 |     208.174 | -29.1829 |    min |
|           Cumulative indexing throttle time of primary shards |                     |           0 |           0 |        0 |    min |
|    Min cumulative indexing throttle time across primary shard |                     |           0 |           0 |        0 |    min |
| Median cumulative indexing throttle time across primary shard |                     |           0 |           0 |        0 |    min |
|    Max cumulative indexing throttle time across primary shard |                     |           0 |           0 |        0 |    min |
|                       Cumulative merge time of primary shards |                     |     76.6012 |      73.989 | -2.61217 |    min |
|                      Cumulative merge count of primary shards |                     |         236 |         245 |        9 |        |
|                Min cumulative merge time across primary shard |                     |     76.6012 |      73.989 | -2.61217 |    min |
|             Median cumulative merge time across primary shard |                     |     76.6012 |      73.989 | -2.61217 |    min |
|                Max cumulative merge time across primary shard |                     |     76.6012 |      73.989 | -2.61217 |    min |
|              Cumulative merge throttle time of primary shards |                     |     16.5041 |     16.8715 |   0.3674 |    min |
|       Min cumulative merge throttle time across primary shard |                     |     16.5041 |     16.8715 |   0.3674 |    min |
|    Median cumulative merge throttle time across primary shard |                     |     16.5041 |     16.8715 |   0.3674 |    min |
|       Max cumulative merge throttle time across primary shard |                     |     16.5041 |     16.8715 |   0.3674 |    min |
|                     Cumulative refresh time of primary shards |                     |      1.4489 |     1.16713 | -0.28177 |    min |
|                    Cumulative refresh count of primary shards |                     |         105 |         104 |       -1 |        |
|              Min cumulative refresh time across primary shard |                     |      1.4489 |     1.16713 | -0.28177 |    min |
|           Median cumulative refresh time across primary shard |                     |      1.4489 |     1.16713 | -0.28177 |    min |
|              Max cumulative refresh time across primary shard |                     |      1.4489 |     1.16713 | -0.28177 |    min |
|                       Cumulative flush time of primary shards |                     |     2.11675 |     2.26642 |  0.14967 |    min |
|                      Cumulative flush count of primary shards |                     |          28 |          33 |        5 |        |
|                Min cumulative flush time across primary shard |                     |     2.11675 |     2.26642 |  0.14967 |    min |
|             Median cumulative flush time across primary shard |                     |     2.11675 |     2.26642 |  0.14967 |    min |
|                Max cumulative flush time across primary shard |                     |     2.11675 |     2.26642 |  0.14967 |    min |
|                                            Total Young Gen GC |                     |     151.055 |     101.125 |   -49.93 |      s |
|                                              Total Old Gen GC |                     |           0 |      44.269 |   44.269 |      s |
|                                                    Store size |                     |     25.3155 |     25.2885 | -0.02698 |     GB |
|                                                 Translog size |                     | 5.12227e-08 | 5.12227e-08 |        0 |     GB |
|                                        Heap used for segments |                     |      57.188 |     56.0933 | -1.09465 |     MB |
|                                      Heap used for doc values |                     |   0.0374985 |   0.0377731 |  0.00027 |     MB |
|                                           Heap used for terms |                     |     52.9576 |     51.8433 | -1.11429 |     MB |
|                                           Heap used for norms |                     |           0 |           0 |        0 |     MB |
|                                          Heap used for points |                     |           0 |           0 |        0 |     MB |
|                                   Heap used for stored fields |                     |     4.19288 |     4.21225 |  0.01937 |     MB |
|                                                 Segment count |                     |          32 |          34 |        2 |        |
|                                                Min Throughput |               index |       73110 |       81399 |  8289.03 | docs/s |
|                                             Median Throughput |               index |     76809.5 |     85367.3 |  8557.86 | docs/s |
|                                                Max Throughput |               index |     84554.2 |     94064.6 |  9510.38 | docs/s |
|                                       50th percentile latency |               index |     944.755 |     854.654 | -90.1011 |     ms |
|                                       90th percentile latency |               index |     1638.11 |     1517.59 | -120.521 |     ms |
|                                       99th percentile latency |               index |     3025.67 |     2792.63 |  -233.04 |     ms |
|                                     99.9th percentile latency |               index |     7393.48 |     5759.41 | -1634.06 |     ms |
|                                    99.99th percentile latency |               index |     11106.5 |      8920.5 | -2186.04 |     ms |
|                                      100th percentile latency |               index |     11211.6 |     9027.56 | -2184.04 |     ms |
|                                  50th percentile service time |               index |     944.755 |     854.654 | -90.1011 |     ms |
|                                  90th percentile service time |               index |     1638.11 |     1517.59 | -120.521 |     ms |
|                                  99th percentile service time |               index |     3025.67 |     2792.63 |  -233.04 |     ms |
|                                99.9th percentile service time |               index |     7393.48 |     5759.41 | -1634.06 |     ms |
|                               99.99th percentile service time |               index |     11106.5 |      8920.5 | -2186.04 |     ms |
|                                 100th percentile service time |               index |     11211.6 |     9027.56 | -2184.04 |     ms |
|                                                    error rate |               index |           0 |           0 |        0 |      % |
|                                                Min Throughput |             default |     3.01982 |     3.01976 |   -7e-05 |  ops/s |
|                                             Median Throughput |             default |     3.02937 |     3.02932 |   -5e-05 |  ops/s |
|                                                Max Throughput |             default |      3.0573 |     3.05632 | -0.00098 |  ops/s |
|                                       50th percentile latency |             default |      10.227 |     10.3453 |   0.1182 |     ms |
|                                       90th percentile latency |             default |        11.5 |      11.346 | -0.15395 |     ms |
|                                       99th percentile latency |             default |     13.8499 |     12.6521 | -1.19781 |     ms |
|                                      100th percentile latency |             default |     13.8753 |      13.911 |  0.03564 |     ms |
|                                  50th percentile service time |             default |     9.70619 |     9.82953 |  0.12333 |     ms |
|                                  90th percentile service time |             default |     10.9778 |     10.8312 | -0.14668 |     ms |
|                                  99th percentile service time |             default |     13.3356 |     12.1355 | -1.20017 |     ms |
|                                 100th percentile service time |             default |     13.3526 |     13.4993 |  0.14666 |     ms |
|                                                    error rate |             default |           0 |           0 |        0 |      % |
|                                                Min Throughput |               range |     1.00262 |     1.00283 |  0.00021 |  ops/s |
|                                             Median Throughput |               range |     1.00393 |     1.00427 |  0.00034 |  ops/s |
|                                                Max Throughput |               range |     1.00876 |     1.00839 | -0.00037 |  ops/s |
|                                       50th percentile latency |               range |     609.087 |     573.153 | -35.9342 |     ms |
|                                       90th percentile latency |               range |     613.388 |     580.881 | -32.5071 |     ms |
|                                       99th percentile latency |               range |     637.052 |     586.129 | -50.9224 |     ms |
|                                      100th percentile latency |               range |     642.315 |     608.689 | -33.6253 |     ms |
|                                  50th percentile service time |               range |     608.499 |     572.526 | -35.9734 |     ms |
|                                  90th percentile service time |               range |     612.795 |     580.258 | -32.5368 |     ms |
|                                  99th percentile service time |               range |     636.462 |     585.509 | -50.9529 |     ms |
|                                 100th percentile service time |               range |     641.723 |     608.066 |  -33.657 |     ms |
|                                                    error rate |               range |           0 |           0 |        0 |      % |
|                                                Min Throughput | distance_amount_agg |      2.0134 |     2.01334 |   -6e-05 |  ops/s |
|                                             Median Throughput | distance_amount_agg |     2.01994 |     2.01995 |    1e-05 |  ops/s |
|                                                Max Throughput | distance_amount_agg |     2.03945 |     2.03944 |   -1e-05 |  ops/s |
|                                       50th percentile latency | distance_amount_agg |     6.47976 |     6.39307 | -0.08669 |     ms |
|                                       90th percentile latency | distance_amount_agg |     6.94396 |     6.70241 | -0.24155 |     ms |
|                                       99th percentile latency | distance_amount_agg |     7.78218 |     7.41149 | -0.37069 |     ms |
|                                      100th percentile latency | distance_amount_agg |     7.79633 |     11.7932 |  3.99688 |     ms |
|                                  50th percentile service time | distance_amount_agg |     5.79209 |     5.70696 | -0.08513 |     ms |
|                                  90th percentile service time | distance_amount_agg |       6.257 |     6.01192 | -0.24508 |     ms |
|                                  99th percentile service time | distance_amount_agg |     7.09178 |     6.71757 | -0.37421 |     ms |
|                                 100th percentile service time | distance_amount_agg |     7.11111 |     11.1103 |  3.99915 |     ms |
|                                                    error rate | distance_amount_agg |           0 |           0 |        0 |      % |
|                                                Min Throughput |       autohisto_agg |     1.50269 |     1.50184 | -0.00085 |  ops/s |
|                                             Median Throughput |       autohisto_agg |     1.50467 |     1.50336 | -0.00131 |  ops/s |
|                                                Max Throughput |       autohisto_agg |     1.50976 |     1.50661 | -0.00315 |  ops/s |
|                                       50th percentile latency |       autohisto_agg |     446.781 |     517.428 |  70.6471 |     ms |
|                                       90th percentile latency |       autohisto_agg |     481.706 |     541.319 |  59.6129 |     ms |
|                                       99th percentile latency |       autohisto_agg |     492.219 |     558.418 |  66.1988 |     ms |
|                                      100th percentile latency |       autohisto_agg |     493.117 |     560.282 |  67.1646 |     ms |
|                                  50th percentile service time |       autohisto_agg |      446.39 |     517.088 |  70.6973 |     ms |
|                                  90th percentile service time |       autohisto_agg |     481.297 |     540.975 |   59.678 |     ms |
|                                  99th percentile service time |       autohisto_agg |     491.891 |     558.076 |  66.1854 |     ms |
|                                 100th percentile service time |       autohisto_agg |       492.7 |     559.941 |  67.2413 |     ms |
|                                                    error rate |       autohisto_agg |           0 |           0 |        0 |      % |
|                                                Min Throughput |  date_histogram_agg |     1.50311 |     1.50189 | -0.00122 |  ops/s |
|                                             Median Throughput |  date_histogram_agg |     1.50456 |     1.50343 | -0.00113 |  ops/s |
|                                                Max Throughput |  date_histogram_agg |     1.50859 |     1.50718 | -0.00141 |  ops/s |
|                                       50th percentile latency |  date_histogram_agg |     455.839 |     507.236 |  51.3972 |     ms |
|                                       90th percentile latency |  date_histogram_agg |     491.334 |     540.334 |  48.9998 |     ms |
|                                       99th percentile latency |  date_histogram_agg |      497.47 |     547.114 |  49.6444 |     ms |
|                                      100th percentile latency |  date_histogram_agg |     497.498 |      549.48 |  51.9821 |     ms |
|                                  50th percentile service time |  date_histogram_agg |     455.453 |     506.898 |  51.4447 |     ms |
|                                  90th percentile service time |  date_histogram_agg |     490.931 |     539.977 |  49.0459 |     ms |
|                                  99th percentile service time |  date_histogram_agg |     497.062 |     546.891 |  49.8297 |     ms |
|                                 100th percentile service time |  date_histogram_agg |     497.082 |     549.136 |  52.0533 |     ms |
|                                                    error rate |  date_histogram_agg |           0 |           0 |        0 |      % |

ebadyano · 2019-12-12T20:13:37Z

1G heap experiments:
http_logs Parallel GC vs G1GC indexing throughout is ~4% higher, for most queries latency is the same, except *force-merge-1-seg queries latency is 35% better.
nyc_taxes queries the same more or less, indexing throughput is 11% better.
It seems Parallel GC shows advantage vs G1GC on smaller heaps, but even with indices.breaker.total.limit: 99.9% we still sometimes trigger circuit breaker with Parallel GC while not with G1GC. @danielmitterdorfer Do we want to investigate further if we want to use Parallel on smaller heaps?

danielmitterdorfer · 2019-12-13T08:33:01Z

@danielmitterdorfer Do we want to investigate further if we want to use Parallel on smaller heaps?

Thanks for doing these experiments Evgenia! IMHO we should be good for now with what we have and we can tackle this question separately at a later point. I'd expect that ParallelGC would require us to investigate thoroughly also on other fronts to ensure cluster stability as this collector is working quite differently than CMS and G1 (it's blocking instead of concurrent).

Relates to #46973

Relates to elastic#46973

) Relates to #46973 Co-authored-by: Evgenia Badyanova <evgenia.badiyanova@elastic.co>

Relates to #46973

Relates to elastic/elasticsearch#46973

danielmitterdorfer added >enhancement :Core/Infra/Core Core issues without another label labels Sep 23, 2019

danielmitterdorfer added the Meta label Sep 23, 2019

danielmitterdorfer mentioned this issue Oct 9, 2019

Add network stats telemetry device elastic/rally#786

Closed

ebadyano self-assigned this Oct 9, 2019

jasontedor mentioned this issue Oct 18, 2019

JDK 14 and the removal of CMS #48242

Closed

jasontedor mentioned this issue Nov 14, 2019

Restrict support for CMS to pre-JDK 14 #49123

Merged

ebadyano mentioned this issue Dec 31, 2019

[DOCS] Update reference documentation that mentions CMS #50542

Merged

ebadyano added a commit that referenced this issue Jan 7, 2020

[DOCS] Update reference documentation that mentions CMS (#50542)

bb736f7

Relates to #46973

ebadyano closed this as completed Jan 8, 2020

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020

[DOCS] Update reference documentation that mentions CMS (elastic#50542)

c241318

Relates to elastic#46973

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jrodewig added a commit that referenced this issue Dec 2, 2020

[DOCS] Update reference documentation that mentions CMS (#50542) (#65733

ea7d8d1

) Relates to #46973 Co-authored-by: Evgenia Badyanova <evgenia.badiyanova@elastic.co>

jrodewig added a commit that referenced this issue Dec 2, 2020

[DOCS] Update reference documentation that mentions CMS (#50542) (#65734

b829176

) Relates to #46973 Co-authored-by: Evgenia Badyanova <evgenia.badiyanova@elastic.co>

jrodewig pushed a commit that referenced this issue Dec 2, 2020

[DOCS] Update reference documentation that mentions CMS (#50542)

0df3a49

Relates to #46973

jrodewig pushed a commit that referenced this issue Dec 2, 2020

[DOCS] Update reference documentation that mentions CMS (#50542)

f450b7a

Relates to #46973

jrodewig pushed a commit that referenced this issue Dec 2, 2020

[DOCS] Update reference documentation that mentions CMS (#50542)

6c833f1

Relates to #46973

2lambda123 pushed a commit to 2lambda123/elastic-elasticsearch that referenced this issue May 2, 2024

[DOCS] Update reference documentation that mentions CMS

33246b3

Relates to elastic/elasticsearch#46973

2lambda123 pushed a commit to 2lambda123/elastic-elasticsearch that referenced this issue May 3, 2024

[DOCS] Update reference documentation that mentions CMS (#50542)

96df984

Relates to elastic/elasticsearch#46973

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare for removal of the CMS garbage collector #46973

Prepare for removal of the CMS garbage collector #46973

danielmitterdorfer commented Sep 23, 2019 •

edited by ebadyano

Loading

elasticmachine commented Sep 23, 2019

ebadyano commented Oct 24, 2019 •

edited

Loading

ebadyano commented Nov 18, 2019 •

edited

Loading

ebadyano commented Dec 6, 2019

ebadyano commented Dec 6, 2019 •

edited

Loading

ebadyano commented Dec 12, 2019

danielmitterdorfer commented Dec 13, 2019

Prepare for removal of the CMS garbage collector #46973

Prepare for removal of the CMS garbage collector #46973

Comments

danielmitterdorfer commented Sep 23, 2019 • edited by ebadyano Loading

Context

Prior work

Tasks

elasticmachine commented Sep 23, 2019

ebadyano commented Oct 24, 2019 • edited Loading

ebadyano commented Nov 18, 2019 • edited Loading

ebadyano commented Dec 6, 2019

ebadyano commented Dec 6, 2019 • edited Loading

ebadyano commented Dec 12, 2019

danielmitterdorfer commented Dec 13, 2019

danielmitterdorfer commented Sep 23, 2019 •

edited by ebadyano

Loading

ebadyano commented Oct 24, 2019 •

edited

Loading

ebadyano commented Nov 18, 2019 •

edited

Loading

ebadyano commented Dec 6, 2019 •

edited

Loading