Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore date aggregation performance in UTC case #38221

Merged
merged 29 commits into from Feb 11, 2019

Conversation

Projects
None yet
6 participants
@spinscale
Copy link
Member

commented Feb 1, 2019

The nightly benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826

Benchmark comparison (on my notebook):

Before:

RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   30  140,989 ± 14,582  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   30   11,981 ±  0,066  ns/op

final results comparing against the joda implementations (all benchmarks taken on my osx notebook, so take with a grain of salt)

Benchmark                                                 Mode  Cnt    Score    Error  Units
RoundingBenchmark.timeIntervalRoundingJava                avgt   30  226,556 ±  1,931  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   30  328,073 ±  2,480  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   30   88,096 ±  1,887  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   30  358,745 ± 22,062  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   30  133,689 ±  0,774  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   30  357,008 ± 19,957  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   30   11,695 ±  0,040  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   30   11,905 ±  0,056  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   30    3,189 ±  0,021  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   30    5,465 ±  0,026  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   30    4,509 ±  0,059  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   30   31,511 ±  0,414  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   30    2,991 ±  0,010  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   30    6,456 ±  0,040  ns/op

I also added a duelling test class that runs against the old implementation to see if there are any inconsistencies. I ran this class a couple of million times on my linux without failure.

Restore date aggregation performance in UTC case
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826
@elasticmachine

This comment has been minimized.

Copy link

commented Feb 1, 2019

@spinscale spinscale added the WIP label Feb 2, 2019

spinscale added some commits Feb 2, 2019

Add more tests
current results show slowdowns in quarter of year and year of century
parsing, the rest is faster/similar in this case.

RoundingBenchmark.timeIntervalRoundingJava                avgt   10  227,639 ± 12,084  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   10  329,288 ± 15,914  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   10   87,215 ±  5,923  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   10  370,219 ± 73,419  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   10  135,884 ±  3,506  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   10  327,793 ± 27,550  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   10   11,724 ±  0,266  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   10   11,785 ±  0,069  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   10   57,007 ±  1,023  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   10   56,605 ±  0,725  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   10   57,992 ±  1,289  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   10   29,979 ±  0,574  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   10   56,557 ±  0,684  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   10    6,344 ±  0,035  ns/op
Speed up year of century parsing.
Use joda time code which solely uses milliseconds and never converts to
any other joda time field, thus being super fast.

Benchmark                                               Mode  Cnt  Score   Error  Units
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava  avgt   10  3,378 ± 0,036  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda  avgt   10  6,460 ± 0,027  ns/op
Bring quarter of year for UTC case up to speed again
Benchmark                                                 Mode  Cnt    Score    Error  Units
RoundingBenchmark.timeIntervalRoundingJava                avgt   30  223,906 ±  1,248  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   30  337,049 ± 14,682  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   30   85,784 ±  1,719  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   30  364,764 ± 19,697  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   30  133,247 ±  1,022  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   30  356,372 ± 17,340  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   30   11,778 ±  0,044  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   30   11,922 ±  0,060  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   30   52,529 ±  3,865  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   30   61,306 ±  1,581  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   30   22,152 ±  0,164  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   30   30,734 ±  0,099  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   30    3,352 ±  0,021  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   30    6,453 ±  0,072  ns/op

@spinscale spinscale removed the WIP label Feb 2, 2019

spinscale added some commits Feb 2, 2019

Fix month of year benchmark, where joda is still much faster
due to object creation in java time

Benchmark                                             Mode  Cnt   Score   Error  Units
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava  avgt   30  18,864 ± 0,636  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda  avgt   30   5,597 ± 0,117  ns/op
speed up month of year
now java is same or faster everywhere

Benchmark                                                 Mode  Cnt    Score   Error  Units
RoundingBenchmark.timeIntervalRoundingJava                avgt   10  228,960 ± 4,507  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   10  324,240 ± 1,546  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   10   87,284 ± 0,857  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   10  358,485 ± 4,671  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   10  137,124 ± 3,098  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   10  327,330 ± 2,473  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   10   11,772 ± 0,088  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   10   11,907 ± 0,132  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   10    3,538 ± 0,037  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   10    5,567 ± 0,145  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   10    4,872 ± 0,126  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   10   30,843 ± 2,215  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   10    3,382 ± 0,071  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   10    6,466 ± 0,065  ns/op

@spinscale spinscale force-pushed the spinscale:1902-fix-utc-aggs-performance branch from 9cfb69a to 59db0ae Feb 2, 2019

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Feb 3, 2019

rally run of nyc_taxis against a recent master commit (on a linux desktop machine)

|   All |                   Min Throughput |       autohisto_agg |      1.2 |  ops/s |
|   All |                Median Throughput |       autohisto_agg |     1.21 |  ops/s |
|   All |                   Max Throughput |       autohisto_agg |     1.22 |  ops/s |
|   All |          50th percentile latency |       autohisto_agg |  33052.1 |     ms |
|   All |          90th percentile latency |       autohisto_agg |  45437.1 |     ms |
|   All |          99th percentile latency |       autohisto_agg |  48204.9 |     ms |
|   All |         100th percentile latency |       autohisto_agg |  48514.8 |     ms |
|   All |     50th percentile service time |       autohisto_agg |  812.235 |     ms |
|   All |     90th percentile service time |       autohisto_agg |  816.315 |     ms |
|   All |     99th percentile service time |       autohisto_agg |  821.443 |     ms |
|   All |    100th percentile service time |       autohisto_agg |  821.872 |     ms |
|   All |                       error rate |       autohisto_agg |        0 |      % |
|   All |                   Min Throughput |  date_histogram_agg |     1.22 |  ops/s |
|   All |                Median Throughput |  date_histogram_agg |     1.22 |  ops/s |
|   All |                   Max Throughput |  date_histogram_agg |     1.23 |  ops/s |
|   All |          50th percentile latency |  date_histogram_agg |  32331.2 |     ms |
|   All |          90th percentile latency |  date_histogram_agg |  44873.6 |     ms |
|   All |          99th percentile latency |  date_histogram_agg |  47697.7 |     ms |
|   All |         100th percentile latency |  date_histogram_agg |  48008.8 |     ms |
|   All |     50th percentile service time |  date_histogram_agg |  816.826 |     ms |
|   All |     90th percentile service time |  date_histogram_agg |  822.268 |     ms |
|   All |     99th percentile service time |  date_histogram_agg |  828.293 |     ms |
|   All |    100th percentile service time |  date_histogram_agg |  830.532 |     ms |
|   All |                       error rate |  date_histogram_agg |        0 |      % |

same run against this branch

|   All |                   Min Throughput |       autohisto_agg |         2 |  ops/s |
|   All |                Median Throughput |       autohisto_agg |         2 |  ops/s |
|   All |                   Max Throughput |       autohisto_agg |         2 |  ops/s |
|   All |          50th percentile latency |       autohisto_agg |   466.499 |     ms |
|   All |          90th percentile latency |       autohisto_agg |   489.928 |     ms |
|   All |          99th percentile latency |       autohisto_agg |   503.417 |     ms |
|   All |         100th percentile latency |       autohisto_agg |    512.86 |     ms |
|   All |     50th percentile service time |       autohisto_agg |   466.408 |     ms |
|   All |     90th percentile service time |       autohisto_agg |   489.372 |     ms |
|   All |     99th percentile service time |       autohisto_agg |   503.324 |     ms |
|   All |    100th percentile service time |       autohisto_agg |   512.777 |     ms |
|   All |                       error rate |       autohisto_agg |         0 |      % |
|   All |                   Min Throughput |  date_histogram_agg |         2 |  ops/s |
|   All |                Median Throughput |  date_histogram_agg |         2 |  ops/s |
|   All |                   Max Throughput |  date_histogram_agg |         2 |  ops/s |
|   All |          50th percentile latency |  date_histogram_agg |    449.92 |     ms |
|   All |          90th percentile latency |  date_histogram_agg |   465.649 |     ms |
|   All |          99th percentile latency |  date_histogram_agg |    483.38 |     ms |
|   All |         100th percentile latency |  date_histogram_agg |   485.472 |     ms |
|   All |     50th percentile service time |  date_histogram_agg |   449.809 |     ms |
|   All |     90th percentile service time |  date_histogram_agg |    465.46 |     ms |
|   All |     99th percentile service time |  date_histogram_agg |   483.307 |     ms |
|   All |    100th percentile service time |  date_histogram_agg |   485.251 |     ms |
|   All |                       error rate |  date_histogram_agg |         0 |      % |
@danielmitterdorfer
Copy link
Member

left a comment

I left a few comments. The macrobenchmark results look promising indeed. I did not check the parts that you took from Joda time but I wonder whether we need to attribute this more clearly.

@@ -48,23 +54,13 @@
private final ZoneId zoneId = ZoneId.of("Europe/Amsterdam");
private final DateTimeZone timeZone = DateUtils.zoneIdToDateTimeZone(zoneId);

private final long timestamp = 1548879021354L;

This comment has been minimized.

Copy link
@danielmitterdorfer

danielmitterdorfer Feb 4, 2019

Member

Using a constant value (+ declaring it as final) might allow for additional compiler optimizations that are unrealistic. We also see that some of the results are very low. For example, timeUnitRoundingUtcYearOfCenturyJava takes just 3 nanoseconds which is only around 10 CPU cycles on a 3GHz core. Given the amount of work that the respective method is doing (it's basic arithmetic) it does not seem completely unreasonable but if we just consider the microbenchmark results this would still need a bit further investigation. However, the macrobenchmark results that you posted look fine and give me confidence that performance has improved substantially indeed.

@@ -76,6 +108,15 @@ public static String format(long epochMillis) {
this.field = field;
}

/**
* This rounds down the supplied milliseconds since the epoch down to the next unit. In order to retain performance this method
* should be as fast as possiblee and not try to convert dates to java-time objects if possible

This comment has been minimized.

Copy link
@danielmitterdorfer

danielmitterdorfer Feb 4, 2019

Member

Nit: typo possiblee -> possible.

@@ -77,4 +78,224 @@ public static ZoneId of(String zoneId) {
}
return ZoneId.of(zoneId).normalized();
}

/*
* begin of code that is partially copied from the joda time implementation in order to make calculations about utc rounding much

This comment has been minimized.

Copy link
@danielmitterdorfer

danielmitterdorfer Feb 4, 2019

Member

Do we need to attribute this even more clearly?

@DaveCTurner
Copy link
Contributor

left a comment

I left mostly suggestions about the names of things. I think we should have more thorough tests for the copied code, particularly around the boundaries (e.g. month boundaries ±1ms) and with negative numbers.

* @param utcMillis the milliseconds since the epoch
* @return The milliseconds since the epoch rounded down to the beginning of the year
*/
public static long getFirstDayOfYearMillis(long utcMillis) {

This comment has been minimized.

Copy link
@DaveCTurner

DaveCTurner Feb 4, 2019

Contributor

Maybe roundYear() to align with the naming convention above?

: ((i < 304 * 84375) ? 10 : (i < 334 * 84375) ? 11 : 12)));
}

private static long getYearMillis(int year) {

This comment has been minimized.

Copy link
@DaveCTurner

DaveCTurner Feb 4, 2019

Contributor

I think this can be inlined?

private static final long APPROX_MILLIS_AT_EPOCH_DIVIDED_BY_TWO = (1970L * MILLIS_PER_YEAR) / 2;

// see org.joda.time.chrono.BasicChronology
private static int getYear(long instant) {

This comment has been minimized.

Copy link
@DaveCTurner

DaveCTurner Feb 4, 2019

Contributor

Can we rename instant to utcMillis to align with the arguments elsewhere and to clarify what it means?

}

// see org.joda.time.chrono.BasicGJChronology
private static int getMonthOfYear(long millis, int year) {

This comment has been minimized.

Copy link
@DaveCTurner

DaveCTurner Feb 4, 2019

Contributor

I think millis here is also a utcMillis.

public static long roundQuarterOfYear(long utcMillis) {
int year = DateUtils.getYear(utcMillis);
int month = DateUtils.getMonthOfYear(utcMillis, year);
return DateUtils.of(year, Month.of(month).firstMonthOfQuarter().getValue());

This comment has been minimized.

Copy link
@DaveCTurner

DaveCTurner Feb 4, 2019

Contributor

It looks like this constructs a Month object simply to round it down to a multiple of 3. I don't think it loses much clarity to do this by hand instead.

* @return the milliseconds since the epoch of the first of january at midnight of the specified year
*/
// see org.joda.time.chrono.GregorianChronology
private static long calculateFirstDayOfYearMillis(int year) {

This comment has been minimized.

Copy link
@DaveCTurner

DaveCTurner Feb 4, 2019

Contributor

I find the mention of Day in this method's name to be a little confusing. Maybe utcMillisAtStartOfYear?

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Feb 5, 2019

@elasticmachine retest this please

@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019

@spinscale spinscale merged commit b955220 into elastic:master Feb 11, 2019

7 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

spinscale added a commit to spinscale/elasticsearch that referenced this pull request Feb 11, 2019

Restore date aggregation performance in UTC case (elastic#38221)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826

spinscale added a commit to spinscale/elasticsearch that referenced this pull request Feb 11, 2019

Restore date aggregation performance in UTC case (elastic#38221)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826

spinscale added a commit to spinscale/elasticsearch that referenced this pull request Feb 11, 2019

Restore date aggregation performance in UTC case (elastic#38221)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826

spinscale added a commit to spinscale/elasticsearch that referenced this pull request Feb 11, 2019

Restore date aggregation performance in UTC case (elastic#38221)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826

spinscale added a commit that referenced this pull request Feb 11, 2019

Restore date aggregation performance in UTC case (#38221) (#38700)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826

spinscale added a commit that referenced this pull request Feb 11, 2019

Restore date aggregation performance in UTC case (#38221) (#38701)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826

@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 12, 2019

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 12, 2019

Restore date aggregation performance in UTC case (elastic#38221)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.