Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TimeUnitRounding for hour, minute and second units #18415

Merged
merged 1 commit into from May 18, 2016

Conversation

Projects
None yet
5 participants
@cbuescher
Copy link
Member

commented May 17, 2016

Currently, rounding intervals obtained by TimeUnitRounding#nextRoundingValue() for hour, minute and second time units can include an extra hour when
overlapping a DST transitions that adds an extra hour (eg CEST -> CET). This changes the rounding logic for time units smaller or equal to an hour to fix this.

Internally when computing nextRoundingValue() we convert to time local to the time zone unsed for rounding before incrementing by one unit. This is necessary for day units and
larger to make sure we again arive at a timestamp that is rounded according to the timezone/unit. When crossing DST changes, this might for example add 23h, or 25h to arive
at a start-of-day date, so in this case the differences between values obtained by repeatedly calling #nextRoundingValue() is not supposed to always we the same.

For time units equal or smaller to an hour we can ommit the conversion to local time and directly add the DurtionField, since we are sure to land again on a rounded date.

Closes #18326

@cbuescher

This comment has been minimized.

Copy link
Member Author

commented May 17, 2016

@jpountz could you take a look at this please?

@cbuescher

This comment has been minimized.

Copy link
Member Author

commented May 17, 2016

So with this fix in place, we get your buckets for the CEST -> CET transition that only resulted in three buckets before (see #18326) and the keys are all 1h apart:

DELETE /_all

PUT /test/type/1
{
  "dateField" : "2000-01-01"
}

GET /test/type/_search
{
  "query": {
    "match_none": {}
  },
  "aggs": {
    "events_by_date": {
      "date_histogram": {
        "field": "dateField",
        "interval": "1h",
        "time_zone": "Europe/Oslo",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2015-10-25T02:00:00.000+02:00",
          "max": "2015-10-25T04:00:00.000+01:00"
        }
      }
    }
  }
}
      "buckets": [
        {
          "key_as_string": "2015-10-25T02:00:00.000+02:00",
          "key": 1445731200000,
          "doc_count": 0
        },
        {
          "key_as_string": "2015-10-25T02:00:00.000+01:00",
          "key": 1445734800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2015-10-25T03:00:00.000+01:00",
          "key": 1445738400000,
          "doc_count": 0
        },
        {
          "key_as_string": "2015-10-25T04:00:00.000+01:00",
          "key": 1445742000000,
          "doc_count": 0
        }
      ]
    }
@cbuescher

This comment has been minimized.

Copy link
Member Author

commented May 17, 2016

This should probably also be ported back to the 2.x branch.

@jpountz

View changes

core/src/main/java/org/elasticsearch/common/rounding/TimeZoneRounding.java Outdated

private boolean unitIsDayOrLonger() {
return (unit == DateTimeUnit.HOUR_OF_DAY || unit == DateTimeUnit.MINUTES_OF_HOUR
|| unit == DateTimeUnit.SECOND_OF_MINUTE) == false;

This comment has been minimized.

Copy link
@jpountz

jpountz May 18, 2016

Contributor

maybe we could do something like unit.field().add(0, 1) >= number_of_millis_in_a_day?

This comment has been minimized.

Copy link
@cbuescher

cbuescher May 18, 2016

Author Member

I'm not sure we should to do any arithmetic only to check what kind of enum we have. There are only eight options, checking if any of the three are set seems clearer to me, wdyt?

This comment has been minimized.

Copy link
@jpountz

jpountz May 18, 2016

Contributor

I was only worried that we might add options in the future and forget to add them here.

@jpountz

This comment has been minimized.

Copy link
Contributor

commented May 18, 2016

It looks good to me. I just left a suggestion about a potentially better way to check that the interval is greater than a day.

@cbuescher cbuescher force-pushed the cbuescher:fix-timezone-18326 branch 2 times, most recently May 18, 2016

@cbuescher

This comment has been minimized.

Copy link
Member Author

commented May 18, 2016

@jpountz thanks, I moved the duration check for the DateTimeUnit into the enum class and added tests for it. This way we shouldn't miss anything if we happen to add new constants. Mind to take another look to see if this is okay?

@jpountz

This comment has been minimized.

Copy link
Contributor

commented May 18, 2016

LGTM

@cbuescher cbuescher force-pushed the cbuescher:fix-timezone-18326 branch May 18, 2016

Fix TimeZoneRounding#nextRoundingValue for hour, minute and second units
Currently rounding intervals obtained by nextRoundingValue() for hour, minute and
second units can include an extra hour when happening at DST transitions that add
an extra hour (eg CEST -> CET). This changes the rounding logic for time units
smaller or equal to an hour to fix this.

Closes #18326

@cbuescher cbuescher force-pushed the cbuescher:fix-timezone-18326 branch to 7c665a0 May 18, 2016

@cbuescher cbuescher merged commit 7c665a0 into elastic:master May 18, 2016

@cbuescher

This comment has been minimized.

Copy link
Member Author

commented May 18, 2016

@jpountz thanks, I think this should be ported back to the 2.x branch, wdyt?

@jpountz

This comment has been minimized.

Copy link
Contributor

commented May 18, 2016

@cbuescher cbuescher assigned cbuescher and unassigned jpountz May 18, 2016

@cbuescher

This comment has been minimized.

Copy link
Member Author

commented May 18, 2016

Merged with 2.x via 8a1d765

@mattdawson

This comment has been minimized.

Copy link

commented May 22, 2016

I honestly don't think this is what you want. Not all timezones change by 1hr, some change by 30m (Lord Howe Island). If I want to guarantee 24 buckets in day I would use GMT for the query, and post convert the times. The only conistent answer is to generate a 2hr or 1.5hr interval bucket in the result.

@dkulichkin

This comment has been minimized.

Copy link

commented Nov 1, 2016

Guys, what is the last status with this DST issue again for date histograms? It seems like version 2.3.2 which is supposed to have all the fixes already on board still has the issue, because of the recent winter daylight shift I started to get recent day-based buckets like following (Amsterdam offset):

request:

 {
  "date_histogram": {
    "field": "ts",
    "interval": "day",
    "time_zone": "+01:00",
    "min_doc_count": 0,
    "extended_bounds": {
        "min": 1477346400000,
        "max": 1477995604417
    }
  }
}

Gives the following key_as_string converted to javascript Date objects as:
2016-10-30T00:00:00.000+01:00 => Sun Oct 30 2016 01:00:00 GMT+0200 (CEST)
2016-10-31T00:00:00.000+01:00 => Tue Oct 31 2016 00:00:00 GMT+0100 (CET)

This makes for instance a problem of showing bullets before Noc 1st on google chats - them all are shown shifted to 1 am values

@cbuescher

This comment has been minimized.

Copy link
Member Author

commented Nov 1, 2016

"time_zone": "+01:00"

You are using a fixed time zone offset here, no DST involved whatsoever. Try using "Europe/Amsterdam". Small example using ES 2.4.1:

PUT /test/type/1
{
  "dateField" : "2016-11-01"
}

GET /test/type/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "events_by_date": {
      "date_histogram": {
        "field": "dateField",
        "interval": "day",
        "time_zone": "Europe/Amsterdam",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2016-10-30T02:00:00.000+02:00",
          "max": "2016-11-03T04:00:00.000+01:00"
        }
      }
    }
  }
}

Response:

"buckets": [
        {
          "key_as_string": "2016-10-30T00:00:00.000+02:00",
          "key": 1477778400000,
          "doc_count": 0
        },
        {
          "key_as_string": "2016-10-31T00:00:00.000+01:00",
          "key": 1477868400000,
          "doc_count": 0
        },
        {
          "key_as_string": "2016-11-01T00:00:00.000+01:00",
          "key": 1477954800000,
          "doc_count": 1
        },
        {
          "key_as_string": "2016-11-02T00:00:00.000+01:00",
          "key": 1478041200000,
          "doc_count": 0
        },
        {
          "key_as_string": "2016-11-03T00:00:00.000+01:00",
          "key": 1478127600000,
          "doc_count": 0
        }
      ]

Note that Sunday 2016-10-30 has 25h, thats because of the added hour when going to CET.

@dkulichkin

This comment has been minimized.

Copy link

commented Nov 1, 2016

@cbuescher thanks a lot, I relied on moment-timezone's solution to get it determined in the browser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.