Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date Histogram with extended bounds misses daylight savings date when no records matched #18310

Closed
mattdawson opened this issue May 13, 2016 · 8 comments

Comments

@mattdawson
Copy link

mattdawson commented May 13, 2016

Elasticsearch version: 2.3.2

JVM version: 1.8.0_40-b25

OS version: Windows 7 Home Premium

Description of the problem including expected versus actual behavior:
Doing histo query over range with no matched data with extended bounds returns only 1 bucket on the daylight savings cross over for 2 hours. When this is done with matching data you get 2 buckets, 1 for pre-dst one for post dst both with same formatted time (0300hrs). It should consistently return 2 buckets or 1 bucket at crossover, which ever is correct.

Steps to reproduce:
Do a histo query with extended bounds on an index over a date range with no data over a daylight savings boundary. Then do one with data and compare results around April 3rd 3am, 2016

e.g.

curl -XPOST 'localhost:9200/read_5555605bab95c2765d65c3cc_201601/_search' -d '
{
   "query":{
      "filtered":{
         "filter":{
            "bool":{
               "must":[
           {
            "term": {
                "p":"doesntexist"
            }
          }
               ]
            }
         }
      }
   },
   "aggs":{
    "events_by_date":{
       "date_histogram":{
          "field":"tsr",
          "interval":"1h",
          "time_zone":"Pacific/Auckland",
          "min_doc_count":0,
          "extended_bounds":{
             "min":"2016-03-31T11:00:00.000Z",
             "max":"2016-05-02T12:00:00.000Z"
          }
       }
    }
    }
}'
@clintongormley
Copy link

@colings86 please could you take a look at this?

@mattdawson
Copy link
Author

mattdawson commented May 15, 2016

Looking deeper in to this I realise that the intention is for one bucket for the hour covering the DST change over.
Essentially I have time series data with tsr = timestamp and tsrf = timestamp formatted, all stored against a channel denoted by 'c', with a sequence id of 'k', where if the data becomes non contiguous for some reason the 'k' value gets incremented.
I then do terms agg plus a nested histo query over the data and merge multiple streams into one.
But, at change over is happens.. A dual 2am record for the second term aggregation where k=1
Look for **** in the log to see the double.

...
rowk1.tsrf==rowk2.tsrf? true rowk1.tsrf= 03/04/2016 01:00 rowk2.tsrf= 03/04/2016 01:00 rowk1.k= undefined rowk2.k= 1
rowk1.tsrf==rowk2.tsrf? true rowk1.tsrf= 03/04/2016 02:00 rowk2.tsrf= 03/04/2016 **** 02:00 **** rowk1.k= undefined rowk2.k= 1
rowk1.tsrf==rowk2.tsrf? false rowk1.tsrf= 03/04/2016 03:00 rowk2.tsrf= 03/04/2016 **** 02:00 ****  rowk1.k= undefined rowk2.k= 1
rowk1.tsrf==rowk2.tsrf? false rowk1.tsrf= 03/04/2016 04:00 rowk2.tsrf= 03/04/2016 03:00 rowk1.k= undefined rowk2.k= 1
...
rowk1.tsrf==rowk2.tsrf? false rowk1.tsrf= 02/05/2016 13:00 rowk2.tsrf= 02/05/2016 12:00 rowk1.k= undefined rowk2.k= 1
rowk1.tsrf==rowk2.tsrf? false rowk1.tsrf= 02/05/2016 14:00 rowk2.tsrf= 02/05/2016 13:00 rowk1.k= 0 rowk2.k= 1
rowk1.tsrf==rowk2.tsrf? false rowk1.tsrf= 02/05/2016 15:00 rowk2.tsrf= 02/05/2016 14:00 rowk1.k= 0 rowk2.k= 1
rowk1.tsrf==rowk2.tsrf? false rowk1.tsrf= 02/05/2016 16:00 rowk2.tsrf= 02/05/2016 15:00 rowk1.k= 0 rowk2.k= undefined
...

What you see here is a two streams in a channel, one from 31/3/2016 11:00 to 02/05/2016 14:00, and one from 02/05/2016 13:00 to 02/05/2016 14:00:00.
I think the bug only occurs when there are two streams and thus two buckets for k values(and thus two date_histos).

Heres the full query I execute.

{
        "index": "read_5555605bab95c2765d65c3cc_201603,read_5555605bab95c2765d65c3cc_201604,read_5555605bab95c2765d65c3cc_201605",
        "type": "reading",
        "ignoreUnavailable": true,
        "allowNoIndices": true,
        "size": 0
}
{
        "query": {
                "filtered": {
                        "filter": {
                                "bool": {
                                        "must": [
                                                {
                                                        "term": {
                                                                "c": "57314b3f361be1ce3c55110d"
                                                        }
                                                },
                                                {
                                                        "term": {
                                                                "p": "energy_active_sum"
                                                        }
                                                },
                                                {
                                                        "range": {
                                                                "tsr": {
                                                                        "gte": "2016-03-30T11:00:00.000Z",
                                                                        "lt": "2016-05-02T12:00:00.000Z"
                                                                }
                                                        }
                                                }
                                        ]
                                }
                        }
                }
        },
        "aggs": {
                "by_k": {
                        "terms": {
                                "field": "k",
                                "size": 0,
                                "order": {
                                        "_term": "asc"
                                }
                        },
                        "aggs": {
                                "events_by_date": {
                                        "date_histogram": {
                                                "field": "tsr",
                                                "interval": "1h",
                                                "time_zone": "Pacific/Auckland",
                                                "min_doc_count": 0,
                                                "extended_bounds": {
                                                        "min": "2016-03-30T11:00:00.000Z",
                                                        "max": "2016-05-02T12:00:00.000Z"
                                                },
                                                "order": {
                                                        "_key": "asc"
                                                }
                                        },
                                        "aggs": {
                                                "maxtsr": {
                                                        "max": {
                                                                "field": "tsr"
                                                        }
                                                },
                                                "mintsr": {
                                                        "min": {
                                                                "field": "tsr"
                                                        }
                                                },
                                                "maxvd": {
                                                        "max": {
                                                                "field": "vd"
                                                        }
                                                },
                                                "minvd": {
                                                        "min": {
                                                                "field": "vd"
                                                        }
                                                }
                                        }
                                }
                        }
                }
        }
}

@mattdawson
Copy link
Author

Further thought leads me to think the k=0 stream is correct because it's dst values were created by extended bounds, and the second failed because it actually had data.

@cbuescher
Copy link
Member

@mattdawson can you take a look at #18326 and the fix provided in #18415 to see if this solves your problem? I have looked at the queries you are using and this looks very similar, however I'm not fully able to understand your use case or check this with the patch from #18415 without a better understanding of your data. If looking at the two issues I linked to doesn't help, can you provide a minimal example with a few datapoints?

@mattdawson
Copy link
Author

mattdawson commented May 20, 2016

@cbuescher, The code change appears to add one non-DST adjusted unit inside TimeUnitRounding .nextRoundingValue for intervals less than 1 day. Although the outcome is a consistent dual bucket for both extended and non-extended result sets, I believe it's the wrong approach. What you want is a single bucket covering 2 hours for the DST change over. The reason being because some DST change overs are only 30mins and this method would not be congruous between 1hour and 30min DST changes.

@matsondawson
Copy link

matsondawson commented May 30, 2016

@cbuescher,
That change breaks 15 min intervals. The buckets seem to change one hour early. And I'm getting overlapping data results. In the data below the min of the next bucket should be more than the max of the previous bucket.
key_as_string is in Pacific/Auckland time. It should change to +12:00 at 3am, but it's changing at 2am.

{ key_as_string: '2016-04-03T01:45:00.000+13:00', mintsr: '2016-04-02T12:45:04.523Z', maxtsr: '2016-04-02T12:50:00.728Z', minvd: 6127.3, maxvd: 6127.3 }
{ key_as_string: '2016-04-03T02:00:00.000+12:00', mintsr: '2016-04-02T13:00:00.755Z', maxtsr: '2016-04-02T14:10:00.814Z', minvd: 6127.4, maxvd: 6127.9 }
{ key_as_string: '2016-04-03T02:15:00.000+12:00', mintsr: '2016-04-02T13:15:00.682Z', maxtsr: '2016-04-02T14:25:00.722Z', minvd: 6127.5, maxvd: 6128.0 }
{ key_as_string: '2016-04-03T02:30:00.000+12:00', mintsr: '2016-04-02T13:30:00.789Z', maxtsr: '2016-04-02T14:40:00.684Z', minvd: 6127.6, maxvd: 6128.1 }
{ key_as_string: '2016-04-03T02:45:00.000+12:00', mintsr: '2016-04-02T13:45:05.004Z', maxtsr: '2016-04-02T14:55:05.685Z', minvd: 6127.7, maxvd: 6128.2 }
{ key_as_string: '2016-04-03T03:00:00.000+12:00', mintsr: '2016-04-02T15:00:06.389Z', maxtsr: '2016-04-02T15:10:00.729Z', minvd: 6128.3, maxvd: 6128.3 }

@cbuescher
Copy link
Member

@matsondawson This is a different problem unrelated to #18415 , although from a user perspective it appears related. If you use 15 min intevals, ES is using TimeIntervalRounding internally which wasn't affected by that change. I'm pretty sure there are edge cases with arbitrary interval lengths around dst changes have glitches in them, so can I ask you again for an example containing few data ponts and a query that doesn't work for you, then I will look into this again.

@jpountz
Copy link
Contributor

jpountz commented Oct 24, 2016

Closing due to lack of feedback.

@jpountz jpountz closed this as completed Oct 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants