Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange bucket for date_histogram with negative post_zone and zero min_doc_count #7673

Closed
glelouarn opened this issue Sep 10, 2014 · 5 comments
Assignees

Comments

@glelouarn
Copy link

I sent following request to my elasticsearch 1.3.2 server http://localhost:9200/payment_prod/2002/_search?search_type=count with negative post_zone and pre_zone:

{
    "query" : {
        "filtered" : {
            "query" : {
                "match_all" : {}
            }
        }
    },
    "aggs" : {
        "by_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month",
                "post_zone" : -2,
                "pre_zone" : -2,
                "min_doc_count" : 0,
                "format" : "yyyy-MM-dd--HH:mm:ss.SSSZ"
            }
        }
    }
}

It seems to me that I shouldn't get the bucket 2013-07-30--22:00:00.000+0000 in the response.

{
    "took" : 105,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 4018428,
        "max_score" : 0.0,
        "hits" : []
    },
    "aggregations" : {
        "by_time" : {
            "buckets" : [{
                    "key_as_string" : "2013-06-30--22:00:00.000+0000",
                    "key" : 1372629600000,
                    "doc_count" : 235258
                }, {
                    "key_as_string" : "2013-07-30--22:00:00.000+0000",
                    "key" : 1375221600000,
                    "doc_count" : 0
                }, {
                    "key_as_string" : "2013-07-31--22:00:00.000+0000",
                    "key" : 1375308000000,
                    "doc_count" : 341928
                }, {
                    "key_as_string" : "2013-08-31--22:00:00.000+0000",
                    "key" : 1377986400000,
                    "doc_count" : 330148
                }
            ]
        }
    }
}

With a small update on the request, post and pre zone with positive values, http://localhost:9200/payment_prod/2002/_search?search_type=count:

{
    "query" : {
        "filtered" : {
            "query" : {
                "match_all" : {}
            }
        }
    },
    "aggs" : {
        "by_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month",
                "post_zone" : 2,
                "pre_zone" : 2,
                "min_doc_count" : 0,
                "format" : "yyyy-MM-dd--HH:mm:ss.SSSZ"
            }
        }
    }
}

In such case, response seems valid for me without strange bucket:

{
    "took" : 87,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 4018428,
        "max_score" : 0.0,
        "hits" : []
    },
    "aggregations" : {
        "by_time" : {
            "buckets" : [{
                    "key_as_string" : "2013-07-01--02:00:00.000+0000",
                    "key" : 1372644000000,
                    "doc_count" : 233384
                }, {
                    "key_as_string" : "2013-08-01--02:00:00.000+0000",
                    "key" : 1375322400000,
                    "doc_count" : 341918
                }
            ]
        }
    }

In fact problem occurs with post_zone < 0 and min_doc_count = 0. Without one of those predicates, result seems more reliable.

Am I wrong or is there a problem with elasticsearch post_zone management?

@clintongormley
Copy link

thanks for the report @glelouarn - we'll take a look

glelouarn pushed a commit to glelouarn/elasticsearch that referenced this issue Dec 22, 2014
… with

min_doc_count = 0 and negative post_zone.
@glelouarn
Copy link
Author

Hi work on it and initialized the pull request #9029

@cbuescher
Copy link
Member

I encountered similar problems while working on #9062. The problem is in the way Rounding.nextRoundingValue() is used when adding empty buckets in InternalHistogram.addEmptyBuckets(). The assumtion there is that when adding time durations (like 1M) to the key of an existing (non-empty) bucket to fill the histogram with empty buckets one always ends up with the key of the next non-empty bucket.

e.g. in the example about one would expect

nextRoundingValue("2013-06-30--22:00:00.000+0000") => "2013-07-31--22:00:00.000+0000"
(well, that is 2hour before the 1st of next month)

Internally we use DurationField.add() from Joda-Time, but that works in a slightly different way (at least for months), e.g. if you start with 2014-01-31T22:00:00.000Z and then add 1month durations consecutively one gets:

2014-01-31T22:00:00.000Z + month -> 2014-02-28T22:00:00.000Z
2014-02-28T22:00:00.000Z + month -> 2014-03-28T22:00:00.000Z
2014-03-28T22:00:00.000Z + month -> 2014-04-28T22:00:00.000Z
etc...

but the following rounded non-empty buckets will have keys 2014-03-31T22:00:00.000Z, 2014-04-30T22:00:00.000Z etc...

The time-zone offset is just one way to run into this. I think that any offset (could be positive as well) that makes bucket keys lie in the range of day_of_month in 28-31 will likely result in the same glitch, at least for DateTimeUnit.MONTH_OF_YEAR and above.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Feb 20, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in elastic#9491 and elastic#7673.

Closes elastic#9491
Closes elastic#7673
cbuescher pushed a commit that referenced this issue Feb 23, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in #9491 and #7673.

Closes #9491
Closes #7673
cbuescher pushed a commit that referenced this issue Feb 23, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in #9491 and #7673.

Closes #9491
Closes #7673
@cbuescher
Copy link
Member

This is fixed on 1.4 and 1.x with #9790 I think.

@cbuescher
Copy link
Member

The fix for this will be in the next release (either 1.4.5 or 1.5).

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in elastic#9491 and elastic#7673.

Closes elastic#9491
Closes elastic#7673
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants