Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange (wrong?) behavior with date histogram aggregation in a field that is inside a nested object (Elasticsearch 7.10.0) #65624

Closed
tberne opened this issue Nov 30, 2020 · 6 comments · Fixed by #65707
Assignees
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@tberne
Copy link

tberne commented Nov 30, 2020

Elasticsearch version (bin/elasticsearch --version): 7.10.0 (Docker)

Plugins installed: [Analysis-ICU]

JVM version (java -version): (the one from Docker image)

OS version (uname -a if on a Unix-like system): (the one from Docker image)

Description of the problem including expected versus actual behavior:

Date history aggregation returns strange (wrong) number of buckets.
As of 7.9.3, it returned 10 buckets. But 7.10.0 returns 14 buckets.
The interval set in aggregation is 1d. Yet, in 7.10.0, it produces buckets with difference of 3 hours.

I've created a Dockerfile to reproduce the problem:

tmp.tar.gz

To run, execute run-bug-container.sh and check the last request aggregation result.

To ease analysis, I'v created two pastebins with responses of 7.9.3 and 7.10.0:

7.9.3: https://pastebin.com/ms5dFefT

7.10.0: https://pastebin.com/3W9kQg4h

Notice that in 7.9.3 it returns 10 buckets with 1 day interval (as expected) but in 7.10.0 it produces 14 buckets (some with 3 hour interval).

To change from 7.10.0 to 7.9.3 in my example, just change the "FROM" in Dockerfile.

All steps to populate/reproduce the problem are inside run.sh file (in curl format).

Steps to reproduce:

Run run-bug-container.sh in the given example.

@tberne tberne added >bug needs:triage Requires assignment of a team area label labels Nov 30, 2020
@tberne
Copy link
Author

tberne commented Nov 30, 2020

I've made a post in forums about this problem also.

@polyfractal polyfractal added :Analytics/Aggregations Aggregations and removed needs:triage Requires assignment of a team area label labels Nov 30, 2020
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000 nik9000 self-assigned this Nov 30, 2020
@nik9000
Copy link
Member

nik9000 commented Nov 30, 2020

That wasn't simple to reproduce but something is certainly up here. I think its related to #62983. I was able to boil it down to something like:

curl -XDELETE -uelastic:password -HContent-type:application/json http://localhost:9200/test
curl -XDELETE -uelastic:password -HContent-type:application/json http://localhost:9200/not_it
curl -XPUT -uelastic:password -HContent-type:application/json http://localhost:9200/not_it -d '{
  "mappings": {
    "properties": {
    }
  }
}'

curl -XPUT -uelastic:password -HContent-type:application/json http://localhost:9200/test -d '{
  "mappings": {
    "properties": {
      "nested": {
        "type": "nested",
        "properties": {
          "date": {
            "index": true,
            "type": "date"
          }
        }
      }
    }
  }
}'

curl -XPOST -uelastic:password -HContent-type:application/json 'http://localhost:9200/test/_bulk?pretty&refresh' -d '
{"index": {}}
{"nested": {"date": 1522781154000}}
{"index": {}}
{"nested": {"date": 1522763154000}}
{"index": {}}
{"nested": {"date": 1522867554000}}
{"index": {}}
{"nested": {"date": 1523126754000}}
{"index": {}}
{"nested": {"date": 1522694754000}}
{"index": {}}
{"nested": {"date": 1522766754000}}
'

curl -XPOST -uelastic:password -HContent-type:application/json http://localhost:9200/*/_search?pretty -d '{
  "aggs": {
    "nested": {
      "nested": {
        "path": "nested"
      },
      "aggs": {
        "histo": {
          "date_histogram": {
            "field": "nested.date",
            "interval": "1d",
            "offset":10800000,
            "order":{"_key":"asc"},
            "keyed":false,
            "extended_bounds":{
              "min":1522551600000,
              "max":1523329200000
            }
          }
        }
      }
    }
  }
}'

It has something to do with the index that doesn't have the mapped nested field. I wonder if we're doubling up the offset somehow when we are in the "unmapped" case.

@nik9000
Copy link
Member

nik9000 commented Dec 1, 2020

The tar.gz reproduction had a bunch of red herrings - icu, a couple of extra indices with complex configuration and a bunch of unrelated documents.

@tberne
Copy link
Author

tberne commented Dec 1, 2020

@nik9000 , the tar.gz I've generated was with a Junit test execution. That's why it has some strange naming. This was the cleaner I could reach with a reproducible scenario without spending much time.
Sorry for that.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Dec 1, 2020
`date_histogram` has a bug with `offset` and `extended_bounds` when it
needs to create an "empty" aggregation result: it includes the bounds
twice! Wooops!

I broke this a while back when I started trying to merge `offset` into
`Rounding`. I never finished that merge, sadly. Interestingly, we've
discovered that the merge is required to properly handle daylight
savings time (elastic#56305) but it isn't really something we're looking to
solve today. For now, this just stops counting the offset twice.

Closes elastic#65624
@nik9000
Copy link
Member

nik9000 commented Dec 1, 2020

@nik9000 , the tar.gz I've generated was with a Junit test execution. That's why it has some strange naming. This was the cleaner I could reach with a reproducible scenario without spending much time.
Sorry for that.

No big deal! I got to it. It took longer to prune the bug down to a more minimal reproduction then it took to actually fix it.

nik9000 added a commit that referenced this issue Dec 7, 2020
`date_histogram` has a bug with `offset` and `extended_bounds` when it
needs to create an "empty" aggregation result: it includes the bounds
twice! Wooops!

I broke this a while back when I started trying to merge `offset` into
`Rounding`. I never finished that merge, sadly. Interestingly, we've
discovered that the merge is required to properly handle daylight
savings time (#56305) but it isn't really something we're looking to
solve today. For now, this just stops counting the offset twice.

Closes #65624
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Dec 7, 2020
`date_histogram` has a bug with `offset` and `extended_bounds` when it
needs to create an "empty" aggregation result: it includes the bounds
twice! Wooops!

I broke this a while back when I started trying to merge `offset` into
`Rounding`. I never finished that merge, sadly. Interestingly, we've
discovered that the merge is required to properly handle daylight
savings time (elastic#56305) but it isn't really something we're looking to
solve today. For now, this just stops counting the offset twice.

Closes elastic#65624
nik9000 added a commit that referenced this issue Dec 7, 2020
`date_histogram` has a bug with `offset` and `extended_bounds` when it
needs to create an "empty" aggregation result: it includes the bounds
twice! Wooops!

I broke this a while back when I started trying to merge `offset` into
`Rounding`. I never finished that merge, sadly. Interestingly, we've
discovered that the merge is required to properly handle daylight
savings time (#56305) but it isn't really something we're looking to
solve today. For now, this just stops counting the offset twice.

Closes #65624

* Fixup
rjernst pushed a commit to mark-vieira/elasticsearch that referenced this issue Dec 11, 2020
`date_histogram` has a bug with `offset` and `extended_bounds` when it
needs to create an "empty" aggregation result: it includes the bounds
twice! Wooops!

I broke this a while back when I started trying to merge `offset` into
`Rounding`. I never finished that merge, sadly. Interestingly, we've
discovered that the merge is required to properly handle daylight
savings time (elastic#56305) but it isn't really something we're looking to
solve today. For now, this just stops counting the offset twice.

Closes elastic#65624
alyokaz pushed a commit to alyokaz/elasticsearch that referenced this issue Mar 10, 2021
`date_histogram` has a bug with `offset` and `extended_bounds` when it
needs to create an "empty" aggregation result: it includes the bounds
twice! Wooops!

I broke this a while back when I started trying to merge `offset` into
`Rounding`. I never finished that merge, sadly. Interestingly, we've
discovered that the merge is required to properly handle daylight
savings time (elastic#56305) but it isn't really something we're looking to
solve today. For now, this just stops counting the offset twice.

Closes elastic#65624
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants