Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_histogram w/ extended_bounds fails on alias/index name #19009

Closed
wrobstory opened this issue Jun 21, 2016 · 4 comments · Fixed by #19085
Closed

date_histogram w/ extended_bounds fails on alias/index name #19009

wrobstory opened this issue Jun 21, 2016 · 4 comments · Fixed by #19085
Assignees

Comments

@wrobstory
Copy link

Elasticsearch version: 2.3.2

JVM version: 1.7.0_67

OS version: OSX 10.11.4

Description of the problem including expected versus actual behavior:

Start with two indices and an alias for both, the second with a new field introduced:

curl -XPUT 'http://localhost:9200/test_index_1/dates/1?pretty' -d '{"when_received": "2016-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/2?pretty' -d '{"when_received": "2016-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/3?pretty' -d '{"when_received": "2016-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/4?pretty' -d '{"when_received": "2016-06-29T17:21:24.000Z"}'

curl -XPUT 'http://localhost:9200/test_index_2/dates/1?pretty' -d '{"when_recorded": "2016-04-25T13:21:24.000Z", "when_received": "2015-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/2?pretty' -d '{"when_recorded": "2016-05-28T14:21:24.000Z", "when_received": "2015-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/3?pretty' -d '{"when_recorded": "2016-06-28T17:21:24.000Z", "when_received": "2015-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/4?pretty' -d '{"when_recorded": "2016-06-29T17:21:24.000Z", "when_received": "2015-06-29T17:21:24.000Z"}'

curl -XPOST 'http://localhost:9200/test_index_1/_refresh'
curl -XPOST 'http://localhost:9200/test_index_2/_refresh'

curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "test_index_1", "alias" : "all_indices" } },
        { "add" : { "index" : "test_index_2", "alias" : "all_indices" } }
    ]
}'

I want to do a date_histogram aggregation over the alias with extended_bounds. The results for each index individually are what I would expect:

curl -XGET 'http://localhost:9200/test_index_1/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ ]
    }
  }
}
curl -XGET 'http://localhost:9200/test_index_2/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-01-01T00:00:00.000Z",
        "key" : 1451606400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-02-01T00:00:00.000Z",
        "key" : 1454284800000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-03-01T00:00:00.000Z",
        "key" : 1456790400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

However, when using the alias, the extended_bounds fail:

curl -XGET 'http://localhost:9200/all_indices/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

Here's the tricky part: this behavior depends on the actual index name. Same steps, but a different name for the second index:

curl -XPUT 'http://localhost:9200/test_index_1/dates/1?pretty' -d '{"when_received": "2016-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/2?pretty' -d '{"when_received": "2016-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/3?pretty' -d '{"when_received": "2016-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/4?pretty' -d '{"when_received": "2016-06-29T17:21:24.000Z"}'

curl -XPUT 'http://localhost:9200/foobar/dates/1?pretty' -d '{"when_recorded": "2016-04-25T13:21:24.000Z", "when_received": "2015-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/foobar/dates/2?pretty' -d '{"when_recorded": "2016-05-28T14:21:24.000Z", "when_received": "2015-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/foobar/dates/3?pretty' -d '{"when_recorded": "2016-06-28T17:21:24.000Z", "when_received": "2015-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/foobar/dates/4?pretty' -d '{"when_recorded": "2016-06-29T17:21:24.000Z", "when_received": "2015-06-29T17:21:24.000Z"}'

curl -XPOST 'http://localhost:9200/test_index_1/_refresh'
curl -XPOST 'http://localhost:9200/foobar/_refresh'

curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "test_index_1", "alias" : "all_indices" } },
        { "add" : { "index" : "foobar", "alias" : "all_indices" } }
    ]
}'

The results for each index:

curl -XGET 'http://localhost:9200/test_index_1/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ ]
    }
  }
}
curl -XGET 'http://localhost:9200/foobar/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-01-01T00:00:00.000Z",
        "key" : 1451606400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-02-01T00:00:00.000Z",
        "key" : 1454284800000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-03-01T00:00:00.000Z",
        "key" : 1456790400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

Except this time, the aggregation over the alias works as expected:

curl -XGET 'http://localhost:9200/all_indices/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-01-01T00:00:00.000Z",
        "key" : 1451606400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-02-01T00:00:00.000Z",
        "key" : 1454284800000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-03-01T00:00:00.000Z",
        "key" : 1456790400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

The steps to reproduce are above, and I would expect the query against the alias to respect the extended_bounds parameter no matter what the index names are.

@clintongormley
Copy link

Thanks for the clear recreation. Actually, I'd disagree with the output from test_index_1 being correct. You've asked for extended bounds and yet you get no buckets back at all? I think all buckets should be returned instead.

@colings86 could you take a look please?

@colings86
Copy link
Contributor

Hmm, I haven't yet run your recreation @wrobstory (thanks for such a complete explanation/recreation btw) but it looks like the problem here is that the extended bounds information is not sent back to the coordinating node in the shard response if the shard had no matching documents. We arbitrarily pick one of the responses to use as the guide for the reduce phase (this is actually the first in the list and I wouldn't be surprised if the list is in fact sorted by index name first) so if we choose one which matched no documents it won't do the last step of completing the extended bounds (leading to the weird behaviour where the name of the index makes a difference to the result). In theory it should be an easy fix, to send back the extended bounds information as part of the empty aggregation response. I'll look into making this change soon.

@colings86
Copy link
Contributor

@wrobstory I have raise #19085 to address this issue

colings86 added a commit that referenced this issue Jun 27, 2016
Previous to this change the unresolved extended bounds was passed into the histogram aggregator which meant extendedbounds.min and extendedbounds.max was passed through as null. This had two effects on the histogram aggregator:

1. If the histogram aggregator was unmapped across all shards, the reduce phase would not add buckets for the extended bounds and the response would contain zero buckets
2. If the histogram aggregator was not unmapped in some shards, the reduce phase might sometimes chose to reduce based on the unmapped shard response and therefore the extended bounds would be ignored.

This change resolves the extended bounds in the unmapped case and solves the above two issues.

Closes #19009
@wrobstory
Copy link
Author

Thanks for the great communication and quick turnaround! Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants