date_histogram w/ extended_bounds fails on alias/index name #19009

wrobstory · 2016-06-21T16:39:42Z

Elasticsearch version: 2.3.2

JVM version: 1.7.0_67

OS version: OSX 10.11.4

Description of the problem including expected versus actual behavior:

Start with two indices and an alias for both, the second with a new field introduced:

curl -XPUT 'http://localhost:9200/test_index_1/dates/1?pretty' -d '{"when_received": "2016-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/2?pretty' -d '{"when_received": "2016-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/3?pretty' -d '{"when_received": "2016-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/4?pretty' -d '{"when_received": "2016-06-29T17:21:24.000Z"}'

curl -XPUT 'http://localhost:9200/test_index_2/dates/1?pretty' -d '{"when_recorded": "2016-04-25T13:21:24.000Z", "when_received": "2015-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/2?pretty' -d '{"when_recorded": "2016-05-28T14:21:24.000Z", "when_received": "2015-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/3?pretty' -d '{"when_recorded": "2016-06-28T17:21:24.000Z", "when_received": "2015-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/4?pretty' -d '{"when_recorded": "2016-06-29T17:21:24.000Z", "when_received": "2015-06-29T17:21:24.000Z"}'

curl -XPOST 'http://localhost:9200/test_index_1/_refresh'
curl -XPOST 'http://localhost:9200/test_index_2/_refresh'

curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "test_index_1", "alias" : "all_indices" } },
        { "add" : { "index" : "test_index_2", "alias" : "all_indices" } }
    ]
}'

I want to do a date_histogram aggregation over the alias with extended_bounds. The results for each index individually are what I would expect:

curl -XGET 'http://localhost:9200/test_index_1/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ ]
    }
  }
}

curl -XGET 'http://localhost:9200/test_index_2/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-01-01T00:00:00.000Z",
        "key" : 1451606400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-02-01T00:00:00.000Z",
        "key" : 1454284800000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-03-01T00:00:00.000Z",
        "key" : 1456790400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

However, when using the alias, the extended_bounds fail:

curl -XGET 'http://localhost:9200/all_indices/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

Here's the tricky part: this behavior depends on the actual index name. Same steps, but a different name for the second index:

curl -XPUT 'http://localhost:9200/test_index_1/dates/1?pretty' -d '{"when_received": "2016-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/2?pretty' -d '{"when_received": "2016-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/3?pretty' -d '{"when_received": "2016-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/4?pretty' -d '{"when_received": "2016-06-29T17:21:24.000Z"}'

curl -XPUT 'http://localhost:9200/foobar/dates/1?pretty' -d '{"when_recorded": "2016-04-25T13:21:24.000Z", "when_received": "2015-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/foobar/dates/2?pretty' -d '{"when_recorded": "2016-05-28T14:21:24.000Z", "when_received": "2015-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/foobar/dates/3?pretty' -d '{"when_recorded": "2016-06-28T17:21:24.000Z", "when_received": "2015-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/foobar/dates/4?pretty' -d '{"when_recorded": "2016-06-29T17:21:24.000Z", "when_received": "2015-06-29T17:21:24.000Z"}'

curl -XPOST 'http://localhost:9200/test_index_1/_refresh'
curl -XPOST 'http://localhost:9200/foobar/_refresh'

curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "test_index_1", "alias" : "all_indices" } },
        { "add" : { "index" : "foobar", "alias" : "all_indices" } }
    ]
}'

The results for each index:

curl -XGET 'http://localhost:9200/test_index_1/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ ]
    }
  }
}

curl -XGET 'http://localhost:9200/foobar/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-01-01T00:00:00.000Z",
        "key" : 1451606400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-02-01T00:00:00.000Z",
        "key" : 1454284800000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-03-01T00:00:00.000Z",
        "key" : 1456790400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

Except this time, the aggregation over the alias works as expected:

curl -XGET 'http://localhost:9200/all_indices/_search?pretty' -d '{
    "size": 0,
    "aggs": 
    {"monthly_date_histogram": 
      {"date_histogram": {"field": "when_recorded", 
                          "interval": "month",
                          "min_doc_count": 0,
                          "extended_bounds": {"max": "now", "min": "now-5M"}}}}
}
'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly_date_histogram" : {
      "buckets" : [ {
        "key_as_string" : "2016-01-01T00:00:00.000Z",
        "key" : 1451606400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-02-01T00:00:00.000Z",
        "key" : 1454284800000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-03-01T00:00:00.000Z",
        "key" : 1456790400000,
        "doc_count" : 0
      }, {
        "key_as_string" : "2016-04-01T00:00:00.000Z",
        "key" : 1459468800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-05-01T00:00:00.000Z",
        "key" : 1462060800000,
        "doc_count" : 1
      }, {
        "key_as_string" : "2016-06-01T00:00:00.000Z",
        "key" : 1464739200000,
        "doc_count" : 2
      } ]
    }
  }
}

The steps to reproduce are above, and I would expect the query against the alias to respect the extended_bounds parameter no matter what the index names are.

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-06-22T09:41:27Z

Thanks for the clear recreation. Actually, I'd disagree with the output from test_index_1 being correct. You've asked for extended bounds and yet you get no buckets back at all? I think all buckets should be returned instead.

@colings86 could you take a look please?

colings86 · 2016-06-22T09:57:50Z

Hmm, I haven't yet run your recreation @wrobstory (thanks for such a complete explanation/recreation btw) but it looks like the problem here is that the extended bounds information is not sent back to the coordinating node in the shard response if the shard had no matching documents. We arbitrarily pick one of the responses to use as the guide for the reduce phase (this is actually the first in the list and I wouldn't be surprised if the list is in fact sorted by index name first) so if we choose one which matched no documents it won't do the last step of completing the extended bounds (leading to the weird behaviour where the name of the index makes a difference to the result). In theory it should be an easy fix, to send back the extended bounds information as part of the empty aggregation response. I'll look into making this change soon.

colings86 · 2016-06-27T09:52:56Z

@wrobstory I have raise #19085 to address this issue

Previous to this change the unresolved extended bounds was passed into the histogram aggregator which meant extendedbounds.min and extendedbounds.max was passed through as null. This had two effects on the histogram aggregator: 1. If the histogram aggregator was unmapped across all shards, the reduce phase would not add buckets for the extended bounds and the response would contain zero buckets 2. If the histogram aggregator was not unmapped in some shards, the reduce phase might sometimes chose to reduce based on the unmapped shard response and therefore the extended bounds would be ignored. This change resolves the extended bounds in the unmapped case and solves the above two issues. Closes #19009

wrobstory · 2016-06-28T15:22:50Z

Thanks for the great communication and quick turnaround! Much appreciated!

clintongormley added >bug :Analytics/Aggregations Aggregations v2.4.0 labels Jun 22, 2016

clintongormley assigned colings86 Jun 22, 2016

colings86 mentioned this issue Jun 27, 2016

Pass resolved extended bounds to unmapped histogram aggregator #19085

Merged

colings86 closed this as completed in #19085 Jun 27, 2016

colings86 mentioned this issue Jul 6, 2016

Migrate terms aggregation to NamedWriteable #19277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

date_histogram w/ extended_bounds fails on alias/index name #19009

date_histogram w/ extended_bounds fails on alias/index name #19009

wrobstory commented Jun 21, 2016

clintongormley commented Jun 22, 2016

colings86 commented Jun 22, 2016

colings86 commented Jun 27, 2016

wrobstory commented Jun 28, 2016

Navigation Menu

date_histogram w/ extended_bounds fails on alias/index name #19009

date_histogram w/ extended_bounds fails on alias/index name #19009

Comments

wrobstory commented Jun 21, 2016

clintongormley commented Jun 22, 2016

colings86 commented Jun 22, 2016

colings86 commented Jun 27, 2016

wrobstory commented Jun 28, 2016