datehistogram sometimes ignores format when searching on multiple indexes #31760

plaarakkers · 2018-07-03T13:18:47Z

Elasticsearch version (Version: 6.3.0 (docker image based on docker.elastic.co/elasticsearch/elasticsearch:6.3.0)):

Plugins installed: []

JVM version (client build 1.8.0_171-b11):

OS version (Linux 4.15.0-23-generic):

Description of the problem including expected versus actual behavior:

The date histogram ignores the format of the aggregation or date field
if the search is performed on multiple indexes where the alphabetically first index does not contain the date field. This results in an unexpected value for the key_as_string in the bucket

Steps to reproduce:
I created a unit test to reproduce the issue. This uses the following mappings:
searchIndexWithDateField.json

{
  "doc": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "keyword",
        "index": false
      },
      "dateField": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

searchIndexWithoutDateField.json

{
  "doc": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "keyword",
        "index": false
      }
    }
  }
}

The unit test itself uses version 6.3.0 for the java client maven dependencies:

package nl.gellygwin;

import org.apache.commons.io.IOUtils;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramInterval;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;
import java.net.InetAddress;
import java.net.UnknownHostException;

public class DateHistogramFormatTest {

    private static final String INDEX_TYPE = "doc";

    private static final String AGGREGATION_NAME = "dateTest";

    private static final String DATE_FIELD = "dateField";

    private TransportClient transportClient;

    @Before
    public void before() throws UnknownHostException {
        transportClient = new PreBuiltTransportClient(Settings.builder()
                .put("cluster.name", "docker-cluster")
                .build())
                .addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));

        transportClient.admin().indices().prepareDelete("_all")
                .get();
    }

    @Test
    public void testDateFieldFormatIncorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "b";
        final String indexNameWithoutDateField = "a";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .field(DATE_FIELD))
                .get();

        System.out.println("testDateFieldFormatIncorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    @Test
    public void testDateFieldFormatCorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "a";
        final String indexNameWithoutDateField = "b";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .field(DATE_FIELD))
                .get();

        System.out.println("testDateFieldFormatCorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    @Test
    public void testAggregationFormatIncorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "b";
        final String indexNameWithoutDateField = "a";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .format("yyyy-MM-dd")
                        .field(DATE_FIELD))
                .get();

        System.out.println("testAggregationFormatIncorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    @Test
    public void testAggregationFormatCorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "a";
        final String indexNameWithoutDateField = "b";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .format("yyyy-MM-dd")
                        .field(DATE_FIELD))
                .get();

        System.out.println("testAggregationFormatCorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    private void setup(String indexNameWithDateField, String indexNameWithoutDateField) throws IOException {
        transportClient.admin().indices().prepareCreate(indexNameWithDateField)
                .addMapping(INDEX_TYPE, readMappingWithDateField(), XContentType.JSON)
                .get();

        transportClient.admin().indices().prepareCreate(indexNameWithoutDateField)
                .addMapping(INDEX_TYPE, readMappingWithoutDateField(), XContentType.JSON)
                .get();

        //create data
        transportClient.prepareIndex(indexNameWithDateField, INDEX_TYPE, "1")
                .setSource("{\"title\": \"1\", \"dateField\": \"2018-01-01\"}", XContentType.JSON)
                .get();

        transportClient.prepareIndex(indexNameWithoutDateField, INDEX_TYPE, "1")
                .setSource("{\"title\": \"1\"}", XContentType.JSON)
                .get();

        transportClient.admin().indices().prepareRefresh(indexNameWithDateField)
                .get();

        transportClient.admin().indices().prepareRefresh(indexNameWithoutDateField)
                .get();
    }

    private String readMappingWithDateField() throws IOException {
        return IOUtils.toString(this.getClass().getResourceAsStream("/searchIndexWithDateField.json"), "UTF-8");
    }

    private String readMappingWithoutDateField() throws IOException {
        return IOUtils.toString(this.getClass().getResourceAsStream("/searchIndexWithoutDateField.json"), "UTF-8");
    }

}

The output of these tests are:

testAggregationFormatIncorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01T00:00:00.000Z","key":1514764800000,"doc_count":1}]}}
testDateFieldFormatIncorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01T00:00:00.000Z","key":1514764800000,"doc_count":1}]}}
testDateFieldFormatCorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01","key":1514764800000,"doc_count":1}]}}
testAggregationFormatCorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01","key":1514764800000,"doc_count":1}]}}

You can see that the output for testAggregationFormatIncorrect and testDateFieldFormatIncorrect uses the default date format while the only difference is that the names of the indexes are switched.
I would expect that the output would be the same for all test cases.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-07-03T13:24:03Z

Pinging @elastic/es-search-aggs

cbuescher · 2018-10-23T14:53:58Z

@plaarakkers Sorry for the long wait on this, this issue somehow slipped our attention although the reproduction in the test is really great.
I was able to reproduce this on current master as well easily with parts of the test you provided (although the testAggregationFormatIncorrect() and testDateFieldFormatIncorrect() fluctuates between the expected "2018-01-01" and the default date format for me, its probably a matter of execution order on multi-node clusters as well).
Thanks a lot for providing such a good reproduction, will try to get to the bottom of this.

Currently when aggregating on an unmapped date field (e.g. using a date_histogram) we don't preserve the aggregations `format` setting but instead use the default format. This can lead to loosing the aggregations `format` when aggregating over several indices where some of them contain unmapped date fields and are encountered first in the reduce phase. Related to elastic#31760

cbuescher · 2018-11-05T14:00:30Z

I opened #35254 which should at least fix the cases where the "format" is explicitely specified in the aggregation itself. I'm not sure we can reliably fix the case where the format is only specified on one index but the other indexs date field is "unmapped", because I don't think we cannot reliably determine the "winning" format in the aggregation. Maybe we can add a special case for this in the reduce phase somewhere, though I'm not sure about this yet.

Currently when aggregating on an unmapped date field (e.g. using a date_histogram) we don't preserve the aggregations `format` setting but instead use the default format. This can lead to loosing the aggregations `format` when aggregating over several indices where some of them contain unmapped date fields and are encountered first in the reduce phase. Related to elastic#31760

…35254) Currently when aggregating on an unmapped date field (e.g. using a date_histogram) we don't preserve the aggregations `format` setting but instead use the default format. This can lead to loosing the aggregations `format` when aggregating over several indices where some of them contain unmapped date fields and are encountered first in the reduce phase. Related to #31760

cbuescher · 2018-11-09T14:43:41Z

@plaarakkers The fix mentioned above (#35254) should solve the cases you mentioned where the aggregation specifies a specific format. For the other case (two indices havinf different formats in the mappings, or one missing - which means the same as the default format) we haven't really got a good way of deciding which one should "win". I think the fact you observe that the order of index names matters is just coincidental. When merging aggregation results from different shards and indices we use the format of whatever results returns first to the coordinating node. The only way to make sure the format is deterministic is to either make sure the same format is specified in all mappings or explicitly set it on the aggregation.
Hope this solves your case and you agree that we can close this issue. Thanks again for the excellent reproduction!

cbuescher · 2018-11-13T09:36:43Z

Closing, please reopen if you see the need for further action.

jpountz added >bug :Analytics/Aggregations Aggregations :Search/Search Search-related issues that do not fall into other categories labels Jul 3, 2018

jpountz removed the :Search/Search Search-related issues that do not fall into other categories label Jul 3, 2018

cbuescher self-assigned this Oct 23, 2018

cbuescher mentioned this issue Nov 5, 2018

Preserve format when aggregation contains unmapped date fields #35254

Merged

cbuescher closed this as completed Nov 13, 2018

fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021

Create TestFixMe.md

a9fae03

fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021

Create Helloworld.md

1398a04

fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021

Update Helloworld.md

f68abab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datehistogram sometimes ignores format when searching on multiple indexes #31760

datehistogram sometimes ignores format when searching on multiple indexes #31760

plaarakkers commented Jul 3, 2018

elasticmachine commented Jul 3, 2018

cbuescher commented Oct 23, 2018

cbuescher commented Nov 5, 2018

cbuescher commented Nov 9, 2018

cbuescher commented Nov 13, 2018

datehistogram sometimes ignores format when searching on multiple indexes #31760

datehistogram sometimes ignores format when searching on multiple indexes #31760

Comments

plaarakkers commented Jul 3, 2018

elasticmachine commented Jul 3, 2018

cbuescher commented Oct 23, 2018

cbuescher commented Nov 5, 2018

cbuescher commented Nov 9, 2018

cbuescher commented Nov 13, 2018