Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datehistogram sometimes ignores format when searching on multiple indexes #31760

Closed
plaarakkers opened this issue Jul 3, 2018 · 5 comments
Closed
Assignees

Comments

@plaarakkers
Copy link

Elasticsearch version (Version: 6.3.0 (docker image based on docker.elastic.co/elasticsearch/elasticsearch:6.3.0)):

Plugins installed: []

JVM version (client build 1.8.0_171-b11):

OS version (Linux 4.15.0-23-generic):

Description of the problem including expected versus actual behavior:

The date histogram ignores the format of the aggregation or date field
if the search is performed on multiple indexes where the alphabetically first index does not contain the date field. This results in an unexpected value for the key_as_string in the bucket

Steps to reproduce:
I created a unit test to reproduce the issue. This uses the following mappings:
searchIndexWithDateField.json

{
  "doc": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "keyword",
        "index": false
      },
      "dateField": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

searchIndexWithoutDateField.json

{
  "doc": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "keyword",
        "index": false
      }
    }
  }
}

The unit test itself uses version 6.3.0 for the java client maven dependencies:

package nl.gellygwin;

import org.apache.commons.io.IOUtils;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramInterval;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;
import java.net.InetAddress;
import java.net.UnknownHostException;

public class DateHistogramFormatTest {

    private static final String INDEX_TYPE = "doc";

    private static final String AGGREGATION_NAME = "dateTest";

    private static final String DATE_FIELD = "dateField";

    private TransportClient transportClient;

    @Before
    public void before() throws UnknownHostException {
        transportClient = new PreBuiltTransportClient(Settings.builder()
                .put("cluster.name", "docker-cluster")
                .build())
                .addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));

        transportClient.admin().indices().prepareDelete("_all")
                .get();
    }

    @Test
    public void testDateFieldFormatIncorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "b";
        final String indexNameWithoutDateField = "a";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .field(DATE_FIELD))
                .get();

        System.out.println("testDateFieldFormatIncorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    @Test
    public void testDateFieldFormatCorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "a";
        final String indexNameWithoutDateField = "b";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .field(DATE_FIELD))
                .get();

        System.out.println("testDateFieldFormatCorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    @Test
    public void testAggregationFormatIncorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "b";
        final String indexNameWithoutDateField = "a";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .format("yyyy-MM-dd")
                        .field(DATE_FIELD))
                .get();

        System.out.println("testAggregationFormatIncorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    @Test
    public void testAggregationFormatCorrect() throws IOException {
        //setup indexes
        final String indexNameWithDateField = "a";
        final String indexNameWithoutDateField = "b";

        setup(indexNameWithDateField, indexNameWithoutDateField);

        final SearchResponse searchResponse = transportClient.prepareSearch("*")
                .addAggregation(AggregationBuilders.dateHistogram(AGGREGATION_NAME)
                        .dateHistogramInterval(DateHistogramInterval.MONTH)
                        .format("yyyy-MM-dd")
                        .field(DATE_FIELD))
                .get();

        System.out.println("testAggregationFormatCorrect: " + searchResponse.getAggregations().get(AGGREGATION_NAME));
    }

    private void setup(String indexNameWithDateField, String indexNameWithoutDateField) throws IOException {
        transportClient.admin().indices().prepareCreate(indexNameWithDateField)
                .addMapping(INDEX_TYPE, readMappingWithDateField(), XContentType.JSON)
                .get();

        transportClient.admin().indices().prepareCreate(indexNameWithoutDateField)
                .addMapping(INDEX_TYPE, readMappingWithoutDateField(), XContentType.JSON)
                .get();

        //create data
        transportClient.prepareIndex(indexNameWithDateField, INDEX_TYPE, "1")
                .setSource("{\"title\": \"1\", \"dateField\": \"2018-01-01\"}", XContentType.JSON)
                .get();

        transportClient.prepareIndex(indexNameWithoutDateField, INDEX_TYPE, "1")
                .setSource("{\"title\": \"1\"}", XContentType.JSON)
                .get();

        transportClient.admin().indices().prepareRefresh(indexNameWithDateField)
                .get();

        transportClient.admin().indices().prepareRefresh(indexNameWithoutDateField)
                .get();
    }

    private String readMappingWithDateField() throws IOException {
        return IOUtils.toString(this.getClass().getResourceAsStream("/searchIndexWithDateField.json"), "UTF-8");
    }

    private String readMappingWithoutDateField() throws IOException {
        return IOUtils.toString(this.getClass().getResourceAsStream("/searchIndexWithoutDateField.json"), "UTF-8");
    }

}

The output of these tests are:

testAggregationFormatIncorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01T00:00:00.000Z","key":1514764800000,"doc_count":1}]}}
testDateFieldFormatIncorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01T00:00:00.000Z","key":1514764800000,"doc_count":1}]}}
testDateFieldFormatCorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01","key":1514764800000,"doc_count":1}]}}
testAggregationFormatCorrect: {"dateTest":{"buckets":[{"key_as_string":"2018-01-01","key":1514764800000,"doc_count":1}]}}

You can see that the output for testAggregationFormatIncorrect and testDateFieldFormatIncorrect uses the default date format while the only difference is that the names of the indexes are switched.
I would expect that the output would be the same for all test cases.

@jpountz jpountz added >bug :Analytics/Aggregations Aggregations :Search/Search Search-related issues that do not fall into other categories labels Jul 3, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jpountz jpountz removed the :Search/Search Search-related issues that do not fall into other categories label Jul 3, 2018
@cbuescher
Copy link
Member

@plaarakkers Sorry for the long wait on this, this issue somehow slipped our attention although the reproduction in the test is really great.
I was able to reproduce this on current master as well easily with parts of the test you provided (although the testAggregationFormatIncorrect() and testDateFieldFormatIncorrect() fluctuates between the expected "2018-01-01" and the default date format for me, its probably a matter of execution order on multi-node clusters as well).
Thanks a lot for providing such a good reproduction, will try to get to the bottom of this.

@cbuescher cbuescher self-assigned this Oct 23, 2018
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Nov 5, 2018
Currently when aggregating on an unmapped date field (e.g. using a
date_histogram) we don't preserve the aggregations `format` setting but instead
use the default format. This can lead to loosing the aggregations `format` when
aggregating over several indices where some of them contain unmapped date fields
and are encountered first in the reduce phase.

Related to elastic#31760
@cbuescher
Copy link
Member

I opened #35254 which should at least fix the cases where the "format" is explicitely specified in the aggregation itself. I'm not sure we can reliably fix the case where the format is only specified on one index but the other indexs date field is "unmapped", because I don't think we cannot reliably determine the "winning" format in the aggregation. Maybe we can add a special case for this in the reduce phase somewhere, though I'm not sure about this yet.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Nov 5, 2018
Currently when aggregating on an unmapped date field (e.g. using a
date_histogram) we don't preserve the aggregations `format` setting but instead
use the default format. This can lead to loosing the aggregations `format` when
aggregating over several indices where some of them contain unmapped date fields
and are encountered first in the reduce phase.

Related to elastic#31760
cbuescher pushed a commit that referenced this issue Nov 8, 2018
…35254)

Currently when aggregating on an unmapped date field (e.g. using a
date_histogram) we don't preserve the aggregations `format` setting but instead
use the default format. This can lead to loosing the aggregations `format` when
aggregating over several indices where some of them contain unmapped date fields
and are encountered first in the reduce phase.

Related to #31760
cbuescher pushed a commit that referenced this issue Nov 8, 2018
…35254)

Currently when aggregating on an unmapped date field (e.g. using a
date_histogram) we don't preserve the aggregations `format` setting but instead
use the default format. This can lead to loosing the aggregations `format` when
aggregating over several indices where some of them contain unmapped date fields
and are encountered first in the reduce phase.

Related to #31760
cbuescher pushed a commit that referenced this issue Nov 9, 2018
…35254)

Currently when aggregating on an unmapped date field (e.g. using a
date_histogram) we don't preserve the aggregations `format` setting but instead
use the default format. This can lead to loosing the aggregations `format` when
aggregating over several indices where some of them contain unmapped date fields
and are encountered first in the reduce phase.

Related to #31760
@cbuescher
Copy link
Member

@plaarakkers The fix mentioned above (#35254) should solve the cases you mentioned where the aggregation specifies a specific format. For the other case (two indices havinf different formats in the mappings, or one missing - which means the same as the default format) we haven't really got a good way of deciding which one should "win". I think the fact you observe that the order of index names matters is just coincidental. When merging aggregation results from different shards and indices we use the format of whatever results returns first to the coordinating node. The only way to make sure the format is deterministic is to either make sure the same format is specified in all mappings or explicitly set it on the aggregation.
Hope this solves your case and you agree that we can close this issue. Thanks again for the excellent reproduction!

@cbuescher
Copy link
Member

Closing, please reopen if you see the need for further action.

fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants