Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse certain dates due to AM/PM issues #58986

Closed
talevy opened this issue Jul 3, 2020 · 6 comments · Fixed by #89693
Closed

Unable to parse certain dates due to AM/PM issues #58986

talevy opened this issue Jul 3, 2020 · 6 comments · Fixed by #89693
Assignees
Labels
>bug :Core/Infra/Core Core issues without another label high hanging fruit Team:Core/Infra Meta label for core/infra team

Comments

@talevy
Copy link
Contributor

talevy commented Jul 3, 2020

Elasticsearch version: 7.6+

Description of the problem including expected versus actual behavior:

I am not entirely sure what exactly the problem is, but there is some parsing logic with rounding dates up that causes parsing exceptions that do not look like they should be parsing exceptions.

I've written a unit test in JavaDateMathParserTests.java that reproduces the believed issue:

    public void testInvalidAMPM() {
        DateFormatter formatter = DateFormatter.forPattern("MM/dd/yyyy hh:mm a");
        DateMathParser parser = formatter.toDateMathParser();
        String date = "04/30/2020 05:48 PM";
        parser.parse(date, () -> 0, false, ZoneId.systemDefault());

        ElasticsearchParseException exception = expectThrows(ElasticsearchParseException.class,
            () -> parser.parse(date, () -> 0, true, ZoneId.systemDefault()));
    }

when roundUpProperty is set to true, parsing fails.

Steps to reproduce:

PUT my_index
{
  "mappings": {
      "properties": {
        "date": {
          "type": "date",
          "ignore_malformed" : false,
          "format" : "MM/dd/yyyy hh:mm a z"
        }
      }
    }
}

PUT my_index/_doc/1
{
  "date": "04/02/2020 07:48 AM CST"
}

GET my_index/_search
{
  "query": {
		"range": {
			"date": {
				"gte": "04/01/2020 07:48 AM CST",
				"lte": "04/20/2020 05:48 PM CST"
			}
		}
  }
}

might be related to #46654 cc @pgomulka

@talevy talevy added >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team labels Jul 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@pgomulka pgomulka self-assigned this Jul 3, 2020
@pgomulka
Copy link
Contributor

pgomulka commented Jul 3, 2020

looks like indeed we have a bug. this affects all 7.x versions. I think #46654 is not related - the problem can be reproduced in 7.4 without that change.
It is caused because of how we default values in https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java#L52

@pgomulka
Copy link
Contributor

pgomulka commented Jul 3, 2020

I think the fix might involve appending custom parseDefaulting-style parser which will treat hour of day and hour_of_ampm same way.
will work on it

@pgomulka
Copy link
Contributor

pgomulka commented Aug 19, 2020

I don't think we can fix this easily. Overriding DefaultValueParser is not possible as these classes are package scope for java.time.

We should probably consider document what fields are possible to be rounded. Possibly some of our predefined formats are also incorrectly rounded.

After parsing when resolving fields hour related fields are resolved to HOUR_OF_DAY
however parseDefaulting method which we use for rounding is used before resolving. That leads to a date with two conflicting fields HOUR_OF_DAY to be 23 and HOUR_OF_AMPM being 05.

My expectation was that parseDefaulting is used after fields are resolved - HOUR_OF_AM_PM resolves to HOUR_OF_DAY after all.
I thought this is a JDK bug and raised a report, but I was advised that this is up to user to avoid conflict.
The problem is that we can't do this because we don't know what pattern will be provided by a user.
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8250514

@pgomulka
Copy link
Contributor

pgomulka commented Aug 31, 2020

AM/PM use case might not common, but there are more. Some of them we define as part of our build-in formats https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html
For instance week_date_time which expects YYYY-'W'ww-e'T'HH:mm:ss.SSSZZ
should be able to parse 2020-W02-1T01:01:01.000Z this should resolve to 2020-01-06T01:01:01.000Z but has a conflict because we default DayOfMonth to 01
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java#L47
There is more weekbased related formats like week_date (format with week year, weak and day of week). I don't think these are commonly used either.

We discussed this in the past and we wanted to limit the number of build-in formats, but a user can always define his own format in a similar fashion.

The pattern to reproduce is to define a format that can resolve to a full date with year, month, day, hour, minute, second, nanoseconds but with the use of other fields (am/pm, week of year, day of week, nano of year, day of year etc) This forces parseDefaulting logic to apply our defaults and results in a conflicts.

good news is that weekyear_week - often used when rolling indices - works. For date 2020-W08 we try to apply default MonthOfYear = 1 but this does not result in a conflict. We then return correct date 2020-02-17 ( logic in DateFormatters.localDateFromWeekBasedDate)

cc @colings86

@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@pgomulka
Copy link
Contributor

this is still an issue

@pgomulka pgomulka added high hanging fruit and removed needs:triage Requires assignment of a team area label labels Dec 10, 2020
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Aug 29, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes elastic#89096
closes elastic#58986
pgomulka added a commit that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes #89096
closes #58986
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes elastic#89096
closes elastic#58986
pgomulka added a commit that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfMonth, MonthOfYear, ClockHourOfAMPM and HourOfAmPM

closes #89096
closes #58986
backports #89693
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes elastic#89096
closes elastic#58986
pgomulka added a commit that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes #89096
closes #58986
backports #89693
thecoop pushed a commit to thecoop/elasticsearch that referenced this issue Sep 9, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes elastic#89096
closes elastic#58986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label high hanging fruit Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants