Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added epoch date formats to configure parsing of unix dates #11453

Merged

Conversation

@spinscale
Copy link
Member

commented Jun 2, 2015

This PR adds the support the new date formats, namely epoch_second and epoch_second_millis. By adding those, we can remove the internal logic to try and parse everything as a unix timestamp first and only then use our regular date format.

This PR tries to remain backwards compatible by using epoch_second_millis||dateOptionalTime where before only dateOptionalTime was used in combination with the parsing.

Some BWC occured, ie, a date like 10000 is now always a year without any other configuration instead of a unix timestamp.

Also, in the current implementation, the RangeQueryParser allows to configure a timezone for queries using from/to unix timestamps, even though the timezone is ignored in the query, as a timestamp is always UTC.

Closes #5328
Relates #10971


|`epoch_second`|A formatter for the number of seconds since the epoch.

|`epoch_second_millis`|A formatter for the number of milliseconds since

This comment has been minimized.

Copy link
@colings86

colings86 Jun 2, 2015

Member

should this not be epoch_millis?

This comment has been minimized.

Copy link
@clintongormley

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

changed

@colings86

This comment has been minimized.

Copy link
Member

commented Jun 2, 2015

Look good, left a small naming comment. Also, worth checking how this fits into @rjernst 's mapping changes (I think he has a PR outstanding on that)

}
return Long.toString(dateTimeFormatter.parser().parseMillis(timestampAsString));
} catch (RuntimeException e1) {
throw new TimestampParsingException(timestampAsString);

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

I realize this was there before, but can we not lose the original exception?

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

true, added

@@ -133,6 +134,10 @@ public static FormatDateTimeFormatter forPattern(String input, Locale locale) {
formatter = ISODateTimeFormat.yearMonth();
} else if ("yearMonthDay".equals(input) || "year_month_day".equals(input)) {
formatter = ISODateTimeFormat.yearMonthDay();
} else if ("epochSecond".equals(input) || "epoch_second".equals(input)) {

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

Can skip adding camelCase versions?

This comment has been minimized.

Copy link
@colings86

colings86 Jun 3, 2015

Member

we should convert this to use ParseField since that will deal with deprecated names etc. in the future

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

Why would we add a new setting that supports camelCase? The idea is to get rid of this weird dual behavior with settings. Why allow users to start using something they will just have to change later?

This comment has been minimized.

Copy link
@colings86

colings86 Jun 3, 2015

Member

In 2.0 ParseField no longer supports camelCase IIRC. The advantage with using ParseField is that it standardises the checks that we do to make it consistent across the codebase (or will once all the Parsers have been changed to use it). It also makes deprecating an old option name in favour of a new one, much easier as we don't end up with these messy if statements with multiple ||'s. And also means its much easier to correlate the builder classes with the Parser since they can both reference a ParseField constant and the builder will alway use the preferred name for the option.

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

skipped for now

public static class EpochTimeParser implements DateTimeParser {

private static final Pattern milliSecondPrecisionPattern = Pattern.compile("^\\d{1,13}$");
private static final Pattern secondPrecisionPattern = Pattern.compile("^\\d{1,10}$");

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

SECOND_PRECISION_PATTERN since this is a constant?

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

fixed


@Override
public int parseInto(DateTimeParserBucket bucket, String text, int position) {
if (text.length() > estimateParsedLength() || !pattern.matcher(text).matches()) {

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

Can we use == false?

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

fixed

@@ -123,11 +123,6 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars
FieldMapper mapper = parseContext.fieldMapper(fieldName);
if (mapper != null) {
if (mapper instanceof DateFieldMapper) {
if ((from instanceof Number || to instanceof Number) && timeZone != null) {

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

Shouldn't we still have this? It doesn't really make sense to set seconds or milliseconds since epoch in anything but UTC?

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

the problem here is, that a date like 2015121212 supplied as a number could be a date in the format yyyyDDMMHH which cannot be distinguished, so this check would trigger even though the number is valid and going to be parsed with a date that correctly owns a timezone. Not sure, when/where to postpone this check, need to think about it.

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

I actually found a solution to this in my latest commit and do check this in the parser now

@@ -177,4 +179,46 @@ public void localDependentDateTests() throws Exception {
assertHitCount(countResponse, 20l);
}
}

@Test
public void testThatNonEpochDatesCanBeSearch() throws Exception {

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

Does this really need to be an integration test? Could it be in SimpleDateMappingTests?

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

this test actually uncovered the above issue with the QueryRangeParser and numeric dates like 2015121212, so I would prefer to leave it as it is

@rjernst

This comment has been minimized.

Copy link
Member

commented Jun 3, 2015

This is great! I left a couple comments. I also would prefer to just use epoch_millis.

@spinscale spinscale force-pushed the spinscale:1506-add-epoch-to-named-date-formats branch from d04ff1d to 2d55d8c Jun 3, 2015
@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2015

updated the PR/replied to all of your comments. thx for the hints!

@rjernst

This comment has been minimized.

Copy link
Member

commented Jun 3, 2015

LGTM

@@ -133,6 +134,10 @@ public static FormatDateTimeFormatter forPattern(String input, Locale locale) {
formatter = ISODateTimeFormat.yearMonth();
} else if ("yearMonthDay".equals(input) || "year_month_day".equals(input)) {
formatter = ISODateTimeFormat.yearMonthDay();
} else if ("epoch_second".equals(input)) {

This comment has been minimized.

Copy link
@colings86

colings86 Jun 3, 2015

Member

I think this is confusing if we support camelCase in some of the options in this parser and not others (even if they are new). We should either support camelCase for all options or for none to be consistent.

This comment has been minimized.

Copy link
@spinscale

spinscale Jun 3, 2015

Author Member

do we plan to resolve this before 2.0? If so, I am fine leaving it as it is...

This comment has been minimized.

Copy link
@colings86

colings86 Jun 3, 2015

Member

I don't think it will be resolved (rejecting camelCase) as its a huge change across 100s of files (because most parser use this style and not ParseField) and I don't see us doing that change quickly

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

I don't think it matters. We should not force making huge changes to the entire codebase in order to not add things which will just be deprecated and/or confusing to the user.

This comment has been minimized.

Copy link
@colings86

colings86 Jun 3, 2015

Member

But I think this is confusing. If I have a format specified as yearMonthDay that works then I would expect to be able to change it to epochSecond and it would work. Supporting some values in camelCase for date formats and not other values is very confusing to a user. I'm all for removing camelCase but we should be consistent with it, especially when its different values for the same setting (in the case different values of format for date fields).

This comment has been minimized.

Copy link
@colings86

colings86 Jun 3, 2015

Member

I would also be fine with removing all the camelCase options for all formats in this PR to make it consistent.

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

We already do not support camelCase for all settings, and I don't think there is any consistency even within the same query/field type/whatever.

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

I would also be fine with removing all the camelCase options for all formats in this PR to make it consistent.

This is the kind of statement that stalls progress. Requiring huge changes just to make a small improvement should not be necessary.

This comment has been minimized.

Copy link
@colings86

colings86 Jun 3, 2015

Member

I am only talking about the date formats here, not across the whole codebase (i can see the above statement might have been a bit ambiguous on that). All the multi-word date format values above support both a camelCase and an underscored version. That should be consistent, whether that means supporting both for now or only supporting the underscored version I don't have a strong opinion but its hardly a huge change to update the date format values to be consistent and its not a huge overhead to maintain an extra 2 camelCase options given that any change to that policy would require a change to all the other date formats too

This comment has been minimized.

Copy link
@rjernst

rjernst Jun 3, 2015

Member

I just realized we aren't even talking about setting names, but the valid values for the format setting. This argument to use ParseValue does not make sense. We don't support camelCase in eg the index option. We should not do it here, it will just add more work for users if we allow them to start using a new value that will just go away in the future (and will require them to change the value to what they would have found in the first place if they had tried using camelCase and seen an error).

@spinscale spinscale force-pushed the spinscale:1506-add-epoch-to-named-date-formats branch from 2d55d8c to 9ddf7d8 Jun 3, 2015
This commit changes the date handling. First and foremost Elasticsearch
does not try to convert every date to a unix timestamp first and then
uses the configured date. This now allows for dates like `2015121212` to
be parsed correctly.

Instead it is now explicit by adding a `epoch_second` and `epoch_millis`
date format. This also means, that the default date format now is
`epoch_millis||dateOptionalTime` to remain backwards compatible.

Closes #5328
Relates #10971
@spinscale spinscale force-pushed the spinscale:1506-add-epoch-to-named-date-formats branch from 9ddf7d8 to 01e8eaf Jun 3, 2015
@spinscale spinscale merged commit 01e8eaf into elastic:master Jun 3, 2015
1 check passed
1 check passed
CLA Commit author is a member of Elasticsearch
Details
@kevinkluge kevinkluge removed the review label Jun 3, 2015
@clintongormley clintongormley changed the title Dates: Added epoch date formats to configure parsing of unix dates Added epoch date formats to configure parsing of unix dates Jun 6, 2015

private static final Pattern MILLI_SECOND_PRECISION_PATTERN = Pattern.compile("^\\d{1,13}$");
private static final Pattern SECOND_PRECISION_PATTERN = Pattern.compile("^\\d{1,10}$");

This comment has been minimized.

Copy link
@tj

tj Aug 22, 2016

I think this might be busted by-design, JSON numerical values only have a float type, so "1.470417092e+09" (produced by Go's json encoder) will just result in IllegalArgumentException[Invalid format: "1.470417092e+09"];. Any recommended way around this?

This comment has been minimized.

Copy link
@rjernst

rjernst Aug 22, 2016

Member

Yes, pass the value as a json string.

This comment has been minimized.

Copy link
@tj

tj Aug 22, 2016

Ah yeah I suppose that might be ok, in this case it's user-defined input so that's pretty awkward but it beats breaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.