New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added epoch date formats to configure parsing of unix dates #11453
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,7 @@ | |
import org.joda.time.format.*; | ||
|
||
import java.util.Locale; | ||
import java.util.regex.Pattern; | ||
|
||
/** | ||
* | ||
|
@@ -133,6 +134,10 @@ public static FormatDateTimeFormatter forPattern(String input, Locale locale) { | |
formatter = ISODateTimeFormat.yearMonth(); | ||
} else if ("yearMonthDay".equals(input) || "year_month_day".equals(input)) { | ||
formatter = ISODateTimeFormat.yearMonthDay(); | ||
} else if ("epoch_second".equals(input)) { | ||
formatter = new DateTimeFormatterBuilder().append(new EpochTimeParser(false)).toFormatter(); | ||
} else if ("epoch_millis".equals(input)) { | ||
formatter = new DateTimeFormatterBuilder().append(new EpochTimeParser(true)).toFormatter(); | ||
} else if (Strings.hasLength(input) && input.contains("||")) { | ||
String[] formats = Strings.delimitedListToStringArray(input, "||"); | ||
DateTimeParser[] parsers = new DateTimeParser[formats.length]; | ||
|
@@ -192,4 +197,50 @@ public DateTimeField getField(Chronology chronology) { | |
return new OffsetDateTimeField(new DividedDateTimeField(new OffsetDateTimeField(chronology.monthOfYear(), -1), QuarterOfYear, 3), 1); | ||
} | ||
}; | ||
|
||
public static class EpochTimeParser implements DateTimeParser { | ||
|
||
private static final Pattern MILLI_SECOND_PRECISION_PATTERN = Pattern.compile("^\\d{1,13}$"); | ||
private static final Pattern SECOND_PRECISION_PATTERN = Pattern.compile("^\\d{1,10}$"); | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this might be busted by-design, JSON numerical values only have a float type, so "1.470417092e+09" (produced by Go's json encoder) will just result in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, pass the value as a json string. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yeah I suppose that might be ok, in this case it's user-defined input so that's pretty awkward but it beats breaking. |
||
private final boolean hasMilliSecondPrecision; | ||
private final Pattern pattern; | ||
|
||
public EpochTimeParser(boolean hasMilliSecondPrecision) { | ||
this.hasMilliSecondPrecision = hasMilliSecondPrecision; | ||
this.pattern = hasMilliSecondPrecision ? MILLI_SECOND_PRECISION_PATTERN : SECOND_PRECISION_PATTERN; | ||
} | ||
|
||
@Override | ||
public int estimateParsedLength() { | ||
return hasMilliSecondPrecision ? 13 : 10; | ||
} | ||
|
||
@Override | ||
public int parseInto(DateTimeParserBucket bucket, String text, int position) { | ||
if (text.length() > estimateParsedLength() || | ||
// timestamps have to have UTC timezone | ||
bucket.getZone() != DateTimeZone.UTC || | ||
pattern.matcher(text).matches() == false) { | ||
return -1; | ||
} | ||
|
||
int factor = hasMilliSecondPrecision ? 1 : 1000; | ||
try { | ||
long millis = Long.valueOf(text) * factor; | ||
DateTime dt = new DateTime(millis, DateTimeZone.UTC); | ||
bucket.saveField(DateTimeFieldType.year(), dt.getYear()); | ||
bucket.saveField(DateTimeFieldType.monthOfYear(), dt.getMonthOfYear()); | ||
bucket.saveField(DateTimeFieldType.dayOfMonth(), dt.getDayOfMonth()); | ||
bucket.saveField(DateTimeFieldType.hourOfDay(), dt.getHourOfDay()); | ||
bucket.saveField(DateTimeFieldType.minuteOfHour(), dt.getMinuteOfHour()); | ||
bucket.saveField(DateTimeFieldType.secondOfMinute(), dt.getSecondOfMinute()); | ||
bucket.saveField(DateTimeFieldType.millisOfSecond(), dt.getMillisOfSecond()); | ||
bucket.setZone(DateTimeZone.UTC); | ||
} catch (Exception e) { | ||
return -1; | ||
} | ||
return text.length(); | ||
} | ||
}; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -102,7 +102,7 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars | |
} else if ("time_zone".equals(currentFieldName) || "timeZone".equals(currentFieldName)) { | ||
timeZone = DateTimeZone.forID(parser.text()); | ||
} else if ("format".equals(currentFieldName)) { | ||
forcedDateParser = new DateMathParser(Joda.forPattern(parser.text()), DateFieldMapper.Defaults.TIME_UNIT); | ||
forcedDateParser = new DateMathParser(Joda.forPattern(parser.text())); | ||
} else { | ||
throw new QueryParsingException(parseContext, "[range] query does not support [" + currentFieldName + "]"); | ||
} | ||
|
@@ -123,11 +123,6 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars | |
FieldMapper mapper = parseContext.fieldMapper(fieldName); | ||
if (mapper != null) { | ||
if (mapper instanceof DateFieldMapper) { | ||
if ((from instanceof Number || to instanceof Number) && timeZone != null) { | ||
throw new QueryParsingException(parseContext, | ||
"[range] time_zone when using ms since epoch format as it's UTC based can not be applied to [" + fieldName | ||
+ "]"); | ||
} | ||
query = ((DateFieldMapper) mapper).fieldType().rangeQuery(from, to, includeLower, includeUpper, timeZone, forcedDateParser, parseContext); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we still have this? It doesn't really make sense to set seconds or milliseconds since epoch in anything but UTC? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the problem here is, that a date like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually found a solution to this in my latest commit and do check this in the parser now |
||
} else { | ||
if (timeZone != null) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is confusing if we support camelCase in some of the options in this parser and not others (even if they are new). We should either support camelCase for all options or for none to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we plan to resolve this before 2.0? If so, I am fine leaving it as it is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will be resolved (rejecting camelCase) as its a huge change across 100s of files (because most parser use this style and not ParseField) and I don't see us doing that change quickly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters. We should not force making huge changes to the entire codebase in order to not add things which will just be deprecated and/or confusing to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think this is confusing. If I have a format specified as
yearMonthDay
that works then I would expect to be able to change it toepochSecond
and it would work. Supporting some values in camelCase for date formats and not other values is very confusing to a user. I'm all for removing camelCase but we should be consistent with it, especially when its different values for the same setting (in the case different values offormat
for date fields).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be fine with removing all the camelCase options for all formats in this PR to make it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already do not support camelCase for all settings, and I don't think there is any consistency even within the same query/field type/whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the kind of statement that stalls progress. Requiring huge changes just to make a small improvement should not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only talking about the date formats here, not across the whole codebase (i can see the above statement might have been a bit ambiguous on that). All the multi-word date format values above support both a camelCase and an underscored version. That should be consistent, whether that means supporting both for now or only supporting the underscored version I don't have a strong opinion but its hardly a huge change to update the date format values to be consistent and its not a huge overhead to maintain an extra 2 camelCase options given that any change to that policy would require a change to all the other date formats too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we aren't even talking about setting names, but the valid values for the
format
setting. This argument to use ParseValue does not make sense. We don't support camelCase in eg theindex
option. We should not do it here, it will just add more work for users if we allow them to start using a new value that will just go away in the future (and will require them to change the value to what they would have found in the first place if they had tried using camelCase and seen an error).