Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Make parsing of date more lenient #52137

Merged
merged 6 commits into from Feb 10, 2020
Merged

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Feb 10, 2020

Make the parsing of date more lenient

  • as an escaped literal: {d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}
  • cast a string to a date: CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)

Closes: #49379

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/SQL)

Make the parsing of date more lenient
 - as an escaped literal: `{d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}`
 - cast a string to a date: `CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)`

Closes: elastic#49379
@matriv
Copy link
Contributor Author

matriv commented Feb 10, 2020

I'm not sure about the escaped literal enhancement, since MS-SQL docs only mention the yyyy-mm-dd format: https://docs.microsoft.com/en-us/sql/connect/jdbc/using-sql-escape-sequences?view=sql-server-ver15#date-and-time-literals

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// double check back and forth conversion
assertEquals(date(1581292800000L), conversion.convert("2020-02-10 10:20"));
assertEquals(date(-125908819200000L), conversion.convert("-2020-02-10 10:20:30.123"));
assertEquals(date(1581292800000L), conversion.convert("2020-02-10 10:20:30.123456789"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sake of completeness, I'd suggest a test without the duration separator, but with a timezone (or a Z).

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some further small comments.

@@ -42,6 +42,30 @@
.appendLiteral('T')
.append(ISO_LOCAL_TIME)
.toFormatter().withZone(UTC);
private static final DateTimeFormatter DATE_ESCAPED_LITERAL_FORMATTER_WHITESPACE = new DateTimeFormatterBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name of the formatters could be more descriptive? The fact that time can be added, but it's optional, imo should be reflected in the formatter's name. Also, not to abuse the length of the names, how about DATE_OPTIONAL_TIME_FORMATTER_WHITESPACE? (ie get rid of ESCAPED_LITERAL keep the FORMATTER and mention the whitespace/'T') Same idea could be applied to the rest of the formatters: `DATE_OPTIONAL_OPTIONAL_TIME_FORMATTER_T, ISO_DATE_OPTIONAL_TIME_FORMATTER_WHITESPACE, ISO_DATE_OPTIONAL_TIME_FORMATTER_T.....

return LocalDate.parse(dateFormat, ISO_LOCAL_DATE).atStartOfDay(UTC);
int separatorIdx = dateFormat.indexOf('-');
if (separatorIdx == 0) { // negative year
separatorIdx = dateFormat.indexOf('-', separatorIdx + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since separatorIdx == 0 then the indexOf call should be indexOf('-', 1) right?

}
separatorIdx = dateFormat.indexOf('-', separatorIdx + 1) + 3;
// Avoid index out of bounds - it will lead to DateTimeParseException anyways
if (separatorIdx >= dateFormat.length() || dateFormat.charAt(separatorIdx) == 'T') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which situation is the > valid here? (I refer to separatorIdx >= dateFormat.length()) Isn't == enough to detect a date-only string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 lines above we do +3 to move directly after the day part in the string.
If the string passed is something like: 2020-12- then we have moved passed the string's length and dateFormat.charAt() will throw OutOfBounds exception.

I forgot to add a test for that, will do.

@matriv matriv merged commit 5863b27 into elastic:master Feb 10, 2020
@matriv matriv deleted the fix-49379 branch February 10, 2020 20:45
matriv added a commit that referenced this pull request Feb 10, 2020
Make the parsing of date more lenient

- as an escaped literal: `{d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}`
- cast a string to a date: `CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)`

Closes: #49379
(cherry picked from commit 5863b27)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: implicit CASTing between DATEs and DATETIMEs
6 participants