Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Respect custom time zone from client in CAST #76638

Closed
wants to merge 7 commits into from

Conversation

Luegg
Copy link
Contributor

@Luegg Luegg commented Aug 18, 2021

Resolves #40692

Makes sure that CASTs use the time zone specified in the according REST/JDBC options to convert strings to datetime if no timezone is specified in the string itself. Other conversions to datetime (e.g. from long) still use UTC.

This change breaks some queries using conversions from string to datetime if the string does not include a timezone. E.g. the following query will retrieve various amount of records depending on the timezone specified in the connection: SELECT hire_date FROM test_emp WHERE hire_date >= '1997-05-19T01:00:00.000'::DATETIME

The following queries are not affected:

SELECT hire_date FROM test_emp WHERE hire_date >= '1997-05-19T01:00:00.000' // timezone is applied in the comparison
SELECT hire_date FROM test_emp WHERE hire_date >= '1997-05-19T01:00:00.000Z'::DATETIME // timezone specified in the string
SELECT hire_date FROM test_emp WHERE hire_date = '1997-05-19||/y' // timezone is applied in the comparison
SELECT hire_date FROM test_emp WHERE hire_date >= 123456789::DATETIME // still uses UTC

@Luegg Luegg force-pushed the fix/defaultTimezoneInCasts branch 4 times, most recently from da174cd to 1741835 Compare August 18, 2021 14:23
@Luegg Luegg marked this pull request as ready for review August 18, 2021 14:39
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Aug 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

* Converts `value` to the target type considering `zoneId` if needed.
* `zoneId` can be null and it is ignored for all non-timezone aware conversions.
*/
Object convert(Object value, ZoneId zoneId);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also investigated whether the zoneId could be stored in the converter but this would be a lot more invasive change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, adding a specific method for one specific data type doesn't feel ideal, but still better than instantiating the converter per session.

@Luegg Luegg force-pushed the fix/defaultTimezoneInCasts branch from 1741835 to 6f001ce Compare August 18, 2021 14:57
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lg, only left minor observations.

Comment on lines 60 to 62
* Parses the given string into a DateTime using `defaultZone` as a default timezone.
*/
public static ZonedDateTime asDateTime(String dateFormat) {
public static ZonedDateTime asDateTime(String dateFormat, ZoneId defaultZone) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep the "default" in defaultZone? This would be the user-specified TZ (or null/default), right?

@@ -1173,52 +1170,6 @@ protected LogicalPlan rule(OrderBy ob) {
}
}

private static class ImplicitCasting extends AnalyzerRule<LogicalPlan> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a commented instantiation of this rule (L124). Not sure why this was kept, but I think commented/dead code isn't typically kept around, so maybe remove that line too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I didn't thin about searching for the class. Thanks!

* Converts `value` to the target type considering `zoneId` if needed.
* `zoneId` can be null and it is ignored for all non-timezone aware conversions.
*/
Object convert(Object value, ZoneId zoneId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, adding a specific method for one specific data type doesn't feel ideal, but still better than instantiating the converter per session.

@@ -52,6 +54,8 @@

public class SqlDataTypeConverterTests extends ESTestCase {

private static final ZoneId MINUS_5 = ZoneId.of("-05:00");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
private static final ZoneId MINUS_5 = ZoneId.of("-05:00");
private static final ZoneId MINUS_5H = ZoneId.of("-05:00");

;

castStringWithoutTZToDatetimeInScript-TZ[Etc/GMT-1]
SELECT hire_date::string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supernit: the rest of the tests use capitalised ::STRING, but I was actually wondering if the conversion is [always] necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the capitalization.

But yes, as soon as you get timestamps in another timezone as UTC, the JDBC issue mentioned in #40779 surfaces.

@@ -43,9 +43,7 @@ public CsvSpecTestCase(String fileName, String groupName, String testName, Integ

@Override
protected final void doTest() throws Throwable {
// Run the time tests always in UTC
// TODO: https://github.com/elastic/elasticsearch/issues/40779
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't (yet) randomise the timezones for the QA tests ref'd in this comment (which I think is the case), maybe we should keep the comment. (BTW, the file ref'd in the issue contains a similar comment.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I got this wrong but when the comment was added, time.csv-spec has been special cased (see https://github.com/elastic/elasticsearch/pull/40776/files#diff-387c1ea94dd5d111032f384770a8244f7d4942ac1fd91a68b397614212d59f04R44). But the special case has been removed in the meantime and I think the comment is no longer accurate. Or the reality is now more complicated as described in https://github.com/elastic/elasticsearch/pull/76638/files#diff-aceee053cb68e1bb219cfe4256635c7f77d86b46eb943e5ad42bc7bb360ca945R125

@@ -33,6 +35,8 @@

public class DataTypeConversionTests extends ESTestCase {

public static final ZoneId MINUS_5 = ZoneId.of("-05:00");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: MINUS_5H.

Copy link
Contributor Author

@Luegg Luegg Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, that's nicer indeed 👍

return new ScriptTemplate(
formatTemplate(format("{sql}.", "cast({},{})", fieldAsScript.template())),

if (DataTypes.isDateTime(dataType)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what's better here: Only passing the timezone for datetime conversions results in a smaller diff of the changes and the resulting scripts are also not unnecessarily large. But it breaks a bit with consistency and might become a problem when more conversions start depending on the timezone.

Not sure whether anyone has a stronger opinion on this?

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but to me it's unclear what exactly the original bug report covers and what this PR is addressing.
For example, what happens with CASTs applied on fields. All examples and tests I see refer to date literals only. Also, any more complex queries (a group by on a CASTed string) - SELECT CAST(CAST(birth_date AS DATETIME) AS STRING) t FROM test_emp WHERE YEAR(birth_date)=2021 GROUP BY t - are not covered.

@@ -51,10 +52,16 @@ public static ZonedDateTime asDateTime(long millis) {
return ZonedDateTime.ofInstant(Instant.ofEpochMilli(millis), UTC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can simply call the new asDateTime method below.

}

Converter converter() {
return conversion;
}

ZoneId zoneId() { return zoneId; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, follow the style of the other methods: a multi-line method, rather than a single-liner.

@Luegg Luegg force-pushed the fix/defaultTimezoneInCasts branch from f348f35 to 1d03cb0 Compare August 25, 2021 08:26
@martijnvg martijnvg added v7.14.2 and removed v7.14.1 labels Aug 26, 2021
@Luegg Luegg marked this pull request as draft August 30, 2021 11:21
@Luegg
Copy link
Contributor Author

Luegg commented Sep 8, 2021

Note to myself: Consider reusing DATETIME_PARSE (https://www.elastic.co/guide/en/elasticsearch/reference/master/sql-functions-datetime.html#sql-functions-datetime-datetimeparse) to avoid bwc issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying >bug Team:QL (Deprecated) Meta label for query languages team v7.14.3 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: Custom timezone is not used in CAST/CONVERT
10 participants