Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Fix issue with timezone when paginating #52101

Merged
merged 5 commits into from
Feb 11, 2020
Merged

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Feb 8, 2020

Previously, when the specified (or default) fetchSize led to
subsequent HTTP requests and the usage of cursors, those subsequent
were no longer using the client timezone specified in the initial
SQL query. As a consequence, Even though the query is executed once
(with the correct timezone) the processing of the query results by
the HitExtractors in the next pages was done using the default
timezone Z. This could lead to incorrect results.

Fix the issue by correctly using the initially specified timezone,
which is found in the deserialisation of the cursor string.

Fixes: #51258

Previously, when the specified (or default) fetchSize led to
subsequent HTTP requests and the usage of cursors, those subsequent
were no longer using the client timezone specified in the initial
SQL query. As a consequence, Even though the query is executed once
(with the correct timezone) the processing of the query results by
the HitExtractors in the next pages was done using the default
timezone `Z`. This could lead to incorrect results.

Fix the issue by correctly using the initially specified timezone,
which is found in the deserialisation of the cursor string.

Fixes: elastic#51258
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/SQL)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 203 to 207
Map<String, Object> expected = new HashMap<>();
if (i == 0) {
expected.put("columns", singletonList(
columnInfo(mode, "tz", "integer", JDBCType.INTEGER, 11)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you move this part in the if (i==0) {} else {} above?

Properties connectionProperties = connectionProperties();
connectionProperties.put(JDBC_TIMEZONE, zoneId.toString());
try (Connection c = esJdbc(connectionProperties);
Statement s = c.createStatement()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this line on the above one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A(n appropriate) formatter should help in these style-adjusting cases.

"{\"query\":\"SELECT DATE_PART('TZOFFSET', date) AS tz FROM test_date_timezone ORDER BY date\","
+ "\"time_zone\":\"" + zoneId.getId() + "\", "
+ "\"mode\":\"" + mode + "\", "
+ "\"fetch_size\":2}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's perfectly fine as is, but was curious about the reason - if any - for choosing a fetch_size of 2, vs. 1, which would simplify the test just a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetch size 2 and odd number of rows ('5') makes the last page to return 1 row.

Copy link
Contributor

@bpintea bpintea Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there'll be pages with 1 and 2 rows as the j-based for makes it obvious.
But I was only curious to understand why is that desired in respect to what the test checks (i.e. all rows have the same timezone offset). Sorry if missing anything obvious. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to do with the bug fixed, just another safety testing that the fetch size behavior is correct.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -80,13 +83,14 @@ public static void operation(PlanExecutor planExecutor, SqlQueryRequest request,
planExecutor.sql(cfg, request.query(), request.params(),
wrap(p -> listener.onResponse(createResponseWithSchema(request, p)), listener::onFailure));
} else {
planExecutor.nextPage(cfg, Cursors.decodeFromString(request.cursor()),
wrap(p -> listener.onResponse(createResponse(request, null, p)),
Tuple<Cursor, ZoneId> decoded = Cursors.decodeFromStringWithZone(request.cursor());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Cursors.decodeFromString used anywhere else ? It might be that decoding a tuple of timezone or adding it as a property in the Cursor class should be the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to do that in an upcoming PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ad discussed, will not pass the zoneId into the Cursor, as zoneId is the responsibility of SqlInput/OutputStreams to handle. Instead, I remove the decodeFromString and only use the decodeFromStringWithZone, so it's obvious for consumers that the zoneId is always decoded as well and returned as part of the returned Tuple.

@@ -68,7 +71,7 @@ protected void doExecute(Task task, SqlQueryRequest request, ActionListener<SqlQ
/**
* Actual implementation of the action. Statically available to support embedded mode.
*/
public static void operation(PlanExecutor planExecutor, SqlQueryRequest request, ActionListener<SqlQueryResponse> listener,
private static void operation(PlanExecutor planExecutor, SqlQueryRequest request, ActionListener<SqlQueryResponse> listener,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why make it private ? I think this will make things a bit harded in debug project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't think of that, Thx, I'll revert

@matriv matriv merged commit 8f7afbd into elastic:master Feb 11, 2020
@matriv matriv deleted the fix-51258 branch February 11, 2020 13:59
matriv added a commit that referenced this pull request Feb 11, 2020
Previously, when the specified (or default) fetchSize led to
subsequent HTTP requests and the usage of cursors, those subsequent
were no longer using the client timezone specified in the initial
SQL query. As a consequence, Even though the query is executed once
(with the correct timezone) the processing of the query results by
the HitExtractors in the next pages was done using the default
timezone Z. This could lead to incorrect results.

Fix the issue by correctly using the initially specified timezone,
which is found in the deserialisation of the cursor string.

Fixes: #51258
(cherry picked from commit 8f7afbd)
matriv added a commit that referenced this pull request Feb 11, 2020
Previously, when the specified (or default) fetchSize led to
subsequent HTTP requests and the usage of cursors, those subsequent
were no longer using the client timezone specified in the initial
SQL query. As a consequence, Even though the query is executed once
(with the correct timezone) the processing of the query results by
the HitExtractors in the next pages was done using the default
timezone Z. This could lead to incorrect results.

Fix the issue by correctly using the initially specified timezone,
which is found in the deserialisation of the cursor string.

Fixes: #51258
(cherry picked from commit 8f7afbd)
matriv added a commit that referenced this pull request Feb 11, 2020
Cherry pick and adapt tests to validate correct behaviour
regarding processing of results that involve the use of the client
configured timezone by the HitExtractors when paginating over
the results of the query (use of cursors).

(cherry picked from commit 8f7afbd)
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: GROUP BY with timezone and small page size returns duplicate values
6 participants