Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix querying TIMESTAMP WITH LOCAL TIME ZONE for oracle #44456

Merged
merged 20 commits into from
Jun 25, 2024

Conversation

calherries
Copy link
Contributor

@calherries calherries commented Jun 20, 2024

Fixes #42325

The solution is actually in QP driver code, not sync. The clue is in this comment, which shows the values don't get converted by the driver code correctly as they come out of the database.

Previously, for a query that selects a TIMESTAMP WITH LOCAL TIME ZONE value from an Oracle DB, the value would have type oracle.sql.TIMESTAMPLTZ in the results of the query processor. Now it has a type of java.sql.OffsetDateTime, as it would with postgres and mysql.

Fingerprinting uses the query processor to select values from the table. The :type/DateTime fingerprinter expects that :type/DateTime values correspond to a java.time.temporal/Temporal object (code).

Copy link

replay-io bot commented Jun 20, 2024

Status Complete ↗︎
Commit 3200ceb
Results
⚠️ 1 Flaky
2687 Passed

@calherries calherries added the backport Automatically create PR on current release branch on merge label Jun 20, 2024
@calherries calherries changed the title Fix querying of oracle timestamps with local time zones Fix querying TIMESTAMP WITH LOCAL TIME ZONE for oracle Jun 20, 2024
@calherries calherries requested a review from a team June 20, 2024 02:41
Copy link
Contributor

@lbrdnk lbrdnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding implementation of TIMESTAMPLTZ 👍. I'm not sure however about the zone offset to 0 conversion.

Comment on lines 617 to 615
(.offsetDateTimeValue t conn)))))
(t/offset-date-time (.timestampValue t (rs->conn rs)) (t/zone-offset 0)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the old implementation returned OffsetDateTime with the actual offset. Here is see conversion to 0. Why the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are loosing information on actual zone offset/zone as stored in the database here. But I believe that may not be a problem as we do format results in format-rows middleware.

Copy link
Contributor Author

@calherries calherries Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the old implementation returned OffsetDateTime with the actual offset

Yes.

These are good questions, and I'm looking for your input to confirm. The offset was untested for Oracle. I decided that read-column-thunk should return an OffsetDateTime in UTC because this is what all the other drivers do in metabase.query-processor-test.timezones-test/set-timezone-drivers, which includes Postgres and MySQL. I don't know why the offset matters, I just did it for consistency.

@camsaul can you confirm the desired behaviour here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this behavior seems pretty questionable. .offsetDateTimeValue() seems way saner, what's the reason for changing this?

According to

https://docs.oracle.com/en/database/oracle/oracle-database/19/nlspg/datetime-data-types-and-time-zone-support.html#GUID-3F1C388E-C651-43D5-ADBC-1A49E5C2CA05

TIMESTAMP WITH LOCAL TIME ZONE is another variant of TIMESTAMP. It differs from TIMESTAMP WITH TIME ZONE as follows: data stored in the database is normalized to the database time zone, and the time zone offset is not stored as part of the column data. When users retrieve the data, Oracle Database returns it in the users' local session time zone. The time zone offset is the difference (in hours and minutes) between local time and UTC (Coordinated Universal Time, formerly Greenwich Mean Time).

We set the session timezone to whatever the report-timezone setting is, which means TIMESTAMPLTZ values should come back in the report-timezone.

According to

https://docs.oracle.com/en/database/oracle/oracle-database/21/jajdb/oracle/sql/TIMESTAMPLTZ.html#timestampValue(java.sql.Connection)

.toTimestamp()

Calls toTimestamp to convert internal Oracle TIMESTAMPLTZ to a Java Timestamp.

JavaTimestamps can have an attached java.util.Calendar specifying the time zone, in practice this doesn't seem to be done consistently so we can't really be sure whether the timestamp is coming back in UTC or the session timezone and has time zone information or not

Copy link
Contributor Author

@calherries calherries Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for changing this?

I mentioned above that I made the change so that Oracle was consistent with other drivers' behaviour. That's the only reason: consistency. I'm happy that you're questioning it, and I'm hoping we can write a test that enforces the desired behaviour across all drivers. Oracle was the only driver failing this test before the change:

(deftest sql-datetime-timezone-handling-test
(mt/test-drivers (filter #(isa? driver/hierarchy % :sql) (conj (set-timezone-drivers) :oracle)) ; Oracle doesn't have a time type, so it's excluded from this test
(mt/dataset dt-attempted-murders
(doseq [timezone ["UTC" "US/Pacific" "US/Eastern" "Asia/Hong_Kong"]]
(mt/with-temporary-setting-values [report-timezone timezone]
(let [expected {:datetime_ltz (t/offset-date-time #t "2019-11-01T00:23:18.331-07:00" "UTC")
:datetime_tz (t/offset-date-time #t "2019-11-01T00:23:18.331-07:00" "UTC")}
actual (select-keys (dt-attempts) (keys expected))]
(is (= expected actual))))))))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, seems like we can use timestampValue to get around this issue for now. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok given that .offsetDateTimeValue() seems to be broken here I think maybe just calling stringValue and parsing it isn't the worst idea in the world.

This gives us the actual correct value and we're not at the whim of broken implementations of offsetDateTimeValue() or timestampValue() or whatever.

If the Oracle JDBC people can't get offsetDateTimeValue() right I don't think we should trust the other stuff either. String however is probably pretty hard to get wrong

(defn x []
  (mt/with-driver :oracle
    (sql-jdbc.execute/do-with-connection-with-options
     :oracle
     (mt/db)
     {:session-timezone "US/Pacific"}
     (fn [^com.mchange.v2.c3p0.C3P0ProxyConnection proxy-conn]
       (let [^oracle.jdbc.OracleConnection conn (.unwrap proxy-conn oracle.jdbc.OracleConnection)
             t (oracle.sql.TIMESTAMPLTZ. conn "2024-06-20 17:28:00 -07:00")]
         (u.date/parse (.stringValue t conn)))))))
;; => #t "2024-06-20T17:28-07:00[US/Pacific]"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a ZonedDateTime here is pretty nice so maybe we should actually do this for TIMESTAMPTZ as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to keep information if we have it

Copy link
Contributor Author

@calherries calherries Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool I'm on board with parsing strings. I chose to use .getNString instead of .stringValue because it means we can avoid unwrapping the connection, and I can't imagine why it would be less reliable.

(u.date/parse (.getNString rs i))))

(mt/test-drivers (filter #(isa? driver/hierarchy % :sql) (conj (set-timezone-drivers) :oracle)) ; Oracle doesn't have a time type, so it's excluded from this test
(mt/dataset dt-attempted-murders
(doseq [timezone ["UTC" "US/Pacific" "US/Eastern" "Asia/Hong_Kong"]]
(mt/with-temporary-setting-values [report-timezone timezone]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to set report timezone here if :format-rows? is set to false? My understanding is that report timezone is set as session timezone, results for ltz are returned in that, but zone offset is converted to 0 in read-column-thunk. Hence actual query results are unaffected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning: :format-rows? is set to false in sql-time-timezone-handling-test. So sql-time-timezone-handling-test is testing a property of the query processor before row formatting is applied. So we should do the same thing here with :datetime_ltz.

It's potentially flimsy reasoning and worth questioning, and I'm openly admitting I don't understand all the guarantees the QP is meant to make. They're not all enforced by tests currently.

But I believe that because sql-time-timezone-handling-test tests that values before row formatting have offsets of 0 independent of report-timezone, we should do the same in this test for consistency at least.

@darksciencebase darksciencebase requested a review from a team June 20, 2024 13:10
.clj-kondo/config.edn Outdated Show resolved Hide resolved
.clj-kondo/config.edn Outdated Show resolved Hide resolved
@calherries
Copy link
Contributor Author

@camsaul can you take another look, I've removed the test across all drivers because it was proving a bit hairy and I'd like to get the fix in first. Now the PR just contains a couple of tests to cover oracle specifically.

I'll continue with that in another PR: #44581

Comment on lines +491 to +495
do-with-temp-tz-format (fn [thunk]
;; verify that the value is independent of NLS_TIMESTAMP_TZ_FORMAT
(jdbc/execute! (spec) "ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT = 'YYYY-MM-DD HH:MI'")
(thunk)
(jdbc/execute! (spec) (format "ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT = '%s'" old-format)))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little concerned that parsing the timestamp as a string would be dependent on this setting, but it doesn't seem to be the case. I've added this to the test in case that changes in the future. Here are the docs for the setting.

Copy link
Member

@camsaul camsaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool everything looks good now

@calherries calherries merged commit a57718b into master Jun 25, 2024
110 checks passed
@calherries calherries deleted the fix-oracle-timestamp-with-local-zone branch June 25, 2024 02:16
github-automation-metabase added a commit that referenced this pull request Jun 25, 2024
…4655)

Co-authored-by: Cal Herries <39073188+calherries@users.noreply.github.com>
Co-authored-by: adam-james <21064735+adam-james-v@users.noreply.github.com>
@calherries
Copy link
Contributor Author

@metabase-bot backport release-x.49.x

metabase-bot bot added a commit that referenced this pull request Jun 25, 2024
…4706)

Co-authored-by: Cal Herries <39073188+calherries@users.noreply.github.com>
@calherries calherries added this to the 0.49.19 milestone Jun 26, 2024
@piranha piranha removed this from the 0.49.19 milestone Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Automatically create PR on current release branch on merge .Team/BackendComponents also known as BEC
Projects
None yet
4 participants