Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sink): switch remote sink chrono types to java.time.* #14413

Merged
merged 10 commits into from
Jan 15, 2024

Conversation

xuefengze
Copy link
Contributor

@xuefengze xuefengze commented Jan 8, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

resolve #13684 and fix #14198

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

cassandra-sink:
Before:
timestamp is allowed to sink into timestamp type in cassandra.

After:
timestamp cannot be sinked and has to be converted to timestamptz (via AT TIME ZONE) inside RisingWave before sinking out.

@xuefengze xuefengze marked this pull request as ready for review January 9, 2024 02:25
@wenym1 wenym1 requested review from StrikeW and xxhZs January 9, 2024 08:40
src/jni_core/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Also spotted some other references to the legacy classes:

  • connector-node/risingwave-connector-service/src/main/java/com/risingwave/connector/JsonDeserializer.java
  • connector-node/risingwave-sink-mock-flink/risingwave-sink-mock-flink-runtime/src/main/java/com/risingwave/mock/flink/runtime/RowDataImpl.java
  • connector-node/risingwave-sink-deltalake/src/main/java/com/risingwave/connector/DeltaLakeSink.java
  • connector-node/risingwave-source-cdc/src/main/java/com/risingwave/connector/cdc/debezium/converters/DatetimeTypeConverter.java

After cleaning these up, is it possible to add a static analysis rule to disallow usage of them? We have this in rust:

disallowed-methods = [
{ path = "std::iter::Iterator::zip", reason = "Please use `zip_eq_fast` if it's available. Otherwise use `zip_eq_debug`" },

src/jni_core/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 145 to 146
case TIMESTAMP:
return ((LocalDateTime) value).toInstant(ZoneOffset.UTC);
Copy link
Contributor

@xiangjinwu xiangjinwu Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://cassandra.apache.org/doc/stable/cassandra/cql/types.html#timestamps

Similar to ClickHouse, there is no naive datetime type in Cassandra. And when interpreting one without timezone:

The time zone may be omitted if desired ('2011-02-03 04:05:00'), and if so, the date will be interpreted as being in the time zone under which the coordinating Cassandra node is configured. There are however difficulties inherent in relying on the time zone configuration being as expected, so it is recommended that the time zone always be specified for timestamps when feasible.

To be fully correct, we either need to use the Cassandra coordinator node time zone (rather than hardcoding to UTC here), or reject sinking and require user to AT TIME ZONE 'xxx' to get a timestamptz before sink.

Copy link
Contributor

@StrikeW StrikeW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding e2e test to cover the case of #14198 in e2e_test/sink/remote/jdbc.load.slt?

@xuefengze
Copy link
Contributor Author

xuefengze commented Jan 11, 2024

mysql-sink: When insert 2038-01-19 03:14:07.499999Z to mysql timestamp type, will get error:

message Data truncation: Incorrect datetime value: '2038-01-19 11:14:07.499999' (my local timezone +8)

It appears that mysql jdbc will convert it to the default timezone? Maybe we can cast offsetdatetime to localdatetime before write to solve this issue?
cc @xiangjinwu

https://github.com/mysql/mysql-connector-j/blob/release/8.x/src/main/protocol-impl/java/com/mysql/cj/protocol/a/OffsetDateTimeValueEncoder.java#L72-L83

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Jan 11, 2024

my local timezone +8

It is better to set the session/connection timezone to UTC when interacting with MySQL (and other systems as well). Local times are problematic and shall be avoided whenever possible.

https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-known-issues-limitations.html

When Connector/J retrieves timestamps for a daylight saving time (DST) switch day using the getTimeStamp() method on the result set, some of the returned values might be wrong. In order to avoid such errors, we recommend setting a connection time zone that uses a monotonic clock by, for example, setting connectionTimeZone=UTC, and configuring other date-time connection properties according to your needs; see Section 6.6, “Handling of Date-Time Values” for details.

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM! Thanks for the PR

@@ -3,17 +3,17 @@
query I
select * from t_remote_0 order by id;
----
1 Alex 28208 281620391 4986480304337356659 28162.0391 2.03 28162.0391 2023-03-20 10:18:30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to change from a normal datetime to a strange one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding e2e test to cover the case of #14198 in e2e_test/sink/remote/jdbc.load.slt?

cover the case of #14198.

Comment on lines +149 to +152
private static LocalDate castDate(Object value) {
try {
Long days = castLong(value) - 1;
return Date.valueOf(LocalDate.of(1, 1, 1).plusDays(days));
Long days = castLong(value);
return LocalDate.ofEpochDay(days);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • old: 1 -> 0001-01-01
  • new: 1 -> 1970-01-02

Look like related to #13486. Was the fix incomplete until this change also merged?

Copy link
Contributor Author

@xuefengze xuefengze Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the fix incomplete until this change also merged?

No.
It seems that there are no remote sinks using JsonDeserializer currently? So I change the value to the number of days since epoch.

Comment on lines -92 to +94
values = ((Timestamp) row.get(i)).getTime();
} else if (row.get(i) instanceof java.sql.Date) {
if (row.get(i) instanceof LocalDateTime) {
values =
ChronoUnit.DAYS.between(
LocalDate.ofEpochDay(0),
((java.sql.Date) row.get(i)).toLocalDate());
((LocalDateTime) row.get(i))
.toInstant(ZoneOffset.UTC)
.toEpochMilli();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @xxhZs regarding #13420 #13677 I am not sure whether we need to test for all 4 LocalDate/LocalTime/LocalDateTime/OffsetDateTime out of Object ...

https://github.com/delta-io/delta/blob/master/PROTOCOL.md#timestamp-without-timezone-timestampntz
Also note that:

  • RIsingWave timestamp is DeltaLake TimestampNtz (isAdjustedToUTC=false)
  • RisingWave timestamptz is DeltaLake Timestamp (isAdjustedToUTC=true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>
@xiangjinwu
Copy link
Contributor

xiangjinwu commented Jan 12, 2024

After cleaning these up, is it possible to add a static analysis rule to disallow usage of them?

I figured this out: (Still possible to use Date/Timestamp from java.sql.* until we stop suppressing AvoidStarImport.)

diff --git a/java/tools/maven/checkstyle.xml b/java/tools/maven/checkstyle.xml
index 4bd0d510e0..f681633272 100644
--- a/java/tools/maven/checkstyle.xml
+++ b/java/tools/maven/checkstyle.xml
@@ -191,10 +191,24 @@ This file is based on the checkstyle file of Apache Beam.
             <property name="illegalPkgs" value="io.netty"/>
             <message key="import.illegal" value="{0}; Use flink-shaded-netty instead."/>
         </module>
+        <!--
         <module name="IllegalImport">
             <property name="illegalPkgs" value="com.google.common"/>
             <message key="import.illegal" value="{0}; Use flink-shaded-guava instead."/>
         </module>
+        -->
+        <module name="IllegalImport">
+            <property name="illegalClasses" value="java.util.Date, java.sql.Date, java.sql.Timestamp, java.sql.Time"/>
+            <message key="import.illegal" value="{0}; Use java.time.* instead."/>
+        </module>
+        <module name="IllegalImport">
+            <property name="illegalClasses" value="java.text.DateFormat, java.text.SimpleDateFormat"/>
+            <message key="import.illegal" value="{0}; Use java.time.* instead."/>
+        </module>
+        <module name="IllegalImport">
+            <property name="illegalClasses" value="java.util.Calendar, java.util.GregorianCalendar"/>
+            <message key="import.illegal" value="{0}; Use java.time.* instead."/>
+        </module>
 
         <module name="RedundantModifier">
             <!-- Checks for redundant modifiers on various symbol definitions.
diff --git a/java/tools/maven/suppressions.xml b/java/tools/maven/suppressions.xml
index 5e19f59eae..11b4774359 100644
--- a/java/tools/maven/suppressions.xml
+++ b/java/tools/maven/suppressions.xml
@@ -30,5 +30,4 @@ under the License.
 
     <suppress files=".*" checks="javadoc"/>
     <suppress files=".*" checks="AvoidStarImport"/>
-    <suppress files=".*" checks="IllegalImport"/>
 </suppressions>
\ No newline at end of file

@xuefengze xuefengze added this pull request to the merge queue Jan 15, 2024
Merged via the queue into risingwavelabs:main with commit f6892e9 Jan 15, 2024
37 of 38 checks passed
@xuefengze xuefengze deleted the replace_java_time branch January 15, 2024 06:48
Little-Wallace pushed a commit that referenced this pull request Jan 20, 2024
Co-authored-by: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

postgres-sink data is incorrect for the date/time type Chrono types in remote sinks
4 participants