-
Notifications
You must be signed in to change notification settings - Fork 25
STRIDE-505 Fix handling of Oracle character fields #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This gets worse the more I read. So if I understand correctly: Oracle interprets varchar(n) as bytes or chars based on database/session configuration (https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Data-Types.html#GUID-0DC7FFAA-F03F-4448-8487-F2592496A510) and therefore recommend explicitly specifying which you want. Microsoft always treats them as bytes (https://docs.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql?view=sql-server-ver15#arguments). PostgreSQL always treats them as char (https://www.postgresql.org/docs/9.6/datatype-character.html). So likely still problems if SQL Server is the destination, but at least we will have deterministic behavior on Oracle, and closer to the standard. |
garricko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the ambiguity of length, but I guess it reflects the reality (either byte or char depending on flavor).
|
@garricko Thanks for uncovering the MS SQL Server info. (Side note: I love the MS line "A common misconception is to think that CHAR(n) and VARCHAR(n), the n defines the number of characters." Maybe it's a common misconception because that's what the SQL standard specifies?) That actually puts a different light on what we are seeing with the ETLs. Previously I thought that the SQL Server VARCHAR(50) field was holding 50 characters. Now I realize that it is holding 50 bytes in some unknown encoding that turns into 50 characters of Unicode-16 by the time JDBC delivers it to the client. |
|
The way I read those docs it seems things should be ok going from mssql to oracle. However many characters fit in 50 bytes in mssql should come back as <= 50 characters of Java string (UTF), and should fit into oracle varchar2(50 char). It seems it should also fit into oracle varchar(50 byte) unless there are characters in whatever codeset MS uses where the equivalent character in whatever codeset oracle uses has more bytes. Or unless the conversion into Java ends up non-normalized (in the Unicode sense). On second thought, re-reading what I just wrote, I guess I wouldn’t be shocked if some value didn’t fit. |
1 similar comment
|
The way I read those docs it seems things should be ok going from mssql to oracle. However many characters fit in 50 bytes in mssql should come back as <= 50 characters of Java string (UTF), and should fit into oracle varchar2(50 char). It seems it should also fit into oracle varchar(50 byte) unless there are characters in whatever codeset MS uses where the equivalent character in whatever codeset oracle uses has more bytes. Or unless the conversion into Java ends up non-normalized (in the Unicode sense). On second thought, re-reading what I just wrote, I guess I wouldn’t be shocked if some value didn’t fit. |
That is precisely the situation that prompted this. The ETL copying LPCH Clarity on MS SQL to Oracle keeps hitting cases where the data returned from SQL Server doesn't fit in an Oracle field of the same length. |
Oracle has a non-standard interpretation of character field lengths. The SQL standard and other implementations says that the length of a character field is in characters unless explicitly specified to be in bytes. Oracle does it the other way around (of course). This means that the library's
Schema::addTableFromRowmethod, if the source row comes from a non-Oracle database and the destination is an Oracle database, can produce a table with columns that are not wide enough (when multi-byte characters are involved).This change makes the Oracle flavor specify that character lengths are in characters.