Version: dabeb4b1 (master)
Description
Legacy Oracle databases using US7ASCII often store multi-byte characters (BIG5, GB2312) as raw bytes — a common practice known as "pass-through". OLR reads nls-character-set: US7ASCII from the schema and applies charset conversion, stripping the high bit from every byte >= 0x80, which destroys the original data.
A config option to skip charset conversion and emit raw bytes as-is would solve this.
Steps to reproduce
- Oracle XE 21c with
NLS_CHARACTERSET = US7ASCII
- Insert Big5-encoded Chinese characters as raw bytes:
CREATE TABLE TEST_MULTIBYTE (id NUMBER PRIMARY KEY, name VARCHAR2(200));
ALTER TABLE TEST_MULTIBYTE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
-- Big5: 台北 = A578 A55F
INSERT INTO TEST_MULTIBYTE VALUES (1, UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('A578A55F')));
COMMIT;
- Capture redo logs and run OLR in batch mode
Expected result
Raw bytes preserved in JSON output: "NAME" contains bytes A5 78 A5 5F.
Actual result
OLR strips the high bit (& 0x7F) from every byte >= 0x80:
Big5 input: A5 78 A5 5F (台北)
OLR output: 25 78 25 5F (%x%_)
{"after":{"ID":1,"NAME":"%x%_"}}
The original Big5 data is unrecoverable from OLR's output.
Version:
dabeb4b1(master)Description
Legacy Oracle databases using
US7ASCIIoften store multi-byte characters (BIG5, GB2312) as raw bytes — a common practice known as "pass-through". OLR readsnls-character-set: US7ASCIIfrom the schema and applies charset conversion, stripping the high bit from every byte >= 0x80, which destroys the original data.A config option to skip charset conversion and emit raw bytes as-is would solve this.
Steps to reproduce
NLS_CHARACTERSET = US7ASCIIExpected result
Raw bytes preserved in JSON output:
"NAME"contains bytesA5 78 A5 5F.Actual result
OLR strips the high bit (
& 0x7F) from every byte >= 0x80:{"after":{"ID":1,"NAME":"%x%_"}}The original Big5 data is unrecoverable from OLR's output.