Thin mode should support DB_NCHARSET 'UTF8' #16

damarvin · 2022-06-11T15:43:18Z

Connecting with encoding='UTF8'—as DB_NCHARSET is set so, and same with 'utf-8'—I get 'DPY-3012: national character set id 871 is not supported by python-oracledb in thin mode'.
I do not want a national character set, but good standard 'utf-8'.
What is the "Thin" mode for, when it does not support the basics, or do I misinterpret things? Is there a work-around or missing extra parameter?

So I propose the enhancement: Thin mode should support DB_NCHARSET 'UTF8'.

The text was updated successfully, but these errors were encountered:

anthony-tuininga · 2022-06-11T16:28:48Z

The national character set (871 - UTF8) is an older implementation of the current standard UTF-8. It is known today by the name CESU-8 and Python does not have built-in support for it. It is no longer recommended for use but (clearly!) some databases were built that way and are still in use. Adding support for CESU-8 would require writing our own encoder/decoder of that character set. One possibility, however, might be to simply defer raising the error until the first attempt to actually use the national character set is used. That sounds like it might resolve your situation since you aren't using any NCHAR, NVARCHAR2 or NCLOB columns? For now the only option you have is to use thick mode.

cjbj · 2022-06-11T22:37:56Z

The differences between Thin and Thick modes are described in the documentation Differences between python-oracledb Thin and Thick Modes
Your solutions with the current 1.0.0 release are to use Thick mode, see here, or connect to a modern database which has a national character set of AL16UTF16.
In this first release of the Thin mode, it already supports a heap of functionality. Work is ongoing to add more.
Each DB has two characters sets. The 'national character set' referenced in the error is used by NCHAR, NVARCHAR2 and NCLOB columns. Thin mode does support a national character set of AL16UTF16 but not the older UTF8. (Again, this is for the national character set, not the basic database character set).
Anthony's suggestion of only throwing DPY-3012 when one of the N* types is used by the app, not at connect time, is a good way forward. To fully support the older UTF8 national character set is a non-trivial amount of work, and we want to look forward, not backwards (which is also why Thin mode connects to Oracle DB 12.1 or later, whereas Thick mode can connect to older databases).
The old connection encoding parameter is ignored by python-oracledb, see the doc. This parameter used to relate to the basic database character set. It was the nencoding parameter that related to the national character set. This parameter is also ignored by python-oracledb.

damarvin · 2022-06-13T19:00:54Z

Thanks a lot for the fast precise and comprehensive clarification.
Not sure I can convince the DB operator, still I withdraw the request for "would require writing an own encoder/decoder".

cjbj · 2022-06-13T22:35:31Z

@damarvin I'll leave this closed but I am tracking the general problem so we know how to prioritize our efforts. I believe @anthony-tuininga's suggested enhancement will be a good for many people.

doerwalter · 2022-06-15T14:48:08Z

Since the difference between UTF-8 and CESU-8 is only how surrogates are encoded, so it might be possible to implement decoding CESU-8 with Python's standard utf-8 codec and a codec error handler. See PEP 293 for details: https://peps.python.org/pep-0293/

supported; an error is now raised only when the first attempt to use NCHAR, NVARCHAR2 or NCLOB data is made (#16).

anthony-tuininga · 2022-06-22T21:41:26Z

I've just pushed code that allows you to connect to a database using national character set UTF8 and raises the exception only upon attempting to use NCHAR, NVARCHAR2 or NCLOB data.

supported; an error is now raised only when the first attempt to use NCHAR, NVARCHAR2 or NCLOB data is made (#16).

damarvin added the enhancement New feature or request label Jun 11, 2022

damarvin closed this as completed Jun 13, 2022

anthony-tuininga added a commit that referenced this issue Jun 22, 2022

Connecting to a database with national character set UTF8 is now

698cd7c

supported; an error is now raised only when the first attempt to use NCHAR, NVARCHAR2 or NCLOB data is made (#16).

anthony-tuininga mentioned this issue Jun 25, 2022

oracledb.exceptions.NotSupportedError: DPY-3012: national character set id 871 is not supported by python-oracledb in thin mode #27

Closed

anthony-tuininga added a commit that referenced this issue Jul 15, 2022

Connecting to a database with national character set UTF8 is now

2f740ee

supported; an error is now raised only when the first attempt to use NCHAR, NVARCHAR2 or NCLOB data is made (#16).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thin mode should support DB_NCHARSET 'UTF8' #16

Thin mode should support DB_NCHARSET 'UTF8' #16

damarvin commented Jun 11, 2022

anthony-tuininga commented Jun 11, 2022 •

edited

Loading

cjbj commented Jun 11, 2022

damarvin commented Jun 13, 2022 •

edited

Loading

cjbj commented Jun 13, 2022

doerwalter commented Jun 15, 2022

anthony-tuininga commented Jun 22, 2022

Thin mode should support DB_NCHARSET 'UTF8' #16

Thin mode should support DB_NCHARSET 'UTF8' #16

Comments

damarvin commented Jun 11, 2022

anthony-tuininga commented Jun 11, 2022 • edited Loading

cjbj commented Jun 11, 2022

damarvin commented Jun 13, 2022 • edited Loading

cjbj commented Jun 13, 2022

doerwalter commented Jun 15, 2022

anthony-tuininga commented Jun 22, 2022

anthony-tuininga commented Jun 11, 2022 •

edited

Loading

damarvin commented Jun 13, 2022 •

edited

Loading