(n)encoding options only used for mapping between client and database? #547

simonvanderveldt · 2021-03-19T16:31:01Z

I was wondering if the (n)encoding options are only used for mapping between the client and the database or also for decoding the incoming data into a Python string when reading from a database?

So let's say I have a database that is using latin-1 and I set my client's (n)encondig to latin-1 as well we should get a bunch of bytes over the line that would need to be decoded using latin-1. Does cx_Oracle do that automatically?

anthony-tuininga · 2021-03-19T16:56:39Z

The nencoding is used for NVARCHAR2 and NCHAR columns as well as NCLOB values. The encoding is used for VARCHAR2 and CHAR columns as well as CLOB values -- and all other strings used by the database unless otherwise stated. With cx_Oracle 8 and Python 3, these values are no longer important, however. Unless otherwise specified, cx_Oracle uses the UTF-8 encoding for all data (both encoding and nencoding are set to this) and the Oracle Client performs any necessary conversion from the database character set into UTF-8. When cx_Oracle receives this data it is decoded into a string automatically.

The default character set for Oracle Database is also UTF-8 (called AL32UTF8) and if this is in use, no conversion needs to be done internally by the client.

There is one additional piece of information that is worth noting. The NCHAR, NVARCHAR2 and NCLOB columns always require conversion and they also require use of the DB_TYPE_NCHAR, DB_TYPE_NVARCHAR and DB_TYPE_NCLOB in order to avoid a potential double conversion (from the national character set to the database character set, and then from the database character set to the encoding used by the client). If you have your database character set identified as AL32UTF8 (UTF-8) there is no need to use the NCHAR, NVARCHAR2 and NCLOB columns and their use should simply be avoided.

You can check to see what character sets are in use by issuing the queries noted in the documentation.

Does that answer your question?

simonvanderveldt · 2021-03-22T11:06:12Z

@anthony-tuininga Thanks for the very quick response, appreciated!

The parts you explained are clear to me, thanks for walking me through that.

The part that I'm not sure about is what happens when we have a database that uses for example ISO-8859-1 and we set encoding=ISO-8859-1 when getting data using cx_Oracle.
Does cx_Oracle then create a string from the bytes received using <received bytes>.decode("ISO-8859-1") as well? Or does it not do that? If not, how would it guarantee that the resulting Python string is correct?

cjbj · 2021-03-22T22:05:52Z

The character set conversion between cx_Oracle and the database data is handled by Oracle's standard NLS layers in the Oracle Client or Server libraries.

anthony-tuininga · 2021-03-22T22:13:24Z

To answer your question directly: yes, cx_Oracle creates a string from the bytes received using .decode("ISO-8859-1").

simonvanderveldt · 2021-03-26T08:05:22Z

@anthony-tuininga OK, thanks for the clarification.
Might be useful to include in the character sets and globalization docs? They only mention the conversion between the client and the database and not what's happening between the client and the Python strings that you end up with.

simonvanderveldt added the question label Mar 19, 2021

cjbj added enhancement and removed question labels Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(n)encoding options only used for mapping between client and database? #547

(n)encoding options only used for mapping between client and database? #547

simonvanderveldt commented Mar 19, 2021

anthony-tuininga commented Mar 19, 2021

simonvanderveldt commented Mar 22, 2021 •

edited

cjbj commented Mar 22, 2021

anthony-tuininga commented Mar 22, 2021 •

edited

simonvanderveldt commented Mar 26, 2021

(n)encoding options only used for mapping between client and database? #547

(n)encoding options only used for mapping between client and database? #547

Comments

simonvanderveldt commented Mar 19, 2021

anthony-tuininga commented Mar 19, 2021

simonvanderveldt commented Mar 22, 2021 • edited

cjbj commented Mar 22, 2021

anthony-tuininga commented Mar 22, 2021 • edited

simonvanderveldt commented Mar 26, 2021

simonvanderveldt commented Mar 22, 2021 •

edited

anthony-tuininga commented Mar 22, 2021 •

edited