Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(n)encoding options only used for mapping between client and database? #547

Open
simonvanderveldt opened this issue Mar 19, 2021 · 5 comments

Comments

@simonvanderveldt
Copy link

I was wondering if the (n)encoding options are only used for mapping between the client and the database or also for decoding the incoming data into a Python string when reading from a database?

So let's say I have a database that is using latin-1 and I set my client's (n)encondig to latin-1 as well we should get a bunch of bytes over the line that would need to be decoded using latin-1. Does cx_Oracle do that automatically?

@anthony-tuininga
Copy link
Member

The nencoding is used for NVARCHAR2 and NCHAR columns as well as NCLOB values. The encoding is used for VARCHAR2 and CHAR columns as well as CLOB values -- and all other strings used by the database unless otherwise stated. With cx_Oracle 8 and Python 3, these values are no longer important, however. Unless otherwise specified, cx_Oracle uses the UTF-8 encoding for all data (both encoding and nencoding are set to this) and the Oracle Client performs any necessary conversion from the database character set into UTF-8. When cx_Oracle receives this data it is decoded into a string automatically.

The default character set for Oracle Database is also UTF-8 (called AL32UTF8) and if this is in use, no conversion needs to be done internally by the client.

There is one additional piece of information that is worth noting. The NCHAR, NVARCHAR2 and NCLOB columns always require conversion and they also require use of the DB_TYPE_NCHAR, DB_TYPE_NVARCHAR and DB_TYPE_NCLOB in order to avoid a potential double conversion (from the national character set to the database character set, and then from the database character set to the encoding used by the client). If you have your database character set identified as AL32UTF8 (UTF-8) there is no need to use the NCHAR, NVARCHAR2 and NCLOB columns and their use should simply be avoided.

You can check to see what character sets are in use by issuing the queries noted in the documentation.

Does that answer your question?

@simonvanderveldt
Copy link
Author

simonvanderveldt commented Mar 22, 2021

@anthony-tuininga Thanks for the very quick response, appreciated!

The parts you explained are clear to me, thanks for walking me through that.

The part that I'm not sure about is what happens when we have a database that uses for example ISO-8859-1 and we set encoding=ISO-8859-1 when getting data using cx_Oracle.
Does cx_Oracle then create a string from the bytes received using <received bytes>.decode("ISO-8859-1") as well? Or does it not do that? If not, how would it guarantee that the resulting Python string is correct?

@cjbj
Copy link
Member

cjbj commented Mar 22, 2021

The character set conversion between cx_Oracle and the database data is handled by Oracle's standard NLS layers in the Oracle Client or Server libraries.

@anthony-tuininga
Copy link
Member

anthony-tuininga commented Mar 22, 2021

To answer your question directly: yes, cx_Oracle creates a string from the bytes received using .decode("ISO-8859-1").

@simonvanderveldt
Copy link
Author

@anthony-tuininga OK, thanks for the clarification.
Might be useful to include in the character sets and globalization docs? They only mention the conversion between the client and the database and not what's happening between the client and the Python strings that you end up with.

@cjbj cjbj added enhancement and removed question labels Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants