Skip to content

cx_Oracle.connect() 'encoding' parameter - 'utf8' vs 'UTF-8' #182

@sobayed

Description

@sobayed

While reading text data from an Oracle DB I came across some strange UnicodeDecodeErrors. After debugging the issue for quite some time I realized, that 'utf8' / 'UTF-8' as values for the cx_Oracle.connect() 'encoding' parameter have a different effect and are therefore not synonyms in this context! Most importantly, they differ in terms of the Connection.maxBytesPerCharacter attribute (see below):

import cx_Oracle

with cx_Oracle.connect(dsn='MyDSN', user='MyUser', password='MyPassword', encoding='UTF-8', nencoding='UTF-8') as con:
    print(con.maxBytesPerCharacter)

 # 4

with cx_Oracle.connect(dsn='MyDSN', user='MyUser', password='MyPassword', encoding='utf8', nencoding='utf8') as con:
    print(con.maxBytesPerCharacter) 

# 3

I assume that 'utf8' potentially maps to Oracle's CESU-8 standard which seems counterintuitive. Also, I couldn't find this behaviour mentioned in the documentation and it goes against the standard Python naming scheme (https://docs.python.org/3/library/codecs.html#standard-encodings). Therefore I suggest either:

  1. Update documentation to make users aware of this issue (including full table of acceptable encoding parameter values and their respective Oracle counterparts).

and/or:

  1. Change parameter value mapping to treat 'utf8' and 'UTF-8' as synonyms

Answer the following questions:

  1. What is your version of Python? Is it 32-bit or 64-bit?
    Python 3.6.5 64-bit

  2. What is your version of cx_Oracle?
    6.2.1

  3. What is your version of the Oracle client (e.g. Instant Client)? How was it
    installed? Where is it installed?
    oracle-instantclient12.2-basic

  4. What is your version of the Oracle Database?
    11.2.0.4.0

  5. What is your OS and version?
    Debian 8 64bit

  6. What compiler version did you use? For example, with GCC, run
    gcc --version.
    GCC 7.2.0

  7. What environment variables did you set? How exactly did you set them?
    None

  8. What exact command caused the problem (e.g. what command did you try to
    install with)? Who were you logged in as?
    cursor.fetchone()

  9. What error(s) you are seeing?
    UnicodeDecodeError when using encoding='utf8'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions