-
Notifications
You must be signed in to change notification settings - Fork 364
Description
While reading text data from an Oracle DB I came across some strange UnicodeDecodeErrors. After debugging the issue for quite some time I realized, that 'utf8' / 'UTF-8' as values for the cx_Oracle.connect()
'encoding' parameter have a different effect and are therefore not synonyms in this context! Most importantly, they differ in terms of the Connection.maxBytesPerCharacter
attribute (see below):
import cx_Oracle
with cx_Oracle.connect(dsn='MyDSN', user='MyUser', password='MyPassword', encoding='UTF-8', nencoding='UTF-8') as con:
print(con.maxBytesPerCharacter)
# 4
with cx_Oracle.connect(dsn='MyDSN', user='MyUser', password='MyPassword', encoding='utf8', nencoding='utf8') as con:
print(con.maxBytesPerCharacter)
# 3
I assume that 'utf8' potentially maps to Oracle's CESU-8 standard which seems counterintuitive. Also, I couldn't find this behaviour mentioned in the documentation and it goes against the standard Python naming scheme (https://docs.python.org/3/library/codecs.html#standard-encodings). Therefore I suggest either:
- Update documentation to make users aware of this issue (including full table of acceptable encoding parameter values and their respective Oracle counterparts).
and/or:
- Change parameter value mapping to treat 'utf8' and 'UTF-8' as synonyms
Answer the following questions:
-
What is your version of Python? Is it 32-bit or 64-bit?
Python 3.6.5 64-bit -
What is your version of cx_Oracle?
6.2.1 -
What is your version of the Oracle client (e.g. Instant Client)? How was it
installed? Where is it installed?
oracle-instantclient12.2-basic -
What is your version of the Oracle Database?
11.2.0.4.0 -
What is your OS and version?
Debian 8 64bit -
What compiler version did you use? For example, with GCC, run
gcc --version
.
GCC 7.2.0 -
What environment variables did you set? How exactly did you set them?
None -
What exact command caused the problem (e.g. what command did you try to
install with)? Who were you logged in as?
cursor.fetchone()
-
What error(s) you are seeing?
UnicodeDecodeError when usingencoding='utf8'