-
Notifications
You must be signed in to change notification settings - Fork 363
Description
Currently in a process of migration from python2.7 to python3.6 and cx_Oracle 7.3.0.
Today, while testing, I stumbled upon an UnicodeDecodeError while fetching data:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byteTo fetch data from database I have to use cursor.callfunc method, which return type is cx_Oracle.CURSOR, and .fetchmany() to load in batches.
In my research I've found out that there are some invalid UTF-8 string in database.
The source of invalid strings is another application, which loads data from parsed files, which, in turn, may contain anything.
Since my application main purpose is to send fetched data over interner i've never been bothered by decoding bytes while app was running using python2.7.
But now it became a problem.
I've wanted to try to use outputtypehandler for VARCHAR column, but the return values of callfunc are not affected by this.
I've also seen this article in documentation:
https://cx-oracle.readthedocs.io/en/7.3/user_guide/globalization.html#character-set-example
which says:
Because the ‘€’ symbol is not supported by the WE8ISO8859P1 character set, all ‘€’ characters are replaced by ‘¿’ in the cx_Oracle output
But for some reason, I've got no replacements, my guess it's also not an option for cursor.callfunc?
Client and database encoding is AMERICAN_AMERICA.AL32UTF8
How one can solve such issue?
Is there a way to solve an issue on the client side, without enforcement of utf8 encoding for every string which enters database? (fixing all clients which can write to database)