Skip to content

How to deal with encoding issues while fetching rows returned by callfunc method? #590

@iamdbychkov

Description

@iamdbychkov

Currently in a process of migration from python2.7 to python3.6 and cx_Oracle 7.3.0.

Today, while testing, I stumbled upon an UnicodeDecodeError while fetching data:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte

To fetch data from database I have to use cursor.callfunc method, which return type is cx_Oracle.CURSOR, and .fetchmany() to load in batches.

In my research I've found out that there are some invalid UTF-8 string in database.

The source of invalid strings is another application, which loads data from parsed files, which, in turn, may contain anything.

Since my application main purpose is to send fetched data over interner i've never been bothered by decoding bytes while app was running using python2.7.

But now it became a problem.

I've wanted to try to use outputtypehandler for VARCHAR column, but the return values of callfunc are not affected by this.
I've also seen this article in documentation:
https://cx-oracle.readthedocs.io/en/7.3/user_guide/globalization.html#character-set-example
which says:

Because the ‘€’ symbol is not supported by the WE8ISO8859P1 character set, all ‘€’ characters are replaced by ‘¿’ in the cx_Oracle output

But for some reason, I've got no replacements, my guess it's also not an option for cursor.callfunc?

Client and database encoding is AMERICAN_AMERICA.AL32UTF8

How one can solve such issue?
Is there a way to solve an issue on the client side, without enforcement of utf8 encoding for every string which enters database? (fixing all clients which can write to database)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions