Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte (outputtypehandler encoding_errors="replace" not being honored) #279

Closed
anthony-tuininga opened this issue Jan 4, 2024 Discussed in #272 · 2 comments
Labels
bug Something isn't working patch available

Comments

@anthony-tuininga
Copy link
Member

Discussed in #272

Originally posted by rh8056 December 20, 2023
Hello,

I posted this question on StackExchange yesterday as well, but figured it would make sense to ask here as well.

I'm struggling to deal with what I think is corrupt data stored in an Oracle database when reading it in a python script. I have the following:

    def output_type_handler(cursor, metadata):
      if metadata.type_code is oracledb.DB_TYPE_VARCHAR:
        return cursor.var(metadata.type_code,
                          arraysize=cursor.arraysize,
                          encoding_errors="replace")

    mydb, mycursor = connectOracle()
    mycursor.outputtypehandler = output_type_handler

    mycursor.execute('''select file_id, md5_value,
                            case when file_content is not null
                                then
                                    utl_raw.cast_to_varchar2(dbms_lob.substr(file_content))
                                else
                                    ''
                                end as FILE_CONTENT from archive_file where archive_id = 123 and file_name = 'file_name.txt' ''')
    row = mycursor.fetchone()
    print(row)

When I run this, I'm getting the following:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 647: invalid continuation byte
If I run this directly in SQL Developer, I get an output, with � in the file content. My SQL Developer settings are set to UTF-8, and my Oracle database NLS_NCHAR_CHARACTERSET is set to AL32UTF8. My understanding with the outputtypehandler change is that my python script would also output a � in this instance.

I added some print() commands inside of the def output_type_handler() to verify that it is indeed being called, and I saw output, so it appears that it is, but it also seems like the encoding_errors="replace" is being ignored. What am I missing here? Thanks!

@anthony-tuininga anthony-tuininga added the bug Something isn't working label Jan 4, 2024
anthony-tuininga added a commit that referenced this issue Jan 4, 2024
"encoding_errors" parameter when creating variables by calling the
method Cursor.var() (#279).
@anthony-tuininga
Copy link
Member Author

I confirmed that this was indeed a regression from cx_Oracle and have added a test case to confirm that it works correctly now. If you are able to build from source you can verify that it works for you, too.

@anthony-tuininga
Copy link
Member Author

The patch has been included in version 2.0.1 which was just released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch available
Projects
None yet
Development

No branches or pull requests

1 participant