Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'gbk' codec can't decode byte 0xa3 #74

Closed
zhangqqqf opened this issue Aug 30, 2017 · 14 comments
Closed

'gbk' codec can't decode byte 0xa3 #74

zhangqqqf opened this issue Aug 30, 2017 · 14 comments

Comments

@zhangqqqf
Copy link

zhangqqqf commented Aug 30, 2017

  1. What is your version of Python? Is it 32-bit or 64-bit?
    python3.6 64-bit
  2. What is your version of cx_Oracle?
    6.0.1
  3. What is your version of the Oracle client (e.g. Instant Client)? How was it
    installed? Where is it installed?
    11gR2
  4. What is your version of the Oracle Database?
    11g
  5. What is your OS and version?
    windows7
  6. What error(s) you are seeing?

Database language "ZHS16GBK"
The data contains illegal characters, e.g. \xa3\xa0,\xa4\x57, �
When cursor.fetchmany() handle the data contains illegal characters, throw Exception:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa3 in position 18: illegal multibyte sequence

@cjbj cjbj added the bug label Aug 30, 2017
@anthony-tuininga
Copy link
Member

Can you provide a test case that demonstrates the issue? How did you get the illegal characters? What code did you use? Can you show what the data should look like properly? What Unicode code points are being used? Without that information I can't help you much!

@zhangqqqf
Copy link
Author

zhangqqqf commented Aug 31, 2017

Thank you for your reply!

CREATE TABLE T_EXCEPTION
(
ILLEGAL_CODE VARCHAR2(100 BYTE)
)

Insert into T_EXCEPTION
(ILLEGAL_CODE)
Values
('11(1-1�');
COMMIT;

History of the existence of such data, but no way to insert this new data containing illegal characters

Database language "ZHS16GBK", Python file encoding UTF-8

I have to deal with the data from a large system, the use of this system is very much, they have a variety of strange input method, so stored in the database will appear this illegal characters,
To deal with the data is 10 million, there are only a few illegal characters of the data, but I can not deal with the data in advance, only in the use of ex_Oracle will be found;

Because cursor.fetchmany () or other fetch () function, I can not specify decode ('utf-8', 'ignore') with errors parameter 'ignore', so I have no way to deal with this problem;

It is difficult for me to accurately describe the problem in english, I hope you can understand;

@anthony-tuininga
Copy link
Member

I think I understand what you are saying. The data in the database is illegal but you'd still like to fetch it without getting an exception? If that's the case I'll label this an enhancement instead.

@cjbj
Copy link
Member

cjbj commented Sep 1, 2017

@zhangqqqf did you have the same issue with cx_Oracle 5.3?

@zhangqqqf
Copy link
Author

@anthony-tuininga
yes, that is i mean

@zhangqqqf
Copy link
Author

@cjbj
I haven't applied it with cx_Oracle 5.3 yet
so, i don't know if i have the same issue 5.3

@hvbtup
Copy link

hvbtup commented Nov 28, 2017

If the data in the database is invalid, throwing a UnicodeDecodeError is the correct behaviour. You could probably work around this inside the DB by converting the data to a BLOB and back to a CLOB, or by using cast_to_raw or anything like that.

@anthony-tuininga
Copy link
Member

Does this suggestion work for you? Considering that the data is invalid, it probably makes more sense to adjust the data before retrieving it, rather than retrieve the data in an invalid state -- possibly even more garbled than it was in the database!

@TungMY
Copy link

TungMY commented Dec 1, 2017

same ploblem, python 3.5 cx_oracle 6.02

@cjbj
Copy link
Member

cjbj commented Dec 1, 2017

@TungMY is your data also invalid?

@dasuanyao
Copy link

i got the same problem when using python to connect oracle.
the connect succeeded
but when i want to fecth data, it appears this error: 'gbk' codec can't decode byte 0xa1 in position 4

@cjbj
Copy link
Member

cjbj commented Dec 5, 2017

@dasuanyao can you describe the data (is it valid?), what character sets are being used in the DB and for Python etc?

@dasuanyao
Copy link

the character sets in DB is GBK, for python, I changed the environment into GDK by "os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.ZHS16GBK'".
I can read some of the data, but then an error occurs.

@anthony-tuininga
Copy link
Member

This issue is being tracked in issue #162 and code changes have been committed that should allow you to fetch invalid data from the database. Take a look and let me know on that issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants