Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError, to read a nvarchar2 data,and some special characters can't decode by utf8. #83

Closed
honorcao opened this issue Sep 21, 2017 · 5 comments

Comments

@honorcao
Copy link

commented Sep 21, 2017

Python: Python 3.4.0 (default, Feb 16 2017, 10:21:31)
GCC:[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
cx_Oracle: cx_Oracle-5.2.1-py3.4-linux-x86_64
Oracle client version: Instant Client 11.2(64 bit)
Oracle database version: 11.2
Red Hat Enterprise Linux Server release 6.5 (64 bit)

 conn = cx_Oracle.connect(strconn,encoding = "UTF-8",nencoding = "UTF-8")
 cursor = conn.cursor ()
 sql = 'select contentid,contenttext from t_test order by tit'
 cursor.execute(sql)
<cx_Oracle.Cursor on <cx_Oracle.Connection to >>
 
 
 cursor.fetchone()
(1, 'test你好test')
 cursor.fetchone()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 4: invalid continuation byte

Column contenttext is nvarchar2
and the contenttext is :

2|一如既往的喜欢💕

There are some special characters in the end,and can't decode by utf8.

I save the string to a txt file,and open it like this:


 f = open('1.txt','r')
 f.readline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/python34/lib/python3.4/codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 23: invalid continuation byte
 f = open('1.txt','rb')
 bline = f.readline()
 bline
b'2|\xe4\xb8\x80\xe5\xa6\x82\xe6\x97\xa2\xe5\xbe\x80\xe7\x9a\x84\xe5\x96\x9c\xe6\xac\xa2\xed\xa0\xbd\xed\xb2\x95 \n'
 bline.decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 23: invalid continuation byte
 bline.decode('utf8','ignore')
'2|一如既往的喜欢 \n'
 

How to use cx_Oracle to read it?

@honorcao

This comment has been minimized.

Copy link
Author

commented Sep 21, 2017

#74
This question is same to me.

The data in the database is correct but can't fetch it.

@anthony-tuininga

This comment has been minimized.

Copy link
Member

commented Sep 21, 2017

Can you indicate what the database character set is? And also provide a SQL statement that will create the table and populate it with the offending row -- but only using unistr('\3042') syntax as is done in the test suite? The encoding UTF-8 is a universal character set and should be able to handle any characters. Have you tried using this syntax instead to connect?

conn = cx_Oracle.connect(strconn, encoding = "UTF-8", nencoding = "UTF-16")

That shouldn't be necessary but it might shed some light on why you're experiencing the issues you are experiencing.

@anthony-tuininga

This comment has been minimized.

Copy link
Member

commented Sep 21, 2017

Have you tried using Python 3.6 as well? Just in case it is an encoding bug in the Python library?

@anthony-tuininga

This comment has been minimized.

Copy link
Member

commented Nov 27, 2017

Do you have any answers to my questions? Or have you solved the problem on your own?

@anthony-tuininga

This comment has been minimized.

Copy link
Member

commented Nov 30, 2017

Assuming the problem has been resolved. Re-open if that is not the case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.