New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues #36

Closed
Dronablo opened this Issue Jun 7, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@Dronablo
Copy link

Dronablo commented Jun 7, 2017

I have the following simple code for python (version 3.6.1 x64 Windows):

# -*- coding: utf-8 -*-
import cx_Oracle
import os

os.environ["NLS_LANG"] = "RUSSIAN_RUSSIA.AL32UTF8"

conn = cx_Oracle.connect("user/pass@//host:1521/dbname")
cur = conn.cursor()

print(cx_Oracle.__version__)

print(conn.encoding)
print(conn.nencoding)

cur.execute("select 'Значение' from dual")
res = cur.fetchone()

cur.close()
conn.close()

print(res)

With cx_Oracle version 5.3 it works as expected and gives the following output:

5.3
UTF-8
UTF-8
('Значение',)

But with the 6.02b version it becomes broken.

  1. It throws exception about unicode character in query:
6.0b2
ASCII
ASCII
Traceback (most recent call last):
  File "C:/Users/SBT-Chernopyatov-AS/PycharmProjects/CUP_Fetcher/ptest_602.py", line 15, in <module>
    cur.execute("select 'Значение' from dual")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 8-15: ordinal not in range(128)
  1. If we fix it by using ".encode('utf8')" or by changing query to "select * from table_with_unicode_data", it gives '????????' for output.
  2. If we try to get back 'UTF-8' in connection.encoding by adding "encoding='UTF-8'" we get:
Traceback (most recent call last):
  File "C:/Users/SBT-Chernopyatov-AS/PycharmProjects/CUP_Fetcher/ptest_602.py", line 7, in <module>
    conn = cx_Oracle.connect("user/pass@//host:1521/dbname", encoding='UTF-8')
cx_Oracle.DatabaseError: DPI-1005: unable to acquire Oracle environment handle

So, seems there is no way to work with the unicode strings right now.

@cjbj cjbj added the bug label Jun 7, 2017

@cjbj

This comment has been minimized.

Copy link
Member

cjbj commented Jun 7, 2017

Can you post your Oracle client and DB versions?

@cjbj

This comment has been minimized.

Copy link
Member

cjbj commented Jun 7, 2017

Also, try setting NLS_LANG in the shell that invokes Python.

@cjbj

This comment has been minimized.

Copy link
Member

cjbj commented Jun 7, 2017

cjones@localhost:~/j$ python3 issue36.py 
6.0b1
ASCII
ASCII
Traceback (most recent call last):
  File "issue36.py", line 15, in <module>
    cur.execute("select 'Значение' from dual")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 8-15: ordinal not in range(128)
cjones@localhost:~/j$ NLS_LANG=.utf8  python3 issue36.py 
6.0b1
UTF-8
UTF-8
('Значение',)
cjones@localhost:~/j$ 

You can get the same 'correct' output by setting os.environ["NLS_LANG"] before importing cx_Oracle.

I've always hated seeing Oracle env vars set inside scripts since that can be 'too late' and cause seemingly random behaviors.

I'm not sure this will be fixable in cx_Oracle 6, unless @anthony-tuininga can work out how to defer loading and initializing OCI.

@Dronablo

This comment has been minimized.

Copy link
Author

Dronablo commented Jun 8, 2017

I see it using python 3.6.1 Windows x64.
Workaround with setting os.environ["NLS_LANG"] before import cx_Oracle works, thanks.

Due to the two facts:

  1. It's well-known technic, described in a lots of places over the internet.
  2. There is lot's of code which allows [probably wrong] cx_Oracle 5.* behaviour.

I believe that this situation should just be somehow described in some 'What's new' part of cx_Oracle ver 6 documentation.

@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented Jun 8, 2017

Version 6 internally creates an OCI environment structure to manage a few things (errors, for example) and this happens when cx_Oracle is imported -- unlike earlier versions of cx_Oracle which do not create an OCI environment until a connection or session pool is created. It appears that the environment variables are read and cached, rather than examined for each creation of an OCI environment structure. It might be possible to delay that initialisation in version 6, but that has its own drawbacks -- namely that the calls using a dpiContext structure will then need to check and potentially initialise dpiContext at that point. Thankfully there aren't too many, but it is another potential source of issues. I'm not sure if you are aware, but this code works fine in both 5.3 and 6, and is preferable to setting the environment variable NLS_LANG.

import cx_Oracle

conn = cx_Oracle.connect(connectString, encoding = "UTF-8", nencoding = "UTF-8")

That bypasses NLS_LANG completely and sets the encoding to use for both CHAR and NCHAR data. If you have no NCHAR data you don't need to set that value, of course!

@Dronablo

This comment has been minimized.

Copy link
Author

Dronablo commented Jun 9, 2017

Both suggested workarounds (os.environ before import cx_Oracle and encoding/nencoding) is quite sufficient to overcome the problem. Just two last notes:

  1. I beleive it shoud be described in documentation.
  2. I have to use both encoding+nencoding. If I use only 'encoding' parameter it throws DPI-1005 as shown in the bottom of my first comment. Seems it should be fixed or described in documentation also.
@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented Jun 9, 2017

  1. I agree.
  2. I didn't expect that. I'll have to see what is causing that!

anthony-tuininga added a commit to oracle/odpi that referenced this issue Jun 10, 2017

OCI requires that both encoding and nencoding have values or that bot…
…h encoding

and encoding do not have values. Look up missing value, if neeeded, in order to
avoid the error when creating the OCI environment
(oracle/python-cx_Oracle#36).

anthony-tuininga added a commit that referenced this issue Jun 10, 2017

OCI requires that both encoding and nencoding have values or that bot…
…h encoding

and encoding do not have values. Look up missing value, if neeeded, in order to
avoid the error when creating the OCI environment
(#36).
@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented Jun 10, 2017

Commits added to eliminate requirement to specify both encoding and nencoding. I also added a note to the release notes. I believe that covers everything, now. If not, please re-open!

anthony-tuininga added a commit that referenced this issue Aug 1, 2017

Delay initialization of the ODPI-C library until the first standalone
connection or session pool is created so that manipulation of the environment
variable NLS_LANG can be performed after the module has been imported
(#36); this also has the added
benefit of reducing the number of errors that can take place when the module is
imported.
@cjbj

This comment has been minimized.

Copy link
Member

cjbj commented Aug 11, 2017

For the record, an upcoming change to cx_Oracle 6 will again allow this:

import cx_Oracle
import os
os.environ["NLS_LANG"] = "RUSSIAN_RUSSIA.AL32UTF8"

But I still don't recommend it, unless you absolutely know what order your code runs. It's better to set environment variables in the shell that invokes python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment