diff --git a/doc/src/api_manual/cursor.rst b/doc/src/api_manual/cursor.rst index e8ae834..09e5ebb 100644 --- a/doc/src/api_manual/cursor.rst +++ b/doc/src/api_manual/cursor.rst @@ -52,7 +52,7 @@ Cursor Object The DB API definition does not define this attribute. -.. method:: Cursor.arrayvar(data_type, value, [size]) +.. method:: Cursor.arrayvar(typ, value, [size]) Create an array variable associated with the cursor of the given type and size and return a :ref:`variable object `. The value is either an @@ -587,19 +587,19 @@ Cursor Object The DB API definition does not define this attribute. -.. method:: Cursor.var(dataType, [size, arraysize, inconverter, outconverter, \ - typename, encodingErrors, bypassencoding]) +.. method:: Cursor.var(typ, [size, arraysize, inconverter, outconverter, \ + typename, encoding_errors, bypass_encoding]) Create a variable with the specified characteristics. This method was designed for use with PL/SQL in/out variables where the length or type cannot be determined automatically from the Python object passed in or for use in input and output type handlers defined on cursors or connections. - The dataType parameter specifies the type of data that should be stored in - the variable. This should be one of the - :ref:`database type constants `, :ref:`DB API constants `, - an object type returned from the method :meth:`Connection.gettype()` or one - of the following Python types: + The typ parameter specifies the type of data that should be stored in the + variable. This should be one of the :ref:`database type constants + `, :ref:`DB API constants `, an object type returned from + the method :meth:`Connection.gettype()` or one of the following Python + types: .. list-table:: :header-rows: 1 @@ -642,17 +642,29 @@ Cursor Object specified when using type :data:`cx_Oracle.OBJECT` unless the type object was passed directly as the first parameter. - The encodingErrors parameter specifies what should happen when decoding + The encoding_errors parameter specifies what should happen when decoding byte strings fetched from the database into strings. It should be one of the values noted in the builtin `decode `__ function. - The bypassencoding parameter, if specified, should be passed as - boolean. This feature allows results of database types CHAR, NCHAR, - LONG_STRING, NSTRING, STRING to be returned raw meaning cx_Oracle - won't do any decoding conversion. See - :ref:`Fetching raw data ` for more information. + The bypass_encoding parameter, if specified, should be passed as a + boolean value. Passing a `True` value causes values of database types + :data:`~cx_Oracle.DB_TYPE_VARCHAR`, :data:`~cx_Oracle.DB_TYPE_CHAR`, + :data:`~cx_Oracle.DB_TYPE_NVARCHAR`, :data:`~cx_Oracle.DB_TYPE_NCHAR` and + :data:`~cx_Oracle.DB_TYPE_LONG` to be returned as `bytes` instead of `str`, + meaning that cx_Oracle doesn't do any decoding. See :ref:`Fetching raw + data ` for more information. + + .. versionadded:: 8.2 + + The parameter `bypass_encoding` was added. + + .. versionchanged:: 8.2 + + For consistency and compliance with the PEP 8 naming style, the + parameter `encodingErrors` was renamed to `encoding_errors`. The old + name will continue to work as a keyword parameter for a period of time. .. note:: diff --git a/doc/src/api_manual/deprecations.rst b/doc/src/api_manual/deprecations.rst index c6ebf02..188bb7d 100644 --- a/doc/src/api_manual/deprecations.rst +++ b/doc/src/api_manual/deprecations.rst @@ -68,6 +68,8 @@ if applicable. The most recent deprecations are listed first. - Replace with parameter name `keyword_parameters` * - `keywordParameters` parameter to :meth:`Cursor.callproc()` - Replace with parameter name `keyword_parameters` + * - `encodingErrors` parameter to :meth:`Cursor.var()` + - Replace with parameter name `encoding_errors` * - `Cursor.fetchraw()` - Replace with :meth:`Cursor.fetchmany()` * - `Queue.deqMany` diff --git a/doc/src/release_notes.rst b/doc/src/release_notes.rst index 3e44b9e..0e39119 100644 --- a/doc/src/release_notes.rst +++ b/doc/src/release_notes.rst @@ -26,6 +26,12 @@ Version 8.2 (TBD) :meth:`cx_Oracle.SessionPool()` in order to permit specifying the size of the statement cache during the creation of pools and standalone connections. +#) Added parameter `bypass_decode` to :meth:`Cursor.var()` in order to allow + the `decode` step to be bypassed when converting data from Oracle Database + into Python strings + (`issue 385 `__). + Initial work was done in `PR 549 + `__. #) Threaded mode is now always enabled when creating connection pools with :meth:`cx_Oracle.SessionPool()`. Any `threaded` parameter value is ignored. #) Eliminated a memory leak when calling :meth:`SodaOperation.filter()` with a diff --git a/doc/src/user_guide/sql_execution.rst b/doc/src/user_guide/sql_execution.rst index 0652849..b7c44d6 100644 --- a/doc/src/user_guide/sql_execution.rst +++ b/doc/src/user_guide/sql_execution.rst @@ -288,7 +288,7 @@ or the value ``None``. The value ``None`` indicates that the default type should be used. Examples of output handlers are shown in :ref:`numberprecision`, -:ref:`directlobs` and :ref:`fetching-raw-data`. Also see samples such as `samples/TypeHandlers.py +:ref:`directlobs` and :ref:`fetching-raw-data`. Also see samples such as `samples/type_handlers.py `__ .. _numberprecision: @@ -347,82 +347,73 @@ See `samples/return_numbers_as_decimals.py .. _fetching-raw-data: Fetching Raw Data ---------------------- - -Sometimes cx_Oracle may have problems converting data to unicode and you may -want to inspect the problem closer rather than auto-fix it using the -encodingerrors parameter. This may be useful when a database contains -records or fields that are in a wrong encoding altogether. +----------------- -It is not recommended to use mixed encodings in databases. -This functionality is aimed at troubleshooting databases -that have inconsistent encodings for external reasons. +Sometimes cx_Oracle may have problems converting data stored in the database to +Python strings. This can occur if the data stored in the database doesn't match +the character set defined by the database. The `encoding_errors` parameter to +:meth:`Cursor.var()` permits the data to be returned with some invalid data +replaced, but for additional control the parameter `bypass_decode` can be set +to `True` and cx_Oracle will bypass the decode step and return `bytes` instead +of `str` for data stored in the database as strings. The data can then be +examined and corrected as required. This approach should only be used for +troubleshooting and correcting invalid data, not for general use! -For these cases, you can pass in the in additional keyword argument -``bypassencoding = True`` into :meth:`Cursor.var()`. This needs -to be used in combination with :ref:`outputtypehandlers` +The following sample demonstrates how to use this feature: .. code-block:: python - #defining output type handlers method - def ConvertStringToBytes(cursor, name, defaultType, size, precision, scale): - if defaultType == cx_Oracle.STRING: - return cursor.var(str, arraysize=cursor.arraysize, bypassencoding = True) - - #set cursor outputtypehandler to the method above - cursor = connection.cursor() - ursor.outputtypehandler = ConvertStringToBytes - + # define output type handler + def return_strings_as_bytes(cursor, name, default_type, size, + precision, scale): + if default_type == cx_Oracle.DB_TYPE_VARCHAR: + return cursor.var(str, arraysize=cursor.arraysize, + bypass_decode=True) -This will allow you to receive data as raw bytes. + # set output type handler on cursor before fetching data + with connection.cursor() as cursor: + cursor.outputtypehandler = return_strings_as_bytes + cursor.execute("select content, charset from SomeTable") + data = cursor.fetchall() - .. code-block:: python +This will produce output as:: - statement = cursor.execute("select content, charset from SomeTable") - data = statement.fetchall() + [(b'Fianc\xc3\xa9', b'UTF-8')] -This will produce output as: +Note that last \xc3\xa9 is é in UTF-8. Since this is valid UTF-8 you can then +perform a decode on the data (the part that was bypassed): .. code-block:: python - [(b'Fianc\xc3\xa9', b'UTF-8')] - + value = data[0][0].decode("UTF-8") -Note that last \xc3\xa9 is é in UTF-8. Then in you can do following: +This will return the value "Fiancé". +If you want to save ``b'Fianc\xc3\xa9'`` into the database directly without +using a Python string, you will need to create a variable using +:meth:`Cursor.var()` that specifies the type as +:data:`~cx_Oracle.DB_TYPE_VARCHAR` (otherwise the value will be treated as +:data:`~cx_Oracle.DB_TYPE_RAW`). The following sample demonstrates this: .. code-block:: python - import codecs - # data = [(b'Fianc\xc3\xa9', b'UTF-8')] - unicodecontent = data[0][0].decode(data[0][1].decode()) # Assuming your charset encoding is UTF-8 - - -This will revert it back to "Fiancé". - -If you want to save ``b'Fianc\xc3\xa9'`` to database you will need to create -:meth:`Cursor.var()` that will tell cx_Oracle that the value is indeed -intended as a string: - - - .. code-block:: python - - connection = cx_Oracle.connect("hr", userpwd, "dbhost.example.com/orclpdb1") - cursor = connection.cursor() - cursorvariable = cursor.var(cx_Oracle.STRING) - cursorvariable.setvalue(0, "Fiancé".encode("UTF-8")) # b'Fianc\xc4\x9b' - cursor.execute("update SomeTable set SomeColumn = :param where id = 1", param=cursorvariable) - - -At that point, the bytes will be assumed to be in the correct encoding and should insert as you expect. + with cx_Oracle.connect(user="hr", password=userpwd, + dsn="dbhost.example.com/orclpdb1") as conn: + with conn.cursor() cursor: + var = cursor.var(cx_Oracle.DB_TYPE_VARCHAR) + var.setvalue(0, b"Fianc\xc4\x9b") + cursor.execute(""" + update SomeTable set + SomeColumn = :param + where id = 1""", + param=var) .. warning:: - This functionality is "as-is": when saving strings like this, - the bytes will be assumed to be in the correct encoding and will - insert like that. Proper encoding is the responsibility of the user and - no correctness of any data in the database can be assumed - to exist by itself. + + The database will assume that the bytes provided are in the character set + expected by the database so only use this for troubleshooting or as + directed. .. _outconverters: diff --git a/samples/QueringRawData.py b/samples/QueringRawData.py deleted file mode 100644 index c70b23c..0000000 --- a/samples/QueringRawData.py +++ /dev/null @@ -1,75 +0,0 @@ -# -*- coding: utf-8 -*- -import cx_Oracle -import sample_env - -"The test below verifies that the option to work around saving and reading of inconsistent encodings works" - -def ConvertStringToBytes(cursor, name, defaultType, size, precision, scale): - if defaultType == cx_Oracle.STRING: - return cursor.var(str, arraysize=cursor.arraysize, bypassencoding = True) - -connection = cx_Oracle.connect(sample_env.get_main_connect_string()) -cursor = connection.cursor() - -cursor.outputtypehandler = ConvertStringToBytes - -sql = 'create table EncodingExperiment (content varchar2(100), encoding varchar2(15))' - -print('Creating experiment table') -try: - cursor.execute(sql) - print('Success, will attempt to add records') -except Exception as err: - # table already exists - print('%s\n%s'%(err, 'EncodingExperiment table exists... Will attempt to add records')) - -# variable that we will test encodings against -unicode_string = 'I bought a cafetière on the Champs-Élysées' - -# First test -windows_1252_encoded = unicode_string.encode('windows-1252') -# Second test -utf8_encoded = unicode_string.encode('utf-8') - -sqlparameters = [(windows_1252_encoded, 'windows-1252'), (utf8_encoded, 'utf-8')] - -sql = 'insert into EncodingExperiment (content, encoding) values (:content, :encoding)' - -# cx_Oracle string variable in which we will store byte value and insert it as such -content_variable = cursor.var(cx_Oracle.STRING) - -print('Adding records to the table: "EncodingExperiment"') -for sqlparameter in sqlparameters: - content, encoding = sqlparameter - # setting content_variable value to a byte value and instert it as such - content_variable.setvalue(0, content) - cursor.execute(sql, content=content_variable, encoding=encoding) - -sql = 'select * from EncodingExperiment' - -print('Fetching records from table EncodingExperiment') -result = cursor.execute(sql).fetchall() - -for dataset in result: - content, encoding = dataset[0], dataset[1].decode() - decodedcontent = content.decode(encoding) - print('Is "%s" == "%s" ?\nResult: %s, (decoded from: %s)'%(decodedcontent, unicode_string, decodedcontent == unicode_string, encoding)) - -print('Finished testing, will attempt to drop the table "EncodingExperiment"') -# drop table after finished testing -sql = 'drop table EncodingExperiment' -try: - cursor.execute(sql) - print('Successfully droped table "EncodingExperiment" from database.') -except Exception as err: - print('Failed to drop table from the database, info: %s'%err) - - - - - - - - - - diff --git a/samples/query_strings_as_bytes.py b/samples/query_strings_as_bytes.py new file mode 100644 index 0000000..51b7e43 --- /dev/null +++ b/samples/query_strings_as_bytes.py @@ -0,0 +1,49 @@ +#------------------------------------------------------------------------------ +# Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. +#------------------------------------------------------------------------------ + +#------------------------------------------------------------------------------ +# query_strings_as_bytes.py +# +# Demonstrates how to query strings as bytes (bypassing decoding of the bytes +# into a Python string). This can be useful when attempting to fetch data that +# was stored in the database in the wrong encoding. +# +# This script requires cx_Oracle 8.2 and higher. +#------------------------------------------------------------------------------ + +import cx_Oracle as oracledb +import sample_env + +STRING_VAL = 'I bought a cafetière on the Champs-Élysées' + +def return_strings_as_bytes(cursor, name, default_type, size, precision, + scale): + if default_type == oracledb.DB_TYPE_VARCHAR: + return cursor.var(str, arraysize=cursor.arraysize, bypass_decode=True) + +with oracledb.connect(sample_env.get_main_connect_string()) as conn: + + # truncate table and populate with our data of choice + with conn.cursor() as cursor: + cursor.execute("truncate table TestTempTable") + cursor.execute("insert into TestTempTable values (1, :val)", + val=STRING_VAL) + conn.commit() + + # fetch the data normally and show that it is returned as a string + with conn.cursor() as cursor: + cursor.execute("select IntCol, StringCol from TestTempTable") + print("Data fetched using normal technique:") + for row in cursor: + print(row) + print() + + # fetch the data, bypassing the decode and show that it is returned as + # bytes + with conn.cursor() as cursor: + cursor.outputtypehandler = return_strings_as_bytes + cursor.execute("select IntCol, StringCol from TestTempTable") + print("Data fetched using bypass decode technique:") + for row in cursor: + print(row) diff --git a/src/cxoCursor.c b/src/cxoCursor.c index 3ed99b5..170b224 100644 --- a/src/cxoCursor.c +++ b/src/cxoCursor.c @@ -1809,27 +1809,39 @@ static PyObject *cxoCursor_setOutputSize(cxoCursor *cursor, PyObject *args) static PyObject *cxoCursor_var(cxoCursor *cursor, PyObject *args, PyObject *keywordArgs) { - static char *keywordList[] = { "type", "size", "arraysize", - "inconverter", "outconverter", "typename", "encodingErrors", "bypassencoding", - NULL }; + static char *keywordList[] = { "typ", "size", "arraysize", "inconverter", + "outconverter", "typename", "encoding_errors", "bypass_decode", + "encodingErrors", NULL }; + Py_ssize_t encodingErrorsLength, encodingErrorsDeprecatedLength; + const char *encodingErrors, *encodingErrorsDeprecated; PyObject *inConverter, *outConverter, *typeNameObj; - Py_ssize_t encodingErrorsLength; + int size, arraySize, bypassDecode; cxoTransformNum transformNum; - const char *encodingErrors; cxoObjectType *objType; - int size, arraySize, bypassEncoding; PyObject *type; cxoVar *var; // parse arguments - size = bypassEncoding = 0; - encodingErrors = NULL; + size = bypassDecode = 0; arraySize = cursor->bindArraySize; + encodingErrors = encodingErrorsDeprecated = NULL; inConverter = outConverter = typeNameObj = NULL; - if (!PyArg_ParseTupleAndKeywords(args, keywordArgs, "O|iiOOOz#p", + if (!PyArg_ParseTupleAndKeywords(args, keywordArgs, "O|iiOOOz#pz#", keywordList, &type, &size, &arraySize, &inConverter, &outConverter, - &typeNameObj, &encodingErrors, &encodingErrorsLength, &bypassEncoding)) + &typeNameObj, &encodingErrors, &encodingErrorsLength, + &bypassDecode, &encodingErrorsDeprecated, + &encodingErrorsDeprecatedLength)) return NULL; + if (encodingErrorsDeprecated) { + if (encodingErrors) { + cxoError_raiseFromString(cxoProgrammingErrorException, + "encoding_errors and encodingErrors cannot both be " + "specified"); + return NULL; + } + encodingErrors = encodingErrorsDeprecated; + encodingErrorsLength = encodingErrorsDeprecatedLength; + } // determine the type of variable if (cxoTransform_getNumFromType(type, &transformNum, &objType) < 0) @@ -1861,10 +1873,9 @@ static PyObject *cxoCursor_var(cxoCursor *cursor, PyObject *args, strcpy((char*) var->encodingErrors, encodingErrors); } - // Flag that manually changes transform type to bytes - if (bypassEncoding) { + // if the decode step is to be bypassed, use the binary transform instead + if (bypassDecode) var->transformNum = CXO_TRANSFORM_BINARY; - } return (PyObject*) var; }