New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cx_Oracle >=6.0.3 doesn't allow to use NaNs #184

Closed
leonid-butenko opened this Issue May 23, 2018 · 12 comments

Comments

Projects
None yet
3 participants
@leonid-butenko
Copy link

leonid-butenko commented May 23, 2018

I found here #91 that the usage of NaN values corrupts the data. We've been using NaNs for a few years now and with a recent update of cx_Oracle have realized it's not supported any longer. What are the alternatives for NaNs? From scientific point of view, having NaNs is quite essential and important.

here is the code snippet I use to reproduce the issue:

import cx_Oracle

c = cx_Oracle.connect(
    dsn=cx_Oracle.makedsn("db", 1521, sid="TEST"), user="USER", password="PWD")

cursor = c.cursor()

try:
    cursor.execute("drop table foo")
except:
    pass

cursor.execute("""
    create table foo(
        id integer primary key,
        data BINARY_FLOAT
    )
""")

cursor.execute("INSERT INTO foo (id, data) VALUES (:id, :data)", {"id": 1, "data": float(100.0)})
cursor.execute("UPDATE foo SET data=:data", {"data": float('nan')})

c.commit()

I get the following error:

>python ./test.py 
Traceback (most recent call last):
  File "./test.py", line 21, in <module>
    cursor.execute("UPDATE foo SET data=:data", {"data": float('nan')})
cx_Oracle.DatabaseError: DPI-1055: value is not a number (NaN) and cannot be used in Oracle numbers

Answer the following questions:

  1. What is your version of Python? Is it 32-bit or 64-bit?

python 2.7.14, 64 bits

  1. What is your version of cx_Oracle?

6.3

  1. What is your version of the Oracle client (e.g. Instant Client)? How was it
    installed? Where is it installed?

I tried with 11.2 and 12.2

  1. What is your version of the Oracle Database?

11g enterprise

  1. What is your OS and version?

ubuntu 18.04 and Debian 9

  1. What compiler version did you use? For example, with GCC, run
    gcc --version.

gcc (GCC) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.

  1. What environment variables did you set? How exactly did you set them?

I use anaconda cloud running in docker.

  1. What exact command caused the problem (e.g. what command did you try to
    install with)? Who were you logged in as?
>docker run -it --rm continuumio/miniconda bash
>conda install -c conda-forge -c anaconda  cx_oracle=6.0.3 oracle-instantclient libaio
>python test.py
  1. What error(s) you are seeing?

Traceback (most recent call last):
File "./test.py", line 21, in
cursor.execute("UPDATE foo SET data=:data", {"data": float('nan')})
cx_Oracle.DatabaseError: DPI-1055: value is not a number (NaN) and cannot be used in Oracle numbers

@zzzeek

This comment has been minimized.

Copy link

zzzeek commented May 23, 2018

Oracle's documentation implies NaN is supported: https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements001.htm#SQLRF50979

this answer https://stackoverflow.com/a/28461506/34549 seems to indicate you need to use binary_float or binary_double datatypes. should that work ?

@leonid-butenko

This comment has been minimized.

Copy link
Author

leonid-butenko commented May 23, 2018

Thanks for finding links! Pls note, I use BINARY_FLOAT in the snippet above.

@leonid-butenko leonid-butenko changed the title cx_Oracle >=6.0.3 doesn't allow a usage of NaNs cx_Oracle >=6.0.3 doesn't allow to use NaNs May 23, 2018

@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented May 23, 2018

Thanks, @zzzeek. Yes, NaN is supported, but only in BINARY_FLOAT and BINARY_DOUBLE. If you use it with NUMBER (which is the default type conversion) you may run into data corruption issues, which is why the change was made. So you simply need to use cursor.setinputsizes() with the type cx_Oracle.NATIVE_FLOAT to tell it you are inserting into a BINARY_FLOAT or BINARY_DOUBLE column.

@zzzeek

This comment has been minimized.

Copy link

zzzeek commented May 23, 2018

Well, good thing I added logic to allow more specific setinputsizes setup for specific SQLAlchemy sub-types.

What are the downsides to NATIVE_FLOAT? I need to know when I'm using this and when I'm not. Again, we go to #68 (comment), where I changed SQLAlchemy to not setinputsizes() for numeric types. Especially, if the datatype is BINARY_DOUBLE, and in an executemany() there are lots of wide-precision numeric values coming in, how do I set NATIVE_FLOAT there without introducing truncation for the non-NaN values ?

@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented May 23, 2018

The downsides to NATIVE_FLOAT? Well, first of all it has a different set of values that it represents than does NUMBER. It handles NaN but won't handle values greater than 2^53 and the precision capabilities are different. Second, if the target type is different (NATIVE_FLOAT going to NUMBER, for example) there is a conversion that is required, which implies a performance penalty will be in place. So unfortunately the decision to use one type or the other is going to be based on the target -- which cx_Oracle can not know about. Most of the time this is NUMBER and the default works well but if you know otherwise, then you have to use setinputsizes() or accept the truncation/performance penalty.

You can use cursor.setinputsizes() for cursor.executemany() in exactly the same way that you use it for cursor.execute(). You are telling cx_Oracle that all of the data that follows will match the type you have specified. If this is not true on the first non-None value passed in and conversion is not possible then cx_Oracle silently picks a type based on that value instead. If this is not true on subsequent non-None values and conversion is not possible then you get an error. If you know the target is BINARY_FLOAT or BINARY_DOUBLE then you should use cursor.setinputsizes() to let cx_Oracle know to use the correct type.

What exactly do you mean by wide-precision numeric values coming in for calls to executemany()? If they are float values coming in they will be sent to the database unchanged when using NATIVE_FLOAT. Can you clarify your comment?

@zzzeek

This comment has been minimized.

Copy link

zzzeek commented May 23, 2018

ah you know I was confusing DOUBLE with NUMBER. per https://docs.oracle.com/cd/B28359_01/server.111/b28285/sqlqr06.htm#i76716, BINARY_FLOAT and BINARY_DOUBLE are both floating point types. but...what is the precision we expect with these two types? my goal is to avoid the bug in #68. I need numeric values to round-trip perfectly given the precision that the type defines.

so far it seems like I should setinputsizes() to NATIVE_FLOAT with the BINARY_FLOAT and BINARY_DOUBLE types. I don't care about a slight performance penalty. I care about accuracy being maintained and no surprise truncation.

@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented May 23, 2018

BINARY_FLOAT is single precision floating point and BINARY_DOUBLE is double precision floating point (the same type that Python uses internally for its "float" type). So you will get the same precision as those values allow. See here for all the nitty gritty details, but the main point for double is you get 2^53 as the maximum integer that can be used and about 15 significant digits and a maximum value of 10^308. Oracle's NUMBER format has a far smaller scale but greater precision. You can typically get 38 significant digits and all of those can be used to represent an integer but the maximum value is only 10^126.

Given those constraints the input from Python and the target in the database are both important. If you have BINARY_FLOAT in the database you have to ensure that the input data is Python floats, integers or decimal values that will fit inside the constraints of BINARY_FLOAT. If you don't do so, you will get silent truncation. The same will be true for BINARY_DOUBLE. Clearly Python floats will always fit perfectly inside BINARY_DOUBLE but integers and decimals can easily exceed its constraints. For NUMBER data Python floats will always result in some truncation for some numbers. Most of the time that isn't an issue but its always there because NUMBER is a decimal format, not a binary format (unlike BINARY_FLOAT and BINARY_DOUBLE which are binary format). For that reason Python decimals and integers fit perfectly into Oracle NUMBER but again both integers and decimals can easily exceed the constraints of NUMBER. And for the same reason Python decimals will always result in conversion errors when going into binary format.

If it sounds complicated, that's because it is! Thankfully most of the time all of these "issues" can be ignored as the precision/scale that most people are dealing with are well within the constraints of all of these types. But if you deal with numbers of larger precision/scale and values like NaN on a regular basis then you have to know all of this stuff and know how to make the software behave the way it should. cx_Oracle has the ability to do this using cursor.setinputsizes() and tries its best to use reasonable defaults. There is no way that I am aware of to do it perfectly without programmer input -- but I'm interested in anything you might suggest to improve things!

@zzzeek

This comment has been minimized.

Copy link

zzzeek commented May 23, 2018

Hi Anthony -

OK, I have Python float("nan") round tripping. But now what about the same thing #78 did for INFINITY? Decimal("NaN") should round trip as well. I'm calling setinputsizes(NATIVE_FLOAT):

cx_Oracle.DatabaseError) DPI-1043: invalid number [SQL: 'INSERT INTO t1 (intcol, numericcol) VALUES (:intcol, :numericcol)'] [parameters: [{'intcol': 1, 'numericcol': Decimal('NaN')}, {'intcol': 2, 'numericcol': Decimal('NaN')}]]

@leonid-butenko

This comment has been minimized.

Copy link
Author

leonid-butenko commented May 24, 2018

After more experimenting today I confirm that setinputsizes with NATIVE_FLOAT fixes the original problem. But not everywhere. As i mentioned above I use anaconda cloud to set up my environment. I have realized that using most recent cx_oracle module version 6.2.1 available from official anaconda channels still crashes even with setinputsizes. Then I ran more tests and figured out that among the following versions

  • 6.0.3
  • 6.1
  • 6.2.1
  • 6.3
  • 6.3.1

only 6.2.1 behaves badly and produces the following error:

Traceback (most recent call last):
  File "test_cx.py", line 21, in <module>
    cursor.execute("INSERT INTO foo (id, data) VALUES (:id, :data)", {"id": 1, "data": float('nan')})
cx_Oracle.DatabaseError: DPI-1055: value is not a number (NaN) and cannot be used in Oracle numbers

So the question is, are you aware of this? Is it a kind of expected behavior of that particular version?

anthony-tuininga added a commit that referenced this issue May 24, 2018

@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented May 24, 2018

@zzzeek, I have just pushed a commit that will permit binding decimal values to cx_Oracle.NATIVE_FLOAT. Truncation will occur, of course, if the decimal value exceeds the constraints of a double value, but since you are binding to cx_Oracle.NATIVE_FLOAT I think that makes sense. This is not permitted for cx_Oracle.NUMBER however, to avoid the problem noted in issue #68.

@leonid-butenko, apparently that was a regression in 6.2 which was corrected in 6.3 (and a test case added to the test suite to prevent it from happening again).

@anthony-tuininga

This comment has been minimized.

Copy link
Member

anthony-tuininga commented May 30, 2018

Assuming all is well due to lack of activity.

@zzzeek

This comment has been minimized.

Copy link

zzzeek commented May 30, 2018

hi Anthony -

havent' had time to test it. ill see if I can ping this issue when I get around to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment