Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
cx_Oracle executemany silent failure when inserting big dataset #153
I am running the following versions:
I have a dataframe with 225 columns and more than 3M records:
When I try to use the cx_Oracle executemany command, I have no errors but also, no data inserted in the database!
It seems that the library has some memory leak and the exception is cached without raising any error.
Is this a dangerous behaviors that should prompt us to use another library?
PS: In case that you wonder, I tested my connection and steps using a small dataset (2 columns, 2 rows), and I was able to run and commit cx_Oracle executemany
Does this happen with "bare" cx_Oracle? In other words, without pandas? Can you provide a script that demonstrates the problem? Since the amount of data is significant the only meaningful way would be by generating the data. You can certainly work around the situation by batching the rows yourself, but it would be nice to figure out what is causing the problem -- and internally batch the rows in cx_Oracle if needed.
Which behaviour are you referring to? The use of NaN? You can use NaN with cx_Oracle but you do have to tell it you want to use BINARY_DOUBLE or BINARY_FLOAT, not NUMBER (the default).
When I pretend to use 3.5Million records x 225 columns with np.nan in some columns, the cx_Oracle library ends the executemany + commit statements without errors. It gives the impression that it works but it does not insert any data in the database. Very dangerous behavior to have it in production!.
Nevertheless, I did some further investigation:
Reading the documentation from Cursor.execute, it seems that I have a columns with None that comes up with a number:
But, I do have a cursor.setinputsizes that says that the expected column is a cx_Oracle.NATIVE_FLOAT
Just a note for future readers interested in loading large volumes of data: Oracle's SQL*Loader and Data Pump were added to Instant Client 12.2.
I would agree that this is invalid behaviour. Can you provide a script that demonstrates the problem?
Please do provide those stats! It may make sense to split the rows to insert into multiple chunks internally.
The limit isn't number of rows but data size (2 GB), so size of each row multiplied by the number of rows. I'll look into places to document that limitation. Most people don't run into it, though!
cursor.executemany() does indeed take into account the call to cursor.setinputsizes(). I can adjust that documentation as well.