to_sql function takes forever to insert in oracle database #14315

rajattjainn · 2016-09-28T12:00:11Z

I am using pandas to do some analysis on a excel file, and once that analysis is complete, I want to insert the resultant dataframe into a database. The size of this dataframe is around 300,000 rows and 27 columns.
I am using pd.to_sql method to insert dataframe in the database. When I use a MySQL database, insertion in the database takes place around 60-90 seconds. However when I try to insert the same dataframe using the same function in an oracle database, the process takes around 2-3 hours to complete.

Relevant code can be found below:

data_frame.to_sql(name='RSA_DATA',  con=get_engine(), if_exists='append',
                          index=False, chunksize=config.CHUNK_SIZE)

I tried using different chunk_sizes (from 50 to 3000), but the difference in time was only of the order of 10 minutes.
Any solution to the above problem ?

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-09-28T12:29:52Z

What database driver are you using?
Given that it differs so much between two databases, it seems likely the problem should be searched in the driver, the speed of the connection, settings of the database, ... (which all influence the speed of the insertion).

rajattjainn · 2016-09-29T04:32:09Z

I used cx_Oracle driver to connect oracle database with my code.
However I don't know if cx_Oracle driver is the cause of this problem. Using a different but a hacky approach, I have been able to insert data in around 120 seconds. I broke the dataframe into multiple dataframes using numpy.array_split() method and used SQLAlchemy bulk insert for inserting in database. But I think this is more of a hack, and not the best solution.

Both the databases are on same machine (I used a lubuntu Virtual Machine for this comparison), hence connection speed shouldn't be an issue ?

jorisvandenbossche · 2016-09-29T21:19:32Z

@addresseerajat Can you have a look at the discussion in #8953 ?
Especially the monkey patch suggested here: #8953 (comment) is possibly worth to try out.

rajattjainn · 2016-10-04T03:28:27Z

@jorisvandenbossche: I looked at the solution and tried using a similar approach. The relevant code is as follows:

df_dict = data_frame.to_dict(orient='records')
connection = get_engine()
connection.execute(rsa_data.insert().values(df_dict))

rsa_data is the name of table into which I am inserting data.

The above line gives me an error:

The 'oracle' dialect with current database version settings does not support in-place multirow inserts.

My database version is oracle 11g.

However when I execute the following command, I am able to insert into the database. The only problem: it takes a lot of time to insert.

df_dict = data_frame.to_dict(orient='records')
connection = get_engine()
connection.execute(rsa_data.insert(), df_dict)

daefresh · 2016-11-30T01:57:33Z

Were there any other findings here? I've discovered that when pushing data into oracle using cx_oracle it's painfully slow. 10 rows can take 15 seconds to insert. The server we're using is decent (32GB of RAM and 8 core).

wuhuanyan · 2018-07-30T02:15:26Z

我最近遇到了同样的问题。最后，我找到了解决问题的方法。
pandas.dataframe.to_sql with oracle database

BenjaminHabert · 2019-02-04T13:59:47Z

As mentioned by @wuhaochen I have also ran into this problem. For me the issue was that oracle was creating columns of CLOB data type for all the string columns of the pandas dataframe. I sped-up the code by explicitly setting the ~~schema~~ dtype parameter of to_sql() and using VARCHAR dtypes for string columns.

I think this should be the default behavior of to_sql as creating CLOB is counter-intuitive.

rm17tink · 2019-10-10T17:23:48Z

Could you provide example for the varchar conversion? numbers always work quickly. thanks

BenjaminHabert · 2019-10-14T12:52:55Z

Sorry, the correct parameter of to_sql() is dtype. This stackoverflow answer should be helpful for this problem.

iron0012 · 2022-09-08T14:32:31Z

to_sql() is still practically broken when working with Oracle without using the workaround recommended above.

mroeschke · 2024-01-27T22:51:49Z

It's not clear that there's a pandas specific fix for this issue so going to close

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Sep 28, 2016

adamboche mentioned this issue Jul 7, 2017

Pandas dataframe to remote sql is slow blaze/odo#552

Open

TomAugspurger mentioned this issue Feb 12, 2018

enable multivalues insert #19664

Merged

2 tasks

mroeschke added the Performance Memory or execution speed performance label May 16, 2020

mroeschke closed this as completed Jan 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_sql function takes forever to insert in oracle database #14315

to_sql function takes forever to insert in oracle database #14315

rajattjainn commented Sep 28, 2016

jorisvandenbossche commented Sep 28, 2016

rajattjainn commented Sep 29, 2016

jorisvandenbossche commented Sep 29, 2016

rajattjainn commented Oct 4, 2016

daefresh commented Nov 30, 2016

wuhuanyan commented Jul 30, 2018 •

edited

Loading

BenjaminHabert commented Feb 4, 2019 •

edited

Loading

rm17tink commented Oct 10, 2019

BenjaminHabert commented Oct 14, 2019

iron0012 commented Sep 8, 2022

mroeschke commented Jan 27, 2024

to_sql function takes forever to insert in oracle database #14315

to_sql function takes forever to insert in oracle database #14315

Comments

rajattjainn commented Sep 28, 2016

jorisvandenbossche commented Sep 28, 2016

rajattjainn commented Sep 29, 2016

jorisvandenbossche commented Sep 29, 2016

rajattjainn commented Oct 4, 2016

daefresh commented Nov 30, 2016

wuhuanyan commented Jul 30, 2018 • edited Loading

BenjaminHabert commented Feb 4, 2019 • edited Loading

rm17tink commented Oct 10, 2019

BenjaminHabert commented Oct 14, 2019

iron0012 commented Sep 8, 2022

mroeschke commented Jan 27, 2024

wuhuanyan commented Jul 30, 2018 •

edited

Loading

BenjaminHabert commented Feb 4, 2019 •

edited

Loading