New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance when running a large statement. sqlalchemy + oracledb #172
Comments
Triple check you are connecting to the same DB in both scenarios! Can you add a call to init_oracle_client() (to run in Thick mode), set the environment variable I don't know what has changed in Pandas 2, but it may be worth checking that version too. |
import re
import time
sql = """
-- large sql script
"""
print(len(sql)) # 351449
start = time.time()
sql = re.sub(r"/\*[\S\n ]+?\*/", "", sql)
print(time.time() - start) # 0.0004279613494873047
start = time.time()
sql = re.sub(r"\--.*(\n|$)", "", sql)
print(time.time() - start) # 0.0004191398620605469
start = time.time()
sql = re.sub(r"""'[^']*'(?=(?:[^']*[^']*')*[^']*$)*""", "", sql,
flags=re.MULTILINE)
print(time.time() - start) # 3571.5018050670624
start = time.time()
sql = re.sub(r'(:\s*)?("([^"]*)")',
lambda m: m.group(0) if sql[m.start(0)] == ":" else "",
sql)
print(time.time() - start) # 0.007905006408691406
|
That's helpful. Are you able to share the large SQL script? You can e-mail it to me if you prefer (anthony.tuininga@gmail.com). Some adjustments to the regular expressions are planned to avoid these problems! |
I have pushed a patch that should correct this issue. If you are able to build from source you can verify that it corrects your issue as well. |
I have tested a patch that you pushed. Now code works really faster. |
This has been included in version 1.3.1 which was just released! |
Hi,
I have the following code:
df
variable and then run it directly in DBeaver - this statement will complete in 17-20 seconds. And it's okay. Compiled SQL contains 347k characters.pd.read_sql
, then this code will run for about 16 minutes. I noticed that the delay happens in the_prepare
method (see trace below).pd.read_sql
and they execute normally.df
variable, but for PostgreSQL. And it run for about 35 seconds.Issue
prepare
step?requirements.txt
Thank you!
The text was updated successfully, but these errors were encountered: