Fixing multi method for to_sql for non-oracle databases #57311

kassett · 2024-02-09T02:44:08Z

closes BUG: method="multi" is not working with DataFrame.to_sql #57310
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v2.2.1.rst file if fixing a bug or adding a new feature.

mroeschke

We don't have to have dialect specific logic in sql.py. All ops should be generically applicable to the db backend

kassett · 2024-02-10T01:19:06Z

Totally fair not to accept dialect specific changes, but with the previous commit, ALL queries using the method="multi" argument resolve to insertion via single row. Is it worth breaking this functionality just for the sake of the Oracle connector?

mroeschke · 2024-02-10T01:43:41Z

I think it's reasonable to revert #51648 that broke this for all method="multi" queries. Would you be interested in making a PR to do that?

kassett · 2024-02-10T01:49:11Z

Yes, I definitely can make that PR, should I keep it in this PR and just change it or make a new one?
Also, is it worth making a warning at least for Oracle?

mroeschke · 2024-02-10T01:53:37Z

Yes, I definitely can make that PR, should I keep it in this PR and just change it or make a new one?

Sure you can revert the code change in this PR. It would be good to add a unit test as well so it doesn't break again (test_sql.py)

Also, is it worth making a warning at least for Oracle?

Yeah in the docstring of to_sql that would be great

doc/source/whatsnew/v2.2.1.rst

pandas/io/sql.py

pandas/tests/io/test_sql.py

mroeschke · 2024-02-12T18:38:34Z

pandas/tests/io/test_sql.py

@@ -4331,3 +4330,35 @@ def test_xsqlite_if_exists(sqlite_buildin):
        (5, "E"),
    ]
    drop_table(table_name, sqlite_buildin)
+
+
+def test_execution_of_multi(mysql_pymysql_engine):


This test should test a public API like

df = DataFrame(...) df.to_sql(..., method="multi") result = pd.read_sql(...) tm.assert_frame_equal(df, result)

When you say this, do you mean that I should test that the data inserted matches the data read? The test that I performed shows that the statement executed is multi-value. Without a multi-value insert, all the data is still inserted, it is just executed as multiple SQL statements. Here I show that 1 SQL statement contains multiple rows.

OK I see the difficulty. I would still say we don't want to add this complex of a unit test since it can become hard to debug/maintain in the future.

Let's just remove this unit test for now. I think it will take some reorganizing of the sql code to make testing this easier

Is the complexity in the monkeypatching or in the regex of the SQL statement? Because perhaps I can find some other way to analyze the statement that is not using regex?
I added a similar unit test on an internal repo which writes to a PrestoDB. There, however, I justed added a timeout because one was exponentially longer than the other.

I simplified the monkeypatch -- I think this is a pretty durable test now.

Any reason to not use the sqlalchemy_connectable fixture for this?

Also I agree with @mroeschke - this test gets into the internals of both pandas implementation and event listening in SQLAlchemy. I don't think there is value in forcing developers / maintainers to have that detailed of knowledge of both.

@WillAyd The sqlalchemy_connectable would work well.
@mroeschke So is the move to not do a unit test here?

Yes I would still recommend not having a unit test for now.

How does it look now?

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

mroeschke · 2024-02-14T17:13:40Z

Looking good but please run the pre-commit checks to fix style issues: https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#pre-commit

mroeschke · 2024-02-17T01:10:05Z

Thanks @kassett

…oracle databases

…r non-oracle databases) (#57466) Backport PR #57311: Fixing multi method for to_sql for non-oracle databases Co-authored-by: Samuel Chai <121340503+kassett@users.noreply.github.com>

…7311) * Fixing multi method for to_sql for non-oracle databases * Simplifying the if statement * adding a doc * Adding unit test * Update doc/source/whatsnew/v2.2.1.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Reverted formatting in test_sql * Simplifying unit test * Removing unit test * remove trailing whitespaces * Removing trailing whitespace * fixing alpahbetical sorting --------- Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

kassett added 3 commits February 8, 2024 21:28

Fixing multi method for to_sql for non-oracle databases

5035b11

Simplifying the if statement

7f4b5f8

adding a doc

641b144

simonjayhawkins added IO SQL to_sql, read_sql, read_sql_query Regression Functionality that used to work in a prior pandas version labels Feb 9, 2024

simonjayhawkins added this to the 2.2.1 milestone Feb 9, 2024

mroeschke requested changes Feb 9, 2024

View reviewed changes

Adding unit test

0f846d4

kassett requested a review from mroeschke February 12, 2024 14:57

mroeschke reviewed Feb 12, 2024

View reviewed changes

doc/source/whatsnew/v2.2.1.rst Outdated Show resolved Hide resolved

mroeschke reviewed Feb 12, 2024

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

mroeschke reviewed Feb 12, 2024

View reviewed changes

pandas/tests/io/test_sql.py Outdated Show resolved Hide resolved

mroeschke reviewed Feb 12, 2024

View reviewed changes

kassett and others added 6 commits February 12, 2024 14:58

Update doc/source/whatsnew/v2.2.1.rst

51cadc9

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

Reverted formatting in test_sql

22c3b86

Simplifying unit test

8b6fd74

Removing unit test

cf8be43

remove trailing whitespaces

a7a1cc5

Removing trailing whitespace

eefe5c3

fixing alpahbetical sorting

5462c7c

kassett requested review from mroeschke and WillAyd February 16, 2024 23:57

Merge branch 'main' into to-sql-multi-fix

590e82a

mroeschke approved these changes Feb 17, 2024

View reviewed changes

mroeschke merged commit f8a7fe4 into pandas-dev:main Feb 17, 2024
47 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 17, 2024

Backport PR pandas-dev#57311: Fixing multi method for to_sql for non-…

7b57807

…oracle databases

meeseeksmachine mentioned this pull request Feb 17, 2024

Backport PR #57311 on branch 2.2.x (Fixing multi method for to_sql for non-oracle databases) #57466

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing multi method for to_sql for non-oracle databases #57311

Fixing multi method for to_sql for non-oracle databases #57311

kassett commented Feb 9, 2024 •

edited

Loading

mroeschke left a comment

kassett commented Feb 10, 2024

mroeschke commented Feb 10, 2024

kassett commented Feb 10, 2024

mroeschke commented Feb 10, 2024

mroeschke Feb 12, 2024

kassett Feb 12, 2024 •

edited

Loading

mroeschke Feb 12, 2024

kassett Feb 12, 2024

kassett Feb 13, 2024

WillAyd Feb 13, 2024

kassett Feb 13, 2024

mroeschke Feb 13, 2024

kassett Feb 13, 2024 •

edited

Loading

mroeschke commented Feb 14, 2024

mroeschke commented Feb 17, 2024

Fixing multi method for to_sql for non-oracle databases #57311

Fixing multi method for to_sql for non-oracle databases #57311

Conversation

kassett commented Feb 9, 2024 • edited Loading

mroeschke left a comment

Choose a reason for hiding this comment

kassett commented Feb 10, 2024

mroeschke commented Feb 10, 2024

kassett commented Feb 10, 2024

mroeschke commented Feb 10, 2024

mroeschke Feb 12, 2024

Choose a reason for hiding this comment

kassett Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

mroeschke Feb 12, 2024

Choose a reason for hiding this comment

kassett Feb 12, 2024

Choose a reason for hiding this comment

kassett Feb 13, 2024

Choose a reason for hiding this comment

WillAyd Feb 13, 2024

Choose a reason for hiding this comment

kassett Feb 13, 2024

Choose a reason for hiding this comment

mroeschke Feb 13, 2024

Choose a reason for hiding this comment

kassett Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

mroeschke commented Feb 14, 2024

mroeschke commented Feb 17, 2024

kassett commented Feb 9, 2024 •

edited

Loading

kassett Feb 12, 2024 •

edited

Loading

kassett Feb 13, 2024 •

edited

Loading