Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing multi method for to_sql for non-oracle databases #57311

Merged
merged 12 commits into from
Feb 17, 2024

Conversation

kassett
Copy link
Contributor

@kassett kassett commented Feb 9, 2024

@simonjayhawkins simonjayhawkins added IO SQL to_sql, read_sql, read_sql_query Regression Functionality that used to work in a prior pandas version labels Feb 9, 2024
@simonjayhawkins simonjayhawkins added this to the 2.2.1 milestone Feb 9, 2024
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to have dialect specific logic in sql.py. All ops should be generically applicable to the db backend

@kassett
Copy link
Contributor Author

kassett commented Feb 10, 2024

Totally fair not to accept dialect specific changes, but with the previous commit, ALL queries using the method="multi" argument resolve to insertion via single row. Is it worth breaking this functionality just for the sake of the Oracle connector?

@mroeschke
Copy link
Member

I think it's reasonable to revert #51648 that broke this for all method="multi" queries. Would you be interested in making a PR to do that?

@kassett
Copy link
Contributor Author

kassett commented Feb 10, 2024

Yes, I definitely can make that PR, should I keep it in this PR and just change it or make a new one?
Also, is it worth making a warning at least for Oracle?

@mroeschke
Copy link
Member

Yes, I definitely can make that PR, should I keep it in this PR and just change it or make a new one?

Sure you can revert the code change in this PR. It would be good to add a unit test as well so it doesn't break again (test_sql.py)

Also, is it worth making a warning at least for Oracle?

Yeah in the docstring of to_sql that would be great

pandas/io/sql.py Outdated Show resolved Hide resolved
@@ -4331,3 +4330,35 @@ def test_xsqlite_if_exists(sqlite_buildin):
(5, "E"),
]
drop_table(table_name, sqlite_buildin)


def test_execution_of_multi(mysql_pymysql_engine):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should test a public API like

df = DataFrame(...)
df.to_sql(..., method="multi")
result = pd.read_sql(...)
tm.assert_frame_equal(df, result)

Copy link
Contributor Author

@kassett kassett Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say this, do you mean that I should test that the data inserted matches the data read? The test that I performed shows that the statement executed is multi-value. Without a multi-value insert, all the data is still inserted, it is just executed as multiple SQL statements. Here I show that 1 SQL statement contains multiple rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see the difficulty. I would still say we don't want to add this complex of a unit test since it can become hard to debug/maintain in the future.

Let's just remove this unit test for now. I think it will take some reorganizing of the sql code to make testing this easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the complexity in the monkeypatching or in the regex of the SQL statement? Because perhaps I can find some other way to analyze the statement that is not using regex?
I added a similar unit test on an internal repo which writes to a PrestoDB. There, however, I justed added a timeout because one was exponentially longer than the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the monkeypatch -- I think this is a pretty durable test now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not use the sqlalchemy_connectable fixture for this?

Also I agree with @mroeschke - this test gets into the internals of both pandas implementation and event listening in SQLAlchemy. I don't think there is value in forcing developers / maintainers to have that detailed of knowledge of both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd The sqlalchemy_connectable would work well.
@mroeschke So is the move to not do a unit test here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would still recommend not having a unit test for now.

Copy link
Contributor Author

@kassett kassett Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it look now?

@mroeschke
Copy link
Member

Looking good but please run the pre-commit checks to fix style issues: https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#pre-commit

@mroeschke mroeschke merged commit f8a7fe4 into pandas-dev:main Feb 17, 2024
47 checks passed
@mroeschke
Copy link
Member

Thanks @kassett

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 17, 2024
phofl pushed a commit that referenced this pull request Feb 17, 2024
…r non-oracle databases) (#57466)

Backport PR #57311: Fixing multi method for to_sql for non-oracle databases

Co-authored-by: Samuel Chai <121340503+kassett@users.noreply.github.com>
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…7311)

* Fixing multi method for to_sql for non-oracle databases

* Simplifying the if statement

* adding a doc

* Adding unit test

* Update doc/source/whatsnew/v2.2.1.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Reverted formatting in test_sql

* Simplifying unit test

* Removing unit test

* remove trailing whitespaces

* Removing trailing whitespace

* fixing alpahbetical sorting

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: method="multi" is not working with DataFrame.to_sql
4 participants