Refactor sqlalchemy code in pandas.io.sql to help prepare for sqlalchemy 2.0. #49531

cdcadman · 2022-11-04T19:01:53Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

I am splitting this out of #48576 , because it is a major refactor of the code, with the goal of making SQLDatabase only accept a Connection and not an Engine. sqlalchemy 2.0 restricts the methods that are available to them, which makes it harder to write code that works with both. For example, Connection.connect() creates a branched connection in sqlalchemy 1.x, but is removed in 2.0, but this is called in SQLDatabase.check_case_sensitive().

I also added some clarification on how transactions work in DataFrame.to_sql, based on this example, run against pandas 1.5.1:

import sqlite3
from pandas import DataFrame
from sqlalchemy import create_engine

with sqlite3.connect(":memory:") as con:
    con.execute("create table test (A integer, B integer)")
    row_count = con.execute("insert into test values (2, 4), (5, 10)").rowcount
    if row_count > 1:
        con.rollback()
    print(con.execute("select count(*) from test").fetchall()[0][0]) # prints 0

with sqlite3.connect(":memory:") as con:
    con.execute("create table test (A integer, B integer)")
    row_count = DataFrame({'A': [2, 5], 'B': [4, 10]}).to_sql('test', con, if_exists='append', index=False)
    if row_count > 1:
        con.rollback() # does nothing, because pandas already committed the transaction.
    print(con.execute("select count(*) from test").fetchall()[0][0]) # prints 2
    
with create_engine("sqlite:///:memory:").connect() as con:
    with con.begin():
        con.exec_driver_sql("create table test (A integer, B integer)")
    try:
        with con.begin():
            row_count = DataFrame({'A': [2, 5], 'B': [4, 10]}).to_sql('test', con, if_exists='append', index=False)
            assert row_count < 2
    except AssertionError:
        pass
    print(con.execute("select count(*) from test").fetchall()[0][0]) # prints 0

mroeschke · 2022-11-04T22:30:51Z

Thanks for the followup, but could you pair down the changes even more (1 PR for 1 targeted change). It will greatly help the review process. It appears this PR is tackling 3 things which would be better if they were 3 individual PRs

SQLDatabase accepting connections only
Disposing engines
test refactoring

cdcadman · 2022-11-04T22:58:18Z

Yes, I think I can do that. For the first part, where I make SQLDatabase accept Connections only, I might have to disable the tests which pass an Engine, but that will be a much smaller change than the test refactor.

cdcadman · 2022-11-05T01:18:43Z

I pulled the engine disposal and test refactoring out of this PR. I think the engine disposal can wait until that last PR. But part of my motivation for going with this approach was so that I could put the engine disposal into the _sqlalchemy_con function.

pandas/io/sql.py

pandas/tests/io/test_sql.py

pandas/io/sql.py

cdcadman · 2022-11-15T10:30:33Z

@mroeschke , I've addressed all the comments. After this PR, the next step would be to refactor the test classes, since the ones involving the sqlalchemy engines would no longer be necessary. In a later PR, I would dispose of the sqlalchemy engine, if it needed to be created, within the new function _sqlalchemy_con.

mroeschke · 2022-11-16T22:48:14Z

pandas/io/sql.py

+
+
+@contextmanager
+def _sqlalchemy_con(connectable, need_transaction: bool):


Will this be reused in your followup PR? I have a slight preference of just folding this into pandasSQL_builder for now until there's a need to share this connection logic

I only plan for _sqlalchemy_con to be called by pandasSQL_builder and not by anything else. I split it out to reduce the amount of indentation and keep the functions small. There is an additional try/finally block and some additional if/else blocks in lines 772-800 here: https://github.com/cdcadman/pandas/blob/sql_fixes/pandas/io/sql.py . I think I can fold _sqlalchemy_con into pandasSQL_builder if you still prefer that.

Okay gotcha. I think it'll be fine to leave as a separate function with more code incoming.

However, could you replace import sqlalchemy here with sqlalchemy = import_optional_dependnecy(..., errors="raise")? Just so that if someone else tries using this function in the future it's known that sqlalchemy is required.

Ok, I made the change.

mroeschke · 2022-11-17T18:29:52Z

Awesome, thanks for the progress here @cdcadman

…emy 2.0. (pandas-dev#49531) * DOC: Clarify behavior of DataFrame.to_sql * CLN: Make SQLDatabase only accept a sqlalchemy Connection. Co-authored-by: Chuck Cadman <charles.cadman@standard.com>

mroeschke reviewed Nov 7, 2022

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

mroeschke reviewed Nov 7, 2022

View reviewed changes

pandas/tests/io/test_sql.py Show resolved Hide resolved

mroeschke reviewed Nov 7, 2022

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

mroeschke reviewed Nov 7, 2022

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

mroeschke added the IO SQL to_sql, read_sql, read_sql_query label Nov 7, 2022

cdcadman mentioned this pull request Nov 8, 2022

TYP:Replace union of subclasses with base class. #49587

Merged

5 tasks

DOC: Clarify behavior of DataFrame.to_sql

56248c0

mroeschke reviewed Nov 16, 2022

View reviewed changes

CLN: Make SQLDatabase only accept a sqlalchemy Connection.

57a63c2

mroeschke approved these changes Nov 17, 2022

View reviewed changes

mroeschke added this to the 2.0 milestone Nov 17, 2022

mroeschke merged commit 08f070d into pandas-dev:main Nov 17, 2022

cdcadman deleted the refactor_sql branch November 17, 2022 18:54

cdcadman mentioned this pull request Nov 17, 2022

TST: Refactor sql test classes. #49757

Merged

5 tasks

This was referenced Nov 29, 2022

Make pandas/io/sql.py work with sqlalchemy 2.0 #48576

Merged

BUG: Allow read_sql to work with chunksize. #49967

Merged

This was referenced Dec 10, 2022

BUG: Make io.sql.execute raise TypeError on Engine or URI string. #50177

Merged

DEPR: pandas.io.sql.execute #50185

Closed

BUG: read_sql with chunksize fails due to connection already closed #50199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor sqlalchemy code in pandas.io.sql to help prepare for sqlalchemy 2.0. #49531

Refactor sqlalchemy code in pandas.io.sql to help prepare for sqlalchemy 2.0. #49531

cdcadman commented Nov 4, 2022 •

edited

Loading

mroeschke commented Nov 4, 2022

cdcadman commented Nov 4, 2022

cdcadman commented Nov 5, 2022

cdcadman commented Nov 15, 2022

mroeschke Nov 16, 2022

cdcadman Nov 16, 2022

mroeschke Nov 17, 2022

cdcadman Nov 17, 2022

mroeschke commented Nov 17, 2022



		@contextmanager
		def _sqlalchemy_con(connectable, need_transaction: bool):

Refactor sqlalchemy code in pandas.io.sql to help prepare for sqlalchemy 2.0. #49531

Refactor sqlalchemy code in pandas.io.sql to help prepare for sqlalchemy 2.0. #49531

Conversation

cdcadman commented Nov 4, 2022 • edited Loading

mroeschke commented Nov 4, 2022

cdcadman commented Nov 4, 2022

cdcadman commented Nov 5, 2022

cdcadman commented Nov 15, 2022

mroeschke Nov 16, 2022

Choose a reason for hiding this comment

cdcadman Nov 16, 2022

Choose a reason for hiding this comment

mroeschke Nov 17, 2022

Choose a reason for hiding this comment

cdcadman Nov 17, 2022

Choose a reason for hiding this comment

mroeschke commented Nov 17, 2022

cdcadman commented Nov 4, 2022 •

edited

Loading