ENH: support both SQLAlchemy engines and connections #7877

Closed
maxgrenderjones opened this Issue Jul 30, 2014 · 14 comments

Comments

Projects
None yet
6 participants
Contributor

maxgrenderjones commented Jul 30, 2014

Feature request for supporting also Connection objects instead of only Engines.

Also reported in #8533, #10104


My use case - I often add a week's worth of data (stored in a pandas dataframe) to an existing table. Before I add the data, I want to be able to delete any existing data for this week from the table, but I want the whole thing to be wrapped in a transaction so I'm not left in a mess halfway through.

Here's what I therefore want to be able to do

# runs a transaction
with engine.begin() as connection:
    r1 = connection.execute(table.delete().where(table.c[WEEK]==week))
    data.to_sql(table.name, connection, if_exists='append')

i.e. my change is that I want to be able to pass a SQLAlchemy Connection object into DataFrame.to_sql instead of an Engine. (The underlying Engine of the Connection object can be accessed as .engine, so we should still be able to pull metadata etc)

maxgrenderjones changed the title from [ENH] Support transactions by allowing a connection to be passed in instead of an Engine to [ENH] Support transactions by allowing a SQLAlchemy Connection to be passed to .to_sql as well as Engine Jul 30, 2014

Is it possible to provide this engine yourself by doing the following in the with context: data.to_sql(table.name, connection.engine, if_exists='append')?

Because then it is the question: do we ask the user to do this themselves, or do we provide the convenience (but also clutter in interface) to also accept a connection?

Can this issue be closed? I agree w/ jorisvandenbossche that a change to pandas isn't necessary--you can easily access the .engine property from a SQLAlchemy connection object:

from sqlalchemy import create_engine
engine = create_engine('sqlite:///file.db')
connection = engine.connect()
# 'connection.engine' now refers to a readily usable engine object

You can indeed easily access the engine, but I don't really know how then the atomicity is handled (the reason to use a transaction in this case). Accessing the engine and using that, within the with engine.begin() as con context, does this defeat the purpose of that, or does this keep it atomic?
Anybody who knows this?

yes, but that is for one to_sql call, while here @maxgrenderjones wants to do multiple such calls in one transaction

Contributor

artemyk commented Sep 19, 2014

I tested it and running a statement as

conn.engine.execute(sql)

does not run it within the transaction (in other words, the statement would not be rolledback as needed)

One possibility would be to a connection kwarg to to_sql. So the call would be:

data.to_sql(table.name, engine, connection=connection, if_exists='append')

connection could be saved in the appropriate object and, if not None, used in run_transaction instead of creating a new connection. Thoughts?

Contributor

maxgrenderjones commented Sep 19, 2014

That would work. Still feels like it might be easier to pass either a connection or an engine, but I guess the connection option perhaps should be a little bit awkward, as it requires you to understand that you need to do your own transaction wrapping.

@artemyk Are you sure that providing a connection does run it within a transaction? As connection.begin() instead of engine.begin() does also run a transaction on its own.
Or does it work if the to_sql(..., connection, ...) is nested in another connection begin? (http://docs.sqlalchemy.org/en/latest/core/connections.html#nesting-of-transaction-blocks)

Contributor

artemyk commented Sep 19, 2014

@jorisvandenbossche The connection contextmanager (i.e. with engine.begin() as connection:) does run things within a transaction --- that's what we use now in run_transaction, and it passes tests for properly commiting/rollingback.
You are right that connection.begin() also is a way to run a transaction, that might be the right way to do nested transactions. Is that something we'd want to support?
BTW, there already seems to be a way to pass in a cursor --- see cur and is_cursor parameters some functions take. I'm not sure if that is deprecated or not, but seems clearly related to the functionality being requested here. to_sql doesn't support it though, nor does run_transaction check to see if the connection is a cursor.

Contributor

tr11 commented Mar 30, 2015

@artemyk, what is the advantage of using a sqlalchemy engine instead of a connection? Going through the io/sql.py code, we really only use has_tables and table_names functions of the engine, which is not that much of an issue since we can recover the engine from the connection (using the engine attribute of the connection).

In fact, it seems to make more sense for the scope of pandas to operate at the connection level only. That way the user can have full control over sqlalchemy connection issues such as isolation levels, nested transactions/savepoints, etc (see sqlalchemy engines and connections).

Maybe we could default to use connections and create our own connection in case the user passes an engine? That way this would backwards compatible, but still allow for more advanced usage.

Contributor

artemyk commented Mar 30, 2015

@tr11 that sounds like a good idea to me. @jorisvandenbossche thoughts?

Contributor

tr11 commented Mar 30, 2015

OK, I really needed this right now, so I just hacked a quick version (https://github.com/tr11/pandas/commit/fe6fee394f78a908e74cf3675668a4475db3c616) that passes all the tests with engines or connections. I will add standard tests that use connection parameters and a couple other ones that show the new things that can be done with transaction rollbacks, using temporary tables, etc. If there is any interest in merging this, I can also update the documentation before submitting a pull request.

jorisvandenbossche changed the title from [ENH] Support transactions by allowing a SQLAlchemy Connection to be passed to .to_sql as well as Engine to ENH: support both SQLAlchemy engines and connections May 12, 2015

@tr11 Sorry for the slow reaction here!
Indeed, that looks like a good idea. If you would still want to make a PR for this (with tests and examples/docs), or give some comments on the newly opened PR #10105, that would be really great

@graingert graingert added a commit to graingert/pandas that referenced this issue May 12, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 d25a262

@graingert graingert added a commit to graingert/pandas that referenced this issue May 12, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 e85087f

@graingert graingert added a commit to graingert/pandas that referenced this issue May 12, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 ef122d6
Contributor

graingert commented May 12, 2015

I've updated the PR with these changes, and added tests

@graingert graingert added a commit to graingert/pandas that referenced this issue May 13, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 5c7d352

@graingert graingert added a commit to graingert/pandas that referenced this issue May 22, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 6468871

@graingert graingert added a commit to graingert/pandas that referenced this issue May 26, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 75c3676

@graingert graingert added a commit to graingert/pandas that referenced this issue May 27, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 1281690

@graingert graingert added a commit to graingert/pandas that referenced this issue May 27, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 5fd3d9a

@graingert graingert added a commit to graingert/pandas that referenced this issue May 27, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 e164845

@graingert graingert added a commit to graingert/pandas that referenced this issue May 27, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 71fd12e

@graingert graingert added a commit to graingert/pandas that referenced this issue May 29, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 e3c37e2

@graingert graingert added a commit to graingert/pandas that referenced this issue May 31, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 0a8e2a5

@graingert graingert added a commit to graingert/pandas that referenced this issue Jun 3, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 2db0179

@graingert graingert added a commit to graingert/pandas that referenced this issue Jun 4, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877 30d46fc

jorisvandenbossche added this to the 0.17.0 milestone Jul 3, 2015

@graingert graingert added a commit to graingert/pandas that referenced this issue Jul 3, 2015

@graingert graingert support both sqlalchemy engines and connections Fixes #7877
update pymysql to 0.6.3 to avoid cursor bugs

Add documentation and tests for SQLAlchemy connectables

explicit reference to connection/engine in docs

Temporary table test

pass compile the connectable
7327f6b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment