Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: Support of SQLAlchemy 2.0 in Pandas #40460

Closed
mickaelandrieu opened this issue Mar 16, 2021 · 15 comments
Closed

QST: Support of SQLAlchemy 2.0 in Pandas #40460

mickaelandrieu opened this issue Mar 16, 2021 · 15 comments
Labels
Dependencies Required and optional dependencies

Comments

@mickaelandrieu
Copy link

mickaelandrieu commented Mar 16, 2021

After search in Stackoverflow and in your open and closed issues, I can't find any information about the support of SQL Alchemy 2.0 using pd.to_sql()


Question about pandas

An exemple is better than a talk.

from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer
import pandas as pd

# Table creation
engine = create_engine(connection_string, echo=True, future=True) # This uses SQL Alchemy 2.0 API
metadata = MetaData()
category = Table(
        'category',
        metadata,
        Column('id', Integer, primary_key = True),
        Column('name', String(128), nullable = False),
        Column('slug', String(128), nullable = False)
    )

category.create(engine, checkfirst=True)

## Insert using pandas => FAIL
categories = pd.DataFrame([{'id' : 1, 'slug': 'test', 'name': 'Test'}])
categories.to_sql('category', engine, index=False)

This is the error I get from the CLI output :

  File "e:\Projects\test\database.py", line 127, in populate_category_table
    categories.to_sql('category', engine, index=False)
  File "E:\Projects\test\env\lib\site-packages\pandas\core\generic.py", 
line 2779, in to_sql
    sql.to_sql(
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
    pandas_sql.to_sql(
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 1392, in to_sql
    table.create()
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 722, in create
    if self.exists():
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 709, in exists
    return self.pd_sql.has_table(self.name, self.schema)
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 1429, in has_table
    return self.connectable.run_callable(
  File "E:\Projects\test\env\lib\site-packages\sqlalchemy\future\engine.py", line 335, in _not_implemented
    raise NotImplementedError(
NotImplementedError: This method is not implemented for SQLAlchemy 2.0.

For information about the version of every dependency :

$ ./env/Scripts/pip list
Package         Version
--------------- ---------
beautifulsoup4  4.9.3
certifi         2020.12.5
chardet         4.0.0
esprima         4.0.1
greenlet        1.0.0
idna            2.10
mysqlclient     2.0.3
numpy           1.20.1
pandas          1.2.3
patreon         0.5.0
pip             21.0.1
python-dateutil 2.8.1
pytz            2021.1
requests        2.25.1
setuptools      49.2.1
six             1.15.0
soupsieve       2.2
SQLAlchemy      1.4.0
urllib3         1.26.3

Is there a workaround right now or should I avoid using pandas to insert the data into my database ?

Regards,

@mickaelandrieu mickaelandrieu added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Mar 16, 2021
@fangchenli
Copy link
Member

My suggestion is to downgrade to 1.3.23 for now. We're looking into this issue.

@fangchenli fangchenli added Dependencies Required and optional dependencies and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Mar 16, 2021
@jorisvandenbossche
Copy link
Member

@mickaelandrieu there was just a PR merged that deals with some deprecation warnings: #40471

This might already resolve the compatibility issues with the future sqlalchemy 2.0 as well?

@mickaelandrieu
Copy link
Author

Hello @jorisvandenbossche,

When this PR will be released ? I can try and come back here to tell you 👍

Duly noted @fangchenli, thanks for your answer !

Regards

@scnerd
Copy link

scnerd commented Mar 24, 2021

It's unclear from that MR if it tackled all 2.0 deprecation warnings; there are (at least) two categories of SQLA deprecations, SADeprecationWarning and RemovedIn20Warning, and I'm not sure if that MR only tackled the first.

A complete, official guide for the migration is here. I can confirm that there are numerous SQLA 2.0 warnings thrown in Pandas 1.2.3 (released 2021/03/02), I can install from master and check if they're still there after that MR.

@mickaelandrieu
Copy link
Author

Hi @scnerd,

what is the right process to install pandas on master branch ? I'm not really good at Python, but i will help if I can !

Regards,

@scnerd
Copy link

scnerd commented Mar 25, 2021

@mickaelandrieu pip can accept git repo's like pip install git+https://github.com/pandas-dev/pandas.git

@scnerd
Copy link

scnerd commented Mar 25, 2021

Having tested some things on master branch, I get at least the following couple of errors:

  /opt/conda/lib/python3.8/site-packages/pandas/io/sql.py:1313: RemovedIn20Warning: The MetaData.bind argument is deprecated and will be removed in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
    meta = MetaData(self.connectable, schema=schema)

  /opt/conda/lib/python3.8/site-packages/pandas/io/sql.py:1103: RemovedIn20Warning: The MetaData.bind argument is deprecated and will be removed in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
    meta = MetaData(self.pd_sql, schema=schema)

  /opt/conda/lib/python3.8/site-packages/sqlalchemy/sql/schema.py:939: RemovedIn20Warning: The ``bind`` argument for schema methods that invoke SQL against an engine or connection will be required in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche
.me/e/b8d9)
    bind = _bind_or_error(self)

  /opt/conda/lib/python3.8/site-packages/pandas/io/sql.py:1612: RemovedIn20Warning: The Connection.connect() method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
    with self.connectable.connect() as conn:

@nsepetys
Copy link

nsepetys commented Mar 27, 2021

Is there a workaround right now or should I avoid using pandas to insert the data into my database ?

Regards,

@mickaelandrieu - One other workaround- I created an engine for this particular pandas method and turned off future (future=False) for the engine creation. It's working now. Going to leave a ToDo to come back and replace once pandas is working with this. Let me know if you found a less hacky way that's working without building the package off of master.

@scnerd
Copy link

scnerd commented Mar 27, 2021

Just to clarify my remark about master, the RemovedIn20Warnings mean that, even on master, Pandas' SQL IO functions are not yet 2.0 compatible. Since SQLAlchemy 2.0 isn't even released yet (at least, as a stable release), I wouldn't consider that an issue (yet), it'll just require pinning sqlalchemy<=2 as a Pandas dependency until those incompatibilities are resolved. Using future=True in your SQLA session is therefore not yet compatible with Pandas.

Frankly, even though the SQLA 2.0 migration guide is great and the changes needed are pretty simple, there's also the concern that these changes are, in some cases, not backwards-compatible with <1.4. This is probably a concern for a library as ubiquitous as Pandas, and means that the upgrades should probably be made in a way that can detect and, when necessary, fall back on the <1.4 conventions.

@mickaelandrieu
Copy link
Author

mickaelandrieu commented Mar 28, 2021

@nsepetys yes, I've ended up doing that too for now.

So I have two configurations : legacy_engine (future = False) for pandas operations and engine (future = True) to start to learn the new API of SQLAlchemy.

@scnerd totally agree on your point of view : thanks for your help everyone !

@davidandreoletti
Copy link

davidandreoletti commented Nov 24, 2021

@mickaelandrieu

SQLAlchemy 2.0 is not out yet BUT SQLAlchemy 1.4 provides a path to prepare the migration to 2.0 (migration guide).

People running on 1.4 with SQLALchemy configured to be compatible with the future 2.0 (ie future=True) cannot execute pandas io/sql function without getting NotImplementedError: This method is not implemented for SQLAlchemy 2.0.

What's the plan going forward, except creating a legacy engine (future=False) ?

@jreback
Copy link
Contributor

jreback commented Nov 24, 2021

the plan is the community needs to figure what should be done and propose PR with full testing

none of the core maintains use the sql support much - but could review

@davidandreoletti
Copy link

@jreback Got it!

@mickaelandrieu
Copy link
Author

the plan is the community needs to figure what should be done and propose PR with full testing

Fully agree, but I'm not skilled enough to contribute pandas right now : not like I don't want to, I can help for tests/docs but not that much atm

@graingert
Copy link
Contributor

sqlalchemy 1.4.46 has started raising a sqlalchemy.exc.RemovedIn20Warning for deprecated functionality https://docs.sqlalchemy.org/en/14/changelog/changelog_14.html#change-e67bfa1efbe52ae40aa842124bc40c51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants