Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Pandas 2.2 breaks SQLAlchemy 1.4 compatibility #57049

Open
3 tasks done
miraculixx opened this issue Jan 24, 2024 · 18 comments
Open
3 tasks done

BUG: Pandas 2.2 breaks SQLAlchemy 1.4 compatibility #57049

miraculixx opened this issue Jan 24, 2024 · 18 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Dependencies Required and optional dependencies IO SQL to_sql, read_sql, read_sql_query Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@miraculixx
Copy link

miraculixx commented Jan 24, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# with SQL Alchemy 1.4 installed the following example fails:
import pandas as pd
from sqlalchemy import create_engine
con = create_engine('sqlite:///:memory:')
df = pd.DataFrame({'a': [0, 1, 2, 3]})
df.to_sql('test', con)
=> AttributeError: 'Engine' object has no attribute 'cursor'

Issue Description

While Pandas 2.2 Release Notes declare sqlalchemy > 2.0.0 a minimum requirement, sqlalchemy 1.4 is still a widely used and maintained release. This change thus likely breaks a lot of existing code. Upgrading from sqlalchemy 1.4 to 2.0 is not trivial and therefore many projects using both pandas and sqlalchemy will not be able to upgrade to pandas > 2.2 easily.

Expected Behavior

SQLAlchemy 1.4 compatibility should be continued, and 1.4 support should be deprecated before removal.

Installed Versions

sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.20.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None

@miraculixx miraculixx added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 24, 2024
@miraculixx
Copy link
Author

Further testing indicates the compatibility issue may only apply to SQLite. For example for MySQL + sqlalchemy 1.4 this still works:

import pandas as pd
from sqlalchemy import create_engine
con = create_engine('mysql+pymysql://root:password@localhost:3306/test')
df = pd.DataFrame({'a': [0, 1, 2, 3]})
df.to_sql('test', con, if_exists='replace')

Running mysql using docker docker run --name=mysql -e MYSQL_ROOT_PASSWORD=password -p 3306:3306 mysql

miraculixx added a commit to omegaml/omegaml that referenced this issue Jan 24, 2024
omegaml pushed a commit to omegaml/omegaml that referenced this issue Jan 24, 2024
@lithomas1 lithomas1 added Regression Functionality that used to work in a prior pandas version Compat pandas objects compatability with Numpy or Python functions Dependencies Required and optional dependencies and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 24, 2024
@lithomas1 lithomas1 added this to the 2.2.1 milestone Jan 24, 2024
@lithomas1
Copy link
Member

lithomas1 commented Jan 24, 2024

While Pandas 2.2 Release Notes declare sqlalchemy > 2.0.0 a minimum requirement, sqlalchemy 1.4 is still a widely used and maintained release. This change thus likely breaks a lot of existing code. Upgrading from sqlalchemy 1.4 to 2.0 is not trivial and therefore many projects using both pandas and sqlalchemy will not be able to upgrade to pandas > 2.2 easily.

This seems reasonable to me, since it looks like sqlalchemy 1.4 is still getting releases, so it'd probably make sense to support it at least until sqlalchemy 2.1 comes out.

Usually, we bump minimum versions of dependencies once they become around 2 years old (and sqlalchemy 1.4 is already 3 years old), but I think was it's probably fine to make an exception for sqlalchemy this time.

cc @mroeschke

@amotl
Copy link

amotl commented Jan 30, 2024

Hi there. FWIW, we are hitting the same error on our CI with SQLAlchemy 1.4 on a 3rd-party SQLAlchemy dialect, see run #7705314750.

@amotl
Copy link

amotl commented Jan 30, 2024

@mroeschke said at #57137 (comment):

The minimum sqlalchemy version for pandas 2.2 is 2.0.
-- https://pandas.pydata.org/docs/whatsnew/v2.2.0.html#increased-minimum-versions-for-dependencies

So, I think it is clear that SQLAlchemy 1.4 will no longer be supported.

@mroeschke
Copy link
Member

Since there have been several reports about Sqlalchemy 1.4, I would be open to supporting that again. Testing and prior compatibility code will needed to be added back in a PR.

@amotl
Copy link

amotl commented Jan 30, 2024

Thanks Matthew. For us it will be totally fine to have pandas >= 2.2 with sqlalchemy >= 2 only, just to clarify. When others from the community don't have an easy chance to update, for any reasons, I surely also see that the need for backward-compatibility is being expressed.

@miried
Copy link

miried commented Jan 30, 2024

snowflake-sqlalchemy pins sqlalchemy to <2.0.0 (see snowflakedb/snowflake-sqlalchemy#452) which leads to a version conflict with pandas 2.2.

For reference, #55524 introduced the version bump. @mroeschke does any pandas code require sqlalchemy features only available in 2.0? According to https://www.sqlalchemy.org/download.html, 1.4 still receives critical fixes, last 1.4 release is from Jan 3, 2024: https://pypi.org/project/SQLAlchemy/1.4.51/

Of course, I'd prefer other projects move their requirements up rather than pandas staying on old versions, but maybe in this case it's the simplest way to postpone the version upgrade?

By the way, I found the error message misleading with regards to the issue.

It looks like in https://github.com/pandas-dev/pandas/blob/2.2.x/pandas/io/sql.py#L900 the errors="ignore" suppresses the actual error message that indicates the version conflict, so one ends up with a cryptic error message because the code fails later (pip did not raise the version conflict either, I guess because it's an optional dependency)

@joshuataylor
Copy link

snowflake-sqlalchemy also seems to throw this error when you use the Snowflake recommended way to create a connection when using private keys.

I suspect it's the way URL works, as it's a snowflake.sqlalcheny import? from snowflake.sqlalchemy import URL

Not sure if this a Snowflake SQLAlchemy issue or here though, apologies for the noise.

@lgonzalez-silen
Copy link

lgonzalez-silen commented Feb 27, 2024

Just to add more context here, Airflow is pinning their newest 2.8 images to pandas < 2.2:

apache/airflow#37748

It does not seem that they are likely to drop their SQLAlchemy 1.4 support soon, since they have a few dependencies that are not SQLAlchemy 2.0 ready (as of December 2023):

apache/airflow#28723

Google Composer released earlier this month their 2.6 version (Airflow 2.6-based), and I found out today when I upgraded that their choice to mix pandas 2.2.0 and SQLAlchemy 1.4.51 created issues running dags for me:

https://issuetracker.google.com/issues/327158030

@potiuk
Copy link

potiuk commented Feb 27, 2024

FYI: We just updated constraints for 2.8.2 and updated images for Airflow 2.8.2 to keep pandas 2.1.4. That will not prevent the users hitting the same problem if they "just install" airfow from scratch with the new pandas, but if they follow the recommended way of installing airflow with constraints, they will be good: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html

@g24swint
Copy link

Wanted to vote in favor of maintaining backward compatibility. We are also using a snowflake database behind sqlalchemy and are having this problem with the versioning. It would be nice if pandas could issue a deprecation error with potential to override first before fully abandoning support for SA 1.4

@rgoubet
Copy link

rgoubet commented Mar 24, 2024

Also having this issue with a postgresql+psycopg2 connection string

@mattholy
Copy link

mattholy commented Apr 16, 2024

And also for pyhive

import pandas as pd
from sqlalchemy import create_engine
DW = create_engine("hive://xxxxxxx")
pd.read_sql('select 1=1',con=DW)

I got AttributeError: 'Engine' object has no attribute 'cursor'
I'm using pandas 2.2.1 and sqlalchemy 1.4.52. I know that SQLAlchemy 1.4.x is very old, but sadly is, Apache-Airflow only supports SQLAlchemy 1.x

@miraculixx
Copy link
Author

miraculixx commented Apr 19, 2024

I know that SQLAlchemy 1.4.x is very old, but sadly is, Apache-Airflow only supports SQLAlchemy 1.x

SQLAlchemy 1.4 is an officially supported release of sqlalchemy, albeit in maintenance mode now

@ptallada
Copy link

Hi,

We are also using SQLAlchemy 1.4 and Pandas, and would greatly appreciate not having to migrate to SQLAlchemy v2.0 :)

@YarShev
Copy link
Contributor

YarShev commented May 22, 2024

Is there a planned resolution on this?

@lithomas1
Copy link
Member

PRs to fix this would be welcome.

@1Elaigwu
Copy link

1Elaigwu commented Jun 7, 2024

I know that SQLAlchemy 1.4.x is very old, but sadly is, Apache-Airflow only supports SQLAlchemy 1.x

SQLAlchemy 1.4 is still officially supported

it should support with the upgrade. could not load data into mysql database on airflow. i can load the data when i run the script on standalone using SQLAlchemy 2.0.30. but not SQLAlchemy version 1.4.52. also after installing the newer version i get "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-airflow 2.9.1 requires sqlalchemy<2.0,>=1.4.36, but you have sqlalchemy 2.0.30 which is incompatible.
flask-appbuilder 4.4.1 requires SQLAlchemy<1.5, but you have sqlalchemy 2.0.30 which is incompatible.
marshmallow-sqlalchemy 0.28.2 requires SQLAlchemy<2.0,>=1.3.0, but you have sqlalchemy 2.0.30 which is incompatible." at this point i have to use another RDBMS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dependencies Required and optional dependencies IO SQL to_sql, read_sql, read_sql_query Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests