Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mariadbconnector: Exception "cursor is closed" raised, when checking cursor.rowcount #10396

Closed
9EOR9 opened this issue Sep 28, 2023 · 11 comments
Closed
Labels
bug Something isn't working mariadb pandas/numpy tagging issues related to pandas / numpy integration
Milestone

Comments

@9EOR9
Copy link
Contributor

9EOR9 commented Sep 28, 2023

Describe the bug

The problem was initially reported on stackoverflow, similiar issue was filed as MariaDB Connector/Python bug. I'm not sure if this problem is caused by Pandas or SQLAlchemy.

MariaDB Connector/Python raises an exception "Cursor is closed" in cursor->rowcount, since the cursor was closed before.

PyMySQL for example doesn't raise an exception, which is in contradiction to PEP-249: ".close(): ... The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor".

Optional link from https://docs.sqlalchemy.org which documents the behavior that is expected

No response

SQLAlchemy Version in Use

2.0.21

DBAPI (i.e. the database driver)

mariadbconnector

Database Vendor and Major Version

MariaDB 11.0

Python Version

3.10

Operating system

Linux, Windows

To Reproduce

import pandas as pd
from sqlalchemy import create_engine
conn_str = 'mariadb+mariadbconnector://user:password@localhost:3306/database_name'
engine = create_engine(conn_str)
df = pd.DataFrame(data={'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']})
df.to_sql('pandas_test_table', index=False, con=engine, if_exists='replace')

Error

Stacktrace (when closing the cursor):

 df.to_sql('pandas_test_table', index=False, con=engine, if_exists='replace')
  c:\python\python-3.10\lib\site-packages\pandas\util\_decorators.py(333)wrapper()
-> return func(*args, **kwargs)
  c:\python\python-3.10\lib\site-packages\pandas\core\generic.py(3008)to_sql()
-> return sql.to_sql(
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(788)to_sql()
-> return pandas_sql.to_sql(
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(1958)to_sql()
-> total_inserted = sql_engine.insert_records(
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(1498)insert_records()
-> return table.insert(chunksize=chunksize, method=method)
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(1059)insert()
-> num_inserted = exec_insert(conn, keys, chunk_iter)
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(951)_execute_insert()
-> result = conn.execute(self.table.insert(), data)
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\base.py(1412)execute()
-> return meth(
  c:\python\python-3.10\lib\site-packages\sqlalchemy\sql\elements.py(516)_execute_on_connection()
-> return connection._execute_clauseelement(
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\base.py(1635)_execute_clauseelement()
-> ret = self._execute_context(
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\base.py(1844)_execute_context()
-> return self._exec_single_context(
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\base.py(1981)_exec_single_context()
-> result = context._setup_result_proxy()
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\default.py(1790)_setup_result_proxy()
-> result = self._setup_dml_or_text_result()
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\default.py(1926)_setup_dml_or_text_result()
-> result._soft_close()
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\cursor.py(1562)_soft_close()
-> self.connection._safe_close_cursor(cursor)
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\base.py(2199)_safe_close_cursor()
-> cursor.close()
> c:\python\python-3.10\lib\site-packages\mariadb-1.1.7-py3.10-win-amd64.egg\mariadb\cursors.py(392)close()

Stacktrace (cursor->rowcount):

-> df.to_sql('pandas_test_table', index=False, con=engine, if_exists='replace')
  c:\python\python-3.10\lib\site-packages\pandas\util\_decorators.py(333)wrapper()
-> return func(*args, **kwargs)
  c:\python\python-3.10\lib\site-packages\pandas\core\generic.py(3008)to_sql()
-> return sql.to_sql(
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(788)to_sql()
-> return pandas_sql.to_sql(
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(1958)to_sql()
-> total_inserted = sql_engine.insert_records(
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(1498)insert_records()
-> return table.insert(chunksize=chunksize, method=method)
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(1059)insert()
-> num_inserted = exec_insert(conn, keys, chunk_iter)
  c:\python\python-3.10\lib\site-packages\pandas\io\sql.py(952)_execute_insert()
-> return result.rowcount
  c:\python\python-3.10\lib\site-packages\sqlalchemy\util\langhelpers.py(1138)__get__()
-> obj.__dict__[self.__name__] = result = self.fget(obj)
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\cursor.py(2021)rowcount()
-> return self.context.rowcount
  c:\python\python-3.10\lib\site-packages\sqlalchemy\util\langhelpers.py(1127)__get__()
-> return self.fget(obj)
  c:\python\python-3.10\lib\site-packages\sqlalchemy\engine\default.py(1778)rowcount()
-> return self.cursor.rowcount
  c:\python\python-3.10\lib\site-packages\mariadb-1.1.7-py3.10-win-amd64.egg\mariadb\cursors.py(521)rowcount()
-> self.check_closed()
> c:\python\python-3.10\lib\site-packages\mariadb-1.1.7-py3.10-win-amd64.egg\mariadb\cursors.py(59)check_closed()
-> raise mariadb.ProgrammingError("Cursor is closed")

Additional context

No response

@9EOR9 9EOR9 added the requires triage New issue that requires categorization label Sep 28, 2023
@zzzeek zzzeek added bug Something isn't working quagmire really hard to make the issue work "correctly" without lots of complication, risk mariadb pandas/numpy tagging issues related to pandas / numpy integration and removed requires triage New issue that requires categorization labels Sep 28, 2023
@zzzeek
Copy link
Member

zzzeek commented Sep 28, 2023

well they are calling result.rowcount on an insert statement. Most DBAPIs don't produce any meaningful number for rowcount with INSERT, which implies we could hardcode this case to return -1, but then, some DBAPIs do produce a meaningful number, such as the sqlite driver, and we would assume mariadb-connector here as well.

the problem with rowcount is that it often produces an extra database query. So a naive fix here, "pre-load rowcount for all INSERT statements", adds this extra database query penalty to millions of SQLAlchemy installations in all cases, to suit what is IMO nearly a non-usecase. Alternatively, we could express that "rowcount is only for UPDATE /DELETE, you will get -1 for other operations", however that's the quagmire here, some drivers do return a meaningful number for INSERT statements so we can't just switch that.

PyMySQL for example doesn't raise an exception, which is in contradiction to PEP-249:

is calling a data accessor a "read only operation" ? Many DBAPIs have this value passively present at cursor.rowcount after a DML operation, hence no issue with exceptions. I certainly hope that mariadb-connector is not enforcing a needless exception raise here if it does not actually need to run a new query to get the rowcount. otherwise if the driver does need to run a new query, then this limitation is typical, though the majority of drivers don't have this limitation (otherwise this issue would be occurring for many drivers).

Since most drivers dont have this limitation, we can't really be sure how many users are calling .rowcount after non UPDATE/DELETE operations, though the fact that pandas seems to hardcode it for all SQL operations suggests it's very widespread, and then the number (which is for many drivers simply -1) is thrown away.

I think in this case what we should do is modify the mariadb-connector driver itself to hardcode a -1 for the INSERT case call cursor.rowcount for all statements within onto the execution context, and then add tests for this case so that the other drivers (there may not be any at the moment, Firebird was the last one that had this issue) can all follow this practice as well.

tl;dr we've never supported .rowcount for an INSERT statement so behavior is defined as pass-through to raw cursor for now.

@zzzeek
Copy link
Member

zzzeek commented Sep 28, 2023

additionally as you may or may not be aware we had to remove mariadb-connector from our CI in December of 2022 as it was producing releases on pypi that relied upon bleeding edge client libraries that were not yet available in Fedora (366a5e3) . I can attempt to re-add mariadb-connector to CI now assuming things have calmed down but I continue to not have a solution to this problem if mariadb-connector puts out releases using bleeding-edge client libraries.

@zzzeek
Copy link
Member

zzzeek commented Sep 28, 2023

the change I'm adding will add the cursor.rowcount call to all mariadb-connector queries. im not sure if this adds query overhead to all statements but given Pandas design I dont see much option. if mariadb-connector instead returned -1 for cursor.rowcount after cursor.close() rather than raising an error, that would be IMO a much better option, and IMO is closer to what pep-249 recommends (emphasis mine):

The attribute is -1 in case no .execute*() has been performed on the cursor or the rowcount of the last operation is cannot be determined by the interface. [7]

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the main branch:

invoke mariadb-connector .rowcount after all statements https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4891

@zzzeek zzzeek removed the quagmire really hard to make the issue work "correctly" without lots of complication, risk label Sep 28, 2023
@zzzeek zzzeek added this to the 2.0.x milestone Sep 28, 2023
@9EOR9
Copy link
Contributor Author

9EOR9 commented Sep 28, 2023

Hello Mike,

thank you for your comments - I think the simplest solution would be if I just remove the check for closed cursor in cursor properties (e.g. rowcount), as long the properties aren't directly mapped to some underlying data (like result sets or prepared statements) which were freed in close() method.

@zzzeek
Copy link
Member

zzzeek commented Sep 28, 2023

OK that's fine, my patch here at least adds test support for this case so has to go in to that extent anyway, does cursor.rowcount in mariadb-connector invoke a new query and/or does it have significant round-trip overhead?

@9EOR9
Copy link
Contributor Author

9EOR9 commented Sep 28, 2023

No, it's stored internally and works for upsert and select statements. I think your idea, to return -1 is even better than returning any value.

@zzzeek
Copy link
Member

zzzeek commented Sep 28, 2023

OK. if there's no overhead I can keep my fix in place in any case

@CaselIT
Copy link
Member

CaselIT commented Sep 28, 2023

OK. if there's no overhead I can keep my fix in place in any case

we can probably leave a comment on the fix mentioning that it's no longer needed on x.y.z version of the connector (since from Georg comment it seems the idea is to remove the expection in this case)

@marshallm94
Copy link

@zzzeek where did this land? I read through all the comments here ( I was the one that originally raised the issue via MariaDB ), but I'm not clear on whether this has been released yet?

@CaselIT
Copy link
Member

CaselIT commented Oct 11, 2023

It was meeged in the main branch 2 weeks ago. Since the last release was 3 weeks ago it's not yet released. It should be released this week or the next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mariadb pandas/numpy tagging issues related to pandas / numpy integration
Projects
None yet
Development

No branches or pull requests

5 participants