Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.read_sql() fails when the select is not the sole part of the query #29974

Open
djrscally opened this issue Dec 2, 2019 · 3 comments
Open
Assignees
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@djrscally
Copy link

djrscally commented Dec 2, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import pyodbc
import sqlalchemy as sa
import urllib

query = """
drop table if exists #MyTempTable;

select
	1 ID
	, 'A' Letter
into #MyTempTable
union
	select 2, 'B' union
	select 3, 'C' union
	select 4, 'D' union
	select 5, 'E'
;

select
	*
from #MyTempTable
"""

connection_string = urllib.parse.quote_plus(
    'DRIVER={SQL Server Native Client 11.0};\
    SERVER={0},{1};\
    trusted_Connection=yes\
    ;encrypt=yes;\
    trustServerCertificate=yes;'.format(server, port)
)

eng = sa.create_engine('mssql+pyodbc:///?odbc_connect={0}'.format(connection_string))

df = pd.read_sql(query, con=eng)

Problem description

The problem is that SQLAlchemy throws an exception when running the snippet of code above:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in _fetchall_impl(self)
   1162         try:
-> 1163             return self.cursor.fetchall()
   1164         except AttributeError:

AttributeError: 'NoneType' object has no attribute 'fetchall'

During handling of the above exception, another exception occurred:

ResourceClosedError                       Traceback (most recent call last)
<ipython-input-2-018f2f8f6b89> in <module>
----> 1 df = pd.read_sql(query, con=eng)

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\pandas\io\sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
    395             sql, index_col=index_col, params=params,
    396             coerce_float=coerce_float, parse_dates=parse_dates,
--> 397             chunksize=chunksize)
    398 
    399 

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\pandas\io\sql.py in read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize)
   1106                                         parse_dates=parse_dates)
   1107         else:
-> 1108             data = result.fetchall()
   1109             frame = _wrap_result(data, columns, index_col=index_col,
   1110                                  coerce_float=coerce_float,

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in fetchall(self)
   1216         except BaseException as e:
   1217             self.connection._handle_dbapi_exception(
-> 1218                 e, None, None, self.cursor, self.context
   1219             )
   1220 

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1458                 util.raise_from_cause(sqlalchemy_exception, exc_info)
   1459             else:
-> 1460                 util.reraise(*exc_info)
   1461 
   1462         finally:

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\util\compat.py in reraise(tp, value, tb, cause)
    275         if value.__traceback__ is not tb:
    276             raise value.with_traceback(tb)
--> 277         raise value
    278 
    279 

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in fetchall(self)
   1211 
   1212         try:
-> 1213             l = self.process_rows(self._fetchall_impl())
   1214             self._soft_close()
   1215             return l

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in _fetchall_impl(self)
   1163             return self.cursor.fetchall()
   1164         except AttributeError:
-> 1165             return self._non_result([])
   1166 
   1167     def _non_result(self, default):

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in _non_result(self, default)
   1168         if self._metadata is None:
   1169             raise exc.ResourceClosedError(
-> 1170                 "This result object does not return rows. "
   1171                 "It has been closed automatically."
   1172             )

ResourceClosedError: This result object does not return rows. It has been closed automatically.

The exception implies that the problem is caused by the fact that the DROP and SELECT...INTO statements are both causing the server to return messages along the lines of (5 rows affected), and Pandas is picking up the first of these and trying to treat it as the results of the query. This is backed up by the fact that you can split out the SELECT statement, and then things work nicely:

prelim = """
drop table if exists #MyTempTable;

select
	1 ID
	, 'A' Letter
into #MyTempTable
union
	select 2, 'B' union
	select 3, 'C' union
	select 4, 'D' union
	select 5, 'E'
;
"""

query = """
select
    *
from #MyTempTable
"""

connection_string = urllib.parse.quote_plus(
    'DRIVER={SQL Server Native Client 11.0};\
    SERVER={0},{1};\
    trusted_Connection=yes\
    ;encrypt=yes;\
    trustServerCertificate=yes;'.format(server, port)
)

eng = sa.create_engine('mssql+pyodbc:///?odbc_connect={0}'.format(connection_string))

eng.execute(prelim)

df = pd.read_sql(query, con=eng)

And that then returns everything all hunky-dory. I think this is a little user unfriendly, and could be improved.

Expected Output

More like "desired" than "expected", but the simplest method that occurs to me would be to iterate over the returned resultsets using SQLAlchemy's cursor.nextset(), and inspection of the results using cursor.returns_rows to see if the use of cursor.fetchall() would be legal, and returning the first available set where that condition is met.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 41.0.1
Cython: 0.29.6
numpy: 1.16.1
scipy: 1.2.1
pyarrow: 0.14.0
xarray: 0.14.0
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.2.1
lxml.etree: 4.3.4
bs4: None
html5lib: None
sqlalchemy: 1.2.18
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Not included in pd.show_versions() but relevant here:

>>> print(pyodbc.version)
'4.0.26'
@djrscally
Copy link
Author

If ever I knew how to add labels, I have forgotten. This is an enhancement rather than a bug in my opinion.

@jbrockmendel jbrockmendel added the IO SQL to_sql, read_sql, read_sql_query label Dec 3, 2019
@djrscally
Copy link
Author

take

@djrscally
Copy link
Author

This could possibly be extended to allow the use of several result-sets within a single .read_sql() call. Perhaps with an additional parameter to that function, unpack=False; which in that default state would simply fix the issue I've been experiencing by returning the first result set where cursor.returns_rows evaluates to True. If the user sets unpack=True in .read_sql(), that could be taken to mean that the query contains multiple result sets and the user wants them all returned to separate dataframes, as in the below example:

query = """
select  * from dbo.Table1;
select * from dbo.Table2;
"""

table1_df, table2_df = pandas.read_sql(query, con, unpack=True)

If the user combined unpack=True with either a Table name or else a query with a single result set, that should simply be handled in the present manner.

Does that sound useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

3 participants