Skip to content

pandas.read_sql() fails when the select is not the sole part of the query #29974

@djrscally

Description

@djrscally

Code Sample, a copy-pastable example if possible

import pandas as pd
import pyodbc
import sqlalchemy as sa
import urllib

query = """
drop table if exists #MyTempTable;

select
	1 ID
	, 'A' Letter
into #MyTempTable
union
	select 2, 'B' union
	select 3, 'C' union
	select 4, 'D' union
	select 5, 'E'
;

select
	*
from #MyTempTable
"""

connection_string = urllib.parse.quote_plus(
    'DRIVER={SQL Server Native Client 11.0};\
    SERVER={0},{1};\
    trusted_Connection=yes\
    ;encrypt=yes;\
    trustServerCertificate=yes;'.format(server, port)
)

eng = sa.create_engine('mssql+pyodbc:///?odbc_connect={0}'.format(connection_string))

df = pd.read_sql(query, con=eng)

Problem description

The problem is that SQLAlchemy throws an exception when running the snippet of code above:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in _fetchall_impl(self)
   1162         try:
-> 1163             return self.cursor.fetchall()
   1164         except AttributeError:

AttributeError: 'NoneType' object has no attribute 'fetchall'

During handling of the above exception, another exception occurred:

ResourceClosedError                       Traceback (most recent call last)
<ipython-input-2-018f2f8f6b89> in <module>
----> 1 df = pd.read_sql(query, con=eng)

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\pandas\io\sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
    395             sql, index_col=index_col, params=params,
    396             coerce_float=coerce_float, parse_dates=parse_dates,
--> 397             chunksize=chunksize)
    398 
    399 

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\pandas\io\sql.py in read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize)
   1106                                         parse_dates=parse_dates)
   1107         else:
-> 1108             data = result.fetchall()
   1109             frame = _wrap_result(data, columns, index_col=index_col,
   1110                                  coerce_float=coerce_float,

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in fetchall(self)
   1216         except BaseException as e:
   1217             self.connection._handle_dbapi_exception(
-> 1218                 e, None, None, self.cursor, self.context
   1219             )
   1220 

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1458                 util.raise_from_cause(sqlalchemy_exception, exc_info)
   1459             else:
-> 1460                 util.reraise(*exc_info)
   1461 
   1462         finally:

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\util\compat.py in reraise(tp, value, tb, cause)
    275         if value.__traceback__ is not tb:
    276             raise value.with_traceback(tb)
--> 277         raise value
    278 
    279 

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in fetchall(self)
   1211 
   1212         try:
-> 1213             l = self.process_rows(self._fetchall_impl())
   1214             self._soft_close()
   1215             return l

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in _fetchall_impl(self)
   1163             return self.cursor.fetchall()
   1164         except AttributeError:
-> 1165             return self._non_result([])
   1166 
   1167     def _non_result(self, default):

c:\users\dscally\appdata\local\programs\python\python37\lib\site-packages\sqlalchemy\engine\result.py in _non_result(self, default)
   1168         if self._metadata is None:
   1169             raise exc.ResourceClosedError(
-> 1170                 "This result object does not return rows. "
   1171                 "It has been closed automatically."
   1172             )

ResourceClosedError: This result object does not return rows. It has been closed automatically.

The exception implies that the problem is caused by the fact that the DROP and SELECT...INTO statements are both causing the server to return messages along the lines of (5 rows affected), and Pandas is picking up the first of these and trying to treat it as the results of the query. This is backed up by the fact that you can split out the SELECT statement, and then things work nicely:

prelim = """
drop table if exists #MyTempTable;

select
	1 ID
	, 'A' Letter
into #MyTempTable
union
	select 2, 'B' union
	select 3, 'C' union
	select 4, 'D' union
	select 5, 'E'
;
"""

query = """
select
    *
from #MyTempTable
"""

connection_string = urllib.parse.quote_plus(
    'DRIVER={SQL Server Native Client 11.0};\
    SERVER={0},{1};\
    trusted_Connection=yes\
    ;encrypt=yes;\
    trustServerCertificate=yes;'.format(server, port)
)

eng = sa.create_engine('mssql+pyodbc:///?odbc_connect={0}'.format(connection_string))

eng.execute(prelim)

df = pd.read_sql(query, con=eng)

And that then returns everything all hunky-dory. I think this is a little user unfriendly, and could be improved.

Expected Output

More like "desired" than "expected", but the simplest method that occurs to me would be to iterate over the returned resultsets using SQLAlchemy's cursor.nextset(), and inspection of the results using cursor.returns_rows to see if the use of cursor.fetchall() would be legal, and returning the first available set where that condition is met.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 41.0.1
Cython: 0.29.6
numpy: 1.16.1
scipy: 1.2.1
pyarrow: 0.14.0
xarray: 0.14.0
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.2.1
lxml.etree: 4.3.4
bs4: None
html5lib: None
sqlalchemy: 1.2.18
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Not included in pd.show_versions() but relevant here:

>>> print(pyodbc.version)
'4.0.26'

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions