Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read.sql fails with adodbapi connection #27547

Closed
ParfaitG opened this issue Jul 23, 2019 · 14 comments
Closed

read.sql fails with adodbapi connection #27547

ParfaitG opened this issue Jul 23, 2019 · 14 comments
Labels
IO SQL to_sql, read_sql, read_sql_query

Comments

@ParfaitG
Copy link
Contributor

ParfaitG commented Jul 23, 2019

Code Sample

import pandas as pd
import adodbapi

conn = adodbapi.connect(r'Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Path\To\MS\Access\DB.accdb;')

df = pd.read_sql("SELECT TOP 10 * FROM myTable", conn)
print(df)
	
conn.close()

Problem description

Similar to this recent python-forum.io post, pandas.read_sql does not work with the older module, adodbapi, which adheres to Python's DB-API 2.0 (PEP-249) standard and allows interfacing with OLEDB providers, a recognized, industry API for data sources. Above makes a successful connection to an MS Access database in Windows environment. However, the pandas.read_sql results in following error:

Traceback (most recent call last):
  File "OleDBConnection.py", line 14, in <module>
    df = pd.read_sql("SELECT TOP 10 * FROM Random_Data", conn)
  File "C:\Users\ParfaitG\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py", line 380, in read_sql
    chunksize=chunksize)
  File "C:\Users\ParfaitG\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py", line 1482, in read_query
    parse_dates=parse_dates)
  File "C:\Users\ParfaitG\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py", line 141, in _wrap_result
    coerce_float=coerce_float)
  File "C:\Users\ParfaitG\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 1512, in from_records
    coerce_float=coerce_float)
  File "C:\Users\ParfaitG\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\construction.py", line 408, in to_arrays
    elif isinstance(data[0], ABCSeries):
  File "C:\Users\ParfaitG\AppData\Roaming\Python\Python37\site-packages\pandas\core\dtypes\generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
  File "C:\Users\ParfaitG\AppData\Roaming\Python\Python37\site-packages\adodbapi\apibase.py", line 474, in __getattr__
    return self._getValue(self.rows.columnNames[name.lower()])
KeyError: '_typ'

Do note: a regular cursor.execute and cursor.fetchall works successfully without issue using adodbapi with or without parameters:

conn = adodbapi.connect(r'Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Path\To\MS\Access\DB.accdb;')

cur = conn.cursor()
cur.execute("SELECT TOP 10 * FROM myTable WHERE [value] >= ? AND [value] <= ?", (10, 50))

for row in cur.fetchall():
    print(row)

Expected Output

Desired result would be exact data frame construction from read_sql as rendered successfully with pyodbc which makes an ODBC connection (another industry recognized API to data sources) to the same database:

import pandas as pd
import pyodbc

conn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\Path\To\MS\Access\DB.accdb;')

df = pd.read_sql("SELECT TOP 10 * FROM myTable", conn)
print(df)

conn.close()

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 19.1.1
setuptools: 40.6.2
Cython: None
numpy: 1.15.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: 2.6.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.3
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None

@jbrockmendel jbrockmendel added the IO SQL to_sql, read_sql, read_sql_query label Jul 25, 2019
@TomAugspurger
Copy link
Contributor

Pandas only supports sqlite3 for in its DBAPI2 mode. Does a sqlalchemy engine for adodbapi exist?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Jul 29, 2019
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jul 30, 2019

Pandas only supports sqlite3 for in its DBAPI2 mode. Does a sqlalchemy engine for adodbapi exist?

It seems this is supported in sqlalchemy: https://docs.sqlalchemy.org/en/13/dialects/mssql.html#module-sqlalchemy.dialects.mssql.adodbapi

In general this is true, but in practice, read_sql with a query has mostly worked with raw connections as well (as opposed to read_sql_table / read_sql with a table name / to_sql which for sure require sqlalchemy). So I am still curious what is going wrong here.

@jorisvandenbossche
Copy link
Member

It seems the isinstance(data[0], ABCSeries) check is failing where data[0] is some kind of adodbapi object

I think this is because the adodbapi SQLrow.__getattr__ is returning a KeyError instead of AttributeError, which is arguably a bug in adodbapi and not in pandas (as in general, isinstance checks should never result in an exception, but rather in False)

@ParfaitG
Copy link
Contributor Author

Correct @jorisvandenbossche. AFAIK, read_sql works for raw connections and does not need SQLAlchemy (unlike to_sql). While MS Access (also a file-level, non -server DBMS like SQLite) is not supported in SQLAlchemy, as mentioned above read_sql can successfully run Access SQL queries via pyodbc connection. The problem here is a raw connection for adodbapi does not work with read_sql.

@ParfaitG
Copy link
Contributor Author

ParfaitG commented Jul 30, 2019

Also, a surface-level read of __getattr__ in adodbapi source code (assuming this is latest version) shows no call to raise a KeyError only AttributeError. Though, possibly upstream this error is raised.

Additionally, as mentioned above basic cursor.execute and cursor.fetchall calls work successfully in adodbapi. And under the hood, pandas' read_sql source code appears to be doing the same cursor handling.

@jorisvandenbossche
Copy link
Member

As mentioned above, it is the isinstance(obj, ABCSeries) that fails (in the later conversion of the set of results from fetchall into a DataFrame). So after cursor.fetchall has succeeded. I suppose this returns a list of adodbapi SQLrow objects, and it is getattr on those that is raising an incorrect error.

@TomAubrunner
Copy link

I updated pandas and now I get the same KeyError.
Some tests showed that pandas 0.23.4 works well with adodbapi. The problem exists since pandas 0.24.0.
Maybe that helps to locate the cause of the issue.

@jorisvandenbossche
Copy link
Member

As mentioned above, this is a bug in adodbapi IMO. Someone feel free to report / fix it over there.

@TomAubrunner
Copy link

TomAubrunner commented Sep 10, 2019

ok.
I replaced line 466 in adodbapi/apibase.py from:

        return self._getValue(self.rows.columnNames[name.lower()])

to:

       try:
            return self._getValue(self.rows.columnNames[name.lower()])
        except:
            return False

This works for me.
I'll try to propose this change to adodbapi. Any clue where/how to?

@jorisvandenbossche
Copy link
Member

It seems the project is on sourceforge: https://sourceforge.net/projects/adodbapi/. The https://pypi.org/project/adodbapi/ (at the bottom) mentions that the pywin32 mailing list can be used for this project.
Or maybe you can directly use the pywin32 github repo which includes adodbapi (https://github.com/mhammond/pywin32), I don't know.

@ParfaitG
Copy link
Contributor Author

Awesome @TomAubrunner! The fix worked with pandas read_sql even with parameters. I have cited you on a StackOverflow question! Let me know if I should re-ask this ticket on the pywin32 repo, or you can add a link if you already attempted a PR. I see some recent tickets for adodbapi, so authors may still be actively monitoring. Possibly the authors may have a different suggestion, too.

Thanks @jorisvandenbossche for looking into this! I believe we can close this ticket on pandas side, migrate, or I can backlink here.

@TomAubrunner
Copy link

TomAubrunner commented Sep 18, 2019 via email

@jorisvandenbossche
Copy link
Member

OK, thanks for the follow-up, closing then on the pandas side.

@jorisvandenbossche jorisvandenbossche removed the Needs Info Clarification about behavior needed to assess issue label Sep 18, 2019
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Sep 18, 2019
@vernondcole
Copy link

Thank you. Seen on pywin32 mailing list. I will look into the problem. The suggested fix looks appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

6 participants