Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Lost precision of NUMBER in read_sql_table(). #34988

Open
3 tasks done
andrewkisel opened this issue Jun 25, 2020 · 3 comments
Open
3 tasks done

BUG: Lost precision of NUMBER in read_sql_table(). #34988

andrewkisel opened this issue Jun 25, 2020 · 3 comments
Assignees
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@andrewkisel
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Problem description

When using pd.read_sql_table() with Oracle DB precision of data type sqlalchemy.dialects.oracle.NUMBER is lost.

Expected Output

Precision should not be lost as this datatype can be either integer or float.

The problem

Looks like the root cause for this is in _harmonize_columns(), particularly in _get_dtype() that is used there. The SQLAlchemy type sqlalchemy.dialects.oracle.NUMBER is based on both sqlalchemy.types.Numeric and sqlalchemy.types.Integer. As there is no support for SQLAlchemy Numeric type in _get_dtype() this Oracle type is being treated as integer and precision is dropped when using astype() in _harmonize_columns() itself.

@andrewkisel andrewkisel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 25, 2020
@kirthash
Copy link

Hello

This bug has been reproduced in:

INSTALLED VERSIONS

commit : None
python : 3.8.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : es_ES.cp1252

pandas : 1.0.4
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 41.2.0
jinja2 : 2.11.2
IPython : 7.15.0
sqlalchemy : 1.3.18

df3 have the correct dtype.

image

@jbrockmendel jbrockmendel added IO SQL to_sql, read_sql, read_sql_query Enhancement and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Sep 3, 2020
@jowage58
Copy link

jowage58 commented Jan 5, 2022

I also ran into this problem with an Oracle column defined as NUMBER(7,6). As mentioned since there's no handling of Numeric types in the method pandas.io.sql.SQLTable._get_dtype and since the sqlalchemy.dialects.oracle.base.NUMBER(sqltypes.Numeric, sqltypes.Integer) falls into the Integer test and it returns int64 for the type.

pandas/pandas/io/sql.py

Lines 1241 to 1255 in 66e3805

def _get_dtype(self, sqltype):
from sqlalchemy.types import (
TIMESTAMP,
Boolean,
Date,
DateTime,
Float,
Integer,
)
if isinstance(sqltype, Float):
return float
elif isinstance(sqltype, Integer):
# TODO: Refine integer size.
return np.dtype("int64")

As a work-around I patched the _get_dtype method and added a check before calling the original method.

if isinstance(sqltype, sqlalchemy.types.Numeric):
    return sqltype.python_type

@JamieAllen1
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
5 participants