Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_sql should convert Numeric to decimals #20525

Open
postelrich opened this issue Mar 28, 2018 · 8 comments
Open

read_sql should convert Numeric to decimals #20525

postelrich opened this issue Mar 28, 2018 · 8 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@postelrich
Copy link

pd.read_sql converts columns of type sqlalchemy.Numeric to floats as the below selection appears to suggest. This seems counter intuitive since in the NUMERIC and DECIMAL types are interchangeable. Also the NUMERIC type has an option asdecimal that defaults to true (http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.Numeric). I would expect that sqlalchemy.Numeric types get converted to decimal.Decimal.

https://github.com/pandas-dev/pandas/blob/master/pandas/io/sql.py#L1187-L1190

@postelrich
Copy link
Author

coerce_float=False skips the conversion so maybe a non-issue.

@jreback
Copy link
Contributor

jreback commented Mar 29, 2018

decimal.Decimal is not a first class type, so we generally don't do things like this. But I suppose it could be done if a SQL dtype is unambiguously a decimal type.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions IO SQL to_sql, read_sql, read_sql_query Difficulty Intermediate labels Mar 29, 2018
@jreback jreback added this to the Next Major Release milestone Mar 29, 2018
@jorisvandenbossche
Copy link
Member

For the reason that @jreback mentions (decimals cannot be stored natively in a pandas dataframe, only as object dtype), I don't think we want to change this by default. But, it could be a good idea to at least enable the user the specify explicitly they want to keep the decimal objects. That would be a welcome contribution I think.

@grofte
Copy link

grofte commented Jan 30, 2019

There is a very old, closed issue that raises some good points about dtypes:
#6798

If you allow read_sql a parameter for specifying the dtypes of the resulting table then that could help people a lot.

Alternatively, add an example to the documentation where
read_sql has a chunk_size
the first chunk is read into a dataframe
the dtypes are specified
the rest of the data is added

It would very useful for large tables. Casting to categories and smaller bit dtypes saves so much memory which in turn decreases data processing run-time significantly.

@TheFou
Copy link

TheFou commented Mar 31, 2021

coerce_float=False skips the conversion so maybe a non-issue.

I just did a test, and whether I use True or False, NUMERIC columns in SQL tables get typed to float64.


I don't understand the logic here.
In pd.read_csv(), It is possible to use converters= to get data loaded accurately.
When exporting with to_sql(), one can use the dtype= parameter to ensure Decimal type gets converted to NUMERIC in database, once again keeping decimal accuracy.
But no way to ensure that the data exported previously can be imported back while keeping decimal accuracy ?
Honestly, this makes no sense to me.

Failing workaround : if I initialize the target dataframe of the import from db, using an emptied copy of an existing dataframe which is typed correctly, the import resets the types of the target dataframe.
So it seems there is literally no option to keep accuracy when importing decimal data from a DB using pandas directly.

For the reason that @jreback mentions (decimals cannot be stored natively in a pandas dataframe, only as object dtype), I don't think we want to change this by default. But, it could be a good idea to at least enable the user the specify explicitly they want to keep the decimal objects. That would be a welcome contribution I think.

I'm currently learning Python, so I'm far from having a level good enough to do it (otherwise I would).
But I really think this is as important as the counterpart parameters mentioned above which have been implemented already, and should be bumped up priority wise.
Not only is it "a good idea", but it is needed for consistency in data accuracy.


Apart from that, thanks to all the developers contributing to this framework, which is a wonder even with these small quirks here and there ^^

@felipead
Copy link

👍 Please upvote this, I am having a very hard time with Pandas right now because of this issue.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

@mroeschke possibly viable with pyarrow decimal?

@mroeschke
Copy link
Member

Yeah this in theory should be better supported with pyarrow decimal. The sqlalchemy layer will need to be changed though because I think it immediately does the conversion of decimal to float

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

8 participants