read_sql should convert Numeric to decimals #20525

postelrich · 2018-03-28T20:56:33Z

pd.read_sql converts columns of type sqlalchemy.Numeric to floats as the below selection appears to suggest. This seems counter intuitive since in the NUMERIC and DECIMAL types are interchangeable. Also the NUMERIC type has an option asdecimal that defaults to true (http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.Numeric). I would expect that sqlalchemy.Numeric types get converted to decimal.Decimal.

https://github.com/pandas-dev/pandas/blob/master/pandas/io/sql.py#L1187-L1190

The text was updated successfully, but these errors were encountered:

postelrich · 2018-03-28T21:20:56Z

coerce_float=False skips the conversion so maybe a non-issue.

jreback · 2018-03-29T10:16:33Z

decimal.Decimal is not a first class type, so we generally don't do things like this. But I suppose it could be done if a SQL dtype is unambiguously a decimal type.

jorisvandenbossche · 2018-03-29T11:30:59Z

For the reason that @jreback mentions (decimals cannot be stored natively in a pandas dataframe, only as object dtype), I don't think we want to change this by default. But, it could be a good idea to at least enable the user the specify explicitly they want to keep the decimal objects. That would be a welcome contribution I think.

grofte · 2019-01-30T09:23:50Z

There is a very old, closed issue that raises some good points about dtypes:
#6798

If you allow read_sql a parameter for specifying the dtypes of the resulting table then that could help people a lot.

Alternatively, add an example to the documentation where
read_sql has a chunk_size
the first chunk is read into a dataframe
the dtypes are specified
the rest of the data is added

It would very useful for large tables. Casting to categories and smaller bit dtypes saves so much memory which in turn decreases data processing run-time significantly.

TheFou · 2021-03-31T03:24:38Z

coerce_float=False skips the conversion so maybe a non-issue.

I just did a test, and whether I use True or False, NUMERIC columns in SQL tables get typed to float64.

I don't understand the logic here.
In pd.read_csv(), It is possible to use converters= to get data loaded accurately.
When exporting with to_sql(), one can use the dtype= parameter to ensure Decimal type gets converted to NUMERIC in database, once again keeping decimal accuracy.
But no way to ensure that the data exported previously can be imported back while keeping decimal accuracy ?
Honestly, this makes no sense to me.

Failing workaround : if I initialize the target dataframe of the import from db, using an emptied copy of an existing dataframe which is typed correctly, the import resets the types of the target dataframe.
So it seems there is literally no option to keep accuracy when importing decimal data from a DB using pandas directly.

For the reason that @jreback mentions (decimals cannot be stored natively in a pandas dataframe, only as object dtype), I don't think we want to change this by default. But, it could be a good idea to at least enable the user the specify explicitly they want to keep the decimal objects. That would be a welcome contribution I think.

I'm currently learning Python, so I'm far from having a level good enough to do it (otherwise I would).
But I really think this is as important as the counterpart parameters mentioned above which have been implemented already, and should be bumped up priority wise.
Not only is it "a good idea", but it is needed for consistency in data accuracy.

Apart from that, thanks to all the developers contributing to this framework, which is a wonder even with these small quirks here and there ^^

felipead · 2022-10-10T18:31:42Z

👍 Please upvote this, I am having a very hard time with Pandas right now because of this issue.

jbrockmendel · 2023-02-23T17:30:09Z

@mroeschke possibly viable with pyarrow decimal?

mroeschke · 2023-02-23T17:45:04Z

Yeah this in theory should be better supported with pyarrow decimal. The sqlalchemy layer will need to be changed though because I think it immediately does the conversion of decimal to float

jreback added Dtype Conversions Unexpected or buggy dtype conversions IO SQL to_sql, read_sql, read_sql_query Difficulty Intermediate labels Mar 29, 2018

jreback added this to the Next Major Release milestone Mar 29, 2018

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke added the Enhancement label May 16, 2020

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_sql should convert Numeric to decimals #20525

read_sql should convert Numeric to decimals #20525

postelrich commented Mar 28, 2018

postelrich commented Mar 28, 2018

jreback commented Mar 29, 2018

jorisvandenbossche commented Mar 29, 2018

grofte commented Jan 30, 2019

TheFou commented Mar 31, 2021

felipead commented Oct 10, 2022

jbrockmendel commented Feb 23, 2023

mroeschke commented Feb 23, 2023

read_sql should convert Numeric to decimals #20525

read_sql should convert Numeric to decimals #20525

Comments

postelrich commented Mar 28, 2018

postelrich commented Mar 28, 2018

jreback commented Mar 29, 2018

jorisvandenbossche commented Mar 29, 2018

grofte commented Jan 30, 2019

TheFou commented Mar 31, 2021

felipead commented Oct 10, 2022

jbrockmendel commented Feb 23, 2023

mroeschke commented Feb 23, 2023