Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating the mean of a Decimal column returns nulls #15953

Open
2 tasks done
knutmerket opened this issue Apr 29, 2024 · 0 comments
Open
2 tasks done

Calculating the mean of a Decimal column returns nulls #15953

knutmerket opened this issue Apr 29, 2024 · 0 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@knutmerket
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
pl.Config.activate_decimals(True)

df = pl.DataFrame(
    {
        "a": [1, 8, 3],
        "b": [-16840000.8900, 16840000.0000, 30000.0000],
        "c": ["foo", "bar", "foo"],
    },
)

(df
.with_columns(
    d=pl.col("b").cast(pl.Decimal())
)
.group_by("c").agg(mean=pl.col("d").mean())
)

┌─────┬───────────────┐
│ c ┆ mean │
│ --- ┆ --- │
│ str ┆ decimal[38,0] │
╞═════╪═══════════════╡
│ foo ┆ null │
│ bar ┆ null │
└─────┴───────────────┘

Log output

Out[9]: 
shape: (2, 2)
┌─────┬───────────────┐
│ c   ┆ mean          │
│ --- ┆ ---           │
│ str ┆ decimal[38,0] │
╞═════╪═══════════════╡
│ foo ┆ null          │
│ bar ┆ null          │
└─────┴───────────────┘

Issue description

My data source is a Denodo view where one of the fields is of the Decimal (19,4) type. I read this into Polars (v. 0.20.23) using read_database(). And sure enough it's read as that data type (despite that fact that I haven't activated Decimals in the config and that none of the values that my query returns are even close to going beyond the limits of Float64, but I guess the Denodo view's dtype specification is respected).

I want to group by another column and do a calculation of the Decimal-column's mean. But this returns only nulls.

I am not sure if this is partially or fully expected with the Decimal type since it's considered "unstable".

For the actual "real world" example where I ran into this issue, I can get around it by converting the column to Float64, but that obviously wouldn't work if I had rows with values that would overflow that dtype.

Expected behavior

Actual means of the groups, not missing values.

Installed versions

--------Version info---------
Polars:               0.20.23
Index type:           UInt32
Platform:             Linux-4.14.336-256.559.amzn2.x86_64-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              <not installed>
pydantic:             2.7.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@knutmerket knutmerket added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant