Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enropy in groupby failed when no discrete probabilities provided #6011

Closed
2 tasks done
s-b90 opened this issue Jan 3, 2023 · 2 comments
Closed
2 tasks done

Enropy in groupby failed when no discrete probabilities provided #6011

s-b90 opened this issue Jan 3, 2023 · 2 comments
Labels
bug Something isn't working python Related to Python Polars

Comments

@s-b90
Copy link

s-b90 commented Jan 3, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

When filtering a colum in agg expression and after filtering we have 0 values. Entropy calculation fails.
Note if we do that without groupby, just for series, everything will be ok.

Reproducible example

import polars as pl

pl.DataFrame({'col_to_groupby': [2], 'time': [1672740910.967138],  'col3': [1]}).groupby('col_to_groupby').agg(
    (
        (pl.col('time').filter(pl.col('col3') == 0).unique_counts() /
         pl.col('time').filter(pl.col('col3') == 0).len()).entropy()

    ).alias('calc')
)
exceptions.ComputeError: returned aggregation is a different length: 0 than the group lengths: 1

Expected behavior

This computation should returns

col_to_groupby calc
i64 f32
2 null

Installed versions

---Version info---
Polars: 0.15.11
Index type: UInt32
Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.35
Python: 3.8.16 (default, Dec  7 2022, 01:12:06) 
[GCC 11.3.0]
---Optional dependencies---
pyarrow: 10.0.1
pandas: <not installed>
numpy: 1.22.3
fsspec: 2022.8.2
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: <not installed>
None
@s-b90 s-b90 added bug Something isn't working python Related to Python Polars labels Jan 3, 2023
@ritchie46
Copy link
Member

I can run this snippet in master after 0.15.15. It returns this:

shape: (1, 2)
┌────────────────┬──────┐
│ col_to_groupby ┆ calc │
│ ---            ┆ ---  │
│ i64            ┆ f64  │
╞════════════════╪══════╡
│ 2              ┆ -0.0 │
└────────────────┴──────┘

I only don't understand what you'd expect from entropy(len).

@s-b90
Copy link
Author

s-b90 commented Jan 23, 2023

Seems that it was fixed by #6036, thank you. I will close this.

I only don't understand what you'd expect from entropy(len).

It was entropy(unique_counts / len) where unique_counts / len are pk - discrete probabilities.
For some reasons, entropy calculates correctly only if values in series are prepared discrete probabilities. So we need to calculate pk before entropy.

@s-b90 s-b90 closed this as completed Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants