We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
This comes from a question I saw on StackOverflow. It doesn't look like it's already been reported here.
The issue is that same command can give different outputs (I think, depending on whether or not groupby swapped the rows?)
groupby
In [68]: df = pl.DataFrame( ...: { ...: "day": [2, 2, 2, 2, 2, 2, 1, 1], ...: "y": [4, 5, 8, 7, 9, None, None, None], ...: "x": [1, 2, 3, 4, 5, 6, 1, 2], ...: } ...: ...: ) ...: ...: xcol = "x" ...: ycol = "y" ...: f = pl.col(ycol).is_not_null() & pl.col(xcol).is_not_null() In [69]: df.groupby('day').agg((pl.col(xcol) - pl.col(xcol).mean()).filter(f)) Out[69]: shape: (2, 2) ┌─────┬────────────────────┐ │ day ┆ x │ │ --- ┆ --- │ │ i64 ┆ list[f64] │ ╞═════╪════════════════════╡ │ 1 ┆ [-0.5, 0.5] │ │ 2 ┆ [-2.5, -1.5, -0.5] │ └─────┴────────────────────┘ In [70]: df.groupby('day').agg((pl.col(xcol) - pl.col(xcol).mean()).filter(f)) Out[70]: shape: (2, 2) ┌─────┬───────────────────────┐ │ day ┆ x │ │ --- ┆ --- │ │ i64 ┆ list[f64] │ ╞═════╪═══════════════════════╡ │ 2 ┆ [-2.5, -1.5, ... 1.5] │ │ 1 ┆ [] │ └─────┴───────────────────────┘
df = pl.DataFrame( { "day": [2, 2, 2, 2, 2, 2, 1, 1], "y": [4, 5, 8, 7, 9, None, None, None], "x": [1, 2, 3, 4, 5, 6, 1, 2], } ) xcol = "x" ycol = "y" f = pl.col(ycol).is_not_null() & pl.col(xcol).is_not_null() df.groupby('day').agg((pl.col(xcol) - pl.col(xcol).mean()).filter(f))
Both times, I would have expected
shape: (2, 2) ┌─────┬───────────────────────┐ │ day ┆ x │ │ --- ┆ --- │ │ i64 ┆ list[f64] │ ╞═════╪═══════════════════════╡ │ 2 ┆ [-2.5, -1.5, ... 1.5] │ │ 1 ┆ [] │ └─────┴───────────────────────┘
---Version info--- Polars: 0.16.6 Index type: UInt32 Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 Python: 3.8.16 (default, Dec 7 2022, 01:12:06) [GCC 11.3.0] ---Optional dependencies--- pyarrow: 11.0.0 pandas: 1.5.3 numpy: 1.24.2 fsspec: <not installed> connectorx: <not installed> xlsx2csv: <not installed> deltalake: <not installed> matplotlib: 3.6.2
The text was updated successfully, but these errors were encountered:
Thanks for the report. Will pick it up. Won't be this release anymore sadly.
Sorry, something went wrong.
ritchie46
Successfully merging a pull request may close this issue.
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
This comes from a question I saw on StackOverflow. It doesn't look like it's already been reported here.
The issue is that same command can give different outputs (I think, depending on whether or not
groupby
swapped the rows?)Reproducible example
Expected behavior
Both times, I would have expected
Installed versions
The text was updated successfully, but these errors were encountered: