-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement pl$arg_sort_by()
#929
Conversation
Merge branch 'main' into arg_sort_by # Conflicts: # man/pl_pl.Rd
I'm a bit stuck here, the following crashes the session and I don't know why: df = pl$DataFrame(
a = c(0, 1, 1, 0),
b = c(3, 2, 3, 2)
)
df$with_columns(
arg_sort_ab = pl$arg_sort_by(c("a", "b"), descending = TRUE)
) |
This is reproduced in Python. The error occurs at the following location: >>> import polars as pl
>>> df = pl.DataFrame(
... {
... "a": [0, 1, 1, 0],
... "b": [3, 2, 3, 2],
... }
... )
>>> df.select(pl.arg_sort_by(pl.col("a")))
shape: (4, 1)
┌─────┐
│ a │
│ --- │
│ u32 │
╞═════╡
│ 0 │
│ 3 │
│ 1 │
│ 2 │
└─────┘
>>> df.select(pl.arg_sort_by(pl.col("a", "b")))
thread '<unnamed>' panicked at /home/runner/work/polars/polars/crates/polars-plan/src/dsl/functions/index.rs:10:36:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("this expression may produce multiple output names"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rstudio/.local/lib/python3.10/site-packages/polars/functions/lazy.py", line 1601, in arg_sort_by
return wrap_expr(plr.arg_sort_by(exprs, descending))
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("this expression may produce multiple output names"))
>>> df.select(pl.arg_sort_by(pl.col("*")))
thread '<unnamed>' panicked at /home/runner/work/polars/polars/crates/polars-plan/src/dsl/functions/index.rs:10:36:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("cannot determine output column without a context for this expression"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rstudio/.local/lib/python3.10/site-packages/polars/functions/lazy.py", line 1601, in arg_sort_by
return wrap_expr(plr.arg_sort_by(exprs, descending))
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("cannot determine output column without a context for this expression"))
>>> df.select(pl.arg_sort_by(pl.col("^h")))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rstudio/.local/lib/python3.10/site-packages/polars/dataframe/frame.py", line 8129, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
File "/home/rstudio/.local/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1937, in collect
return wrap_df(ldf.collect())
polars.exceptions.ColumnNotFoundError: ^h |
Looking at the operation of this, only the first column has special meaning (reuse as the result column name). There are functions with similar behavior, such as |
I think it is, they have an example like that in py-polars: import polars as pl
df = pl.DataFrame(
{
"a": [0, 1, 1, 0],
"b": [3, 2, 3, 2],
}
)
df.select(pl.arg_sort_by(["a", "b"]))
shape: (4, 1)
┌─────┐
│ a │
│ --- │
│ u32 │
╞═════╡
│ 3 │
│ 0 │
│ 1 │
│ 2 │
└─────┘ |
I pushed a change to allow vectors of length > 1 as the first argument, but in the end I don't think this is a fundamental solution because passing something like |
Thanks for your help
I think that also depends on your PR upstream. I'm fine with letting this one as a draft until it's sorted out upstream, or merging this now, as you prefer. |
I would prefer to merge this PR rather than leave it open. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Related to #204. To finish after #923 is merged.