Rename `rle()` struct fields to `len` and `value` #15230

cmdlineluser · 2024-03-22T13:02:11Z

Description

Remove the plural from Series.rle() and Expr.rle() field names.

(Similar to what was done for value_counts: #11462)

Current:

>>> pl.Series([1, 1, 2, 3]).rle().struct.unnest()
shape: (3, 2)
┌─────────┬────────┐
│ lengths ┆ values │
│ ---     ┆ ---    │
│ i32     ┆ i64    │
╞═════════╪════════╡
│ 2       ┆ 1      │
│ 1       ┆ 2      │
│ 1       ┆ 3      │
└─────────┴────────┘

Desired:

>>> pl.Series([1, 1, 2, 3]).rle().struct.unnest()
shape: (3, 2)
┌─────────┬────────┐
│ len     ┆ value  │
│ ---     ┆ ---    │
│ i32     ┆ i64    │
╞═════════╪════════╡
│ 2       ┆ 1      │
│ 1       ┆ 2      │
│ 1       ┆ 3      │
└─────────┴────────┘

(Choosing len to match up with list.len())

The text was updated successfully, but these errors were encountered:

cmdlineluser · 2024-03-22T13:57:10Z

Side note: Would it make sense for rle() to also return the row index? {index, value, len}

The particular use-case being wanting the original row index after performing a .filter()

df = pl.DataFrame({"foo": ["a", "a", "a", "b", "c", "c"]})

df.select(pl.col("foo").rle())
# shape: (3, 1)
# ┌───────────┐
# │ foo       │
# │ ---       │
# │ struct[2] │
# ╞═══════════╡
# │ {3,"a"}   │
# │ {1,"b"}   │
# │ {2,"c"}   │
# └───────────┘

We can calculate it from the length, but it's a little awkward:

(df.select(pl.col("foo").rle())
   .with_columns(
      index = pl.col("foo").struct["lengths"].cum_sum().shift().fill_null(0)
   )
   #.filter(...)
)
# shape: (3, 2)
# ┌───────────┬───────┐
# │ foo       ┆ index │
# │ ---       ┆ ---   │
# │ struct[2] ┆ i32   │
# ╞═══════════╪═══════╡
# │ {3,"a"}   ┆ 0     │
# │ {1,"b"}   ┆ 3     │
# │ {2,"c"}   ┆ 4     │
# └───────────┴───────┘

stinodego · 2024-03-22T14:01:30Z

Agreed on the rename.

I don't think the index should be part of the RLE method by default. It is not an essential part of the RLE definition. Though possibly an include_index parameter would make sense - but please open a separate issue for that.

stinodego · 2024-03-23T07:01:03Z

@cmdlineluser I am tempted to also change the field order of the struct to value/len. That way it matches value_counts. What do you think?

EDIT: Nevermind, it's probably not a good idea as the standard RLE places len before value, e.g. 12W1B12W3B24W1B14W.

cmdlineluser added the enhancement New feature or an improvement of an existing feature label Mar 22, 2024

stinodego added the accepted Ready for implementation label Mar 22, 2024

stinodego added this to the 1.0.0 milestone Mar 22, 2024

stinodego self-assigned this Mar 23, 2024

This was referenced Mar 23, 2024

fix: Fix lazy schema for rle expression #15248

Merged

feat!: Rename struct fields of rle output to len/value and update data type of len field #15249

Merged

stinodego added the A-api Area: changes to the public API label Mar 23, 2024

stinodego closed this as completed in #15249 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename `rle()` struct fields to `len` and `value` #15230

Rename `rle()` struct fields to `len` and `value` #15230

cmdlineluser commented Mar 22, 2024

cmdlineluser commented Mar 22, 2024

stinodego commented Mar 22, 2024 •

edited

Loading

stinodego commented Mar 23, 2024 •

edited

Loading

Rename rle() struct fields to len and value #15230

Rename rle() struct fields to len and value #15230

Comments

cmdlineluser commented Mar 22, 2024

Description

cmdlineluser commented Mar 22, 2024

stinodego commented Mar 22, 2024 • edited Loading

stinodego commented Mar 23, 2024 • edited Loading

Rename `rle()` struct fields to `len` and `value` #15230

Rename `rle()` struct fields to `len` and `value` #15230

stinodego commented Mar 22, 2024 •

edited

Loading

stinodego commented Mar 23, 2024 •

edited

Loading