Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct.rename_fields enhancements: correct name count & dict input #10777

Open
Julian-J-S opened this issue Aug 29, 2023 · 7 comments
Open

struct.rename_fields enhancements: correct name count & dict input #10777

Julian-J-S opened this issue Aug 29, 2023 · 7 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@Julian-J-S
Copy link
Contributor

Problem description

Adjusting struct field names currently is a little weird with rename_fields

Length of names parameter

rating_Series = pl.Series(
    "ratings",
    [
        {"Movie": "Cars", "Theatre": "NE", "Avg_Rating": 4.5},
        {"Movie": "Toy Story", "Theatre": "ME", "Avg_Rating": 4.9},
    ],
)

# Start
rating_Series.struct.unnest()
┌───────────┬─────────┬────────────┐
│ MovieTheatreAvg_Rating │
│ ---------        │
│ strstrf64        │
╞═══════════╪═════════╪════════════╡
│ CarsNE4.5        │
│ Toy StoryME4.9        │
└───────────┴─────────┴────────────┘

# Too many names
rating_Series
.struct.rename_fields(names=['Film', 'State', 'Value', 'hello', 'world'])
.struct.unnest()
┌───────────┬───────┬───────┐
│ FilmStateValue │
│ ---------   │
│ strstrf64   │
╞═══════════╪═══════╪═══════╡
│ CarsNE4.5   │
│ Toy StoryME4.9   │
└───────────┴───────┴───────┘

# Too few
rating_Series
.struct.rename_fields(names=['Film'])
.struct.unnest()
┌───────────┐
│ Film      │
│ ---       │
│ str       │
╞═══════════╡
│ Cars      │
│ Toy Story │
└───────────┘

To discuss:

  • too many names provided
    • no effect, additional names will be ignored
    • should this be allowed?
  • too few names provided:
    • missing columns will be dropped
    • is this intended?

Comparison to Dataframe columns:

  • df.columns = [...]
  • crash if too many/few names provided with: ShapeError: X column names provided for a dataframe of width Y

Add option to provide a mapping to adjust only selected names

Example: rename_fields({'Movie': 'Film', Theatre': 'State'})

@Julian-J-S Julian-J-S added the enhancement New feature or an improvement of an existing feature label Aug 29, 2023
@cmdlineluser
Copy link
Contributor

cmdlineluser commented Aug 29, 2023

Too few should error: #9052 (comment)

Too few names dropping missing columns is not intended: #9052 (comment)

@ion-elgreco
Copy link
Contributor

Too few should error: #9052 (comment)

Why though? A normal rename can do partial renames, shouldn't struct.field_renames behave similarly and keep the other fields but not renamed when no mapping has been passed.

@deanm0000
Copy link
Collaborator

Too few should error: #9052 (comment)

Why though? A normal rename can do partial renames, shouldn't struct.field_renames behave similarly and keep the other fields but not renamed when no mapping has been passed.

It seems the balance is between there being a use case for wanting to rename the first n fields positionally vs simply accidentally feeding too few arguments to the rename.

I know I'm much more likely to be in the latter camp than the former. Additionally, if you are in the former camp and get an error here, you'll know how to address it.

@DGolubets
Copy link

Would be great to have rename_fields accept a dict.

@cmdlineluser
Copy link
Contributor

@DGolubets .name.map_fields() has since been added which can help if you're using frames.

df = rating_Series.to_frame()

df.schema["ratings"]
# Struct({'Movie': String, 'Theatre': String, 'Avg_Rating': Float64})

df.with_columns(
   pl.col("ratings").name.map_fields(lambda f:
       {"Movie": "Film", "Theatre": "State"}.get(f, f)
   )
).schema["ratings"]
# Struct({'Film': String, 'State': String, 'Avg_Rating': Float64})

@DGolubets
Copy link

@cmdlineluser Great!

@DeflateAwning
Copy link
Contributor

+1 on .rename_fields() supporting a dict argument

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

6 participants