Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rename() for DataFrame and LazyFrame #239

Merged
merged 9 commits into from
Jun 11, 2023
Merged

Add rename() for DataFrame and LazyFrame #239

merged 9 commits into from
Jun 11, 2023

Conversation

etiennebacher
Copy link
Collaborator

Not sure about which args we should have here. I used existing and new as in the Rust implementation but the Python implementation uses a named list like list(old = "new") that I also like.

What do you think?

library(polars)

pl$
  DataFrame(mtcars)$
  rename(c("mpg", "hp"), c("miles_per_gallon", "horsepower"))
#> shape: (32, 11)
#> ┌──────────────────┬─────┬───────┬────────────┬───┬─────┬─────┬──────┬──────┐
#> │ miles_per_gallon ┆ cyl ┆ disp  ┆ horsepower ┆ … ┆ vs  ┆ am  ┆ gear ┆ carb │
#> │ ---              ┆ --- ┆ ---   ┆ ---        ┆   ┆ --- ┆ --- ┆ ---  ┆ ---  │
#> │ f64              ┆ f64 ┆ f64   ┆ f64        ┆   ┆ f64 ┆ f64 ┆ f64  ┆ f64  │
#> ╞══════════════════╪═════╪═══════╪════════════╪═══╪═════╪═════╪══════╪══════╡
#> │ 21.0             ┆ 6.0 ┆ 160.0 ┆ 110.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 21.0             ┆ 6.0 ┆ 160.0 ┆ 110.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 22.8             ┆ 4.0 ┆ 108.0 ┆ 93.0       ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 1.0  │
#> │ 21.4             ┆ 6.0 ┆ 258.0 ┆ 110.0      ┆ … ┆ 1.0 ┆ 0.0 ┆ 3.0  ┆ 1.0  │
#> │ …                ┆ …   ┆ …     ┆ …          ┆ … ┆ …   ┆ …   ┆ …    ┆ …    │
#> │ 15.8             ┆ 8.0 ┆ 351.0 ┆ 264.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 4.0  │
#> │ 19.7             ┆ 6.0 ┆ 145.0 ┆ 175.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 6.0  │
#> │ 15.0             ┆ 8.0 ┆ 301.0 ┆ 335.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 8.0  │
#> │ 21.4             ┆ 4.0 ┆ 121.0 ┆ 109.0      ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 2.0  │
#> └──────────────────┴─────┴───────┴────────────┴───┴─────┴─────┴──────┴──────┘

Created on 2023-06-10 with reprex v2.0.2

Copy link
Collaborator

@eitsupi eitsupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Looks nice.

@etiennebacher
Copy link
Collaborator Author

Actually I definitely prefer using a named list, it's much easier to see what is replaced by what when we want to rename a lot of variables:

library(polars)

pl$
  DataFrame(mtcars)$
  rename(
    list(
      mpg = "miles_per_gallon", 
      hp = "horsepower",
      gear = "Number of forward gears",
      carb = "Number of carburators"
    )
  )
#> shape: (32, 11)
#> ┌────────────────────┬─────┬───────┬────────────┬───┬─────┬─────┬────────────────────┬─────────────┐
#> │ miles_per_gallon   ┆ cyl ┆ disp  ┆ horsepower ┆ … ┆ vs  ┆ am  ┆ Number of forward  ┆ Number of   │
#> │ ---                ┆ --- ┆ ---   ┆ ---        ┆   ┆ --- ┆ --- ┆ gears              ┆ carburators │
#> │ f64                ┆ f64 ┆ f64   ┆ f64        ┆   ┆ f64 ┆ f64 ┆ ---                ┆ ---         │
#> │                    ┆     ┆       ┆            ┆   ┆     ┆     ┆ f64                ┆ f64         │
#> ╞════════════════════╪═════╪═══════╪════════════╪═══╪═════╪═════╪════════════════════╪═════════════╡
#> │ 21.0               ┆ 6.0 ┆ 160.0 ┆ 110.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0                ┆ 4.0         │
#> │ 21.0               ┆ 6.0 ┆ 160.0 ┆ 110.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0                ┆ 4.0         │
#> │ 22.8               ┆ 4.0 ┆ 108.0 ┆ 93.0       ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0                ┆ 1.0         │
#> │ 21.4               ┆ 6.0 ┆ 258.0 ┆ 110.0      ┆ … ┆ 1.0 ┆ 0.0 ┆ 3.0                ┆ 1.0         │
#> │ …                  ┆ …   ┆ …     ┆ …          ┆ … ┆ …   ┆ …   ┆ …                  ┆ …           │
#> │ 15.8               ┆ 8.0 ┆ 351.0 ┆ 264.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0                ┆ 4.0         │
#> │ 19.7               ┆ 6.0 ┆ 145.0 ┆ 175.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0                ┆ 6.0         │
#> │ 15.0               ┆ 8.0 ┆ 301.0 ┆ 335.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0                ┆ 8.0         │
#> │ 21.4               ┆ 4.0 ┆ 121.0 ┆ 109.0      ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0                ┆ 2.0         │
#> └────────────────────┴─────┴───────┴────────────┴───┴─────┴─────┴────────────────────┴─────────────┘

Created on 2023-06-10 with reprex v2.0.2

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 10, 2023

Actually I definitely prefer using a named list, it's much easier to see what is replaced by what when we want to rename a lot of variables:

I agree with that, but I think the old = "new" style is confusing...
I think new = "old" is better.

@eitsupi eitsupi requested review from eitsupi and removed request for eitsupi June 10, 2023 14:30
@etiennebacher
Copy link
Collaborator Author

etiennebacher commented Jun 10, 2023

I prefer old = "new" but other packages like dplyr have this new = old so I suppose it's better for consistency to also have new = old

I'll change it

@etiennebacher

This comment was marked as outdated.

R/dataframe__frame.R Outdated Show resolved Hide resolved
@etiennebacher
Copy link
Collaborator Author

library(polars)

pl$
  DataFrame(mtcars)$
  rename(
    miles_per_gallon = "mpg", 
    horsepower = "hp",
    `Number of forward gears` = "gear",
    `Number of carburators` = "carb"
  )
#> shape: (32, 11)
#> ┌────────────────────┬─────┬───────┬────────────┬───┬─────┬─────┬────────────────────┬─────────────┐
#> │ miles_per_gallon   ┆ cyl ┆ disp  ┆ horsepower ┆ … ┆ vs  ┆ am  ┆ Number of forward  ┆ Number of   │
#> │ ---                ┆ --- ┆ ---   ┆ ---        ┆   ┆ --- ┆ --- ┆ gears              ┆ carburators │
#> │ f64                ┆ f64 ┆ f64   ┆ f64        ┆   ┆ f64 ┆ f64 ┆ ---                ┆ ---         │
#> │                    ┆     ┆       ┆            ┆   ┆     ┆     ┆ f64                ┆ f64         │
#> ╞════════════════════╪═════╪═══════╪════════════╪═══╪═════╪═════╪════════════════════╪═════════════╡
#> │ 21.0               ┆ 6.0 ┆ 160.0 ┆ 110.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0                ┆ 4.0         │
#> │ 21.0               ┆ 6.0 ┆ 160.0 ┆ 110.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0                ┆ 4.0         │
#> │ 22.8               ┆ 4.0 ┆ 108.0 ┆ 93.0       ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0                ┆ 1.0         │
#> │ 21.4               ┆ 6.0 ┆ 258.0 ┆ 110.0      ┆ … ┆ 1.0 ┆ 0.0 ┆ 3.0                ┆ 1.0         │
#> │ …                  ┆ …   ┆ …     ┆ …          ┆ … ┆ …   ┆ …   ┆ …                  ┆ …           │
#> │ 15.8               ┆ 8.0 ┆ 351.0 ┆ 264.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0                ┆ 4.0         │
#> │ 19.7               ┆ 6.0 ┆ 145.0 ┆ 175.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0                ┆ 6.0         │
#> │ 15.0               ┆ 8.0 ┆ 301.0 ┆ 335.0      ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0                ┆ 8.0         │
#> │ 21.4               ┆ 4.0 ┆ 121.0 ┆ 109.0      ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0                ┆ 2.0         │
#> └────────────────────┴─────┴───────┴────────────┴───┴─────┴─────┴────────────────────┴─────────────┘

Created on 2023-06-10 with reprex v2.0.2

@etiennebacher
Copy link
Collaborator Author

@eitsupi I'm just re-requesting your review since there were changes since you approved

@sorhawell
Copy link
Collaborator

r-polars has in general a syntax deviation from py-polars . Wherever in py-polars where first input is a list or a dict e.g. pl.DataFrame({"a":[1,2,3]}) then pl$DataFrame(a=1:3) and pl$DataFrame(list(a=1:3)) are both valid. Also e.g. df.select([pl.col("a")]) then df$select(pl$col("a")). If enable named_expr it is also possible to write df$select(new_a=pl$col("a")) instead of $alias("new"). But the ladder has not stabilized in py-polars due to wildcard last I checked.

@etiennebacher
Copy link
Collaborator Author

so @sorhawell are you fine with the syntax in the last post?

@sorhawell
Copy link
Collaborator

so @sorhawell are you fine with the syntax in the last post?

Oh yes I missed that :) sry! Looks great :)

I will take a quick look at code also

@sorhawell
Copy link
Collaborator

forgot to add something

Copy link
Collaborator

@eitsupi eitsupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks!

@eitsupi eitsupi merged commit 9fc2772 into main Jun 11, 2023
8 checks passed
@eitsupi eitsupi deleted the dataframe_rename branch June 11, 2023 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants