feat: `<DataFrame>$partition_by()` #898

eitsupi · 2024-03-05T15:24:08Z

Close #891

Perhaps rather than adding a method to GroupBy, DataFrame's partition_by is sufficient.

library(polars)

df <- readr::read_csv(I("
id,timestamp,timezone
1,2019-01-01T00:00:00Z,UTC
2,2019-01-01T00:00:00Z,Asia/Tokyo
3,2019-01-01T20:00:00Z,UTC
4,2019-01-01T20:00:00Z,Asia/Tokyo
"), show_col_types = FALSE)

as_polars_df(df)$partition_by(
  "timezone",
  maintain_order = FALSE, as_nested_list = TRUE
) |>
  lapply(
    \(l) l$data$with_columns(
      naive_time = pl$col("timestamp")$dt$convert_time_zone(
        l$key$timezone
      )$dt$replace_time_zone(NULL)
    )
  ) |>
  pl$concat()
#> shape: (4, 4)
#> ┌─────┬─────────────────────────┬────────────┬─────────────────────┐
#> │ id  ┆ timestamp               ┆ timezone   ┆ naive_time          │
#> │ --- ┆ ---                     ┆ ---        ┆ ---                 │
#> │ f64 ┆ datetime[ms, UTC]       ┆ str        ┆ datetime[ms]        │
#> ╞═════╪═════════════════════════╪════════════╪═════════════════════╡
#> │ 2.0 ┆ 2019-01-01 00:00:00 UTC ┆ Asia/Tokyo ┆ 2019-01-01 09:00:00 │
#> │ 4.0 ┆ 2019-01-01 20:00:00 UTC ┆ Asia/Tokyo ┆ 2019-01-02 05:00:00 │
#> │ 1.0 ┆ 2019-01-01 00:00:00 UTC ┆ UTC        ┆ 2019-01-01 00:00:00 │
#> │ 3.0 ┆ 2019-01-01 20:00:00 UTC ┆ UTC        ┆ 2019-01-01 20:00:00 │
#> └─────┴─────────────────────────┴────────────┴─────────────────────┘

^{Created on 2024-03-09 with reprex v2.1.0}

R/dataframe__frame.R

eitsupi · 2024-03-09T16:00:16Z

R/dataframe__frame.R

+    ...,
+    maintain_order = TRUE,
+    include_key = TRUE,
+    as_nested_list = FALSE) {


Is this name appropriate?

Yes I think it is the best equivalent to Python's dict

etiennebacher

Thanks, some comments. Can you also bump NEWS?

R/dataframe__frame.R

etiennebacher · 2024-03-10T20:45:06Z

R/dataframe__frame.R

+    ...,
+    maintain_order = TRUE,
+    include_key = TRUE,
+    as_nested_list = FALSE) {


Yes I think it is the best equivalent to Python's dict

R/dataframe__frame.R

etiennebacher · 2024-03-10T20:53:15Z

R/dotdotdot.R

+#' Convert dots to a character vector of column names
+#' @param .df [RPolarsDataFrame]
+#' @param ... Arguments to pass to [`pl$col()`][pl_col]
+#' @noRd
+dots_to_colnames = function(.df, ..., .call = sys.call(1L)) {
+  result(pl$DataFrame(schema = .df$schema)$select(pl$col(...))$columns) |>
+    unwrap(call = .call)
+}


I'm surprised this wasn't needed before , I think DataFrame$drop() should have a similar input as DataFrame$partition_by() based on the py-polars API

@etiennebacher Could you update $drop()?

Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>

eitsupi · 2024-03-10T23:41:20Z

@etiennebacher Thanks for your review.
I think I have addressed your comments.
(Can't merge because it hasn't been approved/you can approve with minor comments instead of requiring changes to reduce everyone's workload)

eitsupi force-pushed the partition-by branch 2 times, most recently from fa79a0b to bb344c8 Compare March 9, 2024 15:57

eitsupi requested a review from etiennebacher March 9, 2024 15:58

eitsupi commented Mar 9, 2024

View reviewed changes

eitsupi marked this pull request as ready for review March 9, 2024 16:00

eitsupi changed the title ~~WIP feat: <DataFrame>$partition_by() [skip ci]~~ feat: <DataFrame>$partition_by() Mar 9, 2024

feat: <DataFrame>$partition_by()

c658187

eitsupi force-pushed the partition-by branch from bb344c8 to c658187 Compare March 9, 2024 16:48

eitsupi added 2 commits March 9, 2024 17:04

test: fix test cases

06430ff

chore: rename the argument for to match Python Polars

4fe6d65

etiennebacher requested changes Mar 10, 2024

View reviewed changes

eitsupi and others added 4 commits March 11, 2024 07:42

Apply suggestions from code review [skip ci]

b077eee

Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>

docs(news): add the item

8d2ec2f

docs: tweak example

33f3485

docs: regen Rd file

b1693c2

eitsupi mentioned this pull request Mar 11, 2024

feat: bump polars to 0.38.2 #907

Merged

eitsupi added this to the 0.15 milestone Mar 11, 2024

eitsupi mentioned this pull request Mar 11, 2024

0.15.1 release #908

Closed

etiennebacher approved these changes Mar 11, 2024

View reviewed changes

etiennebacher merged commit a4c9a9f into main Mar 11, 2024
34 checks passed

etiennebacher deleted the partition-by branch March 11, 2024 11:42

etiennebacher mentioned this pull request Mar 11, 2024

Update $drop() input #913

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `<DataFrame>$partition_by()` #898

feat: `<DataFrame>$partition_by()` #898

eitsupi commented Mar 5, 2024 •

edited

Loading

eitsupi Mar 9, 2024

etiennebacher Mar 10, 2024

etiennebacher left a comment

etiennebacher Mar 10, 2024

etiennebacher Mar 10, 2024

eitsupi Mar 11, 2024

eitsupi commented Mar 10, 2024 •

edited

Loading

feat: <DataFrame>$partition_by() #898

feat: <DataFrame>$partition_by() #898

Conversation

eitsupi commented Mar 5, 2024 • edited Loading

eitsupi Mar 9, 2024

Choose a reason for hiding this comment

etiennebacher Mar 10, 2024

Choose a reason for hiding this comment

etiennebacher left a comment

Choose a reason for hiding this comment

etiennebacher Mar 10, 2024

Choose a reason for hiding this comment

etiennebacher Mar 10, 2024

Choose a reason for hiding this comment

eitsupi Mar 11, 2024

Choose a reason for hiding this comment

eitsupi commented Mar 10, 2024 • edited Loading

feat: `<DataFrame>$partition_by()` #898

feat: `<DataFrame>$partition_by()` #898

eitsupi commented Mar 5, 2024 •

edited

Loading

eitsupi commented Mar 10, 2024 •

edited

Loading