Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<GroupBy>$to_list() or something similar #891

Closed
eitsupi opened this issue Mar 3, 2024 · 3 comments · Fixed by #898
Closed

<GroupBy>$to_list() or something similar #891

eitsupi opened this issue Mar 3, 2024 · 3 comments · Fixed by #898
Assignees
Labels
enhancement New feature or request

Comments

@eitsupi
Copy link
Collaborator

eitsupi commented Mar 3, 2024

It might be useful to provide a function to split a DataFrame by groups as like polars.dataframe.group_by.GroupBy.__iter__ in Python.

My use case is to add a naive time column with the local time for each row of a table that has a column with UTC time and a local time zone string.
Currently the function to change the timezone does not take Expr as input, so I need to split the DataFrame for each timezone string and add the columns back into one DataFrame after adding them.

With the clock package:

df <- readr::read_csv(I("
id,timestamp,timezone
1,2019-01-01T00:00:00Z,UTC
2,2019-01-01T00:00:00Z,Asia/Tokyo
3,2019-01-01T20:00:00Z,UTC
4,2019-01-01T20:00:00Z,Asia/Tokyo
"), show_col_types = FALSE)

df |>
  dplyr::mutate(
    sys_time = clock::as_sys_time(timestamp),
    offset = clock::sys_time_info(sys_time, timezone)$offset,
    naive_time = clock::as_naive_time(sys_time + offset)
  ) |>
  dplyr::select(id, timestamp, timezone, naive_time)
#> # A tibble: 4 × 4
#>      id timestamp           timezone   naive_time
#>   <dbl> <dttm>              <chr>      <naive<second>>
#> 1     1 2019-01-01 00:00:00 UTC        2019-01-01T00:00:00
#> 2     2 2019-01-01 00:00:00 Asia/Tokyo 2019-01-01T09:00:00
#> 3     3 2019-01-01 20:00:00 UTC        2019-01-01T20:00:00
#> 4     4 2019-01-01 20:00:00 Asia/Tokyo 2019-01-02T05:00:00

Created on 2024-03-03 with reprex v2.0.2

With polars now...

library(polars)

df <- readr::read_csv(I("
id,timestamp,timezone
1,2019-01-01T00:00:00Z,UTC
2,2019-01-01T00:00:00Z,Asia/Tokyo
3,2019-01-01T20:00:00Z,UTC
4,2019-01-01T20:00:00Z,Asia/Tokyo
"), show_col_types = FALSE)

pldf <- as_polars_df(df)

out <- list()

for (tz in unique(df$timezone)) {
  out[[tz]] <- pldf$filter(pl$col("timezone") == tz)$with_columns(
    naive_time = pl$col("timestamp")$dt$convert_time_zone(tz)$dt$replace_time_zone(NULL)
  )
}

pl$concat(out)
#> shape: (4, 4)
#> ┌─────┬─────────────────────────┬────────────┬─────────────────────┐
#> │ id  ┆ timestamp               ┆ timezone   ┆ naive_time          │
#> │ --- ┆ ---                     ┆ ---        ┆ ---                 │
#> │ f64 ┆ datetime[ms, UTC]       ┆ str        ┆ datetime[ms]        │
#> ╞═════╪═════════════════════════╪════════════╪═════════════════════╡
#> │ 1.0 ┆ 2019-01-01 00:00:00 UTC ┆ UTC        ┆ 2019-01-01 00:00:00 │
#> │ 3.0 ┆ 2019-01-01 20:00:00 UTC ┆ UTC        ┆ 2019-01-01 20:00:00 │
#> │ 2.0 ┆ 2019-01-01 00:00:00 UTC ┆ Asia/Tokyo ┆ 2019-01-01 09:00:00 │
#> │ 4.0 ┆ 2019-01-01 20:00:00 UTC ┆ Asia/Tokyo ┆ 2019-01-02 05:00:00 │
#> └─────┴─────────────────────────┴────────────┴─────────────────────┘

Created on 2024-03-03 with reprex v2.0.2

@eitsupi eitsupi added the enhancement New feature or request label Mar 3, 2024
@etiennebacher
Copy link
Collaborator

Yeah, not ideal indeed, but how do they do that in py-polars?

@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 5, 2024

Yeah, not ideal indeed, but how do they do that in py-polars?

There is no way to do this in Python, and I believe we would need to use the iterator.
(The clock package is very good at handling this in a vectorized way)

@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 5, 2024

@eitsupi eitsupi changed the title <GroupBy>$to_list() <GroupBy>$partition_by() or something similar Mar 5, 2024
@eitsupi eitsupi changed the title <GroupBy>$partition_by() or something similar <GroupBy>$to_list() or something similar Mar 5, 2024
@eitsupi eitsupi self-assigned this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants