Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump rust-polars to 0.41 #1147

Merged
merged 51 commits into from
Jul 1, 2024
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
408b82f
init
etiennebacher Jun 23, 2024
9949b79
fix read_json
etiennebacher Jun 23, 2024
071a5ba
fix feature names [skip ci]
etiennebacher Jun 23, 2024
b545444
remove offset arg from dt_truncate and dt_round [skip ci]
etiennebacher Jun 23, 2024
f983eb3
add arg normalize in value_counts [skip ci]
etiennebacher Jun 23, 2024
7fe3b26
fix [skip ci]
etiennebacher Jun 23, 2024
a46d2f7
remove most args of top_k and bottom_k
etiennebacher Jun 23, 2024
48dac1a
remove time related args from date_range functions [skip ci]
etiennebacher Jun 23, 2024
d5c8483
add normalize arg in value_counts for series too [skip ci]
etiennebacher Jun 23, 2024
03289f4
fix series arithmetic [skip ci]
etiennebacher Jun 23, 2024
fa4f382
split replace and replace_strict [skip ci]
etiennebacher Jun 23, 2024
ab2b68c
remove unnecessary unsafe tag [skip ci]
etiennebacher Jun 23, 2024
f45ecbb
add order_by args to over [skip ci]
etiennebacher Jun 23, 2024
b58b005
fix compil for map_batches variants
etiennebacher Jun 23, 2024
cc9e6ca
rename str_concat to str_join [skip ci]
etiennebacher Jun 23, 2024
e9ff153
robj_to_statistics_options [skip ci]
etiennebacher Jun 23, 2024
ea0b2f1
fix compil for parquet [skip ci]
etiennebacher Jun 23, 2024
51e612f
fix unpivot [skip ci]
etiennebacher Jun 23, 2024
934781a
bump to 0.41.1 [skip ci]
etiennebacher Jun 23, 2024
1870242
Merge branch 'main' into rust-polars-0.41
etiennebacher Jun 23, 2024
700afd6
remove args, rename funs [skip ci]
etiennebacher Jun 23, 2024
6ecab19
more fixes [skip ci]
etiennebacher Jun 23, 2024
a52f1ec
fix handling of stats for parquet [skip ci]
etiennebacher Jun 23, 2024
381a756
bunch of fixes [skip ci]
etiennebacher Jun 23, 2024
7b174f5
snapshot
etiennebacher Jun 23, 2024
bfb59c7
more fixes [skip ci]
etiennebacher Jun 23, 2024
6f619a7
fix slice, test arg "order_by" in $over()
etiennebacher Jun 23, 2024
7dcc8af
minor
etiennebacher Jun 23, 2024
b9f0972
bump rust-version and crate version
etiennebacher Jun 23, 2024
0771056
fix vignettes
etiennebacher Jun 23, 2024
0a21626
Merge branch 'main' into rust-polars-0.41
etiennebacher Jun 24, 2024
d3c8735
bump to 0.41.2
etiennebacher Jun 24, 2024
87e95f3
fix vignette
etiennebacher Jun 24, 2024
a97daa3
try to fix userguide
etiennebacher Jun 26, 2024
18216f5
news
etiennebacher Jun 26, 2024
9445c3d
Merge branch 'main' into rust-polars-0.41
etiennebacher Jun 26, 2024
44576fb
Merge branch 'rust-polars-0.41' of https://github.com/pola-rs/r-polar…
etiennebacher Jun 26, 2024
4f6752c
typo
etiennebacher Jun 26, 2024
2b03b35
whitespace
etiennebacher Jun 26, 2024
d5dff36
remove blank lines in news
etiennebacher Jun 26, 2024
2bf141c
tests for arg 'statistics'
etiennebacher Jun 30, 2024
b4befa8
uncomment map_batches tests
etiennebacher Jun 30, 2024
b810260
typo
etiennebacher Jun 30, 2024
6475353
fix incorrect use of date_range in docs
etiennebacher Jun 30, 2024
2006a5a
fix docs
etiennebacher Jun 30, 2024
f74f3c3
add utils::
etiennebacher Jun 30, 2024
403443b
check length of statistics
etiennebacher Jul 1, 2024
4ad8950
use ".r" instead of "comment" as chunk engine
etiennebacher Jul 1, 2024
6166d27
bump to unreleased version
etiennebacher Jul 1, 2024
74a40b5
fix for cross joins
etiennebacher Jul 1, 2024
99d330f
other fixes for joins
etiennebacher Jul 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,60 @@

## Polars R Package (development version)

Updated rust-polars to 0.41.2 (#1147).

### Breaking changes

- In `$n_chunks()`, the default value of `strategy` now is `"first"` (#1137).
-`$sample()` for Expr and DataFrame (#1136):
- `$sample()` for Expr and DataFrame (#1136):
- the argument `frac` is renamed `fraction`;
- all the arguments except `n` must be named;
- for the Expr method only, the first argument is now `n` (it was already the
case for the DataFrame method);
- for the Expr method only, the default value for `with_replacement` is now
`FALSE` (it was already the case for the DataFrame method).
- `$melt()` had several changes (#1147):
- `melt()` is renamed `$unpivot()`.
- Some arguments were renamed: `id_vars` is now `index`, `value_vars` is now
`on`.
- The order of arguments has changed: `on` is now first, then `index`. The
order of the other arguments hasn't changed. Note that `on` can be unnamed
but all the other arguments must be named.
- `pivot()` had several changes (#1147):
- The argument `columns` is renamed `on`.
- The order of arguments has changed: `on` is now first, then `index` and
`values`. The order of the other arguments hasn't changed. Note that `on`
can be unnamed but all the other arguments must be named.
- In `$write_parquet()` and `$sink_parquet()`, the default value of argument
`statistics` is now `TRUE` and can take other values than `TRUE/FALSE` (#1147).
- In `$dt$truncate()` and `$dt$round()`, the argument `offset` has been removed.
Use `$dt$offset_by()` after those functions instead (#1147).
- In `$top_k()` and `$bottom_k()` for `Expr`, the arguments `nulls_last`,
`maintain_order` and `multithreaded` have been removed. If any `null` values
are in the top/bottom `k` values, they will always be positioned last (#1147).
- `$replace()` has been split in two functions depending on the desired
behaviour (#1147):
- `$replace()` recodes some values in the column, leaving all other values
unchanged. Compared to the previous version, it doesn't use the arguments
`default` and `return_dtype` anymore.
- `$replace_strict()` replaces all values by different values. If a value
doesn't have a specific mapping, it is replaced by the `default` value.
- `$str$concat()` is deprecated, use `$str$join()` (with the same arguments)
instead (#1147).
- In `pl$date_range()` and `pl$date_ranges()`, the arguments `time_unit` and
`time_zone` have been removed. They were deprecated in previous versions
(#1147).


### New features

- New method `$has_nulls()` (#1133).
- New method `$list$explode()` (#1139).
- `$over()` gains a new argument `order_by` to specify the order of values
within each group. This is useful when the operation depends on the order of
values, such as `$shift()` (#1147).
- `$value_counts()` gains an argument `normalize` to give relative frequencies
of unique values instead of their count (#1147).

## Polars R Package 0.17.0

Expand Down
39 changes: 21 additions & 18 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
Expand Up @@ -1490,7 +1490,7 @@ DataFrame_join_asof = function(



#' @inherit LazyFrame_melt
#' @inherit LazyFrame_unpivot
#' @keywords DataFrame
#'
#' @return A new `DataFrame`
Expand All @@ -1502,25 +1502,26 @@ DataFrame_join_asof = function(
#' c = c(2, 4, 6),
#' d = c(7, 8, 9)
#' )
#' df$melt(id_vars = "a", value_vars = c("b", "c", "d"))
DataFrame_melt = function(
id_vars = NULL,
value_vars = NULL,
#' df$unpivot(index = "a", on = c("b", "c", "d"))
DataFrame_unpivot = function(
on = NULL,
...,
index = NULL,
variable_name = NULL,
value_name = NULL) {
.pr$DataFrame$melt(
self, id_vars %||% character(), value_vars %||% character(),
.pr$DataFrame$unpivot(
self, on %||% character(), index %||% character(),
value_name, variable_name
) |> unwrap("in $melt( ): ")
) |> unwrap("in $unpivot( ): ")
}



#' Pivot data from long to wide
#' @param values Column values to aggregate. Can be multiple columns if the
#' `columns` arguments contains multiple columns as well.
#' `on` arguments contains multiple columns as well.
#' @param index One or multiple keys to group by.
#' @param columns Name of the column(s) whose values will be used as the header
#' @param on Name of the column(s) whose values will be used as the header
#' of the output DataFrame.
#' @param ... Not used.
#' @param aggregate_function One of:
Expand All @@ -1544,7 +1545,7 @@ DataFrame_melt = function(
#' df
#'
#' df$pivot(
#' values = "baz", index = "foo", columns = "bar"
#' values = "baz", index = "foo", on = "bar"
#' )
#'
#' # Run an expression as aggregation function
Expand All @@ -1557,15 +1558,15 @@ DataFrame_melt = function(
#'
#' df$pivot(
#' index = "col1",
#' columns = "col2",
#' on = "col2",
#' values = "col3",
#' aggregate_function = pl$element()$tanh()$mean()
#' )
DataFrame_pivot = function(
values,
index,
columns,
on,
...,
index,
values,
aggregate_function = NULL,
maintain_order = TRUE,
sort_columns = FALSE,
Expand All @@ -1586,7 +1587,7 @@ DataFrame_pivot = function(
)) |>
# run pivot when valid aggregate_expr
and_then(\(aggregate_expr) .pr$DataFrame$pivot_expr(
self, index, columns, values, maintain_order, sort_columns, aggregate_expr, separator
self, on, index, values, maintain_order, sort_columns, aggregate_expr, separator
)) |>
# unwrap and add method context name
unwrap("in $pivot():")
Expand Down Expand Up @@ -1736,7 +1737,7 @@ DataFrame_describe = function(percentiles = c(.25, .75), interpolation = "neares
)$
unnest("fields")$
drop("column")$
pivot(index = "statistic", columns = "variable", values = "column_0")$
pivot(index = "statistic", on = "variable", values = "column_0")$
with_columns(statistic = pl$lit(metrics))
}) |>
uw()
Expand Down Expand Up @@ -2031,9 +2032,11 @@ DataFrame_write_parquet = function(
...,
compression = "zstd",
compression_level = 3,
statistics = FALSE,
statistics = TRUE,
row_group_size = NULL,
data_pagesize_limit = NULL) {
statistics = translate_statistics(statistics) |>
unwrap("in $write_parquet():")
.pr$DataFrame$write_parquet(
self,
file,
Expand Down
88 changes: 29 additions & 59 deletions R/expr__datetime.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#' @description Divide the date/datetime range into buckets.
#' Each date/datetime is mapped to the start of its bucket.
#'
#' @param every string encoding duration see details.
#' @param offset optional string encoding duration see details.
#' @param every Either an Expr or a string indicating a column name or a
#' duration (see Details).
#'
#' @details The ``every`` and ``offset`` argument are created with the
#' the following string language:
Expand All @@ -20,22 +20,18 @@
#' These strings can be combined:
#' - 3d12h4m25s # 3 days, 12 hours, 4 minutes, and 25 seconds
#' @return Date/Datetime expr
#' @keywords ExprDT
#' @aliases (Expr)$dt$truncate
#' @examples
#' t1 = as.POSIXct("3040-01-01", tz = "GMT")
#' t2 = t1 + as.difftime(25, units = "secs")
#' s = pl$date_range(t1, t2, interval = "2s", time_unit = "ms")
#' s = pl$datetime_range(t1, t2, interval = "2s", time_unit = "ms")
#'
#' # use a dt namespace function
#' df = pl$DataFrame(datetime = s)$with_columns(
#' pl$col("datetime")$dt$truncate("4s")$alias("truncated_4s"),
#' pl$col("datetime")$dt$truncate("4s", offset("3s"))$alias("truncated_4s_offset_2s")
#' pl$col("datetime")$dt$truncate("4s")$alias("truncated_4s")
#' )
#' df
ExprDT_truncate = function(every, offset = NULL) {
offset = parse_as_polars_duration_string(offset, default = "0ns")
.pr$Expr$dt_truncate(self, every, offset) |>
ExprDT_truncate = function(every) {
every = parse_as_polars_duration_string(every, default = "0ns")
.pr$Expr$dt_truncate(self, every) |>
unwrap("in $dt$truncate()")
}

Expand All @@ -46,46 +42,20 @@ ExprDT_truncate = function(every, offset = NULL) {
#' Each date/datetime in the second half of the interval
#' is mapped to the end of its bucket.
#'
#' @inherit ExprDT_truncate params details return
#'
#' @param every string encoding duration see details.
#' @param offset optional string encoding duration see details.
#'
#' @details The ``every`` and ``offset`` arguments are created with the
#' following string language:
#' - 1ns # 1 nanosecond
#' - 1us # 1 microsecond
#' - 1ms # 1 millisecond
#' - 1s # 1 second
#' - 1m # 1 minute
#' - 1h # 1 hour
#' - 1d # 1 day
#' - 1w # 1 calendar week
#' - 1mo # 1 calendar month
#' - 1y # 1 calendar year
#' These strings can be combined:
#' - 3d12h4m25s # 3 days, 12 hours, 4 minutes, and 25 seconds
#'
#' This functionality is currently experimental and may
#' change without it being considered a breaking change.
#'
#' @return Date/Datetime expr
#' @keywords ExprDT
#' @aliases (Expr)$dt$round
#' @examples
#' t1 = as.POSIXct("3040-01-01", tz = "GMT")
#' t2 = t1 + as.difftime(25, units = "secs")
#' s = pl$date_range(t1, t2, interval = "2s", time_unit = "ms")
#' s = pl$datetime_range(t1, t2, interval = "2s", time_unit = "ms")
#'
#' # use a dt namespace function
#' df = pl$DataFrame(datetime = s)$with_columns(
#' pl$col("datetime")$dt$truncate("4s")$alias("truncated_4s"),
#' pl$col("datetime")$dt$truncate("4s", offset("3s"))$alias("truncated_4s_offset_2s")
#' pl$col("datetime")$dt$round("4s")$alias("rounded_4s")
#' )
#' df
ExprDT_round = function(every, offset = NULL) {
ExprDT_round = function(every) {
every = parse_as_polars_duration_string(every, default = "0ns")
offset = parse_as_polars_duration_string(offset, default = "0ns")
.pr$Expr$dt_round(self, every, offset) |>
.pr$Expr$dt_round(self, every) |>
unwrap("in $dt$round()")
}

Expand Down Expand Up @@ -370,7 +340,7 @@ ExprDT_ordinal_day = function() {
#' @aliases (Expr)$dt$hour
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' date = pl$datetime_range(
#' as.Date("2020-12-25"),
#' as.Date("2021-1-05"),
#' interval = "1d2h",
Expand All @@ -395,7 +365,7 @@ ExprDT_hour = function() {
#' @aliases (Expr)$dt$minute
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' date = pl$datetime_range(
#' as.Date("2020-12-25"),
#' as.Date("2021-1-05"),
#' interval = "1d5s",
Expand Down Expand Up @@ -556,7 +526,7 @@ ExprDT_epoch = function(tu = c("us", "ns", "ms", "s", "d")) {
#' @aliases (Expr)$dt$timestamp
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' date = pl$datetime_range(
#' start = as.Date("2001-1-1"),
#' end = as.Date("2001-1-3"),
#' interval = "1d1s"
Expand Down Expand Up @@ -585,7 +555,7 @@ ExprDT_timestamp = function(tu = c("ns", "us", "ms")) {
#' @aliases (Expr)$dt$with_time_unit
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' date = pl$datetime_range(
#' start = as.Date("2001-1-1"),
#' end = as.Date("2001-1-3"),
#' interval = "1d1s"
Expand Down Expand Up @@ -615,7 +585,7 @@ ExprDT_with_time_unit = function(tu = c("ns", "us", "ms")) {
#' @aliases (Expr)$dt$cast_time_unit
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' date = pl$datetime_range(
#' start = as.Date("2001-1-1"),
#' end = as.Date("2001-1-3"),
#' interval = "1d1s"
Expand All @@ -641,10 +611,10 @@ ExprDT_cast_time_unit = function(tu = c("ns", "us", "ms")) {
#' @return Expr of i64
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' date = pl$datetime_range(
#' as.POSIXct("2020-03-01", tz = "UTC"),
#' as.POSIXct("2020-05-01", tz = "UTC"),
#' "1mo"
#' "1mo1s"
#' )
#' )
#'
Expand Down Expand Up @@ -681,10 +651,10 @@ ExprDT_convert_time_zone = function(time_zone) {
#' @aliases (Expr)$dt$replace_time_zone
#' @examples
#' df1 = pl$DataFrame(
#' london_timezone = pl$date_range(
#' london_timezone = pl$datetime_range(
#' as.POSIXct("2020-03-01", tz = "UTC"),
#' as.POSIXct("2020-07-01", tz = "UTC"),
#' "1mo"
#' "1mo1s"
#' )$dt$convert_time_zone("Europe/London")
#' )
#'
Expand Down Expand Up @@ -729,10 +699,10 @@ ExprDT_replace_time_zone = function(
#' @return Expr of i64
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' date = pl$datetime_range(
#' start = as.Date("2020-3-1"),
#' end = as.Date("2020-5-1"),
#' interval = "1mo"
#' interval = "1mo1s"
#' )
#' )
#' df$select(
Expand Down Expand Up @@ -791,7 +761,7 @@ ExprDT_total_minutes = function() {
#'
#' @return Expr of i64
#' @examples
#' df = pl$DataFrame(date = pl$date_range(
#' df = pl$DataFrame(date = pl$datetime_range(
#' start = as.POSIXct("2020-1-1", tz = "GMT"),
#' end = as.POSIXct("2020-1-1 00:04:00", tz = "GMT"),
#' interval = "1m"
Expand All @@ -810,7 +780,7 @@ ExprDT_total_seconds = function() {
#'
#' @return Expr of i64
#' @examples
#' df = pl$DataFrame(date = pl$date_range(
#' df = pl$DataFrame(date = pl$datetime_range(
#' start = as.POSIXct("2020-1-1", tz = "GMT"),
#' end = as.POSIXct("2020-1-1 00:00:01", tz = "GMT"),
#' interval = "1ms"
Expand All @@ -829,7 +799,7 @@ ExprDT_total_milliseconds = function() {
#'
#' @return Expr of i64
#' @examples
#' df = pl$DataFrame(date = pl$date_range(
#' df = pl$DataFrame(date = pl$datetime_range(
#' start = as.POSIXct("2020-1-1", tz = "GMT"),
#' end = as.POSIXct("2020-1-1 00:00:01", tz = "GMT"),
#' interval = "1ms"
Expand All @@ -848,7 +818,7 @@ ExprDT_total_microseconds = function() {
#'
#' @return Expr of i64
#' @examples
#' df = pl$DataFrame(date = pl$date_range(
#' df = pl$DataFrame(date = pl$datetime_range(
#' start = as.POSIXct("2020-1-1", tz = "GMT"),
#' end = as.POSIXct("2020-1-1 00:00:01", tz = "GMT"),
#' interval = "1ms"
Expand Down Expand Up @@ -907,7 +877,7 @@ ExprDT_total_nanoseconds = function() {
#'
#' # the "by" argument also accepts expressions
#' df = pl$DataFrame(
#' dates = pl$date_range(
#' dates = pl$datetime_range(
#' as.POSIXct("2022-01-01", tz = "GMT"),
#' as.POSIXct("2022-01-02", tz = "GMT"),
#' interval = "6h", time_unit = "ms", time_zone = "GMT"
Expand All @@ -932,7 +902,7 @@ ExprDT_offset_by = function(by) {
#'
#'
#' @examples
#' df = pl$DataFrame(dates = pl$date_range(
#' df = pl$DataFrame(dates = pl$datetime_range(
#' as.Date("2000-1-1"),
#' as.Date("2000-1-2"),
#' "1h"
Expand Down
Loading
Loading