Releases: pola-rs/r-polars
v0.17.0
Breaking changes
- Updated rust-polars to unreleased version (> 0.40.0) (#1104, #1110, #1117, #1124):
- In
$join()
, there is a new argumentcoalesce
and thehow
options now accept"full"
instead of"outer"
and"outer_coalesce"
. $top_k()
and$bottom_k()
gain three argumentsnulls_last
,maintain_order
andmultithreaded
.- All
$rolling_*()
functions lose the argumentsby
,closed
andwarn_if_unsorted
. Rolling computations based onby
must be made via the correspondingrolling_*_by()
, e.grolling_mean_by()
instead ofrolling_mean(by =)
(#1115). pl$scan_parquet()
andpl$read_parquet()
gain an argumentglob
which defaults toTRUE
. Set it toFALSE
to avoid considering*
as a globing pattern.$is_not_nan()
on anull
value (NA
in R) now returnsnull
. Previously, it returnedTRUE
.- In
$reshape()
, argumentdims
is renameddimensions
and there is a new argumentnested_type
specifying if the output should be of type List or Array. - In
$value_counts()
, all arguments must be named and there is a new argumentname
to specify the name of the output. - In all functions accepting optimization parameter (such as
projection_pushdown
), there is a new parametercluster_with_columns
to combine sequential independent calls to$with_columns()
. $str$explode()
is removed.- The
check_sorted
argument is removed from$rolling()
and$group_by_dynamic()
. Sortedness is now verified in a quick manner, so this argument is no longer needed (pola-rs/polars#16494). $name$map()
stacks on Linux, so this method is deprecated and the document is removed. Please use other methods like<LazyFrame>$rename(<function>)
instead (#1123).
- In
- As warned in v0.16.0, the order of arguments in
pl$Series
is changed (#1071). The first argument is nowname
, and the second argument isvalues
. $to_struct()
on an Expr is removed. This method is now only available forSeries
,DataFrame
, and in the$list
and$arr
subnamespaces. For example,pl$col("a", "b", "c")$to_struct()
should be replaced withpl$struct(c("a", "b", "c"))
(#1092).pl$Struct()
now only accepts named inputs and objects of classRPolarsField
. For example,pl$Struct(pl$Boolean)
doesn't work anymore and should be named likepl$Struct(a = pl$Boolean)
(#1053).- In
$all()
and$any()
, the argumentdrop_nulls
is renamedignore_nulls
, and this argument must be named (#1050). - New method
$struct$with_fields()
(#1109) and new functionpl$field()
to be used in expressions in$struct$with_fields()
(#1113). - New methods for
RPolarsDataType
:$is_enum()
,$is_categorical()
,$is_known()
,$is_string()
,$contains_views()
,$contains_categorical()
(#1112). - In
$dt$combine()
, the argumentstm
andtu
are renamedtime
andtime_unit
(#1116). - The default value of the
rechunk
argument ofpl$concat()
is changed fromTRUE
toFALSE
(#1125). - In
$rename()
for LazyFrame and DataFrame, key-value pairs of names are changed toold_name = "new_name"
instead ofnew_name = "old_name"
(#1129). - In
$rename()
for LazyFrame and DataFrame, no argument is not allowed (#1129). - In all
$rolling_*()
functions, the argumentscenter
andddof
must be named (#1115).
New features
- Allow specify a function in
$rename()
for LazyFrame and DataFrame. They are equivalent topolars.LazyFrame.rename(mapping: Callable[[str], str])
orpolars.DataFrame.rename(mapping: Callable[[str], str])
in Python Polars (#1122, #1129).
Full Changelog: v0.16.4...v0.17.0
lib-v0.40.0
Add `$rolling_*_by()` expressions (#1115) Co-authored-by: eitsupi <ts1s1andn@gmail.com>
v0.16.4
New features
pl$read_ipc()
can read a raw vector of Apache Arrow IPC file (#1072).- New method
<DataFrame>$to_raw_ipc()
to serialize a DataFrame to a raw vector of Apache Arrow IPC file format (#1072). - New method
<LazyFrame>$serialize()
to serialize a LazyFrame to a character vector of JSON representation (#1073). - New function
pl$deserialize_lf()
to deserialize a LazyFrame from a character vector of JSON representation (#1073). - New methods
$str$head()
and$str$tail()
(#1074). - New S3 methods
nanoarrow::as_nanoarrow_array_stream()
andnanoarrow::infer_nanoarrow_schema()
forRPolarsSeries
(#1076). - New method
$dt$is_leap_year()
(#1077). as_polars_df()
andas_polars_series()
supportsarrow::RecordBatchReader
(#1078).- The new
experimental
argument foras_polars_df(<ArrowTabular>)
,as_polars_df(<RecordBatchReader>)
,as_polars_series(<nanoarrow_array_stream>)
, andas_polars_df(<nanoarrow_array_stream>)
(#1078).
Ifexperimental = TRUE
, these functions switch to use the Arrow C stream interface internally.
At this point, the performance is degraded under the expected use cases, so the default is set toexperimental = FALSE
.
Full Changelog: v0.16.3...v0.16.4
lib-v0.39.3
feat: import_stream internal method for Series to support Arrow C str… …eam interface (#1078)
v0.16.3
New features
- New method
<SQLContext>$register_globals()
(#1064). - New experimental method
$sql()
for DataFrame and LazyFrame (#1065).
Miscellaneous
- Move the API document website to the new place (#1067, #1068).
Access to the old website is set to redirect to the top page of the new website.- Old URL:
https://rpolars.github.io/
- New URL:
https://pola-rs.github.io/r-polars/
- Old URL:
Full Changelog: v0.16.2...v0.16.3
v0.16.2
New features
$cut()
and$qcut()
to bin continuous values into discrete categories (#1057).pl$scan_parquet()
andpl$read_parquet()
can read data from the internet by specifying a URL to the first argument (#1056, @andyquinterom).pl$scan_parquet()
andpl$read_parquet()
gain an argumentstorage_options
to scan/read data via cloud storage providers (GCP, AWS, Azure). Note that this support is experimental (#1056, @andyquinterom).- Add support for the
Enum
datatype viapl$Enum()
(#1061).
Bug fixes
- In some read/scan functions, downloading files could fail if the URL was too long. This is now fixed (#1049, @DyfanJones).
New Contributors
- @DyfanJones made their first contribution in #1049
- @andyquinterom made their first contribution in #1056
Full Changelog: v0.16.1...v0.16.2
lib-v0.39.2
ci: exclude R devel on windows from binary library check step (#1062)
v0.16.1
This is a small hot-fix release to update dependent Rust polars to 0.39.1 (#1042).
Also, there are some updates.
Bug fixes
$len()
now correctly includesnull
values in the count (#1044).
Other improvements
$arr$max()
and$arr$min()
work without thenightly
feature (#1042).
Full Changelog: v0.16.0...v0.16.1
lib-v0.39.1
fix: `$len()` should also count `null` values (#1044)
v0.16.0
Breaking changes
-
R objects inside an R list are now converted to Polars data types via
as_polars_series()
(#1021, #1022, #1023). For example, up to polars 0.15.1,
a list containing a data.frame with a column of{clock}
naive-time class
was converted to a nested List type of Float64:data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌──────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[list[list[f64]]] │ #> ╞══════════════════════════╡ #> │ [[[2.1475e9], [7305.0]]] │ #> └──────────────────────────┘
From 0.16.0, nested types are correctly converted, so that will be
a List type of Struct type containing a Datetime type.data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌─────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[struct[1]] │ #> ╞═════════════════════════╡ #> │ [{1990-01-01 00:00:00}] │ #> └─────────────────────────┘
-
Several functions have been rewritten to match the behavior of Python Polars.
There are four types of changes: i) change in argument names, ii) change in
the way arguments are passed (named or by position), iii) arguments are removed,
and iv) change in the default and accepted values. Those are addressed separately
below.-
Change in argument names:
- In
$reshape()
, thedims
argument is renamed todimensions
(#1019). - In
pl$read_*
andpl$scan_*
functions, the first argument is now
source
(#935). - In
pl$Series()
, the argumentx
is renamedvalues
(#933). - In
<DataFrame>$write_*
functions, the first argument is nowfile
(#935). - In
<LazyFrame>$sink_*
functions, the first argument is nowpath
(#935). - In
<LazyFrame>$sink_ipc()
, the argumentmemmap
is renamed tomemory_map
(#1032). - In
<DataFrame>$rolling()
,<LazyFrame>$rolling()
,<DataFrame>$group_by_dynamic()
and<LazyFrame>$group_by_dynamic()
, theby
argument is renamed to
group_by
(#983). - In
$dt$convert_time_zone()
and$dt$replace_time_zone()
, thetz
argument is renamed totime_zone
(#944). - In
$str$strptime()
, the argumentdatatype
is renamed todtype
(#939). - In
$str$to_integer()
(renamed from$str$parse_int()
), argumentradix
is
renamed tobase
(#1038).
- In
-
Change in the way arguments are passed:
-
In all input/output functions, all arguments except the first argument
must be named arguments (#935). -
In
<DataFrame>$rolling()
and<DataFrame>$group_by_dynamic()
, all
arguments exceptindex_column
must be named arguments (#983). -
In
$unique()
forDataFrame
andLazyFrame
, argumentskeep
and
maintain_order
must be named (#953). -
In
$bin$decode()
, thestrict
argument must be a named argument (#980). -
In
$dt$replace_time_zone()
, all arguments excepttime_zone
must be named
arguments (#944). -
In
$str$contains()
, the argumentsliteral
andstrict
must be named
(#982). -
In
$str$contains_any()
, theascii_case_insensitive
argument must be
named (#986). -
In
$str$count_matches()
,$str$replace()
and$str$replace_all()
,
theliteral
argument must be named (#987). -
In
$str$strptime()
,$str$to_date()
,$str$to_datetime()
, and
$str$to_time()
, all arguments (except the first one) must be named (#939). -
In
$str$to_integer()
(renamed from$str$parse_int()
), all arguments
must be named (#1038). -
In
pl$date_range()
, the argumentsclosed
,time_unit
, andtime_zone
must be named (#950). -
In
$set_sorted()
and$sort_by()
, argumentdescending
must be named
(#1034). -
In
pl$Series()
, using positional arguments throws a warning, since the
argument positions will be changed in the future (#966).# polars 0.15.1 or earlier # The first argument is `x`, the second argument is `name`. pl$Series(1:3, "foo") # The code above will warn in 0.16.0 # Use named arguments to silence the warning. pl$Series(values = 1:3, name = "foo") pl$Series(name = "foo", values = 1:3) # polars 0.17.0 or later (future version) # The first argument is `name`, the second argument is `values`. pl$Series("foo", 1:3)
This warning can also be silenced by replacing
pl$Series(<values>, <name>)
byas_polars_series(<values>, <name>)
.
-
-
Arguments removed:
- The argument
columns
in$drop()
is removed.$drop()
now accepts
several character scalars, such as$drop("a", "b", "c")
(#912). - In
pl$col()
, thename
argument is removed, and the...
argument no
longer accepts a list of characters andRPolarsSeries
class objects (#923). - In
pl$date_range()
, the unused argument (not working in recent versions)
explode
is removed. (#950).
- The argument
-
Change in arguments default and accepted values:
- In
pl$Series()
, the argumentvalues
has a new default valueNULL
(#966). - In
$unique()
forDataFrame
andLazyFrame
, argumentkeep
has a new
default value"any"
(#953). - In rolling aggregation functions (such as
$rolling_mean()
), the default
value of argumentclosed
now isNULL
. Usingclosed
with a fixed
window_size
now throws an error (#937). - In
pl$date_range()
, the argumentend
must be specified and the default
value ofinterval
is changed to"1d"
. The argumentsstart
andend
no longer accept numeric values (#950). - In
pl$scan_parquet()
, the default value of the argumentrechunk
is
changed fromTRUE
toFALSE
(#1033). - In
pl$scan_parquet()
andpl$read_parquet()
, the argumentparallel
only accepts"auto"
,"columns"
,"row_groups"
, and"none"
.
Previously, it also accepted upper-case notation of"auto"
,"columns"
,
"none"
, and"RowGroups"
instead of"row_groups"
(#1033). - In
$str$to_integer()
(renamed from$str$parse_int()
), the default
value ofbase
is changed from2
to10
(#1038).
- In
-
-
The usage of
pl$date_range()
to create a range ofDatetime
data type is
deprecated.pl$date_range()
will always create a range ofDate
data type
in the future. Usepl$datetime_range()
if you want to create a range of
Datetime
instead (#950). -
<DataFrame>$get_columns()
now returns an unnamed list instead of a named
list (#991). -
Removed
$argsort()
which was an old alias for$arg_sort()
(#930). -
Removed
pl$expr_to_r()
which was an alias for$to_r()
(#938). -
<Series>$to_r_list()
is renamed<Series>$to_list()
(#938). -
Removed
<Series>$to_r_vector()
which was an old alias for
<Series>$to_vector()
(#938). -
Removed
<Expr>$rep_extend()
, which was an experimental method created at the
early stage of this package and does not exist in other language APIs (#1028). -
The following deprecated functions are now removed:
pl$threadpool_size()
,
<DataFrame>$with_row_count()
,<LazyFrame>$with_row_count()
(#965). -
In
$group_by_dynamic()
, the first datapoint is always preserved (#1034). -
$str$parse_int()
is renamed to$str$to_integer()
(#1038).
New features
-
New functions:
pl$arg_sort_by()
(#929).pl$arg_where()
to get the indices that match a condition (#922).pl$datetime()
,pl$date()
, andpl$time()
to easily create Expr of class
datetime, date, and time via columns and literals (#918).pl$datetime_range()
,pl$date_ranges()
andpl$datetime_ranges()
(#950, #962).pl$int_range()
andpl$int_ranges()
(#968)pl$mean_horizontal()
(#959)pl$read_ipc()
(#1033).is_polars_dtype()
(#927).
-
New methods:
<LazyFrame>$to_dot()
to print the query plan of a LazyFrame with graphviz
dot syntax (#928).$clear()
forDataFrame
,LazyFrame
, andSeries
(#1004).$item()
forDataFrame
andSeries
(#992).$select_seq()
and$with_columns_seq()
forDataFrame
andLazyFrame
(#1003).$arr$to_list()
(#1018).$str$extract_groups()
(#979).$str$find()
(#985).<DataFrame>$write_ipc()
(#1032).RPolarsDataType
gains several methods to check the datatype, such as
$is_integer()
,$is_null()
or$is_list()
(#1036).
-
New arguments or argument values:
ambiguous
can now take the value"null"
to convert ambigous datetimes to
null values (#937).n
in$str$replace()
(#987).non_existent
in$dt$replace_time_zone()
to specify what should happen
when a datetime doesn't exist.mapping_strategy
in$over()
(#984, #988).raise_if_undetermined
in$meta$output_name()
(#961).null_on_oob
in$arr$get()
and$list$get()
to determine what happens
when the index is out of bounds (#1034).nulls_last
,multithreaded
, andmaintain_order
in$sort_by()
(#1034).
-
Other:
Bug fixes
- The
join_nulls
and ...