Skip to content

v0.16.0

Compare
Choose a tag to compare
@eitsupi eitsupi released this 15 Apr 14:08

Breaking changes

  • Rust polars is updated to 0.39.0 (#937, #1034).

  • R objects inside an R list are now converted to Polars data types via
    as_polars_series() (#1021, #1022, #1023). For example, up to polars 0.15.1,
    a list containing a data.frame with a column of {clock} naive-time class
    was converted to a nested List type of Float64:

    data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
    pl$select(
      nested_data = pl$lit(list(data))
    )
    #> shape: (1, 1)
    #> ┌──────────────────────────┐
    #> │ nested_data              │
    #> │ ---                      │
    #> │ list[list[list[f64]]]    │
    #> ╞══════════════════════════╡
    #> │ [[[2.1475e9], [7305.0]]] │
    #> └──────────────────────────┘

    From 0.16.0, nested types are correctly converted, so that will be
    a List type of Struct type containing a Datetime type.

    data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
    pl$select(
      nested_data = pl$lit(list(data))
    )
    #> shape: (1, 1)
    #> ┌─────────────────────────┐
    #> │ nested_data             │
    #> │ ---                     │
    #> │ list[struct[1]]         │
    #> ╞═════════════════════════╡
    #> │ [{1990-01-01 00:00:00}] │
    #> └─────────────────────────┘
  • Several functions have been rewritten to match the behavior of Python Polars.
    There are four types of changes: i) change in argument names, ii) change in
    the way arguments are passed (named or by position), iii) arguments are removed,
    and iv) change in the default and accepted values. Those are addressed separately
    below.

    1. Change in argument names:

      • In $reshape(), the dims argument is renamed to dimensions (#1019).
      • In pl$read_* and pl$scan_* functions, the first argument is now
        source (#935).
      • In pl$Series(), the argument x is renamed values (#933).
      • In <DataFrame>$write_* functions, the first argument is now file (#935).
      • In <LazyFrame>$sink_* functions, the first argument is now path (#935).
      • In <LazyFrame>$sink_ipc(), the argument memmap is renamed to memory_map (#1032).
      • In <DataFrame>$rolling(), <LazyFrame>$rolling(), <DataFrame>$group_by_dynamic()
        and <LazyFrame>$group_by_dynamic(), the by argument is renamed to
        group_by (#983).
      • In $dt$convert_time_zone() and $dt$replace_time_zone(), the tz
        argument is renamed to time_zone (#944).
      • In $str$strptime(), the argument datatype is renamed to dtype (#939).
      • In $str$to_integer() (renamed from $str$parse_int()), argument radix is
        renamed to base (#1038).
    2. Change in the way arguments are passed:

      • In all input/output functions, all arguments except the first argument
        must be named arguments (#935).

      • In <DataFrame>$rolling() and <DataFrame>$group_by_dynamic(), all
        arguments except index_column must be named arguments (#983).

      • In $unique() for DataFrame and LazyFrame, arguments keep and
        maintain_order must be named (#953).

      • In $bin$decode(), the strict argument must be a named argument (#980).

      • In $dt$replace_time_zone(), all arguments except time_zone must be named
        arguments (#944).

      • In $str$contains(), the arguments literal and strict must be named
        (#982).

      • In $str$contains_any(), the ascii_case_insensitive argument must be
        named (#986).

      • In $str$count_matches(), $str$replace() and $str$replace_all(),
        the literal argument must be named (#987).

      • In $str$strptime(), $str$to_date(), $str$to_datetime(), and
        $str$to_time(), all arguments (except the first one) must be named (#939).

      • In $str$to_integer() (renamed from $str$parse_int()), all arguments
        must be named (#1038).

      • In pl$date_range(), the arguments closed, time_unit, and time_zone
        must be named (#950).

      • In $set_sorted() and $sort_by(), argument descending must be named
        (#1034).

      • In pl$Series(), using positional arguments throws a warning, since the
        argument positions will be changed in the future (#966).

        # polars 0.15.1 or earlier
        # The first argument is `x`, the second argument is `name`.
        pl$Series(1:3, "foo")
        
        # The code above will warn in 0.16.0
        # Use named arguments to silence the warning.
        pl$Series(values = 1:3, name = "foo")
        pl$Series(name = "foo", values = 1:3)
        
        # polars 0.17.0 or later (future version)
        # The first argument is `name`, the second argument is `values`.
        pl$Series("foo", 1:3)

        This warning can also be silenced by replacing pl$Series(<values>, <name>)
        by as_polars_series(<values>, <name>).

    3. Arguments removed:

      • The argument columns in $drop() is removed. $drop() now accepts
        several character scalars, such as $drop("a", "b", "c") (#912).
      • In pl$col(), the name argument is removed, and the ... argument no
        longer accepts a list of characters and RPolarsSeries class objects (#923).
      • In pl$date_range(), the unused argument (not working in recent versions)
        explode is removed. (#950).
    4. Change in arguments default and accepted values:

      • In pl$Series(), the argument values has a new default value NULL
        (#966).
      • In $unique() for DataFrame and LazyFrame, argument keep has a new
        default value "any" (#953).
      • In rolling aggregation functions (such as $rolling_mean()), the default
        value of argument closed now is NULL. Using closed with a fixed
        window_size now throws an error (#937).
      • In pl$date_range(), the argument end must be specified and the default
        value of interval is changed to "1d". The arguments start and end
        no longer accept numeric values (#950).
      • In pl$scan_parquet(), the default value of the argument rechunk is
        changed from TRUE to FALSE (#1033).
      • In pl$scan_parquet() and pl$read_parquet(), the argument parallel
        only accepts "auto", "columns", "row_groups", and "none".
        Previously, it also accepted upper-case notation of "auto", "columns",
        "none", and "RowGroups" instead of "row_groups" (#1033).
      • In $str$to_integer() (renamed from $str$parse_int()), the default
        value of base is changed from 2 to 10 (#1038).
  • The usage of pl$date_range() to create a range of Datetime data type is
    deprecated. pl$date_range() will always create a range of Date data type
    in the future. Use pl$datetime_range() if you want to create a range of
    Datetime instead (#950).

  • <DataFrame>$get_columns() now returns an unnamed list instead of a named
    list (#991).

  • Removed $argsort() which was an old alias for $arg_sort() (#930).

  • Removed pl$expr_to_r() which was an alias for $to_r() (#938).

  • <Series>$to_r_list() is renamed <Series>$to_list() (#938).

  • Removed <Series>$to_r_vector() which was an old alias for
    <Series>$to_vector() (#938).

  • Removed <Expr>$rep_extend(), which was an experimental method created at the
    early stage of this package and does not exist in other language APIs (#1028).

  • The following deprecated functions are now removed: pl$threadpool_size(),
    <DataFrame>$with_row_count(), <LazyFrame>$with_row_count() (#965).

  • In $group_by_dynamic(), the first datapoint is always preserved (#1034).

  • $str$parse_int() is renamed to $str$to_integer() (#1038).

New features

  • New functions:

    • pl$arg_sort_by() (#929).
    • pl$arg_where() to get the indices that match a condition (#922).
    • pl$datetime(), pl$date(), and pl$time() to easily create Expr of class
      datetime, date, and time via columns and literals (#918).
    • pl$datetime_range(), pl$date_ranges() and pl$datetime_ranges() (#950, #962).
    • pl$int_range() and pl$int_ranges() (#968)
    • pl$mean_horizontal() (#959)
    • pl$read_ipc() (#1033).
    • is_polars_dtype() (#927).
  • New methods:

    • <LazyFrame>$to_dot() to print the query plan of a LazyFrame with graphviz
      dot syntax (#928).
    • $clear() for DataFrame, LazyFrame, and Series (#1004).
    • $item() for DataFrame and Series (#992).
    • $select_seq() and $with_columns_seq() for DataFrame and LazyFrame
      (#1003).
    • $arr$to_list() (#1018).
    • $str$extract_groups() (#979).
    • $str$find() (#985).
    • <DataFrame>$write_ipc() (#1032).
    • RPolarsDataType gains several methods to check the datatype, such as
      $is_integer(), $is_null() or $is_list() (#1036).
  • New arguments or argument values:

    • ambiguous can now take the value "null" to convert ambigous datetimes to
      null values (#937).
    • n in $str$replace() (#987).
    • non_existent in $dt$replace_time_zone() to specify what should happen
      when a datetime doesn't exist.
    • mapping_strategy in $over() (#984, #988).
    • raise_if_undetermined in $meta$output_name() (#961).
    • null_on_oob in $arr$get() and $list$get() to determine what happens
      when the index is out of bounds (#1034).
    • nulls_last, multithreaded, and maintain_order in $sort_by() (#1034).
  • Other:

    • pl$Series() now calls as_polars_series() internally, so it can convert
      more classes to Series properly (#1015).
    • Export the Duration datatype (#955).
    • New active binding <Series>$struct$fields (#1002).
    • All $write_*() and $sink_*() functions now invisibly return the input
      data (#1039).

Bug fixes

  • The join_nulls and validate arguments of <DataFrame>$join() now work
    correctly (#945).
  • We said in the changelog of 0.14.0 that all row_count_* args in I/O functions
    were renamed row_index_*, but this change was not made for CSV and IPC
    functions. This renaming is now made (#964).
  • Evaluating Series methods from Expr inside functions now works correctly (#973).
    Thanks @Yunuuuu for the report.
  • The dependent crate extendr-api is updated to 2024-03-31 unreleased version (#995).
    The issue that the R session crashes when a panic occurs in the Rust side is resolved.
    Thanks @CGMossa for the upstream fix.
  • The parallel argument of pl$scan_parquet() and pl$read_parquet() now works
    correctly (#1033). Previously, any correct value was treated as "auto".

New Contributors

Full Changelog: v0.15.1...v0.16.0