Skip to content

Python Polars 0.20.16

Compare
Choose a tag to compare
@github-actions github-actions released this 18 Mar 18:33
ec8f5e3

🚀 Performance improvements

  • add new when-then-otherwise kernels (#15089)
  • Coerce sorted flag of unit arrays during concat (#15104)
  • Use sorted flag for (first|last)_non_null (#15050)
  • OOC sort improvements (#14994)

✨ Enhancements

  • improved dtype inference/refinement for read_database results (#15126)
  • raise if both closed and by are passed to rolling_* aggregations (#15108)
  • raise informative error for rolling_* aggs with by of invalid dtype (#15088)
  • add non_existent arg to replace_time_zone (#15062)
  • Support single nested row encodings (#15105)
  • make ooc sort configurable (#15084)
  • Make register_plugin a standalone function and include shared lib discovery (#14804)
  • Expose infer_schema_length parameter on read_database (#15076)
  • Async parquet: Decode parquet on a blocking thread pool (#15083)
  • let "ambiguous" take "null" value (#14961)
  • Raise informative error message when join would introduce duplicate column name (#15042)
  • Allow cast of decimal to boolean (#15015)
  • Add strict parameter to DataFrame constructor to allow non-strict construction (#15034)
  • Support Array statistics in parquet (#15031)
  • Support decimal groupby (#15000)
  • Add thread names to rayon thread pool (#15024)
  • Support decimal uniq (#15001)
  • expose timings in verbose state of OOC sort (#14979)

🐞 Bug fixes

  • Support BinaryView in row decoder to prevent a panic in streaming group by (#15117)
  • Binview chunked gather; don't modify inlined view (#15124)
  • Fix chunked_id gather for binview buffers (#15123)
  • Don't cache HTTP object stores as they maintain URL state (#15121)
  • Output u32 when sum_horizontal provided with single boolean column (#15114)
  • Propagate error instead of panicking when calling product on an invalid type (#15093)
  • Raise error when casting Array to different width (#14995)
  • Fix file scan bugs for ipc, csv and parquet that occur with combinations of glob paths, row indices and predicates (#15065)
  • Incorrectly preserved sorted flag when concatenating sorted series containing nulls (#15082)
  • Return largest non-NaN value for max() on sorted float arrays if it exists instead of NaN (#15060)
  • return NaN for all-NaN min/max (#15066)
  • Prevent "index out of range for slice" error in parquet reader (#15021)
  • Respect nulls_last in streaming sort (#15061)
  • Fix Series construction from nested list with mixed data types (#15046)
  • Don't count nulls in streaming count agg (#15051)
  • agg_list on decimal lost scale (#15054)
  • Block predicate pushdown on equality that are use in join (#15055)
  • Enum equality based on categories (#15053)
  • Don't panic in string_addition_to_linear_concat (#15006)
  • CSV do utf8-validation after escaping fields (#15004)
  • Use primitive constructors to create a Series of lists when dtype is provided (#15002)
  • replace_time_zone with single-null-element "ambiguous" was panicking (#14971)

📖 Documentation

  • Update write_database code blocks in user guide (#15106)
  • Add missing docstring examples in the Struct namespace (#15071)
  • Improve API reference landing page (#14888)
  • improve join_asof example (#14993)
  • Fix inadvertent swap of new and old parameters in replace description (#15019)

🛠️ Other improvements

  • Extend and speed up scan tests (#15127)
  • Add parameterized-scan-tests (#15057)
  • Simplify streaming execution (#15039)
  • Ensure we hit the spilled source path in ooc sort test (#15010)
  • Refactor constructor code (#15009)
  • fix features (#14977)
  • Revert pinning PyPI publish action (#14975)

Thank you to all our contributors for making this release possible!
@JackRolfe, @MKisilyov, @MarcoGorelli, @alexander-beedie, @c-peters, @flisky, @jqnatividad, @mcrumiller, @mickvangelderen, @nameexhaustion, @orlp, @petrosbar, @ritchie46, @stinodego and @trueb2