Skip to content

Python Polars 0.20.4

Compare
Choose a tag to compare
@github-actions github-actions released this 12 Jan 18:49
cdef09b

⚠️ Deprecations

  • Fix group keys in partition_by(as_dict=True) / GroupBy.__iter__ in some cases (#13646)
  • Rename row_count_name/row_count_offset parameters in IO functions to row_index_* (#13563)
  • Deprecate dt.datetime in favor of dt.replace_time_zone(None) (#13520)
  • Rename with_row_count to with_row_index (#13494)
  • Deprecate Expr.where in favor of filter (#13440)
  • Allow drop with no inputs as a no-op (#13460)

🚀 Performance improvements

  • elide parallelism restriction on generic rolling expressions (#13662)
  • ensure time groups are parallelized (#13660)
  • do not eagerly compute bitcount (#13562)
  • optimise SQL engine string concat (#13499)
  • Refactor expression parsing logic of predicates/constraints (#13468)
  • Represent Enum categories as Series (#13434)
  • remove lifetime requirement from CategoricalChunkedBuilder (#13319)

✨ Enhancements

  • write parquet ColumnOrder (#13672)
  • Impl contains for ArrayNameSpace (#13638)
  • improve rolling() expression formatting (#13657)
  • Implement is_between in Rust (#11945)
  • Add base PolarsError and PolarsWarning class (#13615)
  • typing overloads for Series operator methods ge, gt, ... (#13167)
  • Expressify pattern of str.extract (#13607)
  • Impl join for ArrayNameSpace (#13586)
  • add SQL engine support for string cast to json (#13624)
  • add SQL engine support for EXTRACT and DATE_PART (#13603)
  • Allow drop with no inputs as a no-op (#13460)
  • add SQL engine support for POSITION and STRPOS (#13585)
  • additional multi-column support for pl.<function> entries (#13336)
  • is_in support for array dtype (#13559)
  • add new str.find expression, returning the index of a regex pattern or literal substring (#13561)
  • Impl and dispatch arr.first/last to get (#13536)
  • Implement from_dataframe natively (interchange protocol) (#10701)
  • add SQL engine support for LIKE and ILIKE pattern matching (#13522)
  • improve hive partition pruning (#13358) (#13426)
  • Add compact syntax for int_range starting from 0 (#13530)
  • don't rechunk by default in lazy scans (#13518)
  • Add cum_count expression function (#13478)
  • add SQL engine support for IF control flow function (#13491)
  • add SQL engine support for MOD function (#13502)
  • return datetime for datetime mean & median (#13417)
  • add SQL engine support for CONCAT_WS string function (#13483)
  • Allow map_batches to auto-convert output NumPy arrays to Series (#13277)
  • add SQL engine support for RIGHT and REVERSE string functions (#13461)
  • implement BinaryView and Utf8View in polars-arrow (#13243)
  • add SQL engine support for variadic string CONCAT function (#13428)
  • add support for AND in SQL join-clause context (#13242)
  • Impl ordering ops for array namespace (#13414)
  • add SQL engine support for REPLACE string function (#13431)
  • add SQL engine support for SIGN function (#13429)
  • add SQL engine support for IFNULL function (#13432)
  • additional SQL support for bytes, bit, and hex literals (#13389)

🐞 Bug fixes

  • gather.get schema (#13679)
  • Fix group keys in partition_by(as_dict=True) / GroupBy.__iter__ in some cases (#13646)
  • ensure we hit proper cache in nested rolling expressions (#13666)
  • Allow av_buffer cast numeric record to temporal type (#13661)
  • streaming cross join if swapped is hit (#13656)
  • Make sure rolling key is projected when process projection (#13622)
  • fix schema inference for json (#13637)
  • Improve parsing of inputs for Expr dunders (#13635)
  • Empty series of AggregatedList should also have list dtype (#13620)
  • Series.eq_missing should return an Expr when the input is an Expr (#13628)
  • fallback to cast kernel if inline_cast AnyValue raise (#13595)
  • Fix formatting in describe for precise quantiles (#13593)
  • fix reverse variable row decoding (#13587)
  • Fix scatter for null values (#13578)
  • Fix cum_count with regards to start value / null values (#13535)
  • Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (#13548)
  • Treat Python None as null value for Object dtype (#13564)
  • Fix scatter to allow single temporal inputs (#13577)
  • Fix interchange protocol data buffer dtype (#10787)
  • Expr.replace to single value did not replace NULLs (#13551)
  • improve hive partition pruning (#13358) (#13426)
  • fix projection pushdown for new outer join schema (#13527)
  • dont raise when partial function is passed to map_elements (#13524)
  • improve reading of mixed string/other dtype column data from spreadsheets with openpyxl and pyxlsb engines (#13495)
  • ensure size-hint of TrueIdxIter is correct (#13508)
  • correct 'outer_coalesce' logic in case of duplicate names (#13501)
  • raise for out-of-range datetimes in to_datetime/strptime (#13403)
  • Fix Series equality for List/Array types (#13477)
  • Keep logical type when getting values from list (#13456)
  • Handle duplicate/ambiguous inputs for replace (#13217)
  • Handle empty inputs to Enum constructor (#13446)
  • Fix group_by iteration when grouping by certain selectors (#13437)
  • Fix to_pandas for 0x0 dataframe (#13420)
  • Fix offsets for numeric types in from_buffer (#13398)

📖 Documentation

  • Clarify documentation for the agg_list argument in Expr.map_batches (#13625)
  • fix linking to feature flags in user guide (#13644)
  • bring sink_ndjson docstring in line with other sink docstrings (#13636)
  • Update then and otherwise docstrings with "strings are parsed as column names" (#13630)
  • Add sink_ndjson to API reference. (#13627)
  • Improve documentation on broadcasting (#13394)
  • Add note about toolchain issue under native Windows (#13590)
  • Hint about ruff setting in VSCode (#13421)
  • Clarify examples for .transpose() (#13581)
  • Add additional Series docstring examples (#13558)
  • Doc example for read_csv (#13161) (#13545)
  • Add more doc examples on how to create an index column (#13532)
  • update SQL section of the README (#13529)
  • Add note to int_range docs for creating an index column (#13516)
  • add a note to the read_database_uri docstring about escaping special characters in the connection string (#13514)
  • update polars-business > polars-xdt link (#13509)
  • Fix various typos, grammar and formatting in docstrings and user guide (#13506)
  • Doc examples for threadpool_size and get_index_type (#13496)
  • Add missing datetime examples to docs (#13487)
  • add polars-distance to plugins page (#13454)
  • define file-like object in read_parquet docstring (#13463)
  • Move Series.struct.json_encode to methods in Sphinx autosummary (#13443)
  • Add missing examples of series/list.py (#13423)
  • show datetime.date import in code block (#13419)
  • clarify documentation for rle and rle_id (#13397)
  • use named series in Series.plot example (#13407)
  • fix alphabetical order of documentation entries (#13396)

🛠️ Other improvements

  • Auto-add 'needs triage' label to bugs (#13671)
  • make rolling index column visible to optimizer (#13658)
  • Enable new error message lint to improve stack trace display (#13596)
  • Add Documentation / Build system sections to the changelog (#13594)
  • Filter unhelpful messages in make build (#13579)
  • Remove extra line break between checkboxes in GitHub bug report issues (#13576)
  • Narrow type hint for get_index_type util (#13556)
  • Fix some test failures/slowdowns (#13504)
  • pandas 2.2 compat (#13467)
  • Increase timeout for gevent async test (#13448)
  • Do not end docstrings with a blank line (#13193)

Thank you to all our contributors for making this release possible!
@Bromeon, @MarcNuebel, @MarcoGorelli, @ShivMunagala, @Wainberg, @aaarrti, @alexander-beedie, @bchalk101, @c-peters, @cgevans, @cmdlineluser, @collinprince, @deanm0000, @hamishs, @henryharbeck, @ion-elgreco, @jcrozum, @mcrumiller, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @s-banach, @shritesh, @stinodego, @tim-stephenson and @wjandrea