Skip to content

Rust Polars 0.33.0

Compare
Choose a tag to compare
@github-actions github-actions released this 17 Sep 16:31
· 2234 commits to refs/heads/main since this release
7f8cd7d

🏆 Highlights

  • implementing sink_csv for LazyFrame (#10682)

💥 Breaking changes

  • empty product returns identity (#10842)
  • return f64 for rank when method="average" (#10734)
  • Rename groupby to group_by (#10654)
  • Read/write support for IPC streams in DataFrames (#10606)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

⚠️ Deprecations

  • Rename is_first/last to is_first/last_distinct (#11130)
  • Rename count_match to count_matches (#11028)
  • Rename strip to strip_chars (#10813)
  • Add datetime_range expression function (#10213)
  • Rename Series/Expr.rolling_apply to rolling_map (#10750)

🚀 Performance improvements

  • improve performance of fast projection (#10945)
  • parse time zones outside of downcast_iter() in replace_time_zone (#10713)
  • use binary abstraction for atan2 (#10588)
  • use binary abstraction in pow (#10562)

✨ Enhancements

  • Expressify str.split argument. (#11117)
  • Expressify argument of binary contains (#11091)
  • dt.offset_by supports broadcasting lhs (#11095)
  • Expressify argument of binary starts_with and ends_with (#11076)
  • json_extract supports extract static and string value to list dtype (#11057)
  • add quote_style="never" option for write_csv (#11015)
  • add support for nextest (#11048)
  • Add literal for str count_match (#10996)
  • More dtypes supports cast to list (#11025)
  • ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
  • Add strip_prefix and strip_suffix to the string namespace (#10958)
  • Add datetime_range expression function (#10213)
  • add proper cache for Regex compilation (#10934)
  • implementation of array_to_string (#10839)
  • apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
  • accept expr in str.count_match (#10900)
  • accept expressions in .offset_by (#9967)
  • implement drop as special case of select (#10885)
  • Supports is_last operation (#10760)
  • activate cse for group_by (again) (#10749)
  • add pairwise float sum implementation (#10756)
  • implementing sink_csv for LazyFrame (#10682)
  • Supports series unique & arg_unique & n_unique for list (#10743)
  • repeat_by should also support broadcasting of LHS (#10735)
  • deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
  • is_first also supports numeric list type. (#10727)
  • improve slice pushdown in unions (#10723)
  • Support min and max strategy for binary & str columns fill null (#10673)
  • support broadcasting in list set operations (#10668)
  • add truncate_ragged_lines (#10660)
  • supports cast to list (#10623)
  • Rename groupby to group_by (#10654)
  • preserve whitespace in notebook output (#10644)
  • Read/write support for IPC streams in DataFrames (#10606)
  • improve binary (arity) generics (#10622)
  • propagate null is in is_in and more generic array construction (#10614)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • frame-level cast support (#10504)
  • Add failed column to cast exception (#10507)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

🐞 Bug fixes

  • Correct hash and fmt for struct expr (#11119)
  • enforce sortedness of by argument in rolling_* functions (#11002)
  • Filter on empty objectChunked should not throw error (#11073)
  • ensure null_count statistics accounts for null array (#11070)
  • toggle off cse if ext_context is used (#11051)
  • Correct field dtype of string concat (#11055)
  • pushed-down expr should be considered when evaluating ExternalContext (#11023)
  • fix rolling_* functions when "by" has nanosecond resolution (#11005)
  • Don't reuse member for Selector::Add (#11026)
  • fix the construction of List<Null> (#10969)
  • allow singular null in regex pattern (#10948)
  • compute length of null array in explode (#10946)
  • Allow exactly one value in start/end for int_range (#10914)
  • count was falsy tagged as cse in group by (#10917)
  • Retain original dtype when deserializing an empty list (#10893)
  • CSE don't accept opaque functions (#10905)
  • Make int_range(s) exclusive on the upper bound when step is negative (#10898)
  • fix conversion from decimal to float (#10776)
  • Add broadcasting for list comparisons (#10857)
  • don't overflow length before checking limit (#10883)
  • fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
  • tag amortized iter unsafe and add safe alternatives (#10881)
  • use pool in dataframe arithmetic (#10864)
  • remove debug println! from datetime fn (#10862)
  • repair polars_err string interpolation (#10863)
  • make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
  • empty product returns identity (#10842)
  • never panic in hash/equality doesn't hold in cse (#10836)
  • Improve bound checks on temporal ranges (#10837)
  • var/std behavior around few elements (#10828)
  • Fix divided by zero error when read empty csv in streaming mode (#10819)
  • fix equality of quantile aggregation node (#10816)
  • Reading an only-header csv file in streaming mode should not panic (#10810)
  • get_single_leaf can't handle Expr::Count (#10790)
  • string to decimal parsing (#10712)
  • support groupby literal in streaming (#10771)
  • ORDER BY on unselected columns (#10752)
  • Fix is_in cannot cast list type for float (#10769)
  • fix unicode truncation in json parsing (#10761)
  • Error message of list unique should not display inner type (#10748)
  • create chunks_mut entry in vtable (#10745)
  • Prevent panic on sample_n with replacement from empty df (#10731)
  • only preserve sortedness flag in replace_time_zone when safe (#10738)
  • Error on value_counts on column named "counts" (#10737)
  • Build Series from empty Series vector (#10558)
  • return f64 for rank when method="average" (#10734)
  • Keep min/max and arg_min/arg_max consistent. (#10716)
  • Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
  • Cast small int type when scan csv in streaming mode. (#10679)
  • Reused input series in rolling_apply should not be orderly (#10694)
  • re-sort buffer when update window swap the whole buffer (#10696)
  • Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
  • Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
  • AllHorizontal format string (#10658)
  • List<null> chunked builder should take care of series name (#10642)
  • respect 'ignore_errors=False' in csv parser (#10641)
  • fix rename + projection pushdown (#10624)
  • fix int/float downcast in is_in (#10620)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Fix serialization for categorical chunked. (#10609)
  • join_asof missing tolerance implementation, address edge-cases (#10482)
  • Take input_schema to create physical expr for Selection (#10571)
  • fix serialization of empty lists (#10563)
  • Clear window cache after evaluate predication expr (#10505)
  • Parsing regex col in Expr::Columns (#10551)
  • sanitize column naming in boolean ops (#10531)
  • fix build for wasm (#10536)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • fix build for wasm (#9502)
  • rollback cse in groupby: python 0.18.15 (#10491)

🛠️ Other improvements

  • Removed duplicated example (#11109)
  • Add CODEOWNERS for docs folder (#11107)
  • Refactor starts_with and ends_with for string (#11085)
  • Integrate user guide (#11089)
  • remove feature gate join/groupby in polars-core (#10965)
  • Add Documentation issue type (#11042)
  • complete intra-docs in api documentation (#11007)
  • genericize take implementation (#10976)
  • genericize PolarsDataType (#10952)
  • enhance internal crates readme with reference to main crate (#10928)
  • Add Duration method for checking full days (#10850)
  • apply with_name in more places (#10899)
  • never compare opaque functions (#10906)
  • eliminate repetition in utf8 datetime functions (#10860)
  • Fix issue templates for bug reports (#10896)
  • remove LocalProjection (#10886)
  • request verbose logging output of minimal reproducable examples (#10882)
  • Reorganize range expression module (#10871)
  • introduce with_name for Series/ChunkedArray (#10859)
  • Further refactor temporal range functions (#10844)
  • Refactor range related functions (#10830)
  • Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
  • Fix some broken links / formatting (#10772)
  • Improve docs for polars-lazy (#10729)
  • update rustc nightly_2023-08-26 (#10467)
  • default to rust native flate2 lib (#10733)
  • Clear GitHub Actions caches weekly (#10715)
  • move 'is_in' to polars-ops (#10645)
  • Clean up schema calculation for date_range (#10653)
  • remove unused apply functions and add fallible generic apply functions (#10621)
  • Enforce up-to-date Cargo.lock (#10555)
  • make binary chunkedarray functions DRY (#10607)
  • bump MSRV to 1.65 (#10568)
  • genericize chunk implementation (#10506)
  • use ChunkArray::(try_)from_chunk_iter (#10497)
  • add VSCode rust-analyzer settings (#10498)
  • Update URLs for dev documentation (#10495)
  • Update features for latest flate2 release (#10492)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj