v0.7.8
·
278 commits
to refs/heads/main
since this release
What's Changed π
π₯ Breaking Changes
β¨ Features
- feat: Expose Subscribers through nicer APIs @srilman (#6631)
- feat(dashboard): heatmap coloring for nodes. @universalmind303 (#6628)
- feat(dataframe): add DataFrame.skew() global aggregation method @kerwin-zk (#6619)
- feat(dashboard): add arrows indicating pipeline direction @universalmind303 (#6625)
- feat: wire CheckpointId through Flotilla execution pipeline @rohitkulshreshtha (#6567)
- feat: add image_hash() for image deduplication @chenghuichen (#6485)
- feat: support get function from catalog @gavin9402 (#6524)
- feat(dataframe): add var() method to DataFrame and GroupedDataFrame @kerwin-zk (#6584)
- feat: checkpoint based on distributed key-existence filter @everySympathy (#5931)
- feat(distributed): make flotilla worker actor startup timeout configurable @desmondcheongzx (#6592)
π Bug Fixes
- fix: Fix nightly Daft version resolution in Ray runtime env @jeevb (#6630)
- fix(dashboard): pass partition sets to repr_json so plan matches execution topology @samstokes (#6576)
- fix(scan): skip getting bytes when range start equals end in daft async reader @gweaverbiodev (#6602)
- fix(io): retry transient errors on initial GET request @desmondcheongzx (#6544)
- fix(dashboard): use smart per-node stats aggregation for distributed execution @samstokes (#6574)
- fix(io): handle schema-evolved Iceberg columns in Parquet predicate pushdown @sankarreddy-atlan (#6551)
- fix(dashboard): prevent flotilla workers from sending spurious lifecycle events @samstokes (#6573)
π Performance
- perf: Optimize GroupBy Map Building & List-Agg @srilman (#6613)
- perf(flight): Read local shuffle data directly from disk @srilman (#6436)
- perf(inline-agg): add min/max accumulator types @BABTUNA (#6604)
- perf: inline vectorized aggregation for grouped count/sum @desmondcheongzx (#6345)
β»οΈ Refactor
- refactor: migrate PaimonScanOperator to DataSource API @chenghuichen (#6600)
- refactor(subscriber): collapse trait to single sync on_event method @cckellogg (#6593)
- refactor: consolidate BatchManager as single buffer abstraction @universalmind303 (#6566)
- refactor: Use DAFT_REF_NAME and DAFT_SHA env vars for benchmark run metadata @jeevb (#6610)
- refactor: Minor parameterization for benchmarking workflows @jeevb (#6609)
- refactor(distributed): unify repartition exchange write flow across ray and flight @ohbh (#6499)
π Documentation
π· CI
- ci(test): reduce Ray resource requests in chained skip_existing tests @desmondcheongzx (#6633)
- ci(test): reduce Ray resource requests in skip_existing tests @desmondcheongzx (#6629)
- ci(deps): fix CI failures from dependabot bump #6570 @desmondcheongzx (#6596)
π§ Maintenance
- chore(cargo): inherit edition/version from workspace in root package @yew1eb (#6329)
- chore!: Remove unused
max_task_backlogparameter @srilman (#6591) - chore(dashboard): add ?debug query param for SSE event console logging @samstokes (#6577)
Full Changelog: v0.7.7...v0.7.8