Skip to content

Latest commit

 

History

History
26 lines (15 loc) · 2.28 KB

README.md

File metadata and controls

26 lines (15 loc) · 2.28 KB

Open Data Fabric: Apache Arrow DataFusion Engine

Release CI Dependencies Chat

This the implementation of the Engine contract of Open Data Fabric using the Apache Arrow DataFusion data processing framework. It is currently in use in kamu-cli data management tool.

Features

This engine is experimental and has limited functionality due to being batch-oriented, but is extremely fast and low-footprint. There are ongoing attempts to add stream processing functionality.

We recommend using this engine only for basic filter/map operations that do not require temporal processing. If you need temporal JOINs, aggregations, windowing, and watermark semantics - take a look at Apache Flink ODF Engine.

Also note that this engine does not automatically handle retractions and corrections. If you perform map/filter operations on the stream that can contain retractions and corrections - make sure to manually propagate the op column. If output does not contain an op column - all emitted records will be considered as appends.

More information and engine comparisons are available here.

Developing

This is a Rust-based project. You can follow similar steps as in kamu-cli development guide.