Open Data Fabric: Apache Arrow DataFusion Engine

This the implementation of the Engine contract of Open Data Fabric using the Apache Arrow DataFusion data processing framework. It is currently in use in kamu-cli data management tool.

Features

This engine is experimental and has limited functionality due to being batch-oriented, but is extremely fast and low-footprint. There are ongoing attempts to add stream processing functionality.

We recommend using this engine only for basic filter/map operations that do not require temporal processing. If you need temporal JOINs, aggregations, windowing, and watermark semantics - take a look at Apache Flink ODF Engine.

Also note that this engine does not automatically handle retractions and corrections. If you perform map/filter operations on the stream that can contain retractions and corrections - make sure to manually propagate the op column. If output does not contain an op column - all emitted records will be considered as appends.

More information and engine comparisons are available here.

Developing

This is a Rust-based project. You can follow similar steps as in kamu-cli development guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Open Data Fabric: Apache Arrow DataFusion Engine

Features

Developing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Open Data Fabric: Apache Arrow DataFusion Engine

Features

Developing