Explore using DuckDB as computation engine #8

marsupialtail · 2022-09-26T17:38:13Z

DuckDB can be used in many places in Quokka, mostly replacing Polars. This can be approached in stages.

===== SQL predicates =======

Currently Quokka maintains an interpreter that executes SQL predicates with Polars or Pandas (https://github.com/marsupialtail/quokka/blob/master/pyquokka/sql_utils.py#L19)

Perhaps we should just execute this predicate with DuckDB.

Pros of switching:

Cons of switching:

It might be better to maintain this interpreter if eventually we want Quokka to generate SIMD code in Gandiva fashion. Then the predicate can just be compiled down into a shared object library that can be loaded at runtime.

====== Aggregations and groupbys ======

Currently Quokka uses Apache Arrow to do aggregations and groupbys.

Perhaps we should also just use DuckDB.

======= Executor kernels ===========

Quokka kernels today almost exclusively use Polars. Some can probably be switched to DuckDB.

Pros of switching:

Cons of switching:

Want to wait until Arrow 10.0 with the super out-of-core fast hash join support.

marsupialtail closed this as completed Jan 29, 2023

Provide feedback