Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore using DuckDB as computation engine #8

Closed
marsupialtail opened this issue Sep 26, 2022 · 0 comments
Closed

Explore using DuckDB as computation engine #8

marsupialtail opened this issue Sep 26, 2022 · 0 comments

Comments

@marsupialtail
Copy link
Owner

DuckDB can be used in many places in Quokka, mostly replacing Polars. This can be approached in stages.

===== SQL predicates =======

Currently Quokka maintains an interpreter that executes SQL predicates with Polars or Pandas (https://github.com/marsupialtail/quokka/blob/master/pyquokka/sql_utils.py#L19)

Perhaps we should just execute this predicate with DuckDB.

Pros of switching:

  • No need to maintain this interpreter!
  • Possibly better performance (this needs to be validated)

Cons of switching:

  • It might be better to maintain this interpreter if eventually we want Quokka to generate SIMD code in Gandiva fashion. Then the predicate can just be compiled down into a shared object library that can be loaded at runtime.

====== Aggregations and groupbys ======

Currently Quokka uses Apache Arrow to do aggregations and groupbys.

Perhaps we should also just use DuckDB.

======= Executor kernels ===========

Quokka kernels today almost exclusively use Polars. Some can probably be switched to DuckDB.

Pros of switching:

  • Possibly better out-of-core support

Cons of switching:

  • Want to wait until Arrow 10.0 with the super out-of-core fast hash join support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant