Skip to content

Conversation

@Sl1mb0
Copy link
Contributor

@Sl1mb0 Sl1mb0 commented Nov 4, 2025

Helps #141

This moves the udf_query submodule currently in host into its own query module which helps organize things better and allows us to keep lang-specific code out of host.

  • I've read the contributing section of the project CONTRIBUTING.md.
  • Signed CLA (if not already signed).

This commit does a few things. First, it splits off  into it's own  module. It also shuffles around some test utils as part of that. Most importantly though, it defines and implements a UdfCodeFormatter trait for formatting various programming languages when parsing & registering UDFs.
@Sl1mb0 Sl1mb0 requested a review from crepererum November 4, 2025 18:04
@Sl1mb0 Sl1mb0 force-pushed the tm/format-the-snake branch 3 times, most recently from eda4295 to 5ba1fe6 Compare November 4, 2025 18:51
Copy link
Collaborator

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code would be easier to review if the "move code around" and the actual change would be two PRs 😉

host/Cargo.toml Outdated
datafusion-expr.workspace = true
datafusion-sql.workspace = true
datafusion-udf-wasm-arrow2bytes.workspace = true
datafusion-udf-wasm-bundle = { workspace = true, features = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host shouldn't depend on the bundled examples outside of tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not gonna fly as prod code. I don't think it's that much code though, so you could just copy whatever you need in query.

Copy link
Contributor Author

@Sl1mb0 Sl1mb0 Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this bad? Not arguing at all just genuinely curious as I don't see the problem ATM. Is it because we want host to be completely language-agnostic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this code is language-agnostic
  2. this code is esp. guest-agnostic
  3. compiling the host should require you to bundle/compile all kinds of guests. The bundle crate is merely a helper for tests or for users that want an easy way to pull in the guests that we provide in this repo, but there's no hard requirement to use the guests that we provide

query/Cargo.toml Outdated
datafusion-expr.workspace = true
datafusion-sql.workspace = true
datafusion-udf-wasm-host.workspace = true
insta = "1.43.2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. insta is a dev dependency, not a normal prod dep
  2. we should probably make this a workspace dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that the reason I've been struggling with dev-related dependency shenanigans is because I've been using [dev.dependencies] instead of [dev-dependencies] 🤦

query/src/lib.rs Outdated
/// Pre-compiled WASM component.
/// Necessary to create UDFs.
components: HashMap<String, &'a WasmComponentPrecompiled>,
/// Code formatter for UDF code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your formatter is language-specific, so you should have one per language. I suggest you don't add this a generic (in general generics are to be avoided due to the code bloat they produce anyways) and design it a bit like this:

languages: HashMap<String, Lang<'a>>

and then

struct Lang<'a> {
    component: &'a WasmComponentPrecompiled,
    formatter: Box<dyn UdfCodeFormatter>,
}

@@ -0,0 +1,25 @@
use crate::format::UdfCodeFormatter;

/// Python code formatter for UDF code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior here isn't really Python-specific: you're really just stripping the indention. So you could rename this to RemoveIndentionFormatter for something like this. My guess is that other languages might run into a similar issue too.

@Sl1mb0 Sl1mb0 force-pushed the tm/format-the-snake branch 2 times, most recently from f406305 to e80a738 Compare November 5, 2025 14:37
@Sl1mb0 Sl1mb0 changed the title feat: strip indentation from python code chore: move udf_query into its own query module Nov 5, 2025
@Sl1mb0 Sl1mb0 requested a review from crepererum November 5, 2025 14:44
Copy link
Collaborator

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpick, otherwise fine 👍

@Sl1mb0 Sl1mb0 force-pushed the tm/format-the-snake branch from e80a738 to 4b7b7e6 Compare November 5, 2025 16:18
@Sl1mb0 Sl1mb0 enabled auto-merge November 5, 2025 16:18
@Sl1mb0 Sl1mb0 added this pull request to the merge queue Nov 5, 2025
Merged via the queue into main with commit 22f8766 Nov 5, 2025
2 checks passed
@Sl1mb0 Sl1mb0 deleted the tm/format-the-snake branch November 5, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants