-
Notifications
You must be signed in to change notification settings - Fork 1
chore: move udf_query into its own query module
#146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit does a few things. First, it splits off into it's own module. It also shuffles around some test utils as part of that. Most importantly though, it defines and implements a UdfCodeFormatter trait for formatting various programming languages when parsing & registering UDFs.
eda4295 to
5ba1fe6
Compare
crepererum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code would be easier to review if the "move code around" and the actual change would be two PRs 😉
host/Cargo.toml
Outdated
| datafusion-expr.workspace = true | ||
| datafusion-sql.workspace = true | ||
| datafusion-udf-wasm-arrow2bytes.workspace = true | ||
| datafusion-udf-wasm-bundle = { workspace = true, features = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host shouldn't depend on the bundled examples outside of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not gonna fly as prod code. I don't think it's that much code though, so you could just copy whatever you need in query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this bad? Not arguing at all just genuinely curious as I don't see the problem ATM. Is it because we want host to be completely language-agnostic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this code is language-agnostic
- this code is esp. guest-agnostic
- compiling the
hostshould require you to bundle/compile all kinds of guests. Thebundlecrate is merely a helper for tests or for users that want an easy way to pull in the guests that we provide in this repo, but there's no hard requirement to use the guests that we provide
query/Cargo.toml
Outdated
| datafusion-expr.workspace = true | ||
| datafusion-sql.workspace = true | ||
| datafusion-udf-wasm-host.workspace = true | ||
| insta = "1.43.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instais a dev dependency, not a normal prod dep- we should probably make this a
workspacedependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the reason I've been struggling with dev-related dependency shenanigans is because I've been using [dev.dependencies] instead of [dev-dependencies] 🤦
query/src/lib.rs
Outdated
| /// Pre-compiled WASM component. | ||
| /// Necessary to create UDFs. | ||
| components: HashMap<String, &'a WasmComponentPrecompiled>, | ||
| /// Code formatter for UDF code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your formatter is language-specific, so you should have one per language. I suggest you don't add this a generic (in general generics are to be avoided due to the code bloat they produce anyways) and design it a bit like this:
languages: HashMap<String, Lang<'a>>and then
struct Lang<'a> {
component: &'a WasmComponentPrecompiled,
formatter: Box<dyn UdfCodeFormatter>,
}
query/src/format/python.rs
Outdated
| @@ -0,0 +1,25 @@ | |||
| use crate::format::UdfCodeFormatter; | |||
|
|
|||
| /// Python code formatter for UDF code | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the behavior here isn't really Python-specific: you're really just stripping the indention. So you could rename this to RemoveIndentionFormatter for something like this. My guess is that other languages might run into a similar issue too.
f406305 to
e80a738
Compare
udf_query into its own query module
crepererum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nitpick, otherwise fine 👍
e80a738 to
4b7b7e6
Compare
Helps #141
This moves the
udf_querysubmodule currently inhostinto its ownquerymodule which helps organize things better and allows us to keep lang-specific code out ofhost.