Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(udf): support inlined Rust UDF #14903

Merged
merged 4 commits into from Feb 4, 2024
Merged

Conversation

wangrunji0408
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR allows users to inline their Rust UDF code in the create-function statement.

create function gcd(int, int) returns int language rust as $$
    fn gcd(mut a: i32, mut b: i32) -> i32 {
        while b != 0 {
            let t = b;
            b = a % b;
            a = t;
        }
        a
    }
$$;

The function will be built into a wasm binary in the frontend. This requires the Rust toolchain (with the wasm32-wasi target) installed in frontend's environment. The build process will run for at least 10s, during which time the command is blocking from user's perspective. It may also require network access to download dependencies. Therefore, in the future we should consider how to:

  1. improve the user experience while waiting for a build to complete
  2. limit resources for build to prevent DoS attack

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

Add support for inline Rust UDFs.

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@BugenZhao
Copy link
Member

This is amazing!

This requires the Rust toolchain (with the wasm32-wasi target) installed in frontend's environment

Should we consider including it into the Docker image?

body = Some(function_body.clone());

let wasm_binary =
tokio::task::spawn_blocking(move || arrow_udf_wasm::build("", &script))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this means that we don't support specifying external crate dependencies now. This sounds good as the build.rs could run arbitrary code and is quite unsafe in a managed environment.

I'm considering whether it's possible to support a whitelist of 3rd-party crates in the future, just like how Rust's online playground does: https://arc.net/l/quote/qtopwevk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if wasm can serve as a with-host-tools toolchain in the distant future... 🤣

https://doc.rust-lang.org/nightly/rustc/platform-support.html#tier-2-with-host-tools

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering whether it's possible to support a whitelist of 3rd-party crates in the future, just like how Rust's online playground does: https://arc.net/l/quote/qtopwevk

Good idea! We can follow their way: pre-download all the crates that Rust playground supports and add --offline to cargo build.

body = Some(function_body.clone());

let wasm_binary =
tokio::task::spawn_blocking(move || arrow_udf_wasm::build("", &script))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experimental feature of cargo script would also be helpful: rust-lang/cargo#12207

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have investigated it. But cargo script seems to only support executables, not libraries. 😕

let system_params = session.env().meta_client().get_system_params().await?;
let object_name = format!("{:?}.wasm", md5::compute(&wasm_binary));
upload_wasm_binary(
system_params.wasm_storage_url(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still going to deprecate the external storage for WASM binaries? I'm not sure if a Rust WASM binary fits the metadata store. 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "yes"? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will soon move WASM binaries from object store to meta store. I think meta store is capable given that we will move to SQL backend soon and binaries usually won't be larger than 10MB. 😄

@stdrc
Copy link
Contributor

stdrc commented Feb 2, 2024

This requires the Rust toolchain (with the wasm32-wasi target) installed in frontend's environment.

Can we detect the existance of the toolchain before building? Seems in this PR the query will run for a while before reporting missing wasm32-wasi target.

The build process will run for at least 10s, during which time the command is blocking from user's perspective.

Is it possible to print some notice before building or make it run in background?

Copy link
Contributor

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks so successful to me!

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@wangrunji0408
Copy link
Contributor Author

wangrunji0408 commented Feb 4, 2024

Can we detect the existance of the toolchain before building? Seems in this PR the query will run for a while before reporting missing wasm32-wasi target.

Sure. I've added a feature to automatically install the target on build if --offline is not given. Besides, I also added this target to our docker image.

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Feb 4, 2024
@wangrunji0408
Copy link
Contributor Author

Is it possible to print some notice before building or make it run in background?

I did a quick investigation and found NoticeResponse in pgwire protocol maybe useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants