-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(udf): support embedded python udf #15168
Conversation
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Would this Python interpreter be able to execute any command like |
No. We only allow importing packages in a whitelist and have removed dangerous functions such as |
Our embedded UDF is getting more and more exciting. 🤩 BTW, may I ask about the status of Python WASM ecosystem? I suppose it's not that functional so that we chose to run it on a Python interpreter and handle the isolation and security ourselves. |
We have been able to run Python UDFs on WASM using a prebuilt Python WASM image provided by vmware labs. The ecosystem looks fine and WASM does provide better isolation. However, my biggest concern is performance. According to our benchmark result, Python is already very slow (100x slower than native code), running it on WASM makes it 10x slower. I think the performance impact is far more than security benefits, making it less useful in practice. So I don't choose WASM for Python UDF. 😢 |
b728aa5
to
d98f7ac
Compare
Signed-off-by: Runji Wang <wangrunji0408@163.com>
d98f7ac
to
114a03c
Compare
Signed-off-by: Runji Wang <wangrunji0408@163.com>
2ad3f52
to
01e9106
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR LGTM! Will take a look at https://github.com/risingwavelabs/arrow-udf/tree/main/arrow-udf-python later
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
567e3f8
to
0da7efb
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR allows user to define Python UDFs that will be run in an embedded Python interpreter in RisingWave.
This is very similar to #14513. We use pyo3 to embed a Python library into RisingWave. The minimum supported Python version is 3.12. Because we require sub-interpreters and per-interpreter GIL to run different UDFs in parallel.
For safety reasons, functions are limited to pure computational logic. We only allow importing packages in a whitelist and have removed some builtin functions such as
exit
andopen
. The goal is to create a sandbox for untrusted code. However, it is not absolute safe, at least for now. It seems that Python interpreter doesn't provide a way to limit CPU or memory usage. The system can be easily blocked by an infinite loop in UDF. Until this problem is resolved, it is be better to only open this feature to admin users.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
Support embedded Python UDFs. Functions will run inside RisingWave and don't require external UDF servers.
Documentations will be updated later.