Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow registering and executing WebAssembly functions #45

Merged
merged 29 commits into from
Nov 21, 2022

Conversation

psarna
Copy link
Collaborator

@psarna psarna commented Oct 13, 2022

This series implements a mechanism for registering and running Wasm functions. The current runtime of choice is wasmtime and its libwasmtime.so library with C bindings (but a switch to Rust should be considered, because that's the native language of wasmtime and the only interface which offers all of its features).

It operates on a very crude ABI (ref:#16), where ints and doubles are passed to/from WebAssembly as is,
and for strings/blobs/null it passes a pointer to a structure:

  • string: [1 byte for type specification][data]
  • blob: [1 byte for type specification][4 bytes of size][data]
  • null: [1 byte for type specification]

The way it's implemented now is twofold:

  1. There's an internal run_wasm function, capable of running WebAssembly and translating the parameter types from and to the Wasm module
  2. A dynamic lookup table, currently a regular SQL table: CREATE TABLE libsql_wasm_func_table(name text PRIMARY KEY, body text). The table can be initialized from C code by calling libsql_try_initialize_wasm_func_table() or from shell by using a .init_wasm_func_table command.

After creating and filling the new meta-table, when a function call is used in a statement, e.g. SELECT id, fib(id) FROM t, and function fib is neither built-in nor user-defined, it will be looked up in the table. If found, its body will be assumed to hold valid WebAssembly code, compiled and run.

In order to enable WebAssembly integration, run configure with ./configure --enable-wasm-runtime parameter.

A few examples WebAssembly-based user-defined functions coded in Rust can be found here: https://github.com/psarna/libsql_bindgen

Here's an inline demo for testing purposes, with a WebAssembly fibonacci sequence already compiled from Rust and copied in-place:

.init_wasm_func_table

CREATE FUNCTION fib LANGUAGE wasm AS '
(module 
 (type (;0;) (func (param i64) (result i64))) 
 (func $fib (type 0) (param i64) (result i64) 
 (local i64) 
 i64.const 0 
 local.set 1 
 block ;; label = @1 
 local.get 0 
 i64.const 2 
 i64.lt_u 
 br_if 0 (;@1;) 
 i64.const 0 
 local.set 1 
 loop ;; label = @2 
 local.get 0 
 i64.const -1 
 i64.add 
 call $fib 
 local.get 1 
 i64.add 
 local.set 1 
 local.get 0 
 i64.const -2 
 i64.add 
 local.tee 0 
 i64.const 1 
 i64.gt_u 
 br_if 0 (;@2;) 
 end 
 end 
 local.get 0 
 local.get 1 
 i64.add) 
 (memory (;0;) 16) 
 (global $__stack_pointer (mut i32) (i32.const 1048576)) 
 (global (;1;) i32 (i32.const 1048576)) 
 (global (;2;) i32 (i32.const 1048576)) 
 (export "memory" (memory 0)) 
 (export "fib" (func $fib)))
';

CREATE TABLE IF NOT EXISTS example(id int PRIMARY KEY);
INSERT OR REPLACE INTO example(id) VALUES (7);
INSERT OR REPLACE INTO example(id) VALUES (8);
INSERT OR REPLACE INTO example(id) VALUES (9);
SELECT id, fib(id) FROM example;

This series also comes with syntactic sugar for registering and deregistering Wasm functions dynamically via SQL: CREATE FUNCTION and DROP FUNCTION: Fixes #18

Fixes #17

@psarna
Copy link
Collaborator Author

psarna commented Oct 13, 2022

This is only a draft for multitude of reasons, the most important ones being:

  • lack of automated tests
  • currently, invoking Wasm-based user-defined functions causes an explicit memory leak during lookup - these functions need to be tracked and cached (also to avoid Wasm recompilation) - namely, once registered dynamically, the function should simply end up on the list of all the other user-defined functions

@juntao
Copy link

juntao commented Oct 13, 2022

Great work! Perhaps you could consider CNCF's WasmEdge, which has a well maintained C SDK with LLVM-based AOT support for embedding. :)

https://github.com/wasmedge/wasmedge

https://wasmedge.org/book/en/sdk/c.html

Disclaimer: I am a maintainer at WasmEdge. We helped Nebula Graph and TiDB to support similar Wasm UDFs in their SQL DBs.

@psarna
Copy link
Collaborator Author

psarna commented Oct 13, 2022

@juntao I actually looked it up earlier today, we're definitely interested in giving it a go! And, eventually, make the implementation runtime-agnostic by relying on Wasm C API (https://github.com/WebAssembly/wasm-c-api) that @losfair mentioned in another issue.

I remember from my morning research that the C dynamic library from WasmEdge release page was ~50MB, which is quite heavy compared to libwasmtime's 17 - are you aware of any thinner versions of it?

@juntao
Copy link

juntao commented Oct 13, 2022

Yes. I believe WasmEdge supports the standard C API -- I will confirm.

The WasmEdge dynamic library really should not be that big. The distribution binary of WasmEdge is only 8MB. I think the large version contains LLVM so that it can do AOT compilation w/o external dependency. Let me double check and revert. Thank you!

@hydai
Copy link

hydai commented Oct 14, 2022

I remember from my morning research that the C dynamic library from WasmEdge release page was ~50MB, which is quite heavy compared to libwasmtime's 17 - are you aware of any thinner versions of it?

Hi,
The official release contains the ahead-of-time compilation (with LLVM inside). So it may take more space. However, if you are looking for a tiny version, we have wasmedge/slim-runtime 1, which is the runtime only without the compiler inside.

$ file libwasmedge.so.0.0.0
libwasmedge.so.0.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=24d0e767d4f29f65dedc1c334e72e2320b6d391c, stripped
$ size libwasmedge.so.0.0.0
   text	   data	    bss	    dec	    hex	filename
1806540	  45848	    904	1853292	 1c476c	libwasmedge.so.0.0.0
~= 1.8M

@juntao
Copy link

juntao commented Oct 14, 2022

Thank you @hydai

@psarna I think the 1.8MB WasmEdge runtime library is sufficient for your use case. Developers can compile their functions to regular Wasm in any tool they choose. They can further use the wasmedgec tool to do AOT compiling before submitting the Wasm file to the database engine. The 1.8MB runtime library can handle both cases.

Ref: https://wasmedge.org/book/en/quick_start/run_in_aot_mode.html

Also, we do not yet support the proposed "standard" C API. But it could be supported if there is user demand. :)

@psarna
Copy link
Collaborator Author

psarna commented Oct 14, 2022

Splendid, thanks guys! 1.8MiB sounds way more aligned with edge use cases indeed, will give it a try

@psarna psarna force-pushed the wasm_runner_poc branch 2 times, most recently from 0bcb106 to cec8c90 Compare October 17, 2022 08:57
@psarna
Copy link
Collaborator Author

psarna commented Oct 17, 2022

v2:

  • .wat source code is now precompiled to a wasm module during initialization, once
  • each valid wasm function is now dynamically registered as a user-defined function, so it's not recompiled on consecutive exeuctions

Still to do: automated tests

@psarna psarna force-pushed the wasm_runner_poc branch 5 times, most recently from d2626a2 to 66d22d0 Compare October 19, 2022 14:02
@psarna
Copy link
Collaborator Author

psarna commented Oct 19, 2022

v3:

  • multiple fixes
  • added CREATE FUNCTION statement
  • added DROP FUNCTION statement

TODO:

  • automated tests
  • docs

@psarna psarna force-pushed the wasm_runner_poc branch 4 times, most recently from 1f4e898 to 9b7fa01 Compare October 19, 2022 19:29
@Nelson-He
Copy link

Great works.
We are working on using wasm to supply UDF functions in openGauss database too.
In our work, we supply an init function to load the wasm mode from local .wasm or .wat file, and parse the file to get the exported function informations. Then users can get the exported function signature intuitively through the tables we supplied.

We also made a demo to run wasm code in openGuass database, supplied with a docker image to experience.
You can find the project and docker image info below.
https://github.com/Nelson-He/openGauss-wasm

Hope we will keep in touch and exchange the thoughts further more.

@psarna psarna force-pushed the wasm_runner_poc branch 7 times, most recently from c3e0dde to 0745f02 Compare October 20, 2022 12:41
The routine creates the libsql_wasm_func_table table,
responsible for storing WebAssembly source code for dynamically
added Wasm functions.
It will be used to drop functions via the DROP FUNCTION statement.
The table will be created on startup in order to allow
registering Wasm functions dynamically.
The new experimental syntax loosely follows SQL's CREATE FUNCTION.
It still misses OR REPLACE keywords which would allow overriding
an already existing function.
The suite is wrapped in a feature flag, because user-defined
functions need to be compiled opt-in into libSQL.
The new command runs Rust test with udf feature enabled,
which assumes that libSQL was compiled with --enable-wasm-runtime.
Previous dynamic lookup of Wasm function was lazy and only performed
on its first use - this is redundant, and the logic is much clearer
when the functions are initialized on startup + when they're registered
dynamically.
The newly covered cases also check operations on strings,
blobs and null.
This document will serve as an entrypoint for various extensions
added to libSQL and not necessarily compatible with SQLite.
Eventually it might grow to become a separate directory.
Before the fix, single quotes were not properly loaded
from the database.
If .init_wasm_func_table is the initial call to the shell,
call open_db() first to initialize the connection.
By accepting compiled wasm blobs as well as .wat files,
we allow skipping the wasm2wat translation and save
some storage, as the function source code is also stored
as a binary blob, which is way more concise.
Previous ad-hoc solution of registering functions during parsing
was not in line with libSQL layers - execution should happen
in VDBE. Therefore, 2 new opcodes are added for registering
and dropping user-defined functions.
The test verifies that EXPLAIN command can be successfully
ran on CREATE FUNCTION and DROP FUNCTION statements.
The error can be either freely ignored or used in order
to print an error message.
We don't need rowid, as name is already the primary key.
This would allow easier integration with other runtimes
later. The interface only needs two functions right now:
 1. try_instantiate_wasm_function
    responsible for registering a new function dynamically
 2. run_wasm
    responsible for executing given function
Instead of binding to the Wasmtime C API library, the support
is now moved entirely to Rust, with only the minimal set
of C-compatible functions exported to be callable from libSQL
main code.
The new code produces a libwblibsql.so dynamic library
which contains the implementation of all functions required
by our ext/udf/wasm_bindings.h header.
The stripped library weighs 6.4MiB, which is quite heavy,
but already much better than Wasmtime's default C API library,
which weighted ~17MiB.
Why not - users may want to prefer to create a static
binary without having to worry about library paths.
Dynamic linking translates to smaller binaries, but makes it
more ergonomic to quickly try the shell, so let's go with
static by default.
The Dockerfile can be used to build a container with precompiled
sqlite3 shell inside, with WebAssembly user-defined function support.
@penberg penberg merged commit fd2aae3 into tursodatabase:main Nov 21, 2022
@marcobambini
Copy link

Is there any plan to support SQLite aggregate or window functions?
If yes can you please share more details?

@psarna
Copy link
Collaborator Author

psarna commented Jun 7, 2023

It's all already possible via the C API (https://www.sqlite.org/c3ref/create_function.html), we don't have any support for the SQL syntax (e.g. CREATE AGGREGATE or CREATE WINDOW FUNCTION. They would be very nice to have, but also not on our immediate roadmap

@psarna
Copy link
Collaborator Author

psarna commented Jun 7, 2023

That said, contributions are most welcome!!! I'd be glad to help/guide if need be

MarinPostma added a commit that referenced this pull request Oct 17, 2023
45: Full support for query parameters r=penberg a=MarinPostma

This PR introduces full support for query parameters. Both positional and named parameters are supported.

The supported syntax is the same as the one described in https://www.sqlite.org/c3ref/bind_blob.html.
Unbound parameters are interpreted as NULL.

## HTTP query parameters

Parameters can also be bound in http request. The syntax is quite flexible:

* Request without params:
```json
{
    "statements": ["select * from users where name = 'adhoc'"]
}
```
or (syntaxes can be mixed in the same array):
```json
{
    "statements": [{"q": "select * from users where name = 'adhoc'"}]
}
```

* Request with params:
- positional:
```json
{
    "statements": [
        {"q": "select * from users where name = ?", "params": ["adhoc"]},
        {"q": "select * from users where name = ?1", "params": ["adhoc"]},
        {"q": "select * from users where name = $1", "params": ["adhoc"]}
    ]
}
```

- named:
```json
{
    "statements": [
        {"q": "select * from users where name = $name", "params": {"name": "adhoc"}},
        {"q": "select * from users where name = :name", "params": {"name": "adhoc"}},
        {"q": "select * from users where name = `@name",` "params": {"name": "adhoc"}},
        {"q": "select * from users where name = $1", "params": {"name": "adhoc"}}, # object is order sensitive
        {"q": "select * from users where name = ?", "params": {"name": "adhoc"}},
    ]
}
```

### Handling of BLOB

BLOBS are handled as base64 encoded string (standards alphabet, no padding), and are nested into an object for disambiguition:
```json
{
    "statements": [
    {"q": "select * from users where name = $name", "params": {"name": {"blob": "984HG3e"}}},
                                                                  # some b64 blob --^
    ]
}
```


Co-authored-by: ad hoc <postma.marin@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement CREATE FUNCTION for dynamic user-defined function creation Implement a WebAssembly runner function
6 participants