wasm: reuse instances for wasm UDFs #10306

wmitros · 2022-03-31T11:14:18Z

Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive,
but these instances can be reused for subsequent calls of the same UDF on various inputs.

This patch introduces a way of reusing wasmtime instances: a wasm instance cache.
The cache stores a wasmtime instance for each UDF and scheduling group. The instances are
evicted using LRU strategy and their size is based on the size of their wasm memories.

The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason,
the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added
later. The patch also removes the need of compiling the UDF again when dropping it.

The second patch contains the implementation and use of the new cache. The cache is implemented
in lang/wasm_instance_cache.hh and the main ways of using it are the run_script methods from wasm.hh

The third patch adds tests to test_wasm.py that check the correctness and performance of the new
cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF.

avikivity · 2022-03-31T12:53:29Z

cql3/selection/scalar_function_selector.hh

+    // auxiliary variables for reusing instances of wasm UDFs
+    std::optional<wasmtime::Store> _store;
+    std::optional<wasmtime::Instance> _instance;
+    std::optional<wasmtime::Func> _func;


I now think this is the wrong place. First, it entangles wasm into a more generic context. Second, it doesn't allow reuse in single-row queries.

We should have a cache of instances, with eviction based on idle time or memory use. Execution should remove an entry from the cache, execute the function, and insert it into the cache again.

avikivity · 2022-03-31T12:55:08Z

cql3/functions/aggregate_fcts.cc

+    // auxiliary variables for reusing instances of wasm UDFs for the scalar function
+    std::optional<wasmtime::Store> _store;
+    std::optional<wasmtime::Instance> _instance;
+    std::optional<wasmtime::Func> _func;


Having a cache at the wasm level means we don't need to repeat the code for UDF/UDA.

wmitros · 2022-04-07T12:33:07Z

I've added a simple cache of wasm instances to the user_function class. However, I didn't add a time-based eviction strategy yet, and somewhat arbitrarily chose 16MB as the max size of reusable wasm memory and 8 as the max number of cached instances. If there are some arguments for certain values of these variables, I'll be happy to hear them

psarna

Looks good, but please collapse the queue trio into a single queue holding a struct

psarna · 2022-04-07T13:09:07Z

cql3/functions/user_function.hh

+    // wasm UDF instance cache, allowing maximum concurrency
+    std::queue<wasmtime::Store> _store;
+    std::queue<wasmtime::Instance> _instance;
+    std::queue<wasmtime::Func> _func;


I think that instead of 3 queues we just need a single queue which keeps a struct. And this struct should wrap _store, _instance and _func. Am I correct that we always use all three? In this case it makes perfect sense to only use a single queue. That would make the call site look nicer as well.

The cache has to be shard-global, otherwise many distinct functions can eat all shard memory.

avikivity · 2022-04-07T15:25:31Z

cql3/functions/user_function.cc

+                auto memory_export = instance.get(store, "memory");
+                if (!memory_export) {
+                    throw wasm::exception("memory export not found - please export `memory` in the wasm module");
+                }


We shouldn't check this every time.

avikivity · 2022-04-07T15:26:07Z

cql3/functions/user_function.cc

+                        throw wasm::exception(format("Exported object {} is not a function", ctx.function_name));
+                    }
+                    _func.push(std::move(*fnc));
+                }


Betterr move to a separate function for clarity.

avikivity · 2022-04-07T15:26:53Z

cql3/functions/user_function.cc

+                auto memory = std::get<wasmtime::Memory>(*memory_export);
+                if (memory.size(store) < 256 && _store.size() < 8) {
+                    // reuse the the instance if the memory used is less than 256 pages (16MB)
+                    // TODO: also evict the instances if not used for a while


See loading_cache.hh

avikivity · 2022-04-07T15:27:43Z

test/cql-pytest/test_wasm.py

@@ -948,3 +965,6 @@ def test_word_double(cql, test_keyspace, table1, scylla_with_wasm_only):
        cql.execute(f"INSERT INTO {table} (p, txt) VALUES (1001, 'cat42')")
        res = [row for row in cql.execute(f"SELECT {test_keyspace}.{dbl_name}(txt) AS result FROM {table} WHERE p = 1001")]
        assert len(res) == 1 and res[0].result == 'cat42cat42'
+


Need tests to verify reuse. You can add a metric for function creations.

Metrics are a great idea, for the sake of future benchmarking too - e.g. we could verify how many times functions were compiled vs run, how many wasm-induced allocations were performed, and so on.

It's the same idea behind prepared statements and their metrics.

One frustrating problem with tests that use global metrics is that they cannot be run reliably in parallel against the same instance of Scylla. It's not a real problem currently (we don't run tests in parallel), so maybe there's no reason to be apprehensive about it. However, note that there is a reliable way to have metrics in parallel tests - it is to use per-table metrics. It turns out (!) we also have a way to extract per-table metrics via the rest api - see rest_api.py::get_column_family_metric. We don't use this trick enough in tests. We should.

psarna · 2022-04-22T11:58:50Z

What's the status on this one? Is another iteration expected?

wmitros · 2022-05-11T09:04:53Z

The new version depends on #10541, please comment the changes to the loading_cache there

avikivity · 2022-05-11T09:35:19Z

This patch depends on #10234, so it concerns only the last 2 commits.

These changes optimize the use of the wasmtime runtime, by reducing the number of times an instance is set up for a UDF call to 1. The instance is created when building a selector for the function, and later used for executing the function for each row.

Does it work across different queries? I expect so but it isn't clear from the message.

In the case of an UDA, the optimization is similar, but we only pre-process the instance for its scalar function, which is done when constructing impl_user_aggregate. The final function is called only once, so it does not benefit from creating an instance for it separately.

That actually agrees with the comment before and disagrees with my expectation. But why not reuse across queries?

Note that if you use a UDA on three-row partitions, then you'll have to create an instance every three rows.

A test for wasm UDF for selecting multiple rows and a test for wasm UDA are added to test_wasm.py

avikivity · 2022-05-11T09:36:22Z

cql3/functions/function_name.hh

+    bool operator<(const function_name& x) const {
+        return keyspace < x.keyspace || ( keyspace == x.keyspace && name < x.name);
+    }
+


How about: std::strong_ordering operator<=>(const function_name&) = default;

Looks good, I'll add it

avikivity · 2022-05-11T09:38:00Z

cql3/functions/user_function.cc

@@ -7,6 +7,8 @@
 */

 #include "user_function.hh"
+#include "lang/udf_cache.hh"
+#include "lang/wasmtime.hh"


Let's work to have an interface to engines and so only include the abstract engine, not every runtime here. Not high priority but keep in mind.

avikivity · 2022-05-11T09:39:49Z

cql3/functions/user_function.cc

@@ -58,7 +60,40 @@ bytes_opt user_function::execute(cql_serialization_format sf, const std::vector<
        },
        [&] (wasm::context& ctx) {
            try {
-                return wasm::run_script(ctx, arg_types(), parameters, return_type(), _called_on_null_input).get0();
+                auto func_cache = ctx.cache;
+                auto [func_inst, return_promise] = func_cache->get(name(), [this, &ctx] () {


Can we use the get() variant that doesn't accept a function (and uses the function from the constructor instead?) I want to get rid of the lambda variant.

It will also be nice to extract this to a separate function regardless.

I'll try doing that

avikivity · 2022-05-11T09:42:43Z

lang/udf_cache.hh

@@ -0,0 +1,154 @@
+/*
+ */
+


Seems like I've copied a corrupted copyright clause (I see it appears in other files too), but I'll fix it here

avikivity · 2022-05-11T09:42:59Z

lang/udf_cache.hh

+#include "utils/overloaded_functor.hh"
+#include "wasmtime.hh"
+#include "seastar/core/shared_ptr.hh"
+#include "seastar/core/scheduling.hh"


avikivity · 2022-05-11T09:44:27Z

lang/udf_cache.hh

+
+    template <typename LoadFunc>
+    requires std::is_invocable_r_v<value_type, LoadFunc>
+    std::pair<value_type, seastar::promise<value_type>> get(const key_type& key, LoadFunc&& load) {


I see you have a specialized wrapper here, so regardless of loading_cache issues, you can move LoadFunc here and absolve the caller from having to supply it. It can supply the extra parameters LoadFunc needs as regular parameters.

Why does it return both a value_type and a promise<value_type>?

Ah, is that for returning the instance? I think it should just call put(), future/promise isn't needed for synchronous communication.

I see you have a specialized wrapper here, so regardless of loading_cache issues, you can move LoadFunc here and absolve the caller from having to supply it. It can supply the extra parameters LoadFunc needs as regular parameters.

I'll do that

Ah, is that for returning the instance? I think it should just call put(), future/promise isn't needed for synchronous communication.

Yes, it's for returning. We don't have to use the futures now, but when we add pre-empting to udf calls, I think we will need them

avikivity · 2022-05-11T09:45:14Z

lang/udf_cache.hh

+    size_t _max_size;
+
+public:
+    udf_cache(size_t size)


avikivity · 2022-05-11T09:47:17Z

test/cql-pytest/test_wasm.py

@@ -336,6 +339,7 @@ def test_f64_param(cql, test_keyspace, table1, scylla_with_wasm_only):
  (table (;0;) 1 1 funcref)
  (table (;1;) 32 externref)
  (memory (;0;) 17)
+  (export "memory" (memory 0))


Why these changes?

We need to export memory to be able to check the instance size (we're using the size of the memory as the size of the instance in the cache)

What happens if the instance doesn't export memory? Does it have no memory, or is it just not reachable?

As we've discussed on the sync, the instance may have no memory and store everything on the stack

but why did this become important when we started to reuse instances?

Ah, for sizing. Maybe it's better for the sizer to assume a fixed size in that case (for all the overhead).

Do you mean a fixed size for instances that do not export memory? That could work, but it's hard to come up with a reasonable default size. Maybe it will be easier to decide after we do some testing

avikivity · 2022-05-11T09:47:56Z

lang/udf_cache.hh

        return _stats;
    }

+    void setup_metrics() {
+        namespace sm = seastar::metrics;
+        _metrics.add_group("user functions", {


No spaces in metric names.

avikivity · 2022-05-11T09:48:13Z

lang/udf_cache.hh

+    void setup_metrics() {
+        namespace sm = seastar::metrics;
+        _metrics.add_group("user functions", {
+            sm::make_derive("udf_hits", wasm::udf_cache::shard_stats().cache_hits,


Repeats "udf" and forgets to mention "cache"

scylladb-promoter · 2022-05-11T10:40:44Z

CI state FAILURE - https://jenkins.scylladb.com/job/releng/job/Scylla-CI/304/

wmitros · 2022-05-11T13:48:18Z

This patch depends on #10234, so it concerns only the last 2 commits.
These changes optimize the use of the wasmtime runtime, by reducing the number of times an instance is set up for a UDF call to 1. The instance is created when building a selector for the function, and later used for executing the function for each row.

Does it work across different queries? I expect so but it isn't clear from the message.

In the case of an UDA, the optimization is similar, but we only pre-process the instance for its scalar function, which is done when constructing impl_user_aggregate. The final function is called only once, so it does not benefit from creating an instance for it separately.

That actually agrees with the comment before and disagrees with my expectation. But why not reuse across queries?

Note that if you use a UDA on three-row partitions, then you'll have to create an instance every three rows.

A test for wasm UDF for selecting multiple rows and a test for wasm UDA are added to test_wasm.py

These comments referred to the first version and are no longer here. The instances can be reused across queries, I've added a check to the tests to confirm that (although its not optimal to test this using only metrics)

wmitros · 2022-07-18T13:53:07Z

The last rebase adds the new strategy of evicting entries from the cache. An entry that has been created is only completely removed when the corresponding UDF is dropped. Until then, it stores an wasm instance only when it is in use by some queries, but it stores the function name, function signature and a seastar::shared_mutex the entire time.
A test is added for checking whether the entries are actually dropped using a new metric.
This change also allowed the solution suggested in #10306 (comment)
It's worth noting that now the wasm instances are not removed from the cache when they're in use. Instead, when determining whether the instance is in use or not, we only rely on the mutex being locked or not.
I've also changed the lru iterator in cache_entry_type to _lru.end(), so that it's always valid

scylladb-promoter · 2022-07-18T19:14:20Z

CI state SUCCESS - https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1411/

avikivity · 2022-07-19T12:53:48Z

lang/wasm_instance_cache.hh

+#include "wasmtime.hh"
+#include "lang/wasm.hh"
+#include <exception>
+#include <seastar/core/metrics.hh>


It's enough to use metrics_registration.

Please see if you can trim the #include list here. .hh should try to reduce their dependencies.

It should be shorter now

avikivity · 2022-07-19T12:54:07Z

lang/wasm_instance_cache.hh

+#include <seastar/core/timer.hh>
+#include <seastar/util/defer.hh>
+#include "utils/hash.hh"
+#include "utils/overloaded_functor.hh"


and deduplicate it.

avikivity · 2022-07-19T12:54:24Z

lang/wasm_instance_cache.hh

+
+#include "cql3/functions/function_name.hh"
+#include "utils/overloaded_functor.hh"
+#include "cql3/prepared_statements_cache.hh"


Do we need this here?

Looks like we don't, I deleted this

avikivity · 2022-07-19T12:55:13Z

lang/wasm_instance_cache.hh

+    // sizes in wasm pages (16KiB)
+    size_t _total_size = 0;
+    size_t _max_size;
+


Better to count size in bytes to avoid confusion later on.

Good idea especially because the page size was incorrect in the comment

avikivity · 2022-07-19T12:56:19Z

Looks good, I'll go review that first commit.

avikivity · 2022-07-19T12:57:53Z

I see #10234 is merged, so I don't understand where the first commit belongs.

wmitros · 2022-07-19T16:37:30Z

The last rebase has only minor cleanup changes, but one of them is a rewrite of the instance_cache class description so you can take a look if there's something wrong or missing

I see #10234 is merged, so I don't understand where the first commit belongs.

I don't really see the correlation to #10234 in the first commit. This commit, while beneficial on its own, is not necessary for this patch (still, it is useful here). Perhaps it would be better if I made it into another PR, but I'd prefer not to make another patch that needs to be merged before this one

scylladb-promoter · 2022-07-19T20:28:55Z

CI state SUCCESS - https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1431/

avikivity · 2022-07-20T10:21:38Z

The last rebase has only minor cleanup changes, but one of them is a rewrite of the instance_cache class description so you can take a look if there's something wrong or missing

I see #10234 is merged, so I don't understand where the first commit belongs.

I don't really see the correlation to #10234 in the first commit. This commit, while beneficial on its own, is not necessary for this patch (still, it is useful here). Perhaps it would be better if I made it into another PR, but I'd prefer not to make another patch that needs to be merged before this one

The cover letter says

This patch depends on #10234, so it concerns only the last 2 commits.

Since there are three commits, I assumed the first one is related to #10234.

wmitros · 2022-07-20T14:32:03Z

The last rebases fix conflicts with master and update some commit messages.

This patch depends on #10234, so it concerns only the last 2 commits.

Since there are three commits, I assumed the first one is related to #10234.

I see, the cover letter definitely needed updating - it should be fixed now

avikivity · 2022-07-20T14:39:25Z

db/schema_tables.cc

+            row.get_nonnull<sstring>("keyspace_name"), row.get_nonnull<sstring>("function_name")};
+    auto arg_types = read_arg_types(db, row, name.keyspace);
+    return std::make_pair(std::move(name), std::move(arg_types));
+}


The function is called drop, but it doesn't drop anything.

The drop function is dropped in the rebase

avikivity · 2022-07-20T14:40:28Z

db/schema_tables.cc

@@ -1697,6 +1697,13 @@ static std::vector<data_value> read_arg_values(const query::result_set_row& row)
    }
 #endif

+static std::pair<cql3::functions::function_name, std::vector<data_type>> drop_func(replica::database& db, const query::result_set_row& row) {


struct function_signature { ... }

This should not be necessary anymore in the last rebase.

avikivity · 2022-07-20T14:43:05Z

db/schema_tables.cc

+    auto arg_types = read_arg_types(db, row, name.keyspace);
+    return std::make_pair(std::move(name), std::move(arg_types));
+}
+
 static shared_ptr<cql3::functions::user_function> create_func(replica::database& db, const query::result_set_row& row) {


This one actually does create something. So better rename drop_func() to reflect what it does.

Or change it to actually drop.

I've changed it so the drop function is actually added in the same patch as the cache, and it does what is says

avikivity · 2022-07-20T14:46:20Z

The last rebases fix conflicts with master and update some commit messages.

This patch depends on #10234, so it concerns only the last 2 commits.

Since there are three commits, I assumed the first one is related to #10234.

I see, the cover letter definitely needed updating - it should be fixed now

Ok, so I reviewed it too.

Currently, we have 2 mere_functions methods, where one is only the only call to the other. We can replace them with a simple one. The merge_functions method compiles a UDF (using create_func) only to read its signature. We can avoid that by reading it from the row ourselves. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

When executing a wasm UDF, most of the time is spent on setting up the instance. To minimize its cost, we reuse the instance using wasm::instance_cache. This patch adds a wasm instance cache, that stores a wasmtime instance for each UDF and scheduling group. The instances are evicted using LRU strategy. The cache may store some entries for the UDF after evicting the instance, but they are evicted when the corresponding UDF is dropped, which greatly limits their number. The size of stored instances is estimated using the size of their WASM memories. In order to be able to read the size of memory, we require that the memory is exported by the client. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Add a test for a wasm aggregate function which uses the new metrics to check if the cache has been hit at least once. Also check that the cache can get reused on different queries, by testing that the number of queries is higher than the number of cache misses. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

wmitros · 2022-07-20T16:29:29Z

Since there are three commits, I assumed the first one is related to #10234.

I see, the cover letter definitely needed updating - it should be fixed now

Ok, so I reviewed it too.

Thanks, the patch didn't look that good after review, so I tried a different approach made possible by the fact, that after 5a30f9b from yesterday the paths of creating/dropping UDFs and UDAs are now separate (we only wanted to change the UDF path). The same result is achieved but now it's more straightforward

scylladb-promoter · 2022-07-20T20:39:26Z

CI state SUCCESS - https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1456/

wmitros · 2022-08-02T10:46:19Z

There weren't many issues with the version before the last rebase, so if the last changes look good maybe we can merge this @avikivity @psarna

wmitros · 2022-08-02T15:22:19Z

Actually, this does not compile if we don't have wasmtime (it probably didn't since the start of this PR)
The entire code lang/wasm_instance_cache.cc and lang/wasm_instance_cache.hh need to be under #ifdef SCYLLA_ENABLE_WASMTIME
Hopefully we can add this as a follow-up

wmitros requested a review from nyh as a code owner March 31, 2022 11:14

wmitros changed the title ~~wasm: reuse instances~~ wasm: reuse instances for wasm UDFs Mar 31, 2022

avikivity reviewed Mar 31, 2022

View reviewed changes

wmitros force-pushed the wasm-instances branch from 2c49cec to db44860 Compare April 7, 2022 08:46

psarna self-requested a review April 7, 2022 13:05

psarna suggested changes Apr 7, 2022

View reviewed changes

avikivity reviewed Apr 7, 2022

View reviewed changes

wmitros force-pushed the wasm-instances branch from db44860 to fcf165d Compare May 11, 2022 09:03

wmitros requested a review from tgrabiec as a code owner May 11, 2022 09:03

avikivity reviewed May 11, 2022

View reviewed changes

wmitros force-pushed the wasm-instances branch from fcf165d to 8552c69 Compare May 11, 2022 13:48

avikivity reviewed Jul 19, 2022

View reviewed changes

wmitros force-pushed the wasm-instances branch from b977c11 to 0a747e0 Compare July 19, 2022 16:10

wmitros force-pushed the wasm-instances branch 2 times, most recently from 696403a to b947853 Compare July 20, 2022 14:10

avikivity reviewed Jul 20, 2022

View reviewed changes

wmitros added 3 commits July 20, 2022 18:10

wmitros force-pushed the wasm-instances branch from b947853 to 5590493 Compare July 20, 2022 16:21

wmitros mentioned this pull request Aug 1, 2022

Improve compilation and linking of Rust files #11102

Closed

scylladb-promoter closed this in 268e4ab Aug 2, 2022

scylladb-promoter merged commit 268e4ab into scylladb:master Aug 2, 2022

wasm: reuse instances for wasm UDFs #10306

wasm: reuse instances for wasm UDFs #10306

Conversation

wmitros commented Mar 31, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wmitros commented Apr 7, 2022

psarna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psarna commented Apr 22, 2022

wmitros commented May 11, 2022

avikivity commented May 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scylladb-promoter commented May 11, 2022

wmitros commented May 11, 2022

wmitros commented Jul 18, 2022

scylladb-promoter commented Jul 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avikivity commented Jul 19, 2022

avikivity commented Jul 19, 2022

wmitros commented Jul 19, 2022

scylladb-promoter commented Jul 19, 2022

avikivity commented Jul 20, 2022

wmitros commented Jul 20, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avikivity commented Jul 20, 2022

wmitros commented Jul 20, 2022

scylladb-promoter commented Jul 20, 2022

wmitros commented Aug 2, 2022

wmitros commented Aug 2, 2022 • edited

wmitros commented Mar 31, 2022 •

edited

wmitros commented Jul 20, 2022 •

edited

wmitros commented Aug 2, 2022 •

edited