[emval] Move reference counting to C++ #20447

RReverser · 2023-10-12T23:55:01Z

C++ is super eager with copying values by default ~~and asking questions later~~ instead of moving, and each of those copies had to cross JS boundary on creation/deletion just to increase/decrease reference count correspondingly.

Moving reference count from JS to C++ allows to make all this copying substantially cheaper as it becomes a single integer increment/decrement on the Wasm side which, in most cases, can even be completely eliminated by LLVM.

This does mean that we need to heap-allocate a small object with value's metadata on the C++ side, and I know it can be a controversial change, but:

We had to do that on the JS side as well with the extra object wrapper around the value itself, and I'm putting my bet on C++ allocation being somewhat cheaper.
With some relatively minor tricks, we can convince LLVM to optimise most of those allocations away for common case of local values, just like refcounts. I'm even going as far as adding test to ensure we don't regress on those optimisations in the future.
This prevents implicit copies from triggering the cross-thread access assertion.
It prepares better ground for reference types and doing all handle management on Wasm side.

Note that reference counting and allocation optimisation for local values only applies to speed modes (-O1 and higher) and not to -Os where LLVM avoids inlining functions, but microbenchmarks show that even in situations where copies can't be eliminated, this approach is still faster:

benchmarks.cpp

#include <emscripten.h>
#include <emscripten/val.h>
#include <stdio.h>

using namespace emscripten;

EM_JS(void, sideEffect, (EM_VAL v), {
  globalThis.sideEffect = Emval.toValue(v);
});

template<typename Func> void measure(const char* name, Func&& func) {
  // preheat
  for (int i = 0; i < 1'000'000; i++) {
    sideEffect(func().as_handle());
  }
  double time = emscripten_get_now();
  // measure
  int count = 100'000'000;
  for (int i = 0; i < count; i++) {
    sideEffect(func().as_handle());
  }
  time = emscripten_get_now() - time;
  printf("%s: %d ns\n", name, (int)(1e6 * time / count));
}

int main() {
  measure("Allocation cost", []() { return val(42); });

  measure("Copying cost", [v = val(42)]() { return val(v); });
}

Before (timings and JS+Wasm byte sizes for different modes):

-Os:

Allocation cost: 61 ns
Copying cost: 20 ns
43164

-O1:

Allocation cost: 60 ns
Copying cost: 21 ns
82809

-O2:

Allocation cost: 62 ns
Copying cost: 23 ns
44955

After:

-Os:

Allocation cost: 68 ns
Copying cost: 11 ns
43022

-O1:

Allocation cost: 49 ns
Copying cost: 8 ns
82431

-O2:

Allocation cost: 46 ns
Copying cost: 8 ns
44746

The only small slowdown is for initial allocation & freeing in -Os mode, but it's typically less common than copying, and it's in exchange for a small code size win, which is, well, point of that mode anyway.

This allows to completely optimise away the extra allocation / wrapper when values are local, and generally avoids crossing the JS<>Wasm boundary as often.

This reduces size of class val to a single pointer, and still gets optimised out as an allocation.

This allows to completely eliminate statements like `val v;` as compiler can statically prove it doesn't need _emval_free.

C++ doesn't allow elimination of new/delete pairs, but in C malloc/free pairs can be and are eliminated by LLVM. In theory Clang's __builtin_operator_{new,delete} could also help here, but in my experiments it wasn't optimised out while malloc/free still were. I don't think we care about calling into user's overrides in this class too much, and the optimisation is valuable enough, so just using standard C funcs.

This reverts commit e15a9f7.

RReverser · 2023-10-13T03:12:10Z

Marking as draft for now as there's one unresolved issue.

RReverser · 2023-10-13T20:10:49Z

Hm the remaining issues might be easier to solve after #20383 and another planned subsequent PR. I'll leave this as a draft for now.

mrolig5267319 · 2024-02-07T20:22:10Z

Have you considered using the same array approach on the C++ side for refcount as the JS-side? That is, having a single vector<val_metadata> with refcounts indexed by handle, such that most new handles won't require an allocation?

Ah, because in multi-threaded mode you could have handle collisions between threads? Could have an array per thread.

RReverser · 2024-02-07T23:52:05Z

Have you considered using the same array approach on the C++ side for refcount as the JS-side?

Yes, I played with it, but that prevents compiler optimisations that allow to completely eliminate allocation/free pairs, which are extremely valuable as most of Embind values are temporary. Compilers are happy to optimise malloc/free pairs because those are simple well-known APIs but they can't see through all the mechanics of something as complex as a vector (as it has its own list that can move around memory on reallocs).

mrolig5267319 · 2024-02-08T06:19:17Z

src/embind/embind.js

-      'fromWireType': (handle) => {
-        var rv = Emval.toValue(handle);
-        __emval_decref(handle);
-        return rv;
-      },
+      'fromWireType': (handle) => Emval.toValue(handle),


Sorry if I'm missing something obvious, but how do you get away with removing the incref/decref from the C++ toWireType and JS fromWireType? Won't the C++ destructor call free before the JS fromWireType reads Emval.toValue?

RReverser requested review from sbc100 and brendandahl October 12, 2023 23:55

RReverser changed the title ~~Move reference counting to C++~~ [emval] Move reference counting to C++ Oct 12, 2023

RReverser added 11 commits October 13, 2023 01:05

Move reference counting to C++

08535fb

This allows to completely optimise away the extra allocation / wrapper when values are local, and generally avoids crossing the JS<>Wasm boundary as often.

Use non-copied binding for emval

f9480ce

Fix _emval_run_destructors

032ea66

Extract val_metadata with all info

0c33c8f

This reduces size of class val to a single pointer, and still gets optimised out as an allocation.

We don't pass null handles anymore

21ed803

Move reserved handle check to C++ too

de26e59

This allows to completely eliminate statements like `val v;` as compiler can statically prove it doesn't need _emval_free.

Handle out of memory condition

e445e05

Add test

81fdb3c

Revert "Move reserved handle check to C++ too"

0510c71

This reverts commit e15a9f7.

Rebaseline sigs

333caed

RReverser force-pushed the val-cpp-refcount branch from 014081a to 333caed Compare October 13, 2023 00:05

RReverser marked this pull request as draft October 13, 2023 03:04

RReverser added 2 commits October 13, 2023 19:48

Bring back move constructor

04fd587

Unwrap reserved values

5d7bf61

RReverser mentioned this pull request Jan 29, 2024

Refactor Emval implementation to avoid JS heap allocations. #21205

Merged

mrolig5267319 reviewed Feb 8, 2024

View reviewed changes

mrolig5267319 mentioned this pull request Feb 8, 2024

[emval] Move lifecycle mangement from JS to C++. #21300

Draft

mrolig5267319 mentioned this pull request Feb 26, 2024

[emval] Add unique_val analog to be have like a std:unique_ptr version of val. #21433

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[emval] Move reference counting to C++ #20447

[emval] Move reference counting to C++ #20447

RReverser commented Oct 12, 2023 •

edited

Loading

RReverser commented Oct 13, 2023

RReverser commented Oct 13, 2023

mrolig5267319 commented Feb 7, 2024 •

edited

Loading

RReverser commented Feb 7, 2024

mrolig5267319 Feb 8, 2024

[emval] Move reference counting to C++ #20447

Are you sure you want to change the base?

[emval] Move reference counting to C++ #20447

Conversation

RReverser commented Oct 12, 2023 • edited Loading

RReverser commented Oct 13, 2023

RReverser commented Oct 13, 2023

mrolig5267319 commented Feb 7, 2024 • edited Loading

RReverser commented Feb 7, 2024

mrolig5267319 Feb 8, 2024

Choose a reason for hiding this comment

RReverser commented Oct 12, 2023 •

edited

Loading

mrolig5267319 commented Feb 7, 2024 •

edited

Loading