Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[emval] Move reference counting to C++ #20447

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

RReverser
Copy link
Collaborator

@RReverser RReverser commented Oct 12, 2023

C++ is super eager with copying values by default and asking questions later instead of moving, and each of those copies had to cross JS boundary on creation/deletion just to increase/decrease reference count correspondingly.

Moving reference count from JS to C++ allows to make all this copying substantially cheaper as it becomes a single integer increment/decrement on the Wasm side which, in most cases, can even be completely eliminated by LLVM.

This does mean that we need to heap-allocate a small object with value's metadata on the C++ side, and I know it can be a controversial change, but:

  1. We had to do that on the JS side as well with the extra object wrapper around the value itself, and I'm putting my bet on C++ allocation being somewhat cheaper.
  2. With some relatively minor tricks, we can convince LLVM to optimise most of those allocations away for common case of local values, just like refcounts. I'm even going as far as adding test to ensure we don't regress on those optimisations in the future.
  3. This prevents implicit copies from triggering the cross-thread access assertion.
  4. It prepares better ground for reference types and doing all handle management on Wasm side.

Note that reference counting and allocation optimisation for local values only applies to speed modes (-O1 and higher) and not to -Os where LLVM avoids inlining functions, but microbenchmarks show that even in situations where copies can't be eliminated, this approach is still faster:

benchmarks.cpp
#include <emscripten.h>
#include <emscripten/val.h>
#include <stdio.h>

using namespace emscripten;

EM_JS(void, sideEffect, (EM_VAL v), {
  globalThis.sideEffect = Emval.toValue(v);
});

template<typename Func> void measure(const char* name, Func&& func) {
  // preheat
  for (int i = 0; i < 1'000'000; i++) {
    sideEffect(func().as_handle());
  }
  double time = emscripten_get_now();
  // measure
  int count = 100'000'000;
  for (int i = 0; i < count; i++) {
    sideEffect(func().as_handle());
  }
  time = emscripten_get_now() - time;
  printf("%s: %d ns\n", name, (int)(1e6 * time / count));
}

int main() {
  measure("Allocation cost", []() { return val(42); });

  measure("Copying cost", [v = val(42)]() { return val(v); });
}

Before (timings and JS+Wasm byte sizes for different modes):

-Os:

Allocation cost: 61 ns
Copying cost: 20 ns
43164

-O1:

Allocation cost: 60 ns
Copying cost: 21 ns
82809

-O2:

Allocation cost: 62 ns
Copying cost: 23 ns
44955

After:

-Os:

Allocation cost: 68 ns
Copying cost: 11 ns
43022

-O1:

Allocation cost: 49 ns
Copying cost: 8 ns
82431

-O2:

Allocation cost: 46 ns
Copying cost: 8 ns
44746

The only small slowdown is for initial allocation & freeing in -Os mode, but it's typically less common than copying, and it's in exchange for a small code size win, which is, well, point of that mode anyway.

@RReverser RReverser changed the title Move reference counting to C++ [emval] Move reference counting to C++ Oct 12, 2023
This allows to completely optimise away the extra allocation / wrapper when values are local, and generally avoids crossing the JS<>Wasm boundary as often.
This reduces size of class val to a single pointer, and still gets optimised out as an allocation.
This allows to completely eliminate statements like `val v;` as compiler can statically prove it doesn't need _emval_free.
C++ doesn't allow elimination of new/delete pairs, but in C malloc/free pairs can be and are eliminated by LLVM.

In theory Clang's __builtin_operator_{new,delete} could also help here, but in my experiments it wasn't optimised out while malloc/free still were.

I don't think we care about calling into user's overrides in this class too much, and the optimisation is valuable enough, so just using standard C funcs.
@RReverser
Copy link
Collaborator Author

Marking as draft for now as there's one unresolved issue.

@RReverser
Copy link
Collaborator Author

Hm the remaining issues might be easier to solve after #20383 and another planned subsequent PR. I'll leave this as a draft for now.

@mrolig5267319
Copy link
Contributor

mrolig5267319 commented Feb 7, 2024

Have you considered using the same array approach on the C++ side for refcount as the JS-side? That is, having a single vector<val_metadata> with refcounts indexed by handle, such that most new handles won't require an allocation?

Ah, because in multi-threaded mode you could have handle collisions between threads? Could have an array per thread.

@RReverser
Copy link
Collaborator Author

Have you considered using the same array approach on the C++ side for refcount as the JS-side?

Yes, I played with it, but that prevents compiler optimisations that allow to completely eliminate allocation/free pairs, which are extremely valuable as most of Embind values are temporary. Compilers are happy to optimise malloc/free pairs because those are simple well-known APIs but they can't see through all the mechanics of something as complex as a vector (as it has its own list that can move around memory on reallocs).

'fromWireType': (handle) => {
var rv = Emval.toValue(handle);
__emval_decref(handle);
return rv;
},
'fromWireType': (handle) => Emval.toValue(handle),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm missing something obvious, but how do you get away with removing the incref/decref from the C++ toWireType and JS fromWireType? Won't the C++ destructor call free before the JS fromWireType reads Emval.toValue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants