Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An efficient way to pass TypedArrays with Embind #5519

Closed
zandaqo opened this issue Aug 25, 2017 · 15 comments
Closed

An efficient way to pass TypedArrays with Embind #5519

zandaqo opened this issue Aug 25, 2017 · 15 comments

Comments

@zandaqo
Copy link

zandaqo commented Aug 25, 2017

I'm trying to find a way to efficiently convert TypedArrays into std::vector (and vice versa) using Embind. I'm currently using vecFromJSArray but that's inefficient when compared to passing a pointer and directly accessing Emscripten heap as described here. However, playing with the heap is a bit cumbersome on the JS side, so I wonder if there is a better way to convert typed arrays. Maybe we can get a pointer to the begging of the TypedArray's buffer and use the length to find the rest and build a vector, something like this:

using namespace emscripten;

std::vector<double> vecFromJSTypedArray(const val& a) {
  return std::vector<double>(a["buffer"].as<?>(), a["buffer"].as<?>() + a["length"].as<unsigned>());
}

Not sure how to do it though, I'd appreciate any pointers (no pun intended) here.

@kripken kripken added the embind label Aug 25, 2017
@arcanis
Copy link

arcanis commented Sep 19, 2017

Maybe writing from the C++ side is a red herring? The C++ code cannot access the JS memory without copying it first, but the JS memory can work on both at once, even if it's a bit hacky (implied by the documentation tho, so it should be fine). Maybe this implementation would be faster:

C++ code

std::vector<double> vecFromJSTypedArray(emscripten::val const & a)
{
  std::vector<double> vec;
  vec.resize(a["length"].as<unsigned>());

  emscripten::val::global("copyToVector")(a, vec.data());

  return vector;
}

JS side:

// embind will transform the `vec.data()` pointer above into a typed array
// that references the memory section of the reserved vector. Writing into
// it will effectively write into our vector!

function copyToVector(source, destination) {
    source.copyWithin(destination, 0);
}

Basically, the copy is deferred to the copyWithin implementation, which doesn't need to cross the C++/JS boundaries for each element of the typed array, and will probably benefits a lot from being heavily optimized by the engines.

@zandaqo
Copy link
Author

zandaqo commented Sep 19, 2017

@arcanis Now that's clever! I'm having hard time compiling it though:

> emcc --bind -std=c++14 src/wasm.cpp -s WASM=1 -O3 -o wasm.js

src/wasm.cpp:25:31: error: no matching member function for call to 'call'
  val::global("copyToVector").call<void>(a, vec.data());
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~

I'm guessing I should declare "copyToVector" somewhere on the C++ side?

@arcanis
Copy link

arcanis commented Sep 19, 2017

Hm I wrote this from memory, I might have used the wrong API - try replacing the call with a regular functor invocation:

auto copyToVector = emscripten::val::global("copyToVector");
copyToVector(a, vec.data());

@zandaqo
Copy link
Author

zandaqo commented Sep 19, 2017

Now it doesn't like raw pointers:

emscripten/wire.h:90:13: error: 
      static_assert failed "Implicitly binding raw pointers is illegal. Specify
      allow_raw_pointer<arg<?>>"
            static_assert(!std::is_pointer<T*>::value, "Implicitly binding r...
            ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
src/wasm.cpp:16:30: note: in instantiation of function template specialization
      'emscripten::val::operator()<const emscripten::val &, double *>' requested
      here
  val::global("copyToVector")(a, vec.data());
                             ^

@ron99
Copy link

ron99 commented Sep 30, 2017

Here is an efficient way of doing this with c++ only:

template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
  unsigned int length = typedArray["length"].as<unsigned int>();
  val memory = val::module_property("buffer");
  
  vec.reserve(length);

  val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);
  
  memoryView.call<void>("set", typedArray);
}

You should consider resizing the vector (by calling resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions like size, begin, end, etc.).

@zandaqo
Copy link
Author

zandaqo commented Sep 30, 2017

@ron99 That works, thank you!

It is indeed more efficient according to my benchmark:

Simple Linear Regression:
   ...
   Web Assembly x 1,819 ops/sec ±1.22% (87 runs sampled)
   Web Assembly using TypedArrays x 34,110 ops/sec ±1.01% (90 runs sampled)

@ron99
Copy link

ron99 commented Sep 30, 2017

@zandaqo Great!

You can also use this on normal arrays (like in your first implementation) if you explicitly define which kind of TypedArray to use for coping, according to the type of the array's content (double in your case):

Simply replace typedArray["constructor"] with val::global("Float64Array") (and I suggest also renaming typedArray to arr or something :)

@zandaqo
Copy link
Author

zandaqo commented Sep 30, 2017

@ron99 Changed the code as you suggested, now both arrays and typed arrays perform similarly fast:

Simple Linear Regression:
   ...
   Web Assembly x 32,376 ops/sec ±1.03% (88 runs sampled)
   Web Assembly using TypedArrays x 34,593 ops/sec ±0.98% (87 runs sampled)
 Fastest is Native

This should be somehow added to embind's API, speeding up array conversions 16 times is no small feat.

@ron99
Copy link

ron99 commented Sep 30, 2017

@zandaqo I'll open a PR with changes to the vecFromJSArray function to use this implementation when dealing with native numeric types.

By the way, this is redundent since the memory is already reserved when you construct the vector with this size.

@stale
Copy link

stale bot commented Sep 19, 2019

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Sep 19, 2019
@stale stale bot closed this as completed Sep 26, 2019
@peteole
Copy link

peteole commented Feb 21, 2020

Here is an efficient way of doing this with c++ only:

template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
  unsigned int length = typedArray["length"].as<unsigned int>();
  val memory = val::module_property("buffer");
  
  vec.reserve(length);

  val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);
  
  memoryView.call<void>("set", typedArray);
}

You should consider resizing the vector (by calling resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions like size, begin, end, etc.).

I needed to change "val memory = val::module_property("buffer");" to
"
val heap = val::module_property("HEAPU8");
val memory = heap["buffer"];"
in order to make the code work.

@Lectem
Copy link
Contributor

Lectem commented May 6, 2020

Would it be possible to reopen this issue ?

I think it's an important use-case in many domains to be able to handle a lot of data coming from .js with a wasm function (I'm thinking for example statistics, graphs...).
If this could be done without copy it would be even nicer.

@sbc100
Copy link
Collaborator

sbc100 commented May 6, 2020

Since C/C++ programs can only really address a single memroy / typedarray there is not really any way to avoid the copy in most cases. And its not just C/C++, today all webassembly modules are limited to working with just a single memory.

The only way I can think of to avoid the copy is when the JS API can accept a view of part of the webassembly memory.

@Lectem
Copy link
Contributor

Lectem commented May 6, 2020

I see, then we should make sure the copy is as efficient as possible, I can start by doing a new PR with at least the vector::reserve added to vecFromJSArray ?
Then we should probably try what was suggested below

Here is an efficient way of doing this with c++ only:

template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
  unsigned int length = typedArray["length"].as<unsigned int>();
  val memory = val::module_property("buffer");
  
  vec.reserve(length);

  val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);
  
  memoryView.call<void>("set", typedArray);
}

You should consider resizing the vector (by calling resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions like size, begin, end, etc.).

I needed to change "val memory = val::module_property("buffer");" to
"
val heap = val::module_property("HEAPU8");
val memory = heap["buffer"];"
in order to make the code work.

@Lectem
Copy link
Contributor

Lectem commented May 6, 2020

Actually #5655 seems to suggest a better version:

template <typename T> std::vector<T> vecFromJSArray(const emscripten::val &v)
{
    std::vector<T> rv;

    const auto l = v["length"].as<unsigned>();
    rv.resize(l);

    emscripten::val memoryView{emscripten::typed_memory_view(l, rv.data())};
    memoryView.call<void>("set", v);

    return rv;
}

kripken added a commit that referenced this issue Oct 7, 2020
This is a followup of #5519 and #5655 since they were closed.

It is possible to implement emscripten::vecFromJSArray more efficiently for numeric
arrays by using the optimized TypedArray.prototype.set function.

The main issue with this method is that it will silently fail(or succeed) if elements of
the array or not numbers, as it does not do any type checking but instead works as
if it called the javascript Number() function for each element. (See ToNumber for more
details)

So instead of simply updating vecFromJSArray to use this new implementation and
break code (since there's no typechecking anymore) I added a new
convertJSArrayToNumberVector (name subject to change) and improved performance
a tiny bit for vecFromJSArray by:

*    Taking the val parameter by const reference instead of copy
*    Reserving the storage of the vector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants