New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very poor Rust/WASM performance vs JavaScript #1119
Comments
From a quick 15 second fly by of your setup (haven't looked at the code) I noticed two things that may or may not improve the numbers a tad.
Haven't looked at the code yet but first place that I'd look is that you aren't cloning a bunch of data. |
Also, I would try looking at your browser's devtools to see what's going on. |
Thanks for the report @psiphi75! (and the source to poke around!) I've done some poking around and it definitely looks like nothing obvious is missing (like I wasn't able to dig much farther though, I think the perf tools in Chrome/Firefox still have aways to go with wasm! In any case, can you detail a bit more about what the "each unit of work" function is on the JS/wasm implementations? I couldn't quite follow what it was and how JS differed itself. FWIW the profilers showed that very little time was spent in wasm itself, so at least that part is fast here! |
Thanks @chinedufn and @alexcrichton, I tried the optimisation and removing the Thanks for memory tip @alexcrichton, I replaced the following lines with a static Uint8ClampedArray buffer, and it shot up to 20 fps. workUnit.message.buffer = new Uint8ClampedArray(
wasm.memory.buffer,
cellsPtr,
constants.SQUARE_SIZE * constants.WIDTH * 4
); I'll see how I can optimise this part, and keep you posted. Yes, the Chrome dev tools a pretty limited for profiling, both for WASM and Web Workers. |
Oh nice! FWIW I've found that Firefox's perf.html addon is excellent for profiling, but it has a lot of information that isn't always easy to decipher. I was able to figure out that memmove/memcpy were taking up a lot of time for this example, but I couldn't figure out directly why that was being called or what else was slowing things down. |
Once you've got that committed/deployed as well I can try to help poking around some more! |
@alexcrichton, thanks. I'm investigating two options, the first option is the The other options is transferable message passing, I believe this could work well, but would require a bit of a refactor. |
Sounds reasonable to me! If you haven't seen it already we've actually got an example of a parallel raytracer, although it's using |
This has been fixed and was never an issue due to wasm-bindgen. It's now running at more than 27 fps in Firefox and around 20 fps in Chrome! The demo has been updated. I have to say I don't understand the reason, but doing a copy from In a nutshell my JavaScript code changed from: const cellsPtr = rt.render(workUnit.message.stripId);
workUnit.message.buffer = new Uint8ClampedArray(
wasm.memory.buffer,
cellsPtr,
constants.SQUARE_SIZE * constants.WIDTH * 4
);
self.postMessage(workUnit.toObject()); to: workUnit.message.buffer = new Uint8Array(constants.SQUARE_SIZE * constants.WIDTH * 4);
rt.render(workUnit.message.stripId, workUnit.message.buffer);
self.postMessage(workUnit.toObject(), [workUnit.message.buffer.buffer]); There are two aspects here, the main one I believe was creating the I believe a Thanks for your help. |
I'd bet that a lot of people will be poking around the issues looking for performance tips. Some potential different ideas:
|
Glad to hear @psiphi75! FWIW I still can't manage to get good wasm stacks in perf.html, but Chrome's developer tools report that the workers are spending 30% of their time in RayTracer::trace and another 30% in Object::intersect. That at least sounds like a plausible profile to me! It looks like a lot of events are happening in the workers rather than log contigurous blocks of work, so maybe a tweaked architecture with less messages between workers would help more? Sort of just shooting in the dark! @chinedufn I definitely agree! https://rustwasm.github.io/book/game-of-life/time-profiling.html and https://rustwasm.github.io/book/reference/time-profiling.html are hopeful to at least be a start to documentation, but expanding that and/or adding an FAQ here sounds great! |
Last night I demonstrated this to a few people and performance issue is caused due to the following line, self.postMessage(workUnit.toObject()); Apparently this serialises/deserialises the object when it's sent from the worker to the main thread. Hence, it's not related to wasm-bindgen. |
Yes, |
I've implemented the same ray tracing algorithm in JavaScript and Rust/WASM. The results are below:
--release
option)I'm using Web Workers, the tests are done using 8 workers. I have released the demo and the code.
I've reviewed this issue, but there is nothing outstanding.
Running the native Rust version on the console I get around 14.4 fps with only one thread. So in theory native performance should be able to reach around 57 fps. Hence WASM is running around 200 times slower than native.
UPDATE: The JavaScript version utilises around 85% each core on my 4 core (with hyperthreading) CPU. While the WASM version uses around 70 to 100% on one CPU and around 10 to 30% on the other CPUs.
Any ideas?
The text was updated successfully, but these errors were encountered: