New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recompile xzdec with Emscripten in WebAssembly (instead of asm.js) to improve performance #219

Closed
mossroy opened this Issue Apr 22, 2017 · 36 comments

Comments

Projects
None yet
4 participants
@mossroy
Member

mossroy commented Apr 22, 2017

Now that the main browsers support WebAssembly (see http://caniuse.com/#feat=wasm), it would certainly speed up the application to use it for xz decompression.
Emscripten also proposes a polyfill to allow running the code on non-WebAssembly-compliant browsers, using asm.js as a fallback.

The C source code is https://tukaani.org/xz/embedded.html , but I don't know how @peter-x compiled it with Emscripten

@mossroy

This comment has been minimized.

Member

mossroy commented Jan 14, 2018

Implementing this might be a first attempt to introduce the WebAssembly technology, before working on #116

@mossroy

This comment has been minimized.

Member

mossroy commented Jan 30, 2018

I investigated a bit on that topic.
I managed to compile xz-embedded with Emscripten, but it does not correspond to the APIs of our current xzdec.js. So it seems useless for now.

Here are the steps I followed to compile xz-embedded :

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jan 30, 2018

Is xz-embedded superior to xzdec? A quick look through the site doesn't clearly highlight the differences, except that the former was developed for the Linux kernel, and the latter for POSIX-like systems.

If you recompiled xzdec do you think the API might still be the same?

@mossroy

This comment has been minimized.

Member

mossroy commented Jan 31, 2018

I suppose xz-embedded is designed for low-spec devices (slow CPU, not a lot of RAM), but it's only a guess.
I tried to compile xzdec with emscripten but it seems more complicated (and I could not spend a lot of time on it)

@mossroy

This comment has been minimized.

Member

mossroy commented Feb 14, 2018

@peter-x managed to find the directory from where he generated the file xzdec.js : many thanks to him.
It is based on xz-embedded, but with some added code (that's why I did not find the same APIs).
There is a compile.sh script inside the directory, that should let us recompile it with a current version of emscripten (I did not have the time to try that yet) :
xze-nobin.tar.gz

Afterwards, I think we should put that code in the github repo.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Feb 15, 2018

That would be good. Once you've had a chance to try it out, it would be great if you could document any steps required to reproduce, so we don't get into the same situation in the future of not being able to control a vital technology at the core of the app.

@mossroy

This comment has been minimized.

Member

mossroy commented Feb 15, 2018

I fully agree

@mossroy

This comment has been minimized.

Member

mossroy commented Mar 13, 2018

I managed to recompile xzdec (still in asm.js for now) with the latest version of emscripten. See branch https://github.com/kiwix/kiwix-js/tree/emscripten-script-xzdec
It's working, even if there might be some work to do on memory management (emscripten complains that it can not apply all the optimizations as we allow the memory to grow, on the other hand it seems to need a lot of fixed-sized-memory to work on big ZIM files).
To compile, you simply need to install emsdk (see https://kripken.github.io/emscripten-site/docs/getting_started/downloads.html , the portable version is fine), then run the compile.sh script.

I'll test that on my Firefox OS devices, then try to compile into webassembly

@mossroy

This comment has been minimized.

Member

mossroy commented Mar 13, 2018

The new compiled version of xzdec (in asm.js) appears to be incompatible with IE11 (apparently because of some unsupported Math.* functions). I may also be incompatible with other old browsers, but is still compatible with the officially supported ones (see README.md). So I would say this is probably not a big issue. It does work on Firefox OS.

I managed to compile in webassembly format, and pushed some quick and dirty work on the branch.
It has a fallback to asm.js (handled by emscripten itself), which seems to work pretty well on Firefox ESR 52.6 (which does not support WebAssembly).

There's an issue on Firefox OS (because it tries to load the asm.js version dynamically, with a eval(), which is forbidden by CSP in a Firefox OS app). We probably don't have much Firefox OS users any more but it's still useful for me to test on an actual low-spec device. I'll try to find a workaround.

There is still an issue on Chromium : it uses the asm.js version, even if it is compatible with WebAssembly, with error message "Compile is disallowed on the main thread, if the buffer size is larger than 4KB. Use WebAssembly.compile, or compile on a worker thread.". I need to investigate.

This branch does not work inside a webextension for now, because of CSP too. Probably because the code is too dirty for now. I must try to overcome this issue to know if WebAssembly is supported in webextensions.

And unit tests are broken too : this is a test branch, not ready to be merged at all ;-)

@mossroy

This comment has been minimized.

Member

mossroy commented Mar 13, 2018

By running the same unit tests with or without webassembly, I could compare the backend performance with the 3 versions of xzdec. The timing is given by qunit, when running tests.html :

  • legacy asm.js version : around 1000ms to complete
  • newly compiled asm.js version (fallback of wasm) : around 1050ms to complete
  • newly compiled wasm version : 700ms to complete
    I did not compare the memory consumption, and did not try to fine-tune emscripten at all.

+30% with wasm : that's not bad!
Keep in mind that it's not a global speed gain, but only for the backend that reads the ZIM files. And wasm is not available everywhere.

Regarding the webextensions, wasm is available (at least in recent Firefox), and works well. But the fallback to asm.js fails because of CSP : as in Firefox OS, it relies on using eval() to run the asm.js javascript, which is forbidden by CSP.

So, instead of a hack for Firefox OS, we might need to find a generic and more compatible way to handle the fallback (by handling it ourself?)

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Mar 14, 2018

Very encouraging work @mossroy !
The only issue with making Kiwix JS incompatible with IE11 is that there are quite a lot of computers around the world that are still running Windows 7, which doesn't have Edge. Now of course such computers are likely to have an alternative browser available, or via simple download, and I guess anyone likely to install Kiwix on a PC is more likely to find and use the x86 version of Kiwix.
However, this Emscripten discussion thread has a discussion specifically about Wikipedia's need to support IE11 and Emscripten appears to have inserted a flag LEGACY_VM_SUPPORT for this use case, which supports older browsers and older node.js. It might be worth compiling with that flag in case it also helps with the FFOS issue, though I think it's specifically a Math.imul (etc.) polyfill. If it turns out that that flag has no negative effect on our target browsers, it might be worth keeping the support for now.

@mossroy

This comment has been minimized.

Member

mossroy commented Mar 14, 2018

Thanks a lot for this link : I'll try this flag when I find some time.
If it's that simple (and does not have a significant impact on performance, which is probable), we might keep compatibility with IE11 and other old browsers.
If handling the fallback ourself is not complicated, we might hopefully have an automatic switch between these 2 versions :

  • a wasm version for the browsers that support it
  • for the other ones, an asm.js version compiled with the necessary flags to keep compatibility with legacy browsers

@mossroy mossroy removed the help wanted label May 2, 2018

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 3, 2018

I've managed to compile a version of xzdec.js that runs in IE11 (and Edge, FF, Chromium of course) transparently, without having to handle anything via an XHR load in a separate loading script. It uses these parameters:

emcc -O3 -s WASM=1 -s "BINARYEN_METHOD='native-wasm'" --memory-init-file 0 -s 
TOTAL_MEMORY=83886080 -s ALLOW_MEMORY_GROWTH=1 -s LEGACY_VM_SUPPORT=1 -s 
EXPORTED_FUNCTIONS="['_init', '_init_decompression', '_input_empty', '_get_in_buffer', '_set_new_input', 
'_decompress', '_get_out_pos', '_get_out_buffer', '_out_buffer_cleared', '_release']" -DXZ_USE_CRC64=1 
-DXZ_INTERNAL_CRC64=1 *.c -o xzdec.js

It manages the fallback itself inside xzdec.js, and doesn't need a separate memory initialization file. It's significantly faster in IE than than the old xzdec. It runs in webasm in Edge. I can't really tell what it's running in with Firefox Quantum and Chromium, because they're so fast anyway -- the reason I say that is because there doesn't seem to be a difference in speed between Firefox ESR (which doesn't support webasm) and Quantum (which does). They're both lightening fast with this build. I couldn't really expect faster.

There is just one downside. In FFOS simulator, Kiwix JS won't even load into the simulator. It bounces back with the error below. It can't be a manifest issue because in this version we're not loading the extra scripts. We can easily work around this by running a test on startup and attaching the old version or else a version compiled specifically for FFOS (if that's possible). We could do this as part of the existing FFOS detection. I'll push this version to a different branch for testing.

image

EDIT: see commit 7fb9990

Jaifroid added a commit that referenced this issue Jun 3, 2018

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 3, 2018

@mossroy Good news -- the above error on FFOS was a transitory error with my system due to a file lock. The build in the commit New compile supports IE11 #219 also seems to support FFOS. Please try it out and confirm (or not).

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 3, 2018

I've rebuilt with a slightly modified build line, not specifying TOTAL_MEMORY as it uses its defaults at start-upt time and grows as needed. I experimented with the -Os building strategy, but found it more sluggish than -O2 or -O3. I've settled on -O3 as it is supposed to be slightly more optimized than -O2 but take longer build time (but it's pretty quick with either setting, so I guess that only applies to large projects). Here's the line:

emcc -O3 -s WASM=1 --memory-init-file 0 -s ALLOW_MEMORY_GROWTH=1 -s 
LEGACY_VM_SUPPORT=1 -s EXPORTED_FUNCTIONS=
"['_init', '_init_decompression', '_input_empty', '_get_in_buffer', '_set_new_input', 
'_decompress', '_get_out_pos', '_get_out_buffer', '_out_buffer_cleared', '_release']" 
-DXZ_USE_CRC64=1 -DXZ_INTERNAL_CRC64=1 *.c -o xzdec.js

I'll push a commit to the "New compile" branch a little later. It's virtually identical.

EDIT: New version is in commit 81fba7b

@mossroy

This comment has been minimized.

Member

mossroy commented Jun 3, 2018

Cool, that's very promising!

But, in your branch https://github.com/kiwix/kiwix-js/tree/New-compile-of-xzdec, I see no .wasm file : is it because you forgot to commit it, or because you're testing an asm.js-only version?

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 3, 2018

I didn't forget. It's because, from what I can tell, that configuration wraps everything into a single file which then gets read into memory. The XHR functions and so on are all inside xdec.js (you can see them near the top of the file if you inspect xzdec.js or search for XHR). Using that sequence of commands does not output a separate wasm file. This confused me, but seems consistent with the documentation. I'll see if I can find the relevant bit again. EDIT: It was this bit:

Emscripten will emit WebAssembly by default. You can switch that off with -s WASM=0 (and then emscripten emits asm.js).

Note that my main aim here was to test the flag LEGACY_VM_SUPPORT=1 , so that we can continue to support IE11.

The speed-up with this version of xzdec is very noticeable. However, testing on a low-spec test device I have, it is more demanding of memory than the previous compile. Decompressing one or two compressed files (i.e., text, not images, which do not need to be decompressed by xzdec) is OK on a low-spec device, but this configuration returns an exception (or it can be configured to return null) if we send six CSS all at once to the decompressor. Sending them serially is OK. On a PC, this isn't an issue.

To return null instead of an exception, we have to compile with ABORTING_MALLOC=0, but it is only effective if we set TOTAL_MEMORY and do not use the ALLOW_MEMORY_GROWTH option.

It's never easy. Sigh.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 3, 2018

Ah! Yes, it seems like that "Emscripten will emit WebAssembly by default" isn't quite right. The configuration I used is a build configuration for testing, but not a production configuration. For production, the BINARYEN options need to be specified. It'll still wrap all the JavaScript and asm code in a single file, but from there will either execute wasm or asm. I'll try to add that.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 3, 2018

OK, I've discovered the issue. Guess what -- the option LEGACY_VM_SUPPORT=1 prevents the compilation of the wasm binary, but there is no message provided at compile time to explain this! As soon as I remove that option, I get a wasm produced. Nevertheless, the speedup I experienced in what turns out to be a more modern/optimized version of asm is very real. Indistinguishable from the initial tests I made of your build, @mossroy , at least in my perception. I'll push a wasm-enabled commit in this configuration for testing the fallback, but clearly IE11 won't be supported in that case.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 4, 2018

I've left a comment on that bug with legacy vm support in kripken/emscripten#5837 . It's looking like the only way we can currently get what we want is to compile twice, once the version I did earlier (or similar) and once with the wasm binary and no fallback support. We then load the correct script as soon as we detect the browser's capability. This will be a lot cleaner for all our edge cases, including FFOS, even if it's a shame not to have a clean build pipe. We'll see if my report about this gets us a more generic solution.

I've got a very busy couple of weeks ahead and don't think I can return to this for a while. I've pushed commit 039ec33 which ONLY supports wasm-capable browsers. There are two commits on that branch, the first having legacy support with a fast asm, and the second wasm only. there are still memory issues to debug on legacy devices, and of course knitting two such builds together.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 4, 2018

Quick note: looks like we'll need to add -s NO_DYNAMIC_EXECUTION=1 in the compiler. See https://github.com/kripken/emscripten/blob/master/src/settings.js#L658.

@mossroy

This comment has been minimized.

Member

mossroy commented Jun 4, 2018

Thanks for all this research, I'll try to sum up :
The -s NO_DYNAMIC_EXECUTION=1 parameter should allow us to use the built-in fallback mechanism generated by emscripten (instead of implementing our own). BUT the LEGACY_VM_SUPPORT=1 parameter (needed for IE11) is not compatible with this hybrid generation. So, in the end, if we want to carry on supporting IE11, we would have no other choice to handle the fallback ourself, and compile twice : once for old browsers (using asm.js) and once for recent browsers (using wasm). Nevertheless, it might be worth looking at how emscripten handles the fallback : we might try to use the same approach.

Memory consumption is something we'll have to care about, at least for old browsers (because they often come with old hardware)

@kelson42 : do you have an opinion about the support of IE11 for kiwix-js? It currently works (even if it's slow), but I don't know if it's really used by anyone. In the README, we say that this browser is unofficially supported, and that its support might be removed in the future if it's becoming too complicated.

@kelson42

This comment has been minimized.

Contributor

kelson42 commented Jun 4, 2018

@mossroy The overall market share of IE11 seem really low, but we might have a few key devices using it? I can not judge this. Maybe worth a try to stop supporting it!

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 4, 2018

AIUI the NO_DYNAMIC_EXECUTION enables a method of loading the asm (not clear if also the wasm) that respects the CSP. It exists specifically to allow use of Emscripten in browser extensions, apps, etc. The issue with IE is that Emscripten removed the polyfill that fills a couple of Math functions not available on IE11. I found the polyfill in their code -- it's very short (six lines of JavaScript or so) so we could probably include it.

However, I've had a couple of replies to my query about this issue here: kripken/emscripten#5837 . To extract the salient bit (from kripken himself):

Yeah, a dual build is best for this, see https://kripken.github.io/emscripten-site/docs/compiling/WebAssembly.html#codegen-effects - a single build with fallbacks has big compromises.
Perhaps it makes sense to allow legacy support in a dual build, yeah... In other words, in a dual build it would not force wasm off. That seems reasonable, although I worry about the extra complexity somewhat.

On IE support, Kiwix touts its potential in places like Cuba, where we're very likely to fiind old systems and hardware, so I'm not sure overall usage, skewed towards rich countries, is the best guage. Anyway, let's see whether the fallback method will work at all with our other systems / apps / extensions. We may yet have to dual build, and kripken implies that actually it's preferale to dual build. NB this does not mean we have to provide two different apps. We provide the one app, but load asm or wasm ourselves according to capabilities. Also "dual build" sounds like something we'd have to do every time we make a new release, but in reality, this is the first time in 4? years that xzdec has been rebuilt for Kiwix?

@mossroy

This comment has been minimized.

Member

mossroy commented Jun 4, 2018

I agree with you that it's worth trying to keep IE11 support.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 4, 2018

Regarding memory management, I've pushed an experimental commit 4f3c452 that allows us to batch decompression jobs. This is just one way of doing it, and is for experimentation, because on lower-memory devices the new compiles of xzdec crash the app when trying to view mobile-style Wikimedia ZIMs that have up to 6 stylesheets.

A consequence of this is that this batch control at last allows us to decode SVGs, fixing #278 and #308. So, I've fixed the display of SVGs in Service Worker and jQuery modes in commit a967882 (which should probably be submitted as a PR on master). In any case, with the New-compile-of-xzdec branch, you can now open "Sine" and it works through the 100s of equations very fast without crashing the decompressor! If you don't have the full English Wikipedia to try this on (in a WASM browser) you can use the article "Wilson-Cowan model" from WikiMed. Note that the mathematics ZIM from Wikipedia uses PNGs, so is not a good test.

These two commits can easily be applied to an ASM build as well. They work with our old xzdec too.

@mossroy

This comment has been minimized.

Member

mossroy commented Jun 5, 2018

Thanks a lot @Jaifroid!
I also noticed that recent wikipedia ZIM files crash kiwix-js 2.3.0 on my Firefox OS devices that only have 512MB (it works on devices with 1GB).
It would be better to separate the "batch control" of decompression in a separate PR because it could be tested independently, and merged before wasm support

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 5, 2018

OK, I created issue #389 to separate out the batching issue.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Jun 6, 2018

@mossroy , I'm wondering if proper asm might be "fast enough" for us, instead of doing a dual wasm/asm setup.

In the New-compile-of-xzdec branch I've uploaded an optimized asm-only xzdec with unnecessary things removed and a fixed memory size. It's "nearly" as fast as wasm in an asm-capable browser running without debugging. Our previous compiles have been "almost-asm" because of the memory growth. But if we only decompress one-at-a-time, we don't need memory growth, and it makes quite a difference.

This compile uses 5 x base memory (base memory is the 16MiB minimum, and TOTAL_MEMORY must be multiples of 16MiB). At 4 x 16MiB xzdec crashes with one decode, so 5 x is the smallest footprint we can have.

Have a go and see what you think. For reference, the compile string I used is below.

emcc --memory-init-file 0 -O3 -s WASM=0 -s MALLOC="emmalloc" 
-s TOTAL_MEMORY=83886080 -s NO_FILESYSTEM=1 
-s AGGRESSIVE_VARIABLE_ELIMINATION=1 -s DOUBLE_MODE=0 -s PRECISE_I64_MATH=0 
-s NO_DYNAMIC_EXECUTION=1 -s LEGACY_VM_SUPPORT=1 
-s EXPORTED_FUNCTIONS="['_init', '_init_decompression',  '_input_empty',  '_get_in_buffer',
 '_set_new_input',  '_decompress',  '_get_out_pos',  '_get_out_buffer', '_out_buffer_cleared',
 '_release']" -DXZ_USE_CRC64=1 -DXZ_INTERNAL_CRC64=1 *.c -o xzdec.js
@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Sep 16, 2018

@mossroy To get the best possible user experience with new compiles such as #410 , we may need to implement a basic queue to ensure that first paint occurs before heavy decompression of SVGs begins. Although not specific to new compiles of xzdec.js, the problem becomes more urgent because Kiwix JS now has the capability of displaying pages with lots of equations coded as SVGs. On all browsers, with at least some pages that have both a new stylesheet (not already cached) and lots of SVGs, first paint is occurring after all the SVGs have been extracted, in both jQuery and Service Worker modes. This is not ideal even with the very fast decompression that asm.js allows.

To reproduce, load wikipedia_en_maths_novid_2018-06.zim with the new compile in https://github.com/kiwix/kiwix-js/tree/xzdec-compiled-with-full-asm, then load a simple article indexed on the front page, e.g. "Equation", to cache the common stylesheets, then search for the article "Trigonometric Functions". This will (in jQuery mode) display the message "Caching styles to speed up future page display..." because this page has an extra stylesheet, and will take some time to load (~10s), but when first paint happens, all the equations will already be extracted. Now type "Trigonometric functions" again into the search box, and go to the same page again. This time, because the page's special stylesheet is now cached, the first paint happens very quickly, and SVGs are decompressed after first paint. The latter is the desired behaviour for pages with lots of SVGs, since they are processor-intensive and should not delay first paint.

I realize this is a separate issue, and we have #149 and #288 for this, but I wanted to flag it here, as the capability of decompressing SVG-heavy pages now in a reasonable timeframe without crashes introduces a new factor to consider, and #410 highlights the issue, even though it existed with previous builds.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Sep 16, 2018

Just a reminder to myself that the reason we don't render until CSS is fulfilled is because of the huge Stackexchange ZIM and any other pages where CSS redraws severely impact page load times. See #386 . This makes it all the more important that we fulfill CSS before extracting images IMHO.

@Jaifroid

This comment has been minimized.

Collaborator

Jaifroid commented Sep 16, 2018

To address the above issue in jQuery mode (unfortunately not in Service Worker as yet), I have added some code for a simple, extensible queue for jQuery assets extraction functions here:

https://github.com/kiwix/kiwix-js/tree/Add-simple-jQuery-queue

This is built on top of #410 (Asm.js compile) for the purposes of comparison. In my view it makes a big difference for the end user if we do some queuing like this.

This code puts the jQuery extraction functions on a simple queue so that they only get called when, for example, cssCount >= cssExtracted (the variables to compare are defined by the calling function). What it doesn't do (yet) is prioritize extraction of PNG or JPEG images before SVG (could be done by defining a separate image extraction routine for SVGs).

But obviously we need to think about whether something similar could (or should) be done for Service Worker mode.

@mossroy Is this worth a PR? I know you've been keen to avoid complicating Service Worker mode by adding any queuing or prioritizing. Would you be happy for this to be jQuery only?

EDIT1: The commit that adds the queuing functions is f018b3f .

EDIT2: Just pushed a more efficient version of the queue: 4f3d73e . Does it still work with the huge Stackexchange ZIM? I think I may have deleted it...

@mossroy

This comment has been minimized.

Member

mossroy commented Sep 17, 2018

I quickly tested your queuing branch on askubuntu.com_en_all_2017-06.zim : it seems to work fine.

@mossroy

This comment has been minimized.

Member

mossroy commented Sep 17, 2018

A few tests with the LEGACY_VM_SUPPORT parameter :
I've modified the unit tests to make it read the Ray Charles HTML content 100 times
With current master (with LEGACY_VM_SUPPORT=1), it takes 20101ms
Without legacy support (LEGACY_VM_SUPPORT=0), it takes 19950ms

Conclusion : there's no significant performance difference, so it's no use dropping old browsers support.
The xzdec.js file is slightly smaller without legacy VM support : 55902 bytes instead of 57485. But it's nothing compared to the 201588 bytes we had before recompiling.

@mossroy

This comment has been minimized.

Member

mossroy commented Sep 17, 2018

I tested with WASM version (with ALLOW_MEMORY_GROWTH=1 instead of giving a TOTAL_MEMORY), in https://github.com/kiwix/kiwix-js/tree/wasm_version_of_xzdec branch : it takes 20367ms.
So wasm has no significant impact on performance in this use-case.
Maybe it can consume less memory, I did not measure. And the files are a bit smaller : xzdec.js + xzdec.wasm = 40670 bytes instead of 55902.

For now, I agree with @Jaifroid that it's not worth using the WASM version of xzdec, except if we find another good reason.
So I'll close this issue, while keeping the branch in case we want to go back to it.

Regarding the queuing optimization, could you please create a separate issue? And yes, it might be worth a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment