Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A WebAssembly version of HarfBuzz #10

Closed
photopea opened this issue Mar 28, 2019 · 69 comments · Fixed by harfbuzz/harfbuzz#1636
Closed

A WebAssembly version of HarfBuzz #10

photopea opened this issue Mar 28, 2019 · 69 comments · Fixed by harfbuzz/harfbuzz#1636
Assignees

Comments

@photopea
Copy link

Hi guys, I am developing a free web-based photo editor www.Photopea.com , which is used by around 100 000 people a day. It lets people do image editing, including inserting text into a picture.

As there is no sufficient OpenType parser and layout engine in Javascript, I made my own called Typr.js. It is quite advanced and can handle e.g. Arabic text. I also use this JS implementation of BIDI algorithm.

As more and more people use Photopea, I have to extend Typr.js . Currently, I am adding the support for Urdu and Khmer layout. I am often staring at OpenType specification for 5 - 10 hours, without writing a single line of code, only trying to understand what they mean. I would be more than happy to drop Typr.js and use an alternative, if there was any.

Would you be able to provide a WebAssembly version of your library to the public, while documenting it and maintaining it? I am ready to pay 5k - 10k USD for it. It is also important, that the library is not too large (e.g. 150-200kB zipped), as every person has to download it when starting Photopea.

@ebraminio
Copy link
Contributor

ebraminio commented Mar 28, 2019

Exactly what I am thinking about everyday! Well harfbuzz, despite doing a complicated thing itself, has a simple core API itself and the only thing matters about it is hb_shape(). Here is an attempt for it https://github.com/prezi/harfbuzz-js and also mine is here also harfbuzz/harfbuzz#743 . You see even https://github.com/emscripten-ports/harfbuzz is empty as the support I've added to emscripten is using just our thing, the only thing remains however having a clean looking js library port, something I'm very interested to do but the trick is do it clean as possible so can merged upstream.

@ebraminio ebraminio self-assigned this Mar 28, 2019
@ebraminio
Copy link
Contributor

ebraminio commented Mar 28, 2019

Assigning it to myself to see what happens, maybe we can have the wasm distribution in a separate repo in github.com/harfbuzz not if in harfbuzz repo itself.

@ebraminio
Copy link
Contributor

So lets define some goal here, I think as I've put the support in Typr.js photopea/Typr.js#28 already what we can do here eventually to have a cleaned up version of #1636 (just a html or js demo of how to use harfbuzz in browser or nodejs, without build results). We can then decide if we like to put harfbuzz in an npm package or provide .d.ts typescript definition documentation (automated generated) later, or, refer users to Typr.js anyway as a sample use.

@photopea
Copy link
Author

I would be very happy, if we could make some progress in terms of WASM file size.

You are compiling it through a current version of Emscripten, right? The conversion is done through LLVM commands as intermediate state. Is it possible to convert C to WASM directly using other tools, that would provide smaller WASM?

I think my use case would probably the biggest use case of WASM version of HarfBuzz, as there will be hundreds of thousands of people downloading it as a part of the webpage every day :)

@ebraminio
Copy link
Contributor

ebraminio commented Mar 31, 2019

I went for building the library without emscripten before, even the fact that may work (but you should provide libc for the library somehow) emscripten itself incorporates good practices from what I can see.

We can reduce the current binary size by compiling harfbuzz without bulitin ucdn and Unicode function, 710kb -> 599kb (zipped, 214kb to 164kb) but that costs in correctness of shaping.

Other things may lead to some other reduction, disabling multithread, removing the not used APIs but considering binary size of HarfBuzz on Debian for example https://packages.debian.org/sid/libharfbuzz-bin (800kb which is compressed alo) I don't believe we can go for less than 150-100kb compressed :)

@ebraminio
Copy link
Contributor

ebraminio commented Mar 31, 2019

Applying all the mentioned things, it has become 479.9kb (compressed, 127.0kb) but I'd say 214kb is good also considering the correctness and completeness

@behdad
Copy link
Member

behdad commented Apr 1, 2019

Leaving out UCDN is a nonstarter.

As it happens I'm going to work on minimizing HarfBuzz for other uses. So I'll be working on this. Would be great to 1. have a streamlined way to build .wasm, and 2. a major user.

My current plans are: 1. better compressor for UCDN and other tables (based on packtab.c in fribidi repo), and 2. easy way to disable periphery API / legacy features (like Arabic fallback shaping).

@ebraminio
Copy link
Contributor

ebraminio commented Apr 1, 2019

Classic case of:

image

from https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/

Leaving out UCDN is a nonstarter.

Yes, as said.

  1. have a streamlined way to build .wasm,

It is using cmake and emscripten #1636 and it is super easy to use and it doesn't make trouble for autotools development.

  1. a major user.

photopea/Typr.js#28

Recently there was a huge hype around WASI also https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/

@photopea
Copy link
Author

photopea commented Apr 7, 2019

I would like to thank you for this amazing library, and I am informing you, that it is currently used at www.Photopea.com by thousands of users every day :)

As you open Photopea, 1.8 MB of data is downloaded (out of this, 250 kB is Harfbuzz, 90 kB are all 104 icons - 160x160px, 60 kB is a font database, 130 kB are localizations in 36 languages). Of course, everything is compressed during the transfer.

I wish you could make Harfbuzz smaller, but I don't understand it well enough to be able to give you any advice. I already started a discussion about making the Emscripten JS file smaller: emscripten-core/emscripten#8409

@behdad
Copy link
Member

behdad commented Apr 8, 2019

Oooh, you did integrated it in Photopea already!? That's amazing!

I'm working on making a mini version of HarfBuzz in harfbuzz/harfbuzz#1652

@ebraminio
Copy link
Contributor

ebraminio commented Apr 8, 2019

Reduced from 597222 to 558072 by removing CFF and to 523772 by removing AAT. My changes are on #1636 which only has this set of APIs '_hb_version_string', '_malloc', '_hb_blob_create', '_hb_face_create', '_hb_font_create', '_hb_buffer_create', '_hb_buffer_add_utf8', '_hb_buffer_guess_segment_properties', '_hb_buffer_set_direction', '_hb_shape', '_hb_buffer_serialize_glyphs', '_hb_buffer_get_length', '_hb_buffer_serialize_glyphs', '_hb_buffer_destroy', '_hb_font_destroy', '_hb_face_destroy', '_free' and compressed using closure and seems to work here!
harfbuzzjs-closure-no-cff-aat2.zip, 170kb zipped, from 247kb

@photopea
Copy link
Author

photopea commented Apr 8, 2019

Yes, I did integrate it :)

I got very excited about minifying Photopea today. Maybe it could inspire you :D

There is a font database - a large JSON file, that Photopea loads every time. There are 4290 fonts. For each font, there are four strings: Family name, Subfamily name, Postscript name, Font URL. Also, a Font Category, and flags with supported scripts. This file was 451 kB and 57 kB ZIPped.

I made some hacks in my JSON representation (e.g. an empty PostScript name means, that the PostScript name is a concatenation of Family and Subfamily). I turned that JSON into 135 kB and 29 kB ZIPped - less than 7 bytes per font :D

@behdad
Copy link
Member

behdad commented Apr 8, 2019

Nice!

Want to show us your HarfBuzz integration glue? I'm afraid you also need a Unicode Bidirectional Algorithm implementation for full correctness.

@photopea
Copy link
Author

photopea commented Apr 8, 2019

I am using the Javascript implementation of BIDI algorithm, that I mentioned at the beginning. I added bidirectional support about two years ago. What glue do you mean?

@behdad
Copy link
Member

behdad commented Apr 8, 2019

Oh right. Sounds good.

The code calling into HarfBuzz I meant. Okay, so you probably just missing script-run itemization.

@behdad
Copy link
Member

behdad commented Apr 8, 2019

Ie. mixed-script text will currently be broken.

@photopea
Copy link
Author

photopea commented Apr 8, 2019

@ebraminio Could you give me an example of how to use your latest code? it seems like there is no _hb_blob_destroy .

@behdad What is a mixed-script? I call HarfBuzz separately on intervals of text, which share the same direction and font (in Photopea, each character can have a different font). I would like to encourage you to go to www.Photopea.com and try it out.

@brawer
Copy link

brawer commented Apr 9, 2019

To render text, every browser already contains a shaping engine; if it was accessible from JavaScript, “download size” would be zero. At some point, there was talk about adding a text shaping API to the JavaScript core libraries, similar to the ICU wrapper in ECMA-402. Does anyone know what happened to that plan? Obviously it’d take a while to bring it through, but browsers have eventually adopted the Intl API. (As far as I can see, the main missing piece would be to find someone who can write a good API proposal for ECMA. That person would need to understand text rendering, have good JavaScript fu, and be patient enough to survive the standardization process.)

@brawer
Copy link

brawer commented Apr 9, 2019

Re. mixed-script, there’s a proposal for adding an Intl.Segmenter to JavaScript. But currently, the proposal is only about breaking graphemes, words and sentences (exposing ICU break iterators), not script runs.

@brawer
Copy link

brawer commented Apr 9, 2019

@photopea For correct rendering, you’ll need to do split the input text into script runs before calling HarfBuzz, but it’s more complicated. Perhaps you could follow the logic of Raqm; the script itemization code is in raqm_itemize.

@ebraminio
Copy link
Contributor

it seems like there is no _hb_blob_destroy.

Ah, I've missed adding that call, here is the new version, harfbuzzjs-closure-no-cff-aat2.zip

@photopea
Copy link
Author

photopea commented Apr 9, 2019

@ebraminio great, thank you! I just updated and it is much smaller indeed :) BTW. is that "harfbuzzjs.js" a direct output from Emscripten, or you minifed it somehow? Do you think there is a space for minifying that JS even further? Could you write a comment on emscripten-core/emscripten#8409 ?

@ebraminio
Copy link
Contributor

ebraminio commented Apr 9, 2019

Yes I used closure using the flag mentioned in that file actually, you can use Google Closure in rest of your project also and surprise yourself!

@photopea
Copy link
Author

photopea commented Apr 9, 2019

@ebraminio I did use Closure Compiler several years ago, but it turned out to be very slow, so I made my own, which does the same thing, but it is about 50x faster.

But they can make modifications on Emscripten side, that would make the Closure Compiler result even smaller.

@ebraminio
Copy link
Contributor

But they can make modifications on Emscripten side

Interesting, I never thought of that!

@behdad
Copy link
Member

behdad commented Apr 9, 2019

@behdad What is a mixed-script?

Say, you have Hindi and English mixed in the same string.

I would like to encourage you to go to www.Photopea.com and try it out.

I did already. :)

@photopea
Copy link
Author

photopea commented Apr 9, 2019

@behdad I understand, that to use GPOS and GSUB tables, you need to know a script, which will lead you to a set of features and lookups, that should be applied to the text. In my library Typr.js, I used to loop through all features and apply all referenced lookups :D (each lookup at most once).

I use PSD format for storing files, which does not store any information about the script. Also, users are not used to enter a script when entering a text. So my goal is to display it like the web browser would display it (you also don't enter the script in HTML, you can only enter the base direction of the paragraph).

@behdad
Copy link
Member

behdad commented Apr 9, 2019

I use PSD format for storing files, which does not store any information about the script. Also, users are not used to enter a script when entering a text. So my goal is to display it like the web browser would display it (you also don't enter the script in HTML, you can only enter the base direction of the paragraph).

Exactly. Web browsers as well as any other complete text rendering system internally break the text down into "script runs" automatically and shape each one separately.

@ebraminio
Copy link
Contributor

Some bug reporting magic will be useful for here I guess,

Steps to reproduce:

  1. Create a text holder in Photopea
  2. Set FreeSerif font for it
  3. Put "ދިسسی" on it

Actual:
image

Expected:
The thing you see on the browser

image

What happened?
HarfBuzz has determined an incorrect script so rest of the thing went wrong.

Solution:
Segmenting the text by scripts before passing it to HarfBuzz

@photopea
Copy link
Author

How do I send money to Behdad? Does he have a bank account in the US or EU?

Thanks for creating all these npm / wasm package manager distribution channels. I do not use them, but I think many will use them. Personally, I would prefer if you invested the effort into Harfbuzz itself, to make it smaller / faster / more robust. Even though it is already quite small and fast for my needs :)

@ebraminio
Copy link
Contributor

Personally, I would prefer if you invested the effort into Harfbuzz itself

It is already an squeeze from a 1.9Mb .wasm file (540kb zipped) to 536kb (159kb zipped, your original goal I think) using different techniques we've incorporated but there is of course room for more.

@behdad
Copy link
Member

behdad commented Apr 24, 2019

How do I send money to Behdad? Does he have a bank account in the US or EU?

I have US accounts, yes. You can email me@behdad.org. Thanks for your generous offer!

Thanks for creating all these npm / wasm package manager distribution channels. I do not use them, but I think many will use them. Personally, I would prefer if you invested the effort into Harfbuzz itself, to make it smaller / faster / more robust. Even though it is already quite small and fast for my needs :)

I'm still working on that in harfbuzz/harfbuzz#1652

For example, I'm shrinking UCDN from over 100kb to about 30kb. My changes will make it to master soon.

@photopea
Copy link
Author

photopea commented May 7, 2019

May I have one more question? Does Harfbuzz support TTC files (Font Collections) ? A Font collection is basically several TTF files concatenated, with a list of offsets to each file at the beginning. They can also share some tables with each other (by sharing offsets to those tables).

When I load a whole TTC file to HarfBuzz, where do I specify, which font should be used for shaping?

@ebraminio
Copy link
Contributor

Oh it does, you have to put the index you like instead 0 on module._hb_face_create(blob, 0);, there is a hb_face_count also but not available in your build, you can go without it but let me know if you want it.

@behdad
Copy link
Member

behdad commented May 7, 2019

That's the integer index passed to hb_face_create().

@photopea
Copy link
Author

photopea commented May 7, 2019

Wow, great, it works perfectly, thanks! :)

@ebraminio
Copy link
Contributor

ebraminio commented May 12, 2019

New build using Behdad's HB_TINY, only 440kb of .wasm
harfbuzzjs.zip
Not tested personally only to report but feel free to use if works there

@photopea
Copy link
Author

I updated it, works perfectly, thanks! :)

@ebraminio
Copy link
Contributor

ebraminio commented May 24, 2019

Using new works on HarfBuzz it turned from 440kb to 421kb and after the very recent Behdad's works it is turned into 371kb, and after enabling --llvm-lto 1 and the removal of unnecessary strings it has turned to 278kb!

harfbuzzjs-lto.zip

But apparently I can't make it work and the previous version even doesn't work here, please make sure if it working there before updating it

@photopea
Copy link
Author

@ebraminio The new size is incredible! However, for me, it returns an empty array of glyphs :(

@ebraminio
Copy link
Contributor

ebraminio commented May 24, 2019

Now a working version with 2.5.0 release which I can confirm works here also! https://harfbuzz.github.io/harfbuzzjs/ (feel free to pick the wasm from that page even or from below)

harfbuzzjs.zip

its is only 98kb of zipped wasm, exceeding your goal :)

@photopea
Copy link
Author

I just put it online, works great, thanks! :)

@ebraminio
Copy link
Contributor

ebraminio commented Jun 22, 2019

With another round of improvements by Behdad we've reached to 246kb from 280kb and with the removal of hb_serialize, which is necessary for further works, it goes down to 236kb!

harfbuzzjs.zip

The needed change from your side is to use this instead current serializer, I tested it here https://harfbuzz.github.io/harfbuzzjs/ and seems to work fine here,

    var length = module._hb_buffer_get_length(buffer);
    var result = [];
    var infosPtr32 = module._hb_buffer_get_glyph_infos(buffer, 0) / 4;
    var positionsPtr32 = module._hb_buffer_get_glyph_positions(buffer, 0) / 4;
    var infos = module.HEAPU32.slice(infosPtr32, infosPtr32 + 5 * length);
    var positions = module.HEAP32.slice(positionsPtr32, positionsPtr32 + 5 * length);
    for (var i = 0; i < length; ++i) {
      result.push({
        g: infos[i * 5 + 0],
        cl: infos[i * 5 + 2],
        ax: positions[i * 5 + 0],
        ay: positions[i * 5 + 1],
        dx: positions[i * 5 + 2],
        dy: positions[i * 5 + 3]
      });
    }

This is essential as the next round of works I am working on is about removing emscripten and the glue code (that 10kb js code) which works with a trimmed down libc which now I have a working demo of it here https://harfbuzz.github.io/harfbuzzjs/ng/hb.html

@photopea
Copy link
Author

Hi! Currently, when you open Photopea.com , 1.4 MB is loaded (the whole program). HarfBuzzjs.wasm, which is extra 111 kB (as it is GZIPped), is loaded only if the text tool is used (so we don't load it every time as in the past).

I am alerady quite happy with the progress you have made, and if you plan to keep going, I will wait for the next version :)

I wish all developers cared about the size of their programs at least half as much as you do :)

@ebraminio
Copy link
Contributor

ebraminio commented Jun 24, 2019

@photopea, great :) I would say this version worth to be integrated now as the next will have radical changes and we may don't release that soon or ever (as that needs we compile our owned malloc/calloc/realloc/free, which may gets some little time to correctly figured out).

An advantage to the next version is you can compile harfbuzz .wasm by yourself just by downloading llvm installer and it works even in Windows also, current llvm releases http://releases.llvm.org/download.html#8.0.0 which provide an installer for Windows, support compiling and linking wasm32 files. You may like to port some of the other codes you've written for the rest of your app to reduce their size with it, it is super easy https://dassur.ma/things/c-to-webassembly/ and doesn't need may complicated setup emscripten and Google Closure have but for now our emscripten builds are only considered stable (even the fact we have the ng builds now working here https://harfbuzz.github.io/harfbuzzjs/ng/hb.html)

So, all the new changes need is to apply this 9fc9e7a#diff-291994c3e8f610097e257cfe2a68e019L33 but in your code and use the new module I've uploaded, but please check its validity before publishing it in the production. The next build will mostly need just removing underscores from the calls but needs this change also and that's why I like to encourage you to apply it now. Thanks

@ebraminio
Copy link
Contributor

ebraminio commented Jul 2, 2019

Hey @photopea I've just uploaded the emscripten free version of a real webassembly distribution of the project with only ~200kb size (78kb gzipped, includes a minimal libc and malloc, and without that ~10kb .js wrapper) and here is the demo, https://harfbuzz.github.io/harfbuzzjs/ (make sure you are not seeing the cached version) feel free to copy https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html but pick the .wasm binary from the demo page! Please note that there are differences between previous emscripten based release and this, our wasm builds have so simple sbrk that can't grow their memory based on need so var exports = result.instance.exports; exports.memory.grow(400); // each page is 64kb in size is put to create an initial amount of needed RAM. At the end I should note that this is not to undermine all the great works happened at emscripten project, their malloc is still used in our libc and https://github.com/intel/zephyr/blob/master/lib/libc/minimal/source/string/string.c of Intel Zephyr libc (Apache licensed) is also used, we will review these till the release but I guess you will be fine to use our pure .wasm now!

@kripken
Copy link

kripken commented Jul 4, 2019

Is there a side by side comparison of the emscripten and non-emscripten versions (or build instructions for them both)? I'm curious to understand any size difference in the .wasm. How big is that difference?

@ebraminio
Copy link
Contributor

ebraminio commented Jul 5, 2019

It is something like 30kb in wasm binary and 10kb in js glue code (both uncompressed), here is how to test,

$ git clone https://github.com/harfbuzz/harfbuzzjs && cd harfbuzzjs
$ ./build.sh && ls -ltrha hb.wasm # our current pure wasm module
-rwxr-xr-x 1 ebrahim  201K Jul  5 13:06 hb.wasm
$ git checkout 9fc9e7aa8d83b8602639b590d81e8f8fc77ddc91 # last version built with emscripten
$ ./build.sh && ls -ltrha harfbuzzjs.*
-rw-r--r-- 1 ebrahim   11K Jul  5 13:10 harfbuzzjs.js
-rw-r--r-- 1 ebrahim  229K Jul  5 13:10 harfbuzzjs.wasm

Downsides:

  1. No dynamic memory grow in sbrk https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/main.c#L11 I may need your help on this, I mean I don't know how to detect the memory grow is needed.
  2. Very tight to our use, we don't have STL and have very limited use of libc and tweaked the project to use even less of libc. I've removed printf and friends use specifically for the reason and created a specific libc header collection to our use https://github.com/harfbuzz/harfbuzzjs/tree/master/libc/include not something every project can afford.
  3. Not well tested libc, see https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/main.c#L17-L25 and not performance considered one as well 22547e7#diff-b88182833a86ec9e739b42fec4f814a0

Upsides:

  1. Full control of module fetch and load. emscripten's is a bit complicated and has its own learning curve.
  2. This 40kb save! Turned out we didn't need much of the glue code so it may become faster to run also.
  3. Maybe works better with wapm and wasm runners outside browser environment.
  4. Very easy build tools installation is needed, works in Windows easier (only clang installer download) and only needs clang 8. And it should be much faster to build.
  5. Correct symbols names in wasm, something I like very much!

And we still use emscripten malloc implementation and care about emscripten heritage :)

Update:
On another machine (a bot actually) with updated emscripten and both compiling amalgam, now is:

$ ./build.sh && ls -ltrha harfbuzzjs.*
-rw-r--r--   1   9.0K Jul  5 08:06 harfbuzzjs.js
-rw-r--r--   1   226K Jul  5 08:06 harfbuzzjs.wasm

https://transfer.sh/nOUfD/harfbuzz.wasm
https://transfer.sh/qcN4B/harfbuzz.js

vs

-rwxr-xr-x   1   200K Jul  5 07:51 hb.wasm

https://transfer.sh/y121g/hb.wasm

@kripken
Copy link

kripken commented Jul 5, 2019

Thanks @ebraminio!

Ok, I spent some time building the various options here - mostly I wanted to see if there were any bugs or forgotten flags or optimizations anywhere.

One issue is the emscripten version: using latest emscripten with the LLVM wasm backend, and using this modified build.sh (I took the new one you had and made it work with emscripten), I get

$ ls -alh a.*
-rw-rw-r-- 1 alon alon 6.6K Jul  5 14:02 a.out.js
-rw-rw-r-- 1 alon alon 183K Jul  5 14:02 a.out.wasm

At 183K that's better than both the emscripten and non-emscripten numbers from before.

But that brings me to the second issue: you can run Binaryen's wasm-opt tool on the non-emscripten wasm (emscripten runs it automatically), and it shrinks it to 178K,

$ wasm-opt hb.wasm -O -o hb_opt.wasm
$ ls -alh hb.wasm hb_opt.wasm
-rwxrwxr-x 1 alon alon 200K Jul  5 14:12 hb.wasm
-rw-rw-r-- 1 alon alon 178K Jul  5 14:13 hb_opt.wasm

And that 178K is the best number of all of them!

Old emscripten was using old LLVM (6). LLVM 8 and 9 (what emcc uses) can do better! Aside from that, running Binaryen typically shrinks wasm backend output by 10-15% percent (on HarfBuzz it's around 11%). With those two issues out of the way, the comparison is more apples to apples, and the difference is the 6K JS and a wasm difference of 5K, which I believe is because of the libc customization here.

Definitely on a project like HarfBuzz, that needs very little runtime support, it's nice to customize your own libc if you have time for that, and such custom runtimes can be smaller than emscripten's general-purpose runtime! The one thing though is you need to make sure to do all the things emcc would do for you, like running wasm-opt, otherwise the size could be worse, not better.

No dynamic memory grow in sbrk https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/main.c#L11 I may need your help on this, I mean I don't know how to detect the memory grow is needed.

I think you need to track the current memory size and how close the sbrk limit gets to there. The wasm backend has intrinsics to help there (__builtin_wasm_memory_size, __builtin_wasm_memory_grow).

Correct symbols names in wasm, something I like very much!

I'm not sure what you mean by this? If you build with emcc --profiling-funcs for example it will keep symbol names in the wasm, emcc just doesn't emit them by default. (And when using the wasm backend the symbol names are also correct in that they don't have any extra _ prefix.)

@ebraminio
Copy link
Contributor

ebraminio commented Jul 6, 2019

Wow, great infromation

One issue is the emscripten version: using latest emscripten with the LLVM wasm backend, and using this modified build.sh (I took the new one you had and made it work with emscripten), I get

Great, now I think it will be nice provide both binaries in a release, considering how emscripten's is more tested.

But that brings me to the second issue: you can run Binaryen's wasm-opt tool on the non-emscripten wasm (emscripten runs it automatically), and it shrinks it to 178K,

Great! I wished I could pass 200kb limit, now I have it :)

Definitely on a project like HarfBuzz, that needs very little runtime support, it's nice to customize your own libc if you have time for that, and such custom runtimes can be smaller than emscripten's general-purpose runtime!

Our own libc is not tested like emscripten's. I will look if I can use rest of the emscripten libc from the source.

The one thing though is you need to make sure to do all the things emcc would do for you, like running wasm-opt, otherwise the size could be worse, not better.

I tried the one shipped with my distro emscripten and it just shrinked 10k, will try latest one with emsdk once I download it.

I think you need to track the current memory size and how close the sbrk limit gets to there. The wasm backend has intrinsics to help there (__builtin_wasm_memory_size, __builtin_wasm_memory_grow).

Great help, thanks :)

I'm not sure what you mean by this? If you build with emcc --profiling-funcs for example it will keep symbol names in the wasm, emcc just doesn't emit them by default. (And when using the wasm backend the symbol names are also correct in that they don't have any extra _ prefix.)

Will use the flag in our emscripten build revival if doesn't impact much the size, I like to have this aournd
image
I wish even I could write prototype codes in JS dynamic world then write them in native code if necessary!

Thanks :)

@ebraminio ebraminio transferred this issue from harfbuzz/harfbuzz Jul 6, 2019
@ebraminio
Copy link
Contributor

ebraminio commented Jul 6, 2019

Transferred the issue to harfbuzz/harfbuzzjs to have access to all the information here easier.

@photopea here is our latest work using wasm-opt optimization, hb.wasm.zip its demo is here https://harfbuzz.github.io/harfbuzzjs/ and the code you can easily adopt from which is similar to yours is here: https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html the difference is mostly removal of initial underscores from function names, removal of hb_direction_from_string (which you weren't using IIRC) and removal of hb_buffer_serialize

Here is of course emscripten result also that may you can adopt easier but I haven't tested it and I don't recommend and is bigger in size: harfbuzzjs.zip

@kripken: Ok, this is the result with: emcc --profiling-funcs (my distro's emscripten which apparently matches your numbers)

image

image

-rw-r--r-- 1 ebrahim  309K Jul  6 16:25 a.out.wasm
-rw-r--r-- 1 ebrahim  6.7K Jul  6 16:25 a.out.js

Which is better than: (but note the size impact --profiling-funcs had)

image

-rw-r--r-- 1 ebrahim  183K Jul  6 16:29 a.out.wasm
-rw-r--r-- 1 ebrahim  6.7K Jul  6 16:29 a.out.js

But what I like and meant for correct symbol names was this:

image

-rwxr-xr-x 1 ebrahim vdr 178K Jul 6 16:05 hb.wasm

@kripken
Copy link

kripken commented Jul 6, 2019

@ebraminio Thanks, I think I see now, --profiling-funcs keeps full function name info, which is why it's so big (all internal non-exported function names are kept around too). Seems you want just the export names to not be minified? Is the reason you want the unminified export names that you want to call them directly instead of through emscripten's JS code?

Emscripten automatically minifies exports (and imports) in -O3 and above. We don't have an option to specifically disable that atm, as we assume that if we emit both js and wasm that we can do optimizations on that pair together (like minifying those imports and exports, and also metadce, etc.). However, we have an option to emit just wasm, in which case you must provide all the js runtime yourself. That's not recommended for most projects since the JS is non-trivial, but maybe it's worth trying here. To try it just do -o name.wasm. With that I get this:

$ ls -alh name.wasm
-rw-rw-r-- 1 alon alon 183K Jul  6 06:33 name.wasm
$ wasm-dis name.wasm | grep export
 (export "hb_blob_create" (func $418))
 (export "hb_blob_destroy" (func $28))
 (export "free" (func $10))
 (export "hb_blob_get_length" (func $985))
 (export "malloc" (func $288))
 (export "hb_buffer_create" (func $520))
 (export "hb_buffer_destroy" (func $1210))
 (export "hb_buffer_set_direction" (func $1205))
 (export "hb_buffer_get_length" (func $1193))
 (export "hb_buffer_get_glyph_infos" (func $1187))
 (export "hb_buffer_get_glyph_positions" (func $504))
 (export "hb_buffer_guess_segment_properties" (func $1179))
 (export "hb_buffer_add_utf8" (func $1170))
 (export "hb_face_create" (func $1027))
 (export "hb_face_destroy" (func $468))
 (export "hb_font_create" (func $948))
 (export "hb_font_destroy" (func $413))
 (export "hb_font_set_scale" (func $877))
 (export "hb_shape" (func $1068))
 (export "__errno_location" (func $1067))
 (export "dynCall_vi" (func $1060))

@yisibl
Copy link
Contributor

yisibl commented Aug 27, 2020

@photopea

Accessing system fonts from a browser will probably never happen. Mainly because people are too scared of fingerprinting (a list of fonts in your OS lets the website know that it is you, no matter if you use Incognito mode, VPN etc.).

Chrome has experimental support Local Font Access API: https://bugs.chromium.org/p/chromium/issues/detail?id=535764#c67

@yisibl
Copy link
Contributor

yisibl commented Aug 27, 2020

@ebraminio @kripken

#10 (comment)
our wasm builds have so simple sbrk that can't grow their memory based on need

Is there a way to dynamically increase the memory? For example, when doing font subsets on the server side of node.js, you may encounter very large font files, and often prompt that there is insufficient memory.

Do I just need to add this when building?

__builtin_wasm_memory_grow(0, 400);

1f9d05e#diff-9c75fca7d7c7f34fca64331a426b42baR3

@ebraminio
Copy link
Contributor

@yisibl ideally dynamic memory increase should happen in sbrk

extern "C" void *sbrk(unsigned int inc) {
using the line you mentioned, guess I couldn't find a way to coordinate that with current code but is possible definitely.

@yisibl
Copy link
Contributor

yisibl commented Sep 21, 2020

@ebraminio It seems there is hope to support dynamic memory increase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants