Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A WebAssembly version of HarfBuzz #10

Closed
photopea opened this issue Mar 28, 2019 · 65 comments

Comments

Projects
None yet
6 participants
@photopea
Copy link

commented Mar 28, 2019

Hi guys, I am developing a free web-based photo editor www.Photopea.com , which is used by around 100 000 people a day. It lets people do image editing, including inserting text into a picture.

As there is no sufficient OpenType parser and layout engine in Javascript, I made my own called Typr.js. It is quite advanced and can handle e.g. Arabic text. I also use this JS implementation of BIDI algorithm.

As more and more people use Photopea, I have to extend Typr.js . Currently, I am adding the support for Urdu and Khmer layout. I am often staring at OpenType specification for 5 - 10 hours, without writing a single line of code, only trying to understand what they mean. I would be more than happy to drop Typr.js and use an alternative, if there was any.

Would you be able to provide a WebAssembly version of your library to the public, while documenting it and maintaining it? I am ready to pay 5k - 10k USD for it. It is also important, that the library is not too large (e.g. 150-200kB zipped), as every person has to download it when starting Photopea.

@ebraminio

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

Exactly what I am thinking about everyday! Well harfbuzz, despite doing a complicated thing itself, has a simple core API itself and the only thing matters about it is hb_shape(). Here is an attempt for it https://github.com/prezi/harfbuzz-js and also mine is here also harfbuzz/harfbuzz#743 . You see even https://github.com/emscripten-ports/harfbuzz is empty as the support I've added to emscripten is using just our thing, the only thing remains however having a clean looking js library port, something I'm very interested to do but the trick is do it clean as possible so can merged upstream.

@ebraminio ebraminio self-assigned this Mar 28, 2019

@ebraminio

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

Assigning it to myself to see what happens, maybe we can have the wasm distribution in a separate repo in github.com/harfbuzz not if in harfbuzz repo itself.

@ebraminio

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

So lets define some goal here, I think as I've put the support in Typr.js photopea/Typr.js#28 already what we can do here eventually to have a cleaned up version of #1636 (just a html or js demo of how to use harfbuzz in browser or nodejs, without build results). We can then decide if we like to put harfbuzz in an npm package or provide .d.ts typescript definition documentation (automated generated) later, or, refer users to Typr.js anyway as a sample use.

@photopea

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

I would be very happy, if we could make some progress in terms of WASM file size.

You are compiling it through a current version of Emscripten, right? The conversion is done through LLVM commands as intermediate state. Is it possible to convert C to WASM directly using other tools, that would provide smaller WASM?

I think my use case would probably the biggest use case of WASM version of HarfBuzz, as there will be hundreds of thousands of people downloading it as a part of the webpage every day :)

@ebraminio

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

I went for building the library without emscripten before, even the fact that may work (but you should provide libc for the library somehow) emscripten itself incorporates good practices from what I can see.

We can reduce the current binary size by compiling harfbuzz without bulitin ucdn and Unicode function, 710kb -> 599kb (zipped, 214kb to 164kb) but that costs in correctness of shaping.

Other things may lead to some other reduction, disabling multithread, removing the not used APIs but considering binary size of HarfBuzz on Debian for example https://packages.debian.org/sid/libharfbuzz-bin (800kb which is compressed alo) I don't believe we can go for less than 150-100kb compressed :)

@ebraminio

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

Applying all the mentioned things, it has become 479.9kb (compressed, 127.0kb) but I'd say 214kb is good also considering the correctness and completeness

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Leaving out UCDN is a nonstarter.

As it happens I'm going to work on minimizing HarfBuzz for other uses. So I'll be working on this. Would be great to 1. have a streamlined way to build .wasm, and 2. a major user.

My current plans are: 1. better compressor for UCDN and other tables (based on packtab.c in fribidi repo), and 2. easy way to disable periphery API / legacy features (like Arabic fallback shaping).

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Classic case of:

image

from https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/

Leaving out UCDN is a nonstarter.

Yes, as said.

  1. have a streamlined way to build .wasm,

It is using cmake and emscripten #1636 and it is super easy to use and it doesn't make trouble for autotools development.

  1. a major user.

photopea/Typr.js#28

Recently there was a huge hype around WASI also https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 7, 2019

I would like to thank you for this amazing library, and I am informing you, that it is currently used at www.Photopea.com by thousands of users every day :)

As you open Photopea, 1.8 MB of data is downloaded (out of this, 250 kB is Harfbuzz, 90 kB are all 104 icons - 160x160px, 60 kB is a font database, 130 kB are localizations in 36 languages). Of course, everything is compressed during the transfer.

I wish you could make Harfbuzz smaller, but I don't understand it well enough to be able to give you any advice. I already started a discussion about making the Emscripten JS file smaller: emscripten-core/emscripten#8409

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Oooh, you did integrated it in Photopea already!? That's amazing!

I'm working on making a mini version of HarfBuzz in harfbuzz/harfbuzz#1652

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Reduced from 597222 to 558072 by removing CFF and to 523772 by removing AAT. My changes are on #1636 which only has this set of APIs '_hb_version_string', '_malloc', '_hb_blob_create', '_hb_face_create', '_hb_font_create', '_hb_buffer_create', '_hb_buffer_add_utf8', '_hb_buffer_guess_segment_properties', '_hb_buffer_set_direction', '_hb_shape', '_hb_buffer_serialize_glyphs', '_hb_buffer_get_length', '_hb_buffer_serialize_glyphs', '_hb_buffer_destroy', '_hb_font_destroy', '_hb_face_destroy', '_free' and compressed using closure and seems to work here!
harfbuzzjs-closure-no-cff-aat2.zip, 170kb zipped, from 247kb

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 8, 2019

Yes, I did integrate it :)

I got very excited about minifying Photopea today. Maybe it could inspire you :D

There is a font database - a large JSON file, that Photopea loads every time. There are 4290 fonts. For each font, there are four strings: Family name, Subfamily name, Postscript name, Font URL. Also, a Font Category, and flags with supported scripts. This file was 451 kB and 57 kB ZIPped.

I made some hacks in my JSON representation (e.g. an empty PostScript name means, that the PostScript name is a concatenation of Family and Subfamily). I turned that JSON into 135 kB and 29 kB ZIPped - less than 7 bytes per font :D

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Nice!

Want to show us your HarfBuzz integration glue? I'm afraid you also need a Unicode Bidirectional Algorithm implementation for full correctness.

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 8, 2019

I am using the Javascript implementation of BIDI algorithm, that I mentioned at the beginning. I added bidirectional support about two years ago. What glue do you mean?

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Oh right. Sounds good.

The code calling into HarfBuzz I meant. Okay, so you probably just missing script-run itemization.

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Ie. mixed-script text will currently be broken.

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 8, 2019

@ebraminio Could you give me an example of how to use your latest code? it seems like there is no _hb_blob_destroy .

@behdad What is a mixed-script? I call HarfBuzz separately on intervals of text, which share the same direction and font (in Photopea, each character can have a different font). I would like to encourage you to go to www.Photopea.com and try it out.

@brawer

This comment has been minimized.

Copy link

commented Apr 9, 2019

To render text, every browser already contains a shaping engine; if it was accessible from JavaScript, “download size” would be zero. At some point, there was talk about adding a text shaping API to the JavaScript core libraries, similar to the ICU wrapper in ECMA-402. Does anyone know what happened to that plan? Obviously it’d take a while to bring it through, but browsers have eventually adopted the Intl API. (As far as I can see, the main missing piece would be to find someone who can write a good API proposal for ECMA. That person would need to understand text rendering, have good JavaScript fu, and be patient enough to survive the standardization process.)

@brawer

This comment has been minimized.

Copy link

commented Apr 9, 2019

Re. mixed-script, there’s a proposal for adding an Intl.Segmenter to JavaScript. But currently, the proposal is only about breaking graphemes, words and sentences (exposing ICU break iterators), not script runs.

@brawer

This comment has been minimized.

Copy link

commented Apr 9, 2019

@photopea For correct rendering, you’ll need to do split the input text into script runs before calling HarfBuzz, but it’s more complicated. Perhaps you could follow the logic of Raqm; the script itemization code is in raqm_itemize.

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

it seems like there is no _hb_blob_destroy.

Ah, I've missed adding that call, here is the new version, harfbuzzjs-closure-no-cff-aat2.zip

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

@ebraminio great, thank you! I just updated and it is much smaller indeed :) BTW. is that "harfbuzzjs.js" a direct output from Emscripten, or you minifed it somehow? Do you think there is a space for minifying that JS even further? Could you write a comment on emscripten-core/emscripten#8409 ?

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

Yes I used closure using the flag mentioned in that file actually, you can use Google Closure in rest of your project also and surprise yourself!

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

@ebraminio I did use Closure Compiler several years ago, but it turned out to be very slow, so I made my own, which does the same thing, but it is about 50x faster.

But they can make modifications on Emscripten side, that would make the Closure Compiler result even smaller.

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

But they can make modifications on Emscripten side

Interesting, I never thought of that!

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

@behdad What is a mixed-script?

Say, you have Hindi and English mixed in the same string.

I would like to encourage you to go to www.Photopea.com and try it out.

I did already. :)

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

@behdad I understand, that to use GPOS and GSUB tables, you need to know a script, which will lead you to a set of features and lookups, that should be applied to the text. In my library Typr.js, I used to loop through all features and apply all referenced lookups :D (each lookup at most once).

I use PSD format for storing files, which does not store any information about the script. Also, users are not used to enter a script when entering a text. So my goal is to display it like the web browser would display it (you also don't enter the script in HTML, you can only enter the base direction of the paragraph).

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

I use PSD format for storing files, which does not store any information about the script. Also, users are not used to enter a script when entering a text. So my goal is to display it like the web browser would display it (you also don't enter the script in HTML, you can only enter the base direction of the paragraph).

Exactly. Web browsers as well as any other complete text rendering system internally break the text down into "script runs" automatically and shape each one separately.

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

Some bug reporting magic will be useful for here I guess,

Steps to reproduce:

  1. Create a text holder in Photopea
  2. Set FreeSerif font for it
  3. Put "ދިسسی" on it

Actual:
image

Expected:
The thing you see on the browser

image

What happened?
HarfBuzz has determined an incorrect script so rest of the thing went wrong.

Solution:
Segmenting the text by scripts before passing it to HarfBuzz

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 19, 2019

Hey, thanks for creating a separate repository! :)

Note, that I am a person, who does not use native programs during programming, such as C compilers or bash / command line, or npm. I only edit text files with a Notepad. I would be glad, if there was a way I could use Harfbuzz from that repository, but I dont see any JS or WASM file, that is ready to use.

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

They are available here harfbuzzjs.zip but will be soon available as a part of harfbuzzjs release on npm also as it is not that cool to put binaries in root of the project. I've also just created a readme file for the project https://github.com/harfbuzz/harfbuzzjs/ let me know if that is clear enough or anything needs to be added.

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 23, 2019

Hi, I would like to donate 1000 USD to the HarfBuzz project. It works excellently at Photopea.com and has saved me the struggle of implementing everything myself (even though I already invested hundreds of hours into Typr.js).

Is there a Donation page? I would like Ebrahim to get the part of it, too, as he helped me a lot. Do you work together in a group?

Also, I am still hoping you manage to make HarfBuzz smaller, either by throwing out unneeded parts, or improving the representation of structures.

@twardoch

This comment has been minimized.

Copy link

commented Apr 23, 2019

I'm not sure if there is a donation page, but I think the maintainers will suggest something. HarfBuzz has a long history — started as the FT_Layout submodule of FreeType with simple functionality, then a lot of work was done within Qt, and at some point, years ago, @behdad took over the development, first within RedHat, I think, and then within Google.

But it never was a “RedHat project” or a “Google project”. Behdad did a massive job and then was joined by others. Firefox, then Chrome, started using it to do the OpenType shaping, and the developers put an incredible amount of work in to make HB result-compatible with Uniscribe, Microsoft’s implementation of OpenType Layout (without having access to sources, so there was a lot of trial-and-error).

The project has now many man-years of developer work, and the developers have always shown the willingness to implement features (I mean wishes, new functionalities — not the OT features), of course as long as they remained within the scope of the lib. (I once asked for the hb-view tool and proposed its CLI spec, and Behdad did it in a week, which finally made it possible to produce simple text samples in all of the world scripts as PNG, SVG & PDF via Cairo).

There is a lot of implicit knowledge of Unicode & OpenType encoded in HarfBuzz, or — more broadly speaking — a lot of knowledge about the world typography. Thanks to HarfBuzz, both large and small languages have a chance for accurate and orthographically correct digital text exchange. Together with FreeType and the Google Noto project, HarfBuzz is an immense contribution to the centuries of the human written culture.

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

Just released the work on npm https://www.npmjs.com/package/harfbuzzjs the full version on https://wapm.io/package/ebraminio/harfbuzz (brand new wasm files package manager apparently) and whole thing, including .wasm files, .js interfaces and the lean wrapper, hbjs.js, on https://github.com/harfbuzz/harfbuzzjs/releases

Is there a Donation page?

not aware of any donation page, feel free to send the amount you like to Behdad, otherwise he should setup one.

@photopea

This comment has been minimized.

Copy link
Author

commented Apr 24, 2019

How do I send money to Behdad? Does he have a bank account in the US or EU?

Thanks for creating all these npm / wasm package manager distribution channels. I do not use them, but I think many will use them. Personally, I would prefer if you invested the effort into Harfbuzz itself, to make it smaller / faster / more robust. Even though it is already quite small and fast for my needs :)

@ebraminio

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

Personally, I would prefer if you invested the effort into Harfbuzz itself

It is already an squeeze from a 1.9Mb .wasm file (540kb zipped) to 536kb (159kb zipped, your original goal I think) using different techniques we've incorporated but there is of course room for more.

@behdad

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

How do I send money to Behdad? Does he have a bank account in the US or EU?

I have US accounts, yes. You can email me@behdad.org. Thanks for your generous offer!

Thanks for creating all these npm / wasm package manager distribution channels. I do not use them, but I think many will use them. Personally, I would prefer if you invested the effort into Harfbuzz itself, to make it smaller / faster / more robust. Even though it is already quite small and fast for my needs :)

I'm still working on that in harfbuzz/harfbuzz#1652

For example, I'm shrinking UCDN from over 100kb to about 30kb. My changes will make it to master soon.

@photopea

This comment has been minimized.

Copy link
Author

commented May 7, 2019

May I have one more question? Does Harfbuzz support TTC files (Font Collections) ? A Font collection is basically several TTF files concatenated, with a list of offsets to each file at the beginning. They can also share some tables with each other (by sharing offsets to those tables).

When I load a whole TTC file to HarfBuzz, where do I specify, which font should be used for shaping?

@ebraminio

This comment has been minimized.

Copy link
Member

commented May 7, 2019

Oh it does, you have to put the index you like instead 0 on module._hb_face_create(blob, 0);, there is a hb_face_count also but not available in your build, you can go without it but let me know if you want it.

@behdad

This comment has been minimized.

Copy link
Member

commented May 7, 2019

That's the integer index passed to hb_face_create().

@photopea

This comment has been minimized.

Copy link
Author

commented May 7, 2019

Wow, great, it works perfectly, thanks! :)

@ebraminio

This comment has been minimized.

Copy link
Member

commented May 12, 2019

New build using Behdad's HB_TINY, only 440kb of .wasm
harfbuzzjs.zip
Not tested personally only to report but feel free to use if works there

@photopea

This comment has been minimized.

Copy link
Author

commented May 12, 2019

I updated it, works perfectly, thanks! :)

@ebraminio

This comment has been minimized.

Copy link
Member

commented May 24, 2019

Using new works on HarfBuzz it turned from 440kb to 421kb and after the very recent Behdad's works it is turned into 371kb, and after enabling --llvm-lto 1 and the removal of unnecessary strings it has turned to 278kb!

harfbuzzjs-lto.zip

But apparently I can't make it work and the previous version even doesn't work here, please make sure if it working there before updating it

@photopea

This comment has been minimized.

Copy link
Author

commented May 24, 2019

@ebraminio The new size is incredible! However, for me, it returns an empty array of glyphs :(

@ebraminio

This comment has been minimized.

Copy link
Member

commented May 24, 2019

Now a working version with 2.5.0 release which I can confirm works here also! https://harfbuzz.github.io/harfbuzzjs/ (feel free to pick the wasm from that page even or from below)

harfbuzzjs.zip

its is only 98kb of zipped wasm, exceeding your goal :)

@photopea

This comment has been minimized.

Copy link
Author

commented May 24, 2019

I just put it online, works great, thanks! :)

@ebraminio

This comment has been minimized.

Copy link
Member

commented Jun 22, 2019

With another round of improvements by Behdad we've reached to 246kb from 280kb and with the removal of hb_serialize, which is necessary for further works, it goes down to 236kb!

harfbuzzjs.zip

The needed change from your side is to use this instead current serializer, I tested it here https://harfbuzz.github.io/harfbuzzjs/ and seems to work fine here,

    var length = module._hb_buffer_get_length(buffer);
    var result = [];
    var infosPtr32 = module._hb_buffer_get_glyph_infos(buffer, 0) / 4;
    var positionsPtr32 = module._hb_buffer_get_glyph_positions(buffer, 0) / 4;
    var infos = module.HEAPU32.slice(infosPtr32, infosPtr32 + 5 * length);
    var positions = module.HEAP32.slice(positionsPtr32, positionsPtr32 + 5 * length);
    for (var i = 0; i < length; ++i) {
      result.push({
        g: infos[i * 5 + 0],
        cl: infos[i * 5 + 2],
        ax: positions[i * 5 + 0],
        ay: positions[i * 5 + 1],
        dx: positions[i * 5 + 2],
        dy: positions[i * 5 + 3]
      });
    }

This is essential as the next round of works I am working on is about removing emscripten and the glue code (that 10kb js code) which works with a trimmed down libc which now I have a working demo of it here https://harfbuzz.github.io/harfbuzzjs/ng/hb.html

@photopea

This comment has been minimized.

Copy link
Author

commented Jun 24, 2019

Hi! Currently, when you open Photopea.com , 1.4 MB is loaded (the whole program). HarfBuzzjs.wasm, which is extra 111 kB (as it is GZIPped), is loaded only if the text tool is used (so we don't load it every time as in the past).

I am alerady quite happy with the progress you have made, and if you plan to keep going, I will wait for the next version :)

I wish all developers cared about the size of their programs at least half as much as you do :)

@ebraminio

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

@photopea, great :) I would say this version worth to be integrated now as the next will have radical changes and we may don't release that soon or ever (as that needs we compile our owned malloc/calloc/realloc/free, which may gets some little time to correctly figured out).

An advantage to the next version is you can compile harfbuzz .wasm by yourself just by downloading llvm installer and it works even in Windows also, current llvm releases http://releases.llvm.org/download.html#8.0.0 which provide an installer for Windows, support compiling and linking wasm32 files. You may like to port some of the other codes you've written for the rest of your app to reduce their size with it, it is super easy https://dassur.ma/things/c-to-webassembly/ and doesn't need may complicated setup emscripten and Google Closure have but for now our emscripten builds are only considered stable (even the fact we have the ng builds now working here https://harfbuzz.github.io/harfbuzzjs/ng/hb.html)

So, all the new changes need is to apply this 9fc9e7a#diff-291994c3e8f610097e257cfe2a68e019L33 but in your code and use the new module I've uploaded, but please check its validity before publishing it in the production. The next build will mostly need just removing underscores from the calls but needs this change also and that's why I like to encourage you to apply it now. Thanks

@ebraminio

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

Hey @photopea I've just uploaded the emscripten free version of a real webassembly distribution of the project with only ~200kb size (78kb gzipped, includes a minimal libc and malloc, and without that ~10kb .js wrapper) and here is the demo, https://harfbuzz.github.io/harfbuzzjs/ (make sure you are not seeing the cached version) feel free to copy https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html but pick the .wasm binary from the demo page! Please note that there are differences between previous emscripten based release and this, our wasm builds have so simple sbrk that can't grow their memory based on need so var exports = result.instance.exports; exports.memory.grow(400); // each page is 64kb in size is put to create an initial amount of needed RAM. At the end I should note that this is not to undermine all the great works happened at emscripten project, their malloc is still used in our libc and https://github.com/intel/zephyr/blob/master/lib/libc/minimal/source/string/string.c of Intel Zephyr libc (Apache licensed) is also used, we will review these till the release but I guess you will be fine to use our pure .wasm now!

@kripken

This comment has been minimized.

Copy link

commented Jul 4, 2019

Is there a side by side comparison of the emscripten and non-emscripten versions (or build instructions for them both)? I'm curious to understand any size difference in the .wasm. How big is that difference?

@ebraminio

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

It is something like 30kb in wasm binary and 10kb in js glue code (both uncompressed), here is how to test,

$ git clone https://github.com/harfbuzz/harfbuzzjs && cd harfbuzzjs
$ ./build.sh && ls -ltrha hb.wasm # our current pure wasm module
-rwxr-xr-x 1 ebrahim  201K Jul  5 13:06 hb.wasm
$ git checkout 9fc9e7aa8d83b8602639b590d81e8f8fc77ddc91 # last version built with emscripten
$ ./build.sh && ls -ltrha harfbuzzjs.*
-rw-r--r-- 1 ebrahim   11K Jul  5 13:10 harfbuzzjs.js
-rw-r--r-- 1 ebrahim  229K Jul  5 13:10 harfbuzzjs.wasm

Downsides:

  1. No dynamic memory grow in sbrk https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/main.c#L11 I may need your help on this, I mean I don't know how to detect the memory grow is needed.
  2. Very tight to our use, we don't have STL and have very limited use of libc and tweaked the project to use even less of libc. I've removed printf and friends use specifically for the reason and created a specific libc header collection to our use https://github.com/harfbuzz/harfbuzzjs/tree/master/libc/include not something every project can afford.
  3. Not well tested libc, see https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/main.c#L17-L25 and not performance considered one as well 22547e7#diff-b88182833a86ec9e739b42fec4f814a0

Upsides:

  1. Full control of module fetch and load. emscripten's is a bit complicated and has its own learning curve.
  2. This 40kb save! Turned out we didn't need much of the glue code so it may become faster to run also.
  3. Maybe works better with wapm and wasm runners outside browser environment.
  4. Very easy build tools installation is needed, works in Windows easier (only clang installer download) and only needs clang 8. And it should be much faster to build.
  5. Correct symbols names in wasm, something I like very much!

And we still use emscripten malloc implementation and care about emscripten heritage :)

Update:
On another machine (a bot actually) with updated emscripten and both compiling amalgam, now is:

$ ./build.sh && ls -ltrha harfbuzzjs.*
-rw-r--r--   1   9.0K Jul  5 08:06 harfbuzzjs.js
-rw-r--r--   1   226K Jul  5 08:06 harfbuzzjs.wasm

https://transfer.sh/nOUfD/harfbuzz.wasm
https://transfer.sh/qcN4B/harfbuzz.js

vs

-rwxr-xr-x   1   200K Jul  5 07:51 hb.wasm

https://transfer.sh/y121g/hb.wasm

@kripken

This comment has been minimized.

Copy link

commented Jul 5, 2019

Thanks @ebraminio!

Ok, I spent some time building the various options here - mostly I wanted to see if there were any bugs or forgotten flags or optimizations anywhere.

One issue is the emscripten version: using latest emscripten with the LLVM wasm backend, and using this modified build.sh (I took the new one you had and made it work with emscripten), I get

$ ls -alh a.*
-rw-rw-r-- 1 alon alon 6.6K Jul  5 14:02 a.out.js
-rw-rw-r-- 1 alon alon 183K Jul  5 14:02 a.out.wasm

At 183K that's better than both the emscripten and non-emscripten numbers from before.

But that brings me to the second issue: you can run Binaryen's wasm-opt tool on the non-emscripten wasm (emscripten runs it automatically), and it shrinks it to 178K,

$ wasm-opt hb.wasm -O -o hb_opt.wasm
$ ls -alh hb.wasm hb_opt.wasm
-rwxrwxr-x 1 alon alon 200K Jul  5 14:12 hb.wasm
-rw-rw-r-- 1 alon alon 178K Jul  5 14:13 hb_opt.wasm

And that 178K is the best number of all of them!

Old emscripten was using old LLVM (6). LLVM 8 and 9 (what emcc uses) can do better! Aside from that, running Binaryen typically shrinks wasm backend output by 10-15% percent (on HarfBuzz it's around 11%). With those two issues out of the way, the comparison is more apples to apples, and the difference is the 6K JS and a wasm difference of 5K, which I believe is because of the libc customization here.

Definitely on a project like HarfBuzz, that needs very little runtime support, it's nice to customize your own libc if you have time for that, and such custom runtimes can be smaller than emscripten's general-purpose runtime! The one thing though is you need to make sure to do all the things emcc would do for you, like running wasm-opt, otherwise the size could be worse, not better.

No dynamic memory grow in sbrk https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/main.c#L11 I may need your help on this, I mean I don't know how to detect the memory grow is needed.

I think you need to track the current memory size and how close the sbrk limit gets to there. The wasm backend has intrinsics to help there (__builtin_wasm_memory_size, __builtin_wasm_memory_grow).

Correct symbols names in wasm, something I like very much!

I'm not sure what you mean by this? If you build with emcc --profiling-funcs for example it will keep symbol names in the wasm, emcc just doesn't emit them by default. (And when using the wasm backend the symbol names are also correct in that they don't have any extra _ prefix.)

@ebraminio

This comment has been minimized.

Copy link
Member

commented Jul 6, 2019

Wow, great infromation

One issue is the emscripten version: using latest emscripten with the LLVM wasm backend, and using this modified build.sh (I took the new one you had and made it work with emscripten), I get

Great, now I think it will be nice provide both binaries in a release, considering how emscripten's is more tested.

But that brings me to the second issue: you can run Binaryen's wasm-opt tool on the non-emscripten wasm (emscripten runs it automatically), and it shrinks it to 178K,

Great! I wished I could pass 200kb limit, now I have it :)

Definitely on a project like HarfBuzz, that needs very little runtime support, it's nice to customize your own libc if you have time for that, and such custom runtimes can be smaller than emscripten's general-purpose runtime!

Our own libc is not tested like emscripten's. I will look if I can use rest of the emscripten libc from the source.

The one thing though is you need to make sure to do all the things emcc would do for you, like running wasm-opt, otherwise the size could be worse, not better.

I tried the one shipped with my distro emscripten and it just shrinked 10k, will try latest one with emsdk once I download it.

I think you need to track the current memory size and how close the sbrk limit gets to there. The wasm backend has intrinsics to help there (__builtin_wasm_memory_size, __builtin_wasm_memory_grow).

Great help, thanks :)

I'm not sure what you mean by this? If you build with emcc --profiling-funcs for example it will keep symbol names in the wasm, emcc just doesn't emit them by default. (And when using the wasm backend the symbol names are also correct in that they don't have any extra _ prefix.)

Will use the flag in our emscripten build revival if doesn't impact much the size, I like to have this aournd
image
I wish even I could write prototype codes in JS dynamic world then write them in native code if necessary!

Thanks :)

@ebraminio ebraminio transferred this issue from harfbuzz/harfbuzz Jul 6, 2019

@ebraminio

This comment has been minimized.

Copy link
Member

commented Jul 6, 2019

Transferred the issue to harfbuzz/harfbuzzjs to have access to all the information here easier.

@photopea here is our latest work using wasm-opt optimization, hb.wasm.zip its demo is here https://harfbuzz.github.io/harfbuzzjs/ and the code you can easily adopt from which is similar to yours is here: https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html the difference is mostly removal of initial underscores from function names, removal of hb_direction_from_string (which you weren't using IIRC) and removal of hb_buffer_serialize

Here is of course emscripten result also that may you can adopt easier but I haven't tested it and I don't recommend and is bigger in size: harfbuzzjs.zip

@kripken: Ok, this is the result with: emcc --profiling-funcs (my distro's emscripten which apparently matches your numbers)

image

image

-rw-r--r-- 1 ebrahim  309K Jul  6 16:25 a.out.wasm
-rw-r--r-- 1 ebrahim  6.7K Jul  6 16:25 a.out.js

Which is better than: (but note the size impact --profiling-funcs had)

image

-rw-r--r-- 1 ebrahim  183K Jul  6 16:29 a.out.wasm
-rw-r--r-- 1 ebrahim  6.7K Jul  6 16:29 a.out.js

But what I like and meant for correct symbol names was this:

image

-rwxr-xr-x 1 ebrahim vdr 178K Jul 6 16:05 hb.wasm

@kripken

This comment has been minimized.

Copy link

commented Jul 6, 2019

@ebraminio Thanks, I think I see now, --profiling-funcs keeps full function name info, which is why it's so big (all internal non-exported function names are kept around too). Seems you want just the export names to not be minified? Is the reason you want the unminified export names that you want to call them directly instead of through emscripten's JS code?

Emscripten automatically minifies exports (and imports) in -O3 and above. We don't have an option to specifically disable that atm, as we assume that if we emit both js and wasm that we can do optimizations on that pair together (like minifying those imports and exports, and also metadce, etc.). However, we have an option to emit just wasm, in which case you must provide all the js runtime yourself. That's not recommended for most projects since the JS is non-trivial, but maybe it's worth trying here. To try it just do -o name.wasm. With that I get this:

$ ls -alh name.wasm
-rw-rw-r-- 1 alon alon 183K Jul  6 06:33 name.wasm
$ wasm-dis name.wasm | grep export
 (export "hb_blob_create" (func $418))
 (export "hb_blob_destroy" (func $28))
 (export "free" (func $10))
 (export "hb_blob_get_length" (func $985))
 (export "malloc" (func $288))
 (export "hb_buffer_create" (func $520))
 (export "hb_buffer_destroy" (func $1210))
 (export "hb_buffer_set_direction" (func $1205))
 (export "hb_buffer_get_length" (func $1193))
 (export "hb_buffer_get_glyph_infos" (func $1187))
 (export "hb_buffer_get_glyph_positions" (func $504))
 (export "hb_buffer_guess_segment_properties" (func $1179))
 (export "hb_buffer_add_utf8" (func $1170))
 (export "hb_face_create" (func $1027))
 (export "hb_face_destroy" (func $468))
 (export "hb_font_create" (func $948))
 (export "hb_font_destroy" (func $413))
 (export "hb_font_set_scale" (func $877))
 (export "hb_shape" (func $1068))
 (export "__errno_location" (func $1067))
 (export "dynCall_vi" (func $1060))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.