-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A WebAssembly version of HarfBuzz #10
Comments
Exactly what I am thinking about everyday! Well harfbuzz, despite doing a complicated thing itself, has a simple core API itself and the only thing matters about it is hb_shape(). Here is an attempt for it https://github.com/prezi/harfbuzz-js and also mine is here also harfbuzz/harfbuzz#743 . You see even https://github.com/emscripten-ports/harfbuzz is empty as the support I've added to emscripten is using just our thing, the only thing remains however having a clean looking js library port, something I'm very interested to do but the trick is do it clean as possible so can merged upstream. |
Assigning it to myself to see what happens, maybe we can have the wasm distribution in a separate repo in github.com/harfbuzz not if in harfbuzz repo itself. |
So lets define some goal here, I think as I've put the support in Typr.js photopea/Typr.js#28 already what we can do here eventually to have a cleaned up version of #1636 (just a html or js demo of how to use harfbuzz in browser or nodejs, without build results). We can then decide if we like to put harfbuzz in an npm package or provide .d.ts typescript definition documentation (automated generated) later, or, refer users to Typr.js anyway as a sample use. |
I would be very happy, if we could make some progress in terms of WASM file size. You are compiling it through a current version of Emscripten, right? The conversion is done through LLVM commands as intermediate state. Is it possible to convert C to WASM directly using other tools, that would provide smaller WASM? I think my use case would probably the biggest use case of WASM version of HarfBuzz, as there will be hundreds of thousands of people downloading it as a part of the webpage every day :) |
I went for building the library without emscripten before, even the fact that may work (but you should provide libc for the library somehow) emscripten itself incorporates good practices from what I can see. We can reduce the current binary size by compiling harfbuzz without bulitin ucdn and Unicode function, 710kb -> 599kb (zipped, 214kb to 164kb) but that costs in correctness of shaping. Other things may lead to some other reduction, disabling multithread, removing the not used APIs but considering binary size of HarfBuzz on Debian for example https://packages.debian.org/sid/libharfbuzz-bin (800kb which is compressed alo) I don't believe we can go for less than 150-100kb compressed :) |
Applying all the mentioned things, it has become 479.9kb (compressed, 127.0kb) but I'd say 214kb is good also considering the correctness and completeness |
Leaving out UCDN is a nonstarter. As it happens I'm going to work on minimizing HarfBuzz for other uses. So I'll be working on this. Would be great to 1. have a streamlined way to build .wasm, and 2. a major user. My current plans are: 1. better compressor for UCDN and other tables (based on packtab.c in fribidi repo), and 2. easy way to disable periphery API / legacy features (like Arabic fallback shaping). |
Classic case of:
Yes, as said.
It is using cmake and emscripten #1636 and it is super easy to use and it doesn't make trouble for autotools development.
Recently there was a huge hype around WASI also https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/ |
I would like to thank you for this amazing library, and I am informing you, that it is currently used at www.Photopea.com by thousands of users every day :) As you open Photopea, 1.8 MB of data is downloaded (out of this, 250 kB is Harfbuzz, 90 kB are all 104 icons - 160x160px, 60 kB is a font database, 130 kB are localizations in 36 languages). Of course, everything is compressed during the transfer. I wish you could make Harfbuzz smaller, but I don't understand it well enough to be able to give you any advice. I already started a discussion about making the Emscripten JS file smaller: emscripten-core/emscripten#8409 |
Oooh, you did integrated it in Photopea already!? That's amazing! I'm working on making a mini version of HarfBuzz in harfbuzz/harfbuzz#1652 |
Reduced from 597222 to 558072 by removing CFF and to 523772 by removing AAT. My changes are on #1636 which only has this set of APIs |
Yes, I did integrate it :) I got very excited about minifying Photopea today. Maybe it could inspire you :D There is a font database - a large JSON file, that Photopea loads every time. There are 4290 fonts. For each font, there are four strings: Family name, Subfamily name, Postscript name, Font URL. Also, a Font Category, and flags with supported scripts. This file was 451 kB and 57 kB ZIPped. I made some hacks in my JSON representation (e.g. an empty PostScript name means, that the PostScript name is a concatenation of Family and Subfamily). I turned that JSON into 135 kB and 29 kB ZIPped - less than 7 bytes per font :D |
Nice! Want to show us your HarfBuzz integration glue? I'm afraid you also need a Unicode Bidirectional Algorithm implementation for full correctness. |
I am using the Javascript implementation of BIDI algorithm, that I mentioned at the beginning. I added bidirectional support about two years ago. What glue do you mean? |
Oh right. Sounds good. The code calling into HarfBuzz I meant. Okay, so you probably just missing script-run itemization. |
Ie. mixed-script text will currently be broken. |
@ebraminio Could you give me an example of how to use your latest code? it seems like there is no _hb_blob_destroy . @behdad What is a mixed-script? I call HarfBuzz separately on intervals of text, which share the same direction and font (in Photopea, each character can have a different font). I would like to encourage you to go to www.Photopea.com and try it out. |
To render text, every browser already contains a shaping engine; if it was accessible from JavaScript, “download size” would be zero. At some point, there was talk about adding a text shaping API to the JavaScript core libraries, similar to the ICU wrapper in ECMA-402. Does anyone know what happened to that plan? Obviously it’d take a while to bring it through, but browsers have eventually adopted the Intl API. (As far as I can see, the main missing piece would be to find someone who can write a good API proposal for ECMA. That person would need to understand text rendering, have good JavaScript fu, and be patient enough to survive the standardization process.) |
Re. mixed-script, there’s a proposal for adding an Intl.Segmenter to JavaScript. But currently, the proposal is only about breaking graphemes, words and sentences (exposing ICU break iterators), not script runs. |
@photopea For correct rendering, you’ll need to do split the input text into script runs before calling HarfBuzz, but it’s more complicated. Perhaps you could follow the logic of Raqm; the script itemization code is in raqm_itemize. |
Ah, I've missed adding that call, here is the new version, harfbuzzjs-closure-no-cff-aat2.zip |
@ebraminio great, thank you! I just updated and it is much smaller indeed :) BTW. is that "harfbuzzjs.js" a direct output from Emscripten, or you minifed it somehow? Do you think there is a space for minifying that JS even further? Could you write a comment on emscripten-core/emscripten#8409 ? |
Yes I used closure using the flag mentioned in that file actually, you can use Google Closure in rest of your project also and surprise yourself! |
@ebraminio I did use Closure Compiler several years ago, but it turned out to be very slow, so I made my own, which does the same thing, but it is about 50x faster. But they can make modifications on Emscripten side, that would make the Closure Compiler result even smaller. |
Interesting, I never thought of that! |
Say, you have Hindi and English mixed in the same string.
I did already. :) |
@behdad I understand, that to use GPOS and GSUB tables, you need to know a script, which will lead you to a set of features and lookups, that should be applied to the text. In my library Typr.js, I used to loop through all features and apply all referenced lookups :D (each lookup at most once). I use PSD format for storing files, which does not store any information about the script. Also, users are not used to enter a script when entering a text. So my goal is to display it like the web browser would display it (you also don't enter the script in HTML, you can only enter the base direction of the paragraph). |
Exactly. Web browsers as well as any other complete text rendering system internally break the text down into "script runs" automatically and shape each one separately. |
Some bug reporting magic will be useful for here I guess, Steps to reproduce:
Expected: What happened? Solution: |
How do I send money to Behdad? Does he have a bank account in the US or EU? Thanks for creating all these npm / wasm package manager distribution channels. I do not use them, but I think many will use them. Personally, I would prefer if you invested the effort into Harfbuzz itself, to make it smaller / faster / more robust. Even though it is already quite small and fast for my needs :) |
It is already an squeeze from a 1.9Mb .wasm file (540kb zipped) to 536kb (159kb zipped, your original goal I think) using different techniques we've incorporated but there is of course room for more. |
I have US accounts, yes. You can email me@behdad.org. Thanks for your generous offer!
I'm still working on that in harfbuzz/harfbuzz#1652 For example, I'm shrinking UCDN from over 100kb to about 30kb. My changes will make it to master soon. |
May I have one more question? Does Harfbuzz support TTC files (Font Collections) ? A Font collection is basically several TTF files concatenated, with a list of offsets to each file at the beginning. They can also share some tables with each other (by sharing offsets to those tables). When I load a whole TTC file to HarfBuzz, where do I specify, which font should be used for shaping? |
Oh it does, you have to put the index you like instead 0 on |
That's the integer index passed to hb_face_create(). |
Wow, great, it works perfectly, thanks! :) |
New build using Behdad's HB_TINY, only 440kb of .wasm |
I updated it, works perfectly, thanks! :) |
Using new works on HarfBuzz it turned from 440kb to 421kb and after the very recent Behdad's works it is turned into 371kb, and after enabling But apparently I can't make it work and the previous version even doesn't work here, please make sure if it working there before updating it |
@ebraminio The new size is incredible! However, for me, it returns an empty array of glyphs :( |
Now a working version with 2.5.0 release which I can confirm works here also! https://harfbuzz.github.io/harfbuzzjs/ (feel free to pick the wasm from that page even or from below) its is only 98kb of zipped wasm, exceeding your goal :) |
I just put it online, works great, thanks! :) |
With another round of improvements by Behdad we've reached to 246kb from 280kb and with the removal of hb_serialize, which is necessary for further works, it goes down to 236kb! The needed change from your side is to use this instead current serializer, I tested it here https://harfbuzz.github.io/harfbuzzjs/ and seems to work fine here, var length = module._hb_buffer_get_length(buffer);
var result = [];
var infosPtr32 = module._hb_buffer_get_glyph_infos(buffer, 0) / 4;
var positionsPtr32 = module._hb_buffer_get_glyph_positions(buffer, 0) / 4;
var infos = module.HEAPU32.slice(infosPtr32, infosPtr32 + 5 * length);
var positions = module.HEAP32.slice(positionsPtr32, positionsPtr32 + 5 * length);
for (var i = 0; i < length; ++i) {
result.push({
g: infos[i * 5 + 0],
cl: infos[i * 5 + 2],
ax: positions[i * 5 + 0],
ay: positions[i * 5 + 1],
dx: positions[i * 5 + 2],
dy: positions[i * 5 + 3]
});
} This is essential as the next round of works I am working on is about removing emscripten and the glue code (that 10kb js code) which works with a trimmed down libc which now I have a working demo of it here https://harfbuzz.github.io/harfbuzzjs/ng/hb.html |
Hi! Currently, when you open Photopea.com , 1.4 MB is loaded (the whole program). HarfBuzzjs.wasm, which is extra 111 kB (as it is GZIPped), is loaded only if the text tool is used (so we don't load it every time as in the past). I am alerady quite happy with the progress you have made, and if you plan to keep going, I will wait for the next version :) I wish all developers cared about the size of their programs at least half as much as you do :) |
@photopea, great :) I would say this version worth to be integrated now as the next will have radical changes and we may don't release that soon or ever (as that needs we compile our owned malloc/calloc/realloc/free, which may gets some little time to correctly figured out). An advantage to the next version is you can compile harfbuzz .wasm by yourself just by downloading llvm installer and it works even in Windows also, current llvm releases http://releases.llvm.org/download.html#8.0.0 which provide an installer for Windows, support compiling and linking wasm32 files. You may like to port some of the other codes you've written for the rest of your app to reduce their size with it, it is super easy https://dassur.ma/things/c-to-webassembly/ and doesn't need may complicated setup emscripten and Google Closure have but for now our emscripten builds are only considered stable (even the fact we have the ng builds now working here https://harfbuzz.github.io/harfbuzzjs/ng/hb.html) So, all the new changes need is to apply this 9fc9e7a#diff-291994c3e8f610097e257cfe2a68e019L33 but in your code and use the new module I've uploaded, but please check its validity before publishing it in the production. The next build will mostly need just removing underscores from the calls but needs this change also and that's why I like to encourage you to apply it now. Thanks |
Hey @photopea I've just uploaded the emscripten free version of a real webassembly distribution of the project with only ~200kb size (78kb gzipped, includes a minimal libc and malloc, and without that ~10kb .js wrapper) and here is the demo, https://harfbuzz.github.io/harfbuzzjs/ (make sure you are not seeing the cached version) feel free to copy https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html but pick the .wasm binary from the demo page! Please note that there are differences between previous emscripten based release and this, our wasm builds have so simple |
Is there a side by side comparison of the emscripten and non-emscripten versions (or build instructions for them both)? I'm curious to understand any size difference in the .wasm. How big is that difference? |
It is something like 30kb in wasm binary and 10kb in js glue code (both uncompressed), here is how to test, $ git clone https://github.com/harfbuzz/harfbuzzjs && cd harfbuzzjs
$ ./build.sh && ls -ltrha hb.wasm # our current pure wasm module
-rwxr-xr-x 1 ebrahim 201K Jul 5 13:06 hb.wasm
$ git checkout 9fc9e7aa8d83b8602639b590d81e8f8fc77ddc91 # last version built with emscripten
$ ./build.sh && ls -ltrha harfbuzzjs.*
-rw-r--r-- 1 ebrahim 11K Jul 5 13:10 harfbuzzjs.js
-rw-r--r-- 1 ebrahim 229K Jul 5 13:10 harfbuzzjs.wasm Downsides:
Upsides:
And we still use emscripten malloc implementation and care about emscripten heritage :) Update:
https://transfer.sh/nOUfD/harfbuzz.wasm vs
|
Thanks @ebraminio! Ok, I spent some time building the various options here - mostly I wanted to see if there were any bugs or forgotten flags or optimizations anywhere. One issue is the emscripten version: using latest emscripten with the LLVM wasm backend, and using this modified build.sh (I took the new one you had and made it work with emscripten), I get $ ls -alh a.*
-rw-rw-r-- 1 alon alon 6.6K Jul 5 14:02 a.out.js
-rw-rw-r-- 1 alon alon 183K Jul 5 14:02 a.out.wasm At 183K that's better than both the emscripten and non-emscripten numbers from before. But that brings me to the second issue: you can run Binaryen's wasm-opt tool on the non-emscripten wasm (emscripten runs it automatically), and it shrinks it to 178K, $ wasm-opt hb.wasm -O -o hb_opt.wasm
$ ls -alh hb.wasm hb_opt.wasm
-rwxrwxr-x 1 alon alon 200K Jul 5 14:12 hb.wasm
-rw-rw-r-- 1 alon alon 178K Jul 5 14:13 hb_opt.wasm And that 178K is the best number of all of them! Old emscripten was using old LLVM (6). LLVM 8 and 9 (what emcc uses) can do better! Aside from that, running Binaryen typically shrinks wasm backend output by 10-15% percent (on HarfBuzz it's around 11%). With those two issues out of the way, the comparison is more apples to apples, and the difference is the 6K JS and a wasm difference of 5K, which I believe is because of the libc customization here. Definitely on a project like HarfBuzz, that needs very little runtime support, it's nice to customize your own libc if you have time for that, and such custom runtimes can be smaller than emscripten's general-purpose runtime! The one thing though is you need to make sure to do all the things
I think you need to track the current memory size and how close the sbrk limit gets to there. The wasm backend has intrinsics to help there (
I'm not sure what you mean by this? If you build with |
Transferred the issue to harfbuzz/harfbuzzjs to have access to all the information here easier. @photopea here is our latest work using Here is of course emscripten result also that may you can adopt easier but I haven't tested it and I don't recommend and is bigger in size: harfbuzzjs.zip @kripken: Ok, this is the result with:
Which is better than: (but note the size impact --profiling-funcs had)
But what I like and meant for correct symbol names was this:
|
@ebraminio Thanks, I think I see now, Emscripten automatically minifies exports (and imports) in $ ls -alh name.wasm
-rw-rw-r-- 1 alon alon 183K Jul 6 06:33 name.wasm
$ wasm-dis name.wasm | grep export
(export "hb_blob_create" (func $418))
(export "hb_blob_destroy" (func $28))
(export "free" (func $10))
(export "hb_blob_get_length" (func $985))
(export "malloc" (func $288))
(export "hb_buffer_create" (func $520))
(export "hb_buffer_destroy" (func $1210))
(export "hb_buffer_set_direction" (func $1205))
(export "hb_buffer_get_length" (func $1193))
(export "hb_buffer_get_glyph_infos" (func $1187))
(export "hb_buffer_get_glyph_positions" (func $504))
(export "hb_buffer_guess_segment_properties" (func $1179))
(export "hb_buffer_add_utf8" (func $1170))
(export "hb_face_create" (func $1027))
(export "hb_face_destroy" (func $468))
(export "hb_font_create" (func $948))
(export "hb_font_destroy" (func $413))
(export "hb_font_set_scale" (func $877))
(export "hb_shape" (func $1068))
(export "__errno_location" (func $1067))
(export "dynCall_vi" (func $1060)) |
Chrome has experimental support Local Font Access API: https://bugs.chromium.org/p/chromium/issues/detail?id=535764#c67 |
Is there a way to dynamically increase the memory? For example, when doing font subsets on the server side of node.js, you may encounter very large font files, and often prompt that there is insufficient memory. Do I just need to add this when building? __builtin_wasm_memory_grow(0, 400); |
@ebraminio It seems there is hope to support dynamic memory increase. |
Hi guys, I am developing a free web-based photo editor www.Photopea.com , which is used by around 100 000 people a day. It lets people do image editing, including inserting text into a picture.
As there is no sufficient OpenType parser and layout engine in Javascript, I made my own called Typr.js. It is quite advanced and can handle e.g. Arabic text. I also use this JS implementation of BIDI algorithm.
As more and more people use Photopea, I have to extend Typr.js . Currently, I am adding the support for Urdu and Khmer layout. I am often staring at OpenType specification for 5 - 10 hours, without writing a single line of code, only trying to understand what they mean. I would be more than happy to drop Typr.js and use an alternative, if there was any.
Would you be able to provide a WebAssembly version of your library to the public, while documenting it and maintaining it? I am ready to pay 5k - 10k USD for it. It is also important, that the library is not too large (e.g. 150-200kB zipped), as every person has to download it when starting Photopea.
The text was updated successfully, but these errors were encountered: