Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add harfbuzz shaping support #28

Closed
wants to merge 1 commit into from

Conversation

ebraminio
Copy link

Hopefully you will be unblocked with this ;)

@ebraminio ebraminio force-pushed the gh-pages branch 2 times, most recently from 69a0fcc to 4425f99 Compare March 28, 2019 22:41
@photopea
Copy link
Owner

Hey, I am not familiar with harfbuzz so much to be able to accept this PR just like that. It also makes Typr.js 15x larger :D

I somehow fixed the Urdu support in my local version of Typr.js . Now I am trying to solve typing. The user has to be able to click into a rendered text, and the cursor will appear, and the user can type.

My current problem is, that String characters and glyphs are not 1-to-1. The number of glyphs can be smaller or even larger, than the number of characters, depending on substitutions. I need some kind of map between characters and glyphs, so that I can find the output glyph for each character, and the original character for each glyph. Does harfbuzz provide such map?

Is there any public manual for harfbuzz (either C or WASM version), so that I could see what it is capable of?

@ebraminio
Copy link
Author

I need some kind of map between characters and glyphs

Cluster numbers are for that :)

@photopea
Copy link
Owner

photopea commented Mar 28, 2019

BTW. what about the Bidi algorithm? Is it included in Harfbuzz? If I give it a string with both right-to-left and left-to-right runs, will it position glyphs correctly?

Also, I need to separate long paragraphs into multiple runs. I use my own code for it right now. I just want to know, how much of my code will I be able to replace :)

@ebraminio
Copy link
Author

BTW. what about the Bidi algorithm? Is it included in Harfbuzz? If I give it a string with both right-to-left and left-to-right runs, will it position glyphs correctly?

It is not included as harfbuzz supposed to work on text with same direction and script so you have to separate your text by direction and script before feeding it into harfbuzz.

Cluster numbers are for that :)

They are provided with the json object the added function have, the "cl" field.

@ebraminio
Copy link
Author

ebraminio commented Mar 28, 2019

It also makes Typr.js 15x larger :D

Well, it is what you wanted! :) ~200kb if zipped

Is there any public manual for harfbuzz (either C or WASM version), so that I could see what it is capable of?

https://harfbuzz.github.io/

@photopea
Copy link
Owner

photopea commented Mar 28, 2019

So I just found another bug in Khmer rendered by Typr.js : https://imgur.com/OcUt5sS (top is Google Chrome, bottom is Typr.js)

I will try to invest more time into learning about HarfBuzz, as I think it may save me a lot of time in the future.

@ebraminio
Copy link
Author

ebraminio commented Mar 28, 2019

I will try to invest more time into learning about HarfBuzz, as I think it may save me a lot of time in the future.

You can never replicate all the considerations and sometime hacks harfbuzz have to support all the scripts :)

@photopea
Copy link
Owner

Hey, so I tried it and it seems to be working quite well :) Do I understand these parts correctly?

  • HarfBuzz can detect the writing direction from Unicode values - _hb_buffer_guess_segment_properties
  • in the output, glyphs are ordered left-to-right, no matter what the logical order was
  • the "cl" property denotes the cluster beginning. It is not the index of the initial character, but an index into the input byte array (so if "cl"s go like 0, 2, 4 ... there can be 2 characters per cluster, or 1 character per cluster, which uses two bytes in UTF8).

I have some questions:

  • in the output array, "cl" values are always in increasing, or in decreasing order (so that I could get a cluster size)?
  • is it possible to make HarfBuzz identify glyphs by the number - index (in "glyf","CFF","SVG") instead of a string (the "g" property in the output) ?

@ebraminio
Copy link
Author

ebraminio commented Mar 29, 2019 via email

@ebraminio
Copy link
Author

ebraminio commented Mar 29, 2019 via email

@photopea
Copy link
Owner

But when I am calling your function shapeStringToGlyphs for the second time with the same arguments, I am getting an error "OOM", which happens during var buffer = module._malloc(_font.byteLength);

@photopea
Copy link
Owner

I see you are allocating two "buffer"s , but freeing just one of them.

@ebraminio
Copy link
Author

Oh fixed. Also take a look at https://github.com/harfbuzz/harfbuzz/blob/master/src/hb-ot-color.h you may find things there interesting also. Except glyf/CFF parsing, you can drop all of the font parsing you have locally using HarfBuzz.

@ebraminio
Copy link
Author

It is important however hb_face_t instances be cached somehow as preparing it inside iharfbuzz is a bit costly.

@ebraminio
Copy link
Author

ebraminio commented Mar 29, 2019

hb_font_set_scale should be called for setting font size also sometime but it should be scaled by 16 to get subpixel information also

Yes, always is ordered.

Hmm. Maybe not in some corner cases, but most cases I am aware at least.

@photopea
Copy link
Owner

photopea commented Apr 1, 2019

Also, note, that malloc() returns a pointer (integer), and you can not do

text_ptr[text.byteLength] = 0;

Instead, you should use

module.HEAPU8[text_ptr+text.byteLength] = 0;

I am still wondering, how zero-ended text is popular in low-level programming :D To me, it always made more sense to represent it with a pointer + length (and pass as two integers instead of one) :) You save one Byte and can have zeros inside a string.

@photopea
Copy link
Owner

photopea commented Apr 1, 2019

BTW. I think it would be better to keep Typr.js and WASM HarfBuzz separated and independent.

Typr.js is a tiny library, that can handle 95% of use cases. Making it 10x bigger, just to cover the remaining 5% of use cases is not worth it.

I would like to change the interface of Typr.js to make it similar to HarfBuzz (add Typr.U.shape() function, that would return an array in the same format as HarfBuzz shape()). Programmers will be able to switch from Typr.js to HarfBuzz easily, as the interface will be identical, in case they need Urdu etc.

@photopea
Copy link
Owner

photopea commented Apr 1, 2019

**** I do plan to use HarfBuzz heavily in Photopea, and I will probably stop using Typr.js in Photopea completely (or use just parts of it).

@ebraminio
Copy link
Author

Oooh, fixed :)

The problem is there we can't commit to some JavaScript interface, but well, let's see :)

@ebraminio
Copy link
Author

And the idea was to you have something to play :) Let's close this so

@ebraminio ebraminio closed this Apr 1, 2019
@ebraminio
Copy link
Author

ebraminio commented Apr 1, 2019

stop using Typr.js in Photopea completely

Sadly we don't support some of things Typr.js support already, a glyf/CFF glyph points interface, but that is in the roadmap.

@photopea
Copy link
Owner

photopea commented Apr 1, 2019

What do you mean? All I need is the shape() function of HarfBuzz. Please, don't add any extra font processing, it would only make HarfBuzz bigger.

@khaledhosny
Copy link

khaledhosny commented Apr 1, 2019

Also, note, that malloc() returns a pointer (integer), and you can not do

text_ptr[text.byteLength] = 0;

Instead, you should use

module.HEAPU8[text_ptr+text.byteLength] = 0;

I am still wondering, how zero-ended text is popular in low-level programming :D To me, it always made more sense to represent it with a pointer + length (and pass as two integers instead of one) :) You save one Byte and can have zeros inside a string.

NULL-terminated strings is not a requirement here, the length of the string can be passed (and should be, in this case since the string is not already NULL-terminated) instead of -1 to hb_buffer_add_utf8().

@ebraminio
Copy link
Author

What do you mean? All I need is the shape() function of HarfBuzz. Please, don't add any extra font processing, it would only make HarfBuzz bigger.

That is just in the plan for next year or so and may never happen and it is possible to reduce the size by subsetting the library by needed APIs.

instead of -1 to hb_buffer_add_utf8().

Correct, I'll apply that on harfbuzz PR.

@photopea
Copy link
Owner

photopea commented Apr 2, 2019

@ebraminio BTW. how do I set the text direction in your WASM module? My Characters are typed in a following order:

... arabic1 ... 2018 .- 2019 .... arabic2 ....

And they need to be printed as

... arabic2 ... 2019 -. 2018 .... arabic1 ....

So the four-character sequence " .- " (between 2018 and 2019, which are in LTR) needs to be shaped in RTL order. I tried

module._hb_buffer_set_direction(hbuf, 2); // or 1

But it returns glyphs with ax=0.

@ebraminio
Copy link
Author

You need to segment your input before feeding it into harfbuzz, setting direction won't be useful in that IIUC.

@photopea
Copy link
Owner

photopea commented Apr 2, 2019

I did "segment" it. I need to call shape() only on a string " .- " and need the output in RTL.

Currently, module._hb_buffer_guess_segment_properties(); guesses the direction to be LTR.

@ebraminio
Copy link
Author

Aha, yes that call should be removed but you need to detect and set the script also, hmm, I should look if is possible to have the guess but set the direction also without script, most of clients do both at the same time so may is not supported, I am not sure.

@photopea
Copy link
Owner

photopea commented Apr 2, 2019

I solved it. Guess Properties only guesses the unset properties (if you set direction in advance, it will not change it).

module._hb_buffer_set_direction(hbuf, ltr?4:5);
module._hb_buffer_guess_segment_properties(hbuf);

But I had to find the constants 4 and 5 by trying :D

@ebraminio
Copy link
Author

But I had to find the constants 4 and 5 by trying :D

Nice. We should have some way for that also, you can read the headers or use this https://36-24718300-gh.circle-artifacts.com/0/root/project/ABI.json "name" : "HB_DIRECTION_LTR", "value" : "4"; "name" : "HB_DIRECTION_RTL", "value" : "5" for now.

@photopea
Copy link
Owner

photopea commented Apr 5, 2019

@ebraminio I have a strange problem with this WASM version of HarfBuzz.

I have two fonts: A - 700 kB and B - 10 MB. When I malloc() a space for A at the start and use it, everything works well. When I malloc() a space for B at the start, everything works well too.

But when I allocate A, use it, then free() it and try to malloc() space B, I get OOM error (out of memory probably). My single shaping looks like this:

    var buffer = module["_malloc"](fdata.byteLength);  // <=  throws OOM
    module["HEAPU8"].set(fdata, buffer);

    var blob = module["_hb_blob_create"](buffer, fdata.byteLength, 2, 0, 0);
    var face = module["_hb_face_create"](blob, 0);
    var font = module["_hb_font_create"](face);
    var hbuf = module["_hb_buffer_create"]();
					
    var text = utf8Encoder["encode"](str), tl = text.byteLength;
    var text_ptr = module["_malloc"](tl + 1);
    module["HEAPU8"].set(text, text_ptr); 
    module["HEAPU8"][text_ptr+tl]=0;
    module["_hb_buffer_add_utf8"](hbuf, text_ptr, -1, 0, -1);
					
    module["_hb_buffer_set_direction"](hbuf, ltr?4:5);
    module["_hb_buffer_guess_segment_properties"](hbuf);

    module["_hb_shape"](font, hbuf, 0, 0);

    var olen = str.length*100;
    var out = module["_malloc"](olen);
    module["_hb_buffer_serialize_glyphs"](hbuf, 0,
    module["_hb_buffer_get_length"](hbuf), out, olen, 0, font, 1246973774, 4);  //JSON 
    var r = module["HEAPU8"].slice(out, out+olen);
    var json = JSON.parse("[" + utf8Decoder["decode"](r.slice(0, r.indexOf(0))) + "]");

    module["_hb_blob_destroy"](blob);
    module["_hb_font_destroy"](font);
    module["_hb_face_destroy"](face);
    module["_hb_buffer_destroy"](hbuf);
    module["_free"](out);
    module["_free"](buffer);
    module["_free"](text_ptr);

@ebraminio
Copy link
Author

Maybe something is not freed, I don't see it in your code but created with a wasm version with a larger heap anyway, harfbuzzjs.zip

@photopea
Copy link
Owner

photopea commented Apr 5, 2019

Could you please take a closer look at my code? Maybe I am freeing something that should not be freed. Or can you test it yourself while using several large font files multiple times?

@ebraminio
Copy link
Author

Hmm. I don't see the issue here using your code, I guess having more of your code will be more helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants