Kazuraki breaks assumptions about hanmen layout #138

simoncozens · 2015-07-23T15:21:03Z

The Kazuraki font is a calligraphic Japanese font, which contains vertical ligatures. Currently we shape Japanese characters independently, so the ligatures are never triggered. Also some of the vertical spacing is totally wrong.

alerque · 2015-07-23T15:25:19Z

私はちょうどそれを言及するつもりでした。

simoncozens · 2015-07-23T15:29:19Z

...and it looks like Harfbuzz won't help me here.

> text = "として"
> SILE.shaper:shapeToken(text, SILE.font.loadDefaults({ font = "Kazuraki SPN",　direction = "TTB", language = "ja" }))
{
  {
    codepoint = 1699,
    depth = 0,
    height = 8.81,
    name = "",
    width = 10,
  },
  {
    codepoint = 1682,
    depth = 0,
    height = 13.31,
    name = "",
    width = 10,
  },
  {
    codepoint = 1697,
    depth = 0,
    height = 7.97,
    name = "",
    width = 10,
  },
}

That's meant to be a single ligature, I think.

simoncozens · 2015-07-23T15:33:14Z

Ligged and non-ligged variants:

simoncozens · 2015-07-24T13:15:24Z

OK, I just had lunch with Behdad and he explained the problem. To use Kazuraki font, you also have to add\font[..,features="+liga"] because it doesn't turn ligatures on by default.

However, because ja.lua shapes each character individually, Harfbuzz will never see enough text to form a ligature. If you don't tell SILE that the text is Japanese, you get ligatured output, but of course this has its own problems.

behdad · 2015-07-25T15:16:11Z

However, because ja.lua shapes each character individually, Harfbuzz will never see enough text to form a ligature. If you don't tell SILE that the text is Japanese, you get ligatured output, but of course this has its own problems.

Not sure what you mean...

Are you telling HarfBuzz that this is vertical layout or not? What "other" problems do you mean?

simoncozens · 2015-07-26T20:14:29Z

Hey Behdad! Yes, we're telling Harfbuzz it's vertical. There are two problems:

The Japanese language support module languages/ja.lua applies the JIS X 4051 rules for inter-character spacing. To do this, the string is broken up into individual glyphs, each glyph is shaped and glue and penalty is (optionally) added between each pair of glyphs according to the rules. So the "として" ligature will never be formed because ja.lua passes the three glyphs "と", "し" and "て" to the shaper independently. And if Kazuraki didn't exist, that would be fine, because you want to allow for line break opportunities between those glyphs. To make Kazuraki work perfectly, you would need to essentially implement Japanese hyphenation! i.e. you send "として" to the shaper in one go, but allow the line break algorithm to split it. I am not sure this is an intelligent thing to do for the sake of one font, even one as pretty as Kazuraki.
If you turn off the special ja.lua intercharacter penalty/glue handling and use the default SILE approach of shipping text to the shaper chunked on UCD line breaking data, the ligatures work OK. But something is also wrong with our glyph metric code. In particular, I think the Y-advance of some characters is not big enough.

In this example, there is not enough space between 日 and 本 (本 should be positioned lower = 日 advance is not big enough) and か and ら.

To be honest I think the metric code is suspect. (See this thread on the HB mailing list.) Here is the code from justenoughharfbuzz.c:

void calculate_extents(box* b, hb_glyph_info_t glyph_info, hb_glyph_position_t glyph_pos, FT_Face ft_face, double point_size, hb_direction_t direction) {
  FT_Error error = FT_Load_Glyph(ft_face, glyph_info.codepoint, FT_LOAD_NO_SCALE);
  if (error) return;
  FT_Glyph glyph;
  error = FT_Get_Glyph(ft_face->glyph, &glyph);
  if (error) return;
  FT_BBox ft_bbox;
  FT_Glyph_Get_CBox(glyph, FT_GLYPH_BBOX_UNSCALED, &ft_bbox);
  FT_Fixed advance;
  FT_Get_Advance(ft_face, glyph_info.codepoint, FT_LOAD_NO_SCALE, &advance);
  const FT_Glyph_Metrics *ftmetrics = &ft_face->glyph->metrics;
  b->width = advance * point_size / ft_face->units_per_EM;
  if (direction == HB_DIRECTION_TTB) {
    FT_Get_Advance(ft_face, glyph_info.codepoint, FT_LOAD_NO_SCALE | FT_LOAD_VERTICAL_LAYOUT, &advance);
    b->height = advance * point_size / ft_face->units_per_EM;
    b->depth = 0;
  } else {
    b->height = ft_bbox.yMax * point_size / ft_face->units_per_EM;
    b->depth = -ft_bbox.yMin * point_size / ft_face->units_per_EM;
  }
  FT_Done_Glyph(glyph);
}

khaledhosny · 2015-08-09T14:21:01Z

I think one way to handle this, is to apply the JIS X 4051 spacing rules after shaping not before it. I think you will need output glyph to input character mapping (which you will need anyway for #110 anyway) as JIS X 4051 rules are character based, but this can be done using cluster values from hb_glyph_info_t.

behdad · 2015-08-12T13:27:45Z

What @khaledhosny said.

simoncozens · 2015-10-11T13:02:01Z

Implementing #179 has made this now work!

This was referenced Sep 28, 2015

SILE can be very slow #177

Closed

Clean up the text handling pipeline #179

Closed

simoncozens closed this as completed Oct 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kazuraki breaks assumptions about hanmen layout #138

Kazuraki breaks assumptions about hanmen layout #138

simoncozens commented Jul 23, 2015

alerque commented Jul 23, 2015

simoncozens commented Jul 23, 2015

simoncozens commented Jul 23, 2015

simoncozens commented Jul 24, 2015

behdad commented Jul 25, 2015

simoncozens commented Jul 26, 2015

khaledhosny commented Aug 9, 2015

behdad commented Aug 12, 2015

simoncozens commented Oct 11, 2015

Kazuraki breaks assumptions about hanmen layout #138

Kazuraki breaks assumptions about hanmen layout #138

Comments

simoncozens commented Jul 23, 2015

alerque commented Jul 23, 2015

simoncozens commented Jul 23, 2015

simoncozens commented Jul 23, 2015

simoncozens commented Jul 24, 2015

behdad commented Jul 25, 2015

simoncozens commented Jul 26, 2015

khaledhosny commented Aug 9, 2015

behdad commented Aug 12, 2015

simoncozens commented Oct 11, 2015