Indic2 final-Reph reordering algorithm doesn't seem to match spec #2298

n8willis · 2020-04-02T09:44:19Z

Over in the opentype-shaping-documents repo, on issue 48 it was pointed out that the HarfBuzz final-reordering algorithm for Rephs in Indic2 seems pretty different from the MS script-shaping spec.

Most notably is adrianwong's comment that HB repeats step b at step e (that's not my analysis; the comment inline says the code was copied), when step e is very different in the Microsoft spec page.

Since this difference does not seem to have generated test errors for HarfBuzz, we were wondering what the rationale is for the step-b-to-e repeat. Or what else might be going on.

behdad · 2020-04-15T21:12:34Z

We started with the spec but then adjusted to match our understanding of what Uniscribe was doing. If you're suggesting a change we need to see tests and how Uniscribe handles them. Reading the code and comment it looks like the copying was very intentional.

n8willis · 2020-04-16T08:18:11Z

No, not suggesting a change. Really just wanting to understand the reason — for example, whether (a) it was discovered that step e in the spec produces incorrect results, or possibly (b) step e was a no-op somehow, or (c) it was discovered that Uniscribe actually didn't do step e, etc.

It sounds like you're saying it was (c)?

For the shaping doc repo, it's just important to record if there's a mistake in what the MS pages say. The usual caveats apply as to whether MS would change their page content if we reported an bug against it, but we can record the issue for future reference.

behdad · 2020-04-16T18:15:34Z

I used git blame and you are right that the copy was blatant:

commit d0e68dbd0b9fc9a42c4280d01c8ffd9c5015d550
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Fri Jul 20 11:25:41 2012 -0400

    [Indic] Implement reph positioning step 5
    
    Not tuned, just copied from step 2.  Fixes another 0.5% of Kannada
    failures.  1% to go.

So yes possible that can be improved. I'm so far removed from that code right now, but maybe @jfkthame can suggest something. I don't have my extensive testing infrastructure set up. But maybe we do some testing setup in the cloud for realz this time.

n8willis · 2020-04-17T15:04:01Z

Cool. @adrianwong brought it to my attention in the docs repo, so it's possible he'd have testworthy material, too. Knowing it might be Kannada-related is a good start.

(Super intriguing to me that nobody (AFAICT) has noticed any incorrect results from it over the years, so I wonder if it's only appearing in rare corner cases or if the spec doesn't actually need that step in it at all.)

behdad · 2020-04-17T15:44:38Z

I read it some now. Looks like some of the step 5 we do in step 6. So it's just mispositioned with the comments:

    }

    /*       5. If no consonant is found in steps 3 or 4, move reph to a position
     *          immediately before the first post-base matra, syllable modifier
     *          sign or vedic sign that has a reordering class after the intended
     *          reph position. For example, if the reordering position for reph
     *          is post-main, it will skip above-base matras that also have a
     *          post-main position.
     */
    reph_step_5:
    {
      /* Copied from step 2. */
      new_reph_pos = start + 1;
      while (new_reph_pos < base && !is_halant (info[new_reph_pos]))
        new_reph_pos++;

      if (new_reph_pos < base && is_halant (info[new_reph_pos]))
      {
        /* ->If ZWJ or ZWNJ are following this halant, position is moved after it. */
        if (new_reph_pos + 1 < base && is_joiner (info[new_reph_pos + 1]))
          new_reph_pos++;
        goto reph_move;
      }
    }

followed by:

    /*       6. Otherwise, reorder reph to the end of the syllable.
     */
    {
      new_reph_pos = end - 1;
      while (new_reph_pos > start && info[new_reph_pos].indic_position() == POS_SMVD)
        new_reph_pos--;

      /*
       * If the Reph is to be ending up after a Matra,Halant sequence,
       * position it before that Halant so it can interact with the Matra.
       * However, if it's a plain Consonant,Halant we shouldn't do that.
       * Uniscribe doesn't do this.
       * TEST: U+0930,U+094D,U+0915,U+094B,U+094D
       */
      if (!indic_plan->uniscribe_bug_compatible &&
          unlikely (is_halant (info[new_reph_pos])))
      {
        for (unsigned int i = base + 1; i < new_reph_pos; i++)
          if (info[i].indic_category() == OT_M) {
            /* Ok, got it. */
            new_reph_pos--;
          }
      }

      goto reph_move;
    }

So step 6 says just move it to the end, but we move it to before SMVD (syllable-modifier & vedic signs), which is what step 5 was about.

A gentle reshuffling of code/comments without behavior change is what I suggest.

n8willis · 2020-04-27T12:31:30Z

Okay. Makes sense. Thanks!

n8willis · 2020-04-27T12:32:04Z

I will take a crack at finessing the comments if you like.

#2298 (comment)

n8willis added the Indic label Apr 2, 2020

behdad added a commit that referenced this issue Jun 17, 2020

[indic] Add comment

e5de391

#2298 (comment)

adrianwong mentioned this issue Feb 22, 2022

[Indic] Final Reph reordering n8willis/opentype-shaping-documents#48

Open

behdad closed this as completed Jun 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indic2 final-Reph reordering algorithm doesn't seem to match spec #2298

Indic2 final-Reph reordering algorithm doesn't seem to match spec #2298

n8willis commented Apr 2, 2020

behdad commented Apr 15, 2020

n8willis commented Apr 16, 2020

behdad commented Apr 16, 2020

n8willis commented Apr 17, 2020

behdad commented Apr 17, 2020 •

edited

Loading

n8willis commented Apr 27, 2020

n8willis commented Apr 27, 2020

Indic2 final-Reph reordering algorithm doesn't seem to match spec #2298

Indic2 final-Reph reordering algorithm doesn't seem to match spec #2298

Comments

n8willis commented Apr 2, 2020

behdad commented Apr 15, 2020

n8willis commented Apr 16, 2020

behdad commented Apr 16, 2020

n8willis commented Apr 17, 2020

behdad commented Apr 17, 2020 • edited Loading

n8willis commented Apr 27, 2020

n8willis commented Apr 27, 2020

behdad commented Apr 17, 2020 •

edited

Loading