Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indic2 final-Reph reordering algorithm doesn't seem to match spec #2298

Closed
n8willis opened this issue Apr 2, 2020 · 7 comments
Closed

Indic2 final-Reph reordering algorithm doesn't seem to match spec #2298

n8willis opened this issue Apr 2, 2020 · 7 comments
Labels

Comments

@n8willis
Copy link
Collaborator

n8willis commented Apr 2, 2020

Over in the opentype-shaping-documents repo, on issue 48 it was pointed out that the HarfBuzz final-reordering algorithm for Rephs in Indic2 seems pretty different from the MS script-shaping spec.

Most notably is adrianwong's comment that HB repeats step b at step e (that's not my analysis; the comment inline says the code was copied), when step e is very different in the Microsoft spec page.

Since this difference does not seem to have generated test errors for HarfBuzz, we were wondering what the rationale is for the step-b-to-e repeat. Or what else might be going on.

@n8willis n8willis added the Indic label Apr 2, 2020
@behdad
Copy link
Member

behdad commented Apr 15, 2020

We started with the spec but then adjusted to match our understanding of what Uniscribe was doing. If you're suggesting a change we need to see tests and how Uniscribe handles them. Reading the code and comment it looks like the copying was very intentional.

@n8willis
Copy link
Collaborator Author

No, not suggesting a change. Really just wanting to understand the reason — for example, whether (a) it was discovered that step e in the spec produces incorrect results, or possibly (b) step e was a no-op somehow, or (c) it was discovered that Uniscribe actually didn't do step e, etc.

It sounds like you're saying it was (c)?

For the shaping doc repo, it's just important to record if there's a mistake in what the MS pages say. The usual caveats apply as to whether MS would change their page content if we reported an bug against it, but we can record the issue for future reference.

@behdad
Copy link
Member

behdad commented Apr 16, 2020

I used git blame and you are right that the copy was blatant:

commit d0e68dbd0b9fc9a42c4280d01c8ffd9c5015d550
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Fri Jul 20 11:25:41 2012 -0400

    [Indic] Implement reph positioning step 5
    
    Not tuned, just copied from step 2.  Fixes another 0.5% of Kannada
    failures.  1% to go.

So yes possible that can be improved. I'm so far removed from that code right now, but maybe @jfkthame can suggest something. I don't have my extensive testing infrastructure set up. But maybe we do some testing setup in the cloud for realz this time.

@n8willis
Copy link
Collaborator Author

Cool. @adrianwong brought it to my attention in the docs repo, so it's possible he'd have testworthy material, too. Knowing it might be Kannada-related is a good start.

(Super intriguing to me that nobody (AFAICT) has noticed any incorrect results from it over the years, so I wonder if it's only appearing in rare corner cases or if the spec doesn't actually need that step in it at all.)

@behdad
Copy link
Member

behdad commented Apr 17, 2020

I read it some now. Looks like some of the step 5 we do in step 6. So it's just mispositioned with the comments:

    }

    /*       5. If no consonant is found in steps 3 or 4, move reph to a position
     *          immediately before the first post-base matra, syllable modifier
     *          sign or vedic sign that has a reordering class after the intended
     *          reph position. For example, if the reordering position for reph
     *          is post-main, it will skip above-base matras that also have a
     *          post-main position.
     */
    reph_step_5:
    {
      /* Copied from step 2. */
      new_reph_pos = start + 1;
      while (new_reph_pos < base && !is_halant (info[new_reph_pos]))
        new_reph_pos++;

      if (new_reph_pos < base && is_halant (info[new_reph_pos]))
      {
        /* ->If ZWJ or ZWNJ are following this halant, position is moved after it. */
        if (new_reph_pos + 1 < base && is_joiner (info[new_reph_pos + 1]))
          new_reph_pos++;
        goto reph_move;
      }
    }

followed by:

    /*       6. Otherwise, reorder reph to the end of the syllable.
     */
    {
      new_reph_pos = end - 1;
      while (new_reph_pos > start && info[new_reph_pos].indic_position() == POS_SMVD)
        new_reph_pos--;

      /*
       * If the Reph is to be ending up after a Matra,Halant sequence,
       * position it before that Halant so it can interact with the Matra.
       * However, if it's a plain Consonant,Halant we shouldn't do that.
       * Uniscribe doesn't do this.
       * TEST: U+0930,U+094D,U+0915,U+094B,U+094D
       */
      if (!indic_plan->uniscribe_bug_compatible &&
          unlikely (is_halant (info[new_reph_pos])))
      {
        for (unsigned int i = base + 1; i < new_reph_pos; i++)
          if (info[i].indic_category() == OT_M) {
            /* Ok, got it. */
            new_reph_pos--;
          }
      }

      goto reph_move;
    }

So step 6 says just move it to the end, but we move it to before SMVD (syllable-modifier & vedic signs), which is what step 5 was about.

A gentle reshuffling of code/comments without behavior change is what I suggest.

@n8willis
Copy link
Collaborator Author

Okay. Makes sense. Thanks!

@n8willis
Copy link
Collaborator Author

I will take a crack at finessing the comments if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants