Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect glyph for 'u' #7

Open
bennylin opened this issue Jul 10, 2023 · 4 comments
Open

Incorrect glyph for 'u' #7

bennylin opened this issue Jul 10, 2023 · 4 comments

Comments

@bennylin
Copy link

bennylin commented Jul 10, 2023

Title

Incorrect glyph for 'u'.
Writing this bug report on behalf of https://incubator.wikimedia.org/wiki/User_talk:Surung_Simanullang

Font

Full file name, for example 'NotoSansBatak-Regular.ttf'.
You can upload the problem font here unless it is a Chinese, Japanese or Korean font (these are large).
NotoSansBatak-Regular.zip

Where the font came from, and when

For example:
Site: I believe from https://fonts.google.com/noto/specimen/Noto+Sans+Batak, that's where I usually download Noto fonts from. But I'm not 100% sure.
Date: 2021-04-01 (according to the file date on my folder)

Font Version

  • Win -- 3.1, August 2, 2020

OS name and version

This is especially important if the font came pre-installed.

Application name and version

If the issue is observed using a specific app.

Issue

Summarize the issue briefly -- one paragraph preferred

  1. Write ᯖᯪᯀᯮᯰ ᯉᯪ ᯔᯉᯮᯂ᯲ in Noto Sans Batak (image 1), translit 'tiung ni manuk'
  2. The glyph for letter ᯀᯮ is incorrect, since it didn't display as the ᯀ (1BC0) + ᯮ (1BEE)
  3. Observed results (see image 1)
  4. Expected results: should look like ᯮ at the bottom right (see image 2)
  5. Additional information

Example from Pustaha Laklak (image 3) Add MS 15678, f. 10r https://www.bl.uk/manuscripts/Viewer.aspx?ref=add_ms_15678_f001r

Character data

Please include real character data to illustrate your issue-- Unicode codepoints are helpful. This makes it possible for developers who don't know the language or script to copy/paste the text to reproduce the issue.

  • ᯀ (1BC0) + ᯮ (1BEE) = ᯀᯮ , translit 'u'
  • which, according to Batak speaker, Surung_Simanullang, only occur as 'ung'
  • ᯀ (1BC0) + ᯮ (1BEE) + ᯰ (1BF0) = ᯀᯮᯰ, translit 'ung'
  • ᯖ (1BD6) + ᯪ (1BEA) + ᯀ (1BC0) + ᯮ (1BEE) + ᯰ (1BF0) = ᯖᯪᯀᯮᯰ in ᯖᯪᯀᯮᯰ ᯉᯪ ᯔᯉᯮᯂ᯲ translit 'tiung ni manuk'

Screenshot

If possible, include a screenshot or an image illustrating the issue.
Annotations are also helpful.

image
Image 1

image
Image 2

image
Image 3

@simoncozens
Copy link
Contributor

I see what has happened. The Unicode proposal has this chart:
Screenshot 2023-07-13 at 20 47 12
Notice how "hu" and "bu" keep the zig-zag shape of the ᯮ - but a+u does not. This may be a mistake; I am not sure, but I imagine that Noto Batak may have been implemented according to the Unicode proposal, not according to the manuscript evidence. We would probably need a little more research to see whether the current form is ever used, or if it is a mistake.

@r12a
Copy link

r12a commented Jul 14, 2023

Fwiw, other -u ligatures also invert the direction of the -u vowel strokes. Besides the a+u and Mandailing hu above, they include gu, and wu (see https://r12a.github.io/scripts/batk/btk.html#u_ligatures).

We would probably need a little more research to see whether the current form is ever used, or if it is a mistake.

I'm no expert, but given the propensity for -u to ligate in various ways in Batak, but also in certain other scripts (eg. Tamil), it's not surprising to me that these ligatures may look slightly different.

@bennylin
Copy link
Author

bennylin commented Jul 14, 2023

Fwiw, other -u ligatures also invert the direction of the -u vowel strokes. Besides the a+u and Mandailing hu above, they include gu, and wu (see https://r12a.github.io/scripts/batk/btk.html#u_ligatures).

We would probably need a little more research to see whether the current form is ever used, or if it is a mistake.

I'm no expert, but given the propensity for -u to ligate in various ways in Batak, but also in certain other scripts (eg. Tamil), it's not surprising to me that these ligatures may look slightly different.

as well as 'lu', which should be on the bottom-right corner. I was going to submit a new bug report, but since this matter has been brought up here, I will submit the examples here as well.

image

all of these are brought to light by Surung Simanullang, because he's the expert on this. Hopefully some day he will be able to join our conversation here (you just need to create an account on github).

Exhibit 1: ADD MS 15678 p 8
Text 1: ᯑᯮᯰᯎᯮᯒᯬᯉ᯲
Translit 1: dungguron
Text 2: ᯑᯬᯂᯬᯖ᯲ ᯇᯬᯎᯮ ᯉᯪ ᯀᯞᯔᯉ᯲
Translit 2: dohot pogu ni alaman

image
image

Exhibit 2: ADD MS 15678 pp 13-14
Text 1: ᯘᯎᯮ
Translit 1: sagu
(continued on the next page, the complete word reads 'sagusagu' or 'ᯘᯎᯮᯘᯎᯮ')
Text 2:

image
image
image

@simoncozens
Copy link
Contributor

OK, I'm unclear about the resolution here. I can see that the manuscript forms tend to keep the direction of -u, whereas the Unicode proposal and code charts have a bit more flexibility in how -u gets ligated; we use the forms from Unicode. Could this be a unification issue? Do we need stylistic sets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants