Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging Bengali post-base consonants #38

Closed
adrianwong opened this issue Oct 31, 2018 · 5 comments
Closed

Tagging Bengali post-base consonants #38

adrianwong opened this issue Oct 31, 2018 · 5 comments
Assignees

Comments

@adrianwong
Copy link
Contributor

adrianwong commented Oct 31, 2018

Hi @n8willis,

I've got a couple of questions about post-base consonant tagging:

  1. Section 2.7 in the Bengali spec mentions that any non-base consonants that occur after a matra should be tagged with POS_POSTBASE_CONSONANT. HarfBuzz appears to tag them with (their version of) POS_FINAL_CONSONANT instead, plus there is a comment mentioning that this only occurs in Sinhala. Highlighted HarfBuzz code here. Are we taking a different approach here? (The syllables we scraped from Wikipedia contain a fair number of "Ya", "Ba" and "Ra" consonants that occur after the base consonant but do not occur after a matra, thus leaving them untagged).

  2. The same section mentions that Bengali "includes one post-base consonant" ("Ya"), but Section 1 contradicts that by saying "three consonants in Bengali are allowed to occur in post-base position: "Ya", "Ba", and "Ra"." Is the statement in Section 1 the correct one? These same scraped syllables imply that it is.

@adrianwong adrianwong changed the title Tagging post-base consonants Tagging Bengali post-base consonants Dec 6, 2018
@n8willis
Copy link
Owner

n8willis commented Jan 7, 2019

(in reverse order....)

  1. Section one is correct; that's just a bug for me.

  2. For POS_FINAL_ vs POS_POSTBASE_, are you looking at the output of hb_shape, or just at the source code?

@n8willis n8willis self-assigned this Jan 7, 2019
@adrianwong
Copy link
Contributor Author

For POS_FINAL_ vs POS_POSTBASE_, are you looking at the output of hb_shape, or just at the source code?

Just at the source code!

@n8willis
Copy link
Owner

n8willis commented Feb 24, 2019

(2). The fix here is for me to update the discussion text, saying that 'three consonants can occur after the base consonant: the post-base Ya and the the two below-base, Ra and Ba." Etc.

Item (1) is, I think, subsumed by the corrections on #41.

@n8willis
Copy link
Owner

n8willis commented Apr 1, 2019

Pushed 9ef09bb, propagating the revised language from Bengali to all non-SInhala Indic scripts. I think this is covered in the update, but as on the other two issues, have a look and let me know how it reads before I close it.

@adrianwong
Copy link
Contributor Author

Looks good!

@n8willis n8willis closed this as completed Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants