-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagging Bengali post-base consonants #38
Comments
(in reverse order....)
|
Just at the source code! |
(2). The fix here is for me to update the discussion text, saying that 'three consonants can occur after the base consonant: the post-base Ya and the the two below-base, Ra and Ba." Etc. Item (1) is, I think, subsumed by the corrections on #41. |
Pushed 9ef09bb, propagating the revised language from Bengali to all non-SInhala Indic scripts. I think this is covered in the update, but as on the other two issues, have a look and let me know how it reads before I close it. |
Looks good! |
Hi @n8willis,
I've got a couple of questions about post-base consonant tagging:
Section 2.7 in the Bengali spec mentions that any non-base consonants that occur after a matra should be tagged with
POS_POSTBASE_CONSONANT
. HarfBuzz appears to tag them with (their version of)POS_FINAL_CONSONANT
instead, plus there is a comment mentioning that this only occurs in Sinhala. Highlighted HarfBuzz code here. Are we taking a different approach here? (The syllables we scraped from Wikipedia contain a fair number of "Ya", "Ba" and "Ra" consonants that occur after the base consonant but do not occur after a matra, thus leaving them untagged).The same section mentions that Bengali "includes one post-base consonant" ("Ya"), but Section 1 contradicts that by saying "three consonants in Bengali are allowed to occur in post-base position: "Ya", "Ba", and "Ra"." Is the statement in Section 1 the correct one? These same scraped syllables imply that it is.
The text was updated successfully, but these errors were encountered: