Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Korean: breve normalization #28

Closed
scossu opened this issue Jul 20, 2023 · 3 comments
Closed

Korean: breve normalization #28

scossu opened this issue Jul 20, 2023 · 3 comments
Milestone

Comments

@scossu
Copy link
Collaborator

scossu commented Jul 20, 2023

In https://github.com/lcnetdev/scriptshifter/blob/korean/scriptshifter/hooks/korean/Functions_KoreanRomanizer.au3#L295C1-L301C1 :

	  If $OCLC="No" Then
		 Local $MARC8[4][2] = [["ŏ","ŏ"],["ŭ","ŭ"],["Ŏ","Ŏ"],["Ŭ","Ŭ"]]
		 For $i = 0 To Ubound($MARC8, 1) - 1
			$KorNameRom = StringRegExpReplace($KorNameRom, "\Q" & $MARC8[$i][0] & "\E",$MARC8[$i][1])
		 Next
	  EndIf

This is replacing the vowel + combining breve pair with a single vowel with breve. I remember discussing this as a general normalization step, and I verified this is running in my code.

However, the test strings for the expected results have the combined version, e.g. https://github.com/lcnetdev/scriptshifter/blob/korean/tests/data/sample_strings.csv#L791:

허상 과 실상 : 한국 정치 의 성숙 을 갈망 하며,Hŏsang kwa silsang: Han'guk chŏngch'i ŭi sŏngsuk ŭl kalmang hamyŏ

@hyoungl Were the test strings written with $OCLC="yes" in mind? If so, shall I replace all the combined breve letters with their one-character versions?

@hyoungl
Copy link
Collaborator

hyoungl commented Jul 20, 2023

$OCLC ="yes" is mainly for the situation where you have to use OCLC Connexion and Voyager.
K-Romanizer chooses different versions only on OCLC Connexion, thus checking $OCLC="yes" or "no"
In Scriptshifter, do you need to deal with both cases, namely 1-character versions and 2-character versions?

@scossu
Copy link
Collaborator Author

scossu commented Jul 20, 2023

No, we decided that we want to normalize all breves to the 1-character version.

Shall I go ahead and change the test strings to the 1-character breve then?

@hyoungl
Copy link
Collaborator

hyoungl commented Jul 20, 2023

Sure

@scossu scossu closed this as completed Jul 21, 2023
@scossu scossu added this to the Phase 2 milestone Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants