Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Korean: Name test failures #49

Closed
scossu opened this issue Sep 4, 2023 · 6 comments
Closed

Korean: Name test failures #49

scossu opened this issue Sep 4, 2023 · 6 comments
Assignees
Milestone

Comments

@scossu
Copy link
Collaborator

scossu commented Sep 4, 2023

Current list of failing names (from Y. Lee's list:
korean_names_tests.log

@scossu
Copy link
Collaborator Author

scossu commented Sep 5, 2023

Looking at this error list I can't figure out if there is a number of logical errors or if it's a matter of adding some exception rules.

This one stands out as a possible error in the test string:

- Ŏ Yun-jŏk
?  ^
+ Ŏ,Yun-jŏk
?  ^
 : S2R transliteration error for korean_names!
Original: 어윤적

I haven't seen the comma notation elsewhere. Was that intended?

@hyoungl
Copy link
Collaborator

hyoungl commented Sep 5, 2023

It is actually a very simple logic.
Func NameRomanizer() checks if there is a comma or a center dot.
If yes, it separates strings by the comma or the center dot and romanizes each name separately: Func BatchRom()
If no, it just romanizes one name: Func KorNameRom20()

@hyoungl
Copy link
Collaborator

hyoungl commented Sep 5, 2023

The notation you mentioned is for a different situation, where cataloging rules (not romanization rules) require a comma between the last name and the first name. This function should not turn on globally, because it applies only to certain fields like 100, 600, 700, etc.

@scossu
Copy link
Collaborator Author

scossu commented Sep 6, 2023

Right, I was referring to the comma in the result rather than in the source.

So, the resolution for Ŏ,Yun-jŏk would be to replace the comma in the expected result with a space?

If you think that an extra option may be necessary for adding a comma in specific fields, I can add it. I tried to avoid per-script options so far, but if it's necessary, I'll add it.

@hyoungl
Copy link
Collaborator

hyoungl commented Sep 7, 2023

Basically, there should not be a comma between the first name and the last name.
The logic was, romanization with no comma (a comma never occurs in Hangul script anyway).
If the Hangul script is in the field 100, 600, 700, or 800, then add a comma following the cataloging practice.

@scossu
Copy link
Collaborator Author

scossu commented Sep 19, 2023

I added an option to specify the MARC field for Korean names. If the value is one of those you indicate, the Romanized text will add a comma after the last name.

@scossu scossu closed this as completed Sep 19, 2023
@scossu scossu added this to the Phase 2 milestone Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants