Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of Chinese character components #2

Closed
lancejpollard opened this issue Jan 12, 2019 · 2 comments
Closed

Number of Chinese character components #2

lancejpollard opened this issue Jan 12, 2019 · 2 comments

Comments

@lancejpollard
Copy link

Hi there, this looks like an interesting project. I'm wondering if it involves coming up with a set of primitives/components/radicals/etc. that are building blocks of all chinese characters. I'm interested to know if you found a system to handle every character, or if there are still new characters you encounter which throw the framework for a loop -- that is, the framework wasn't able to account for it, for some new stroke or character component of some sort.

I'd also be interested to know of a list of such components if available. I'm not sure I see them included in the data directory.

Thank you.

@lancejpollard
Copy link
Author

Wondering if you have somewhere also that describes any math behind it, such as how you know that subdividing the grid will allow for covering all possible ways a stroke might be on a Chinese character. I'd be really interested to know how that works.

@LingDong-
Copy link
Owner

Hi there! Thanks for your interest in the project.

Yes most characters can be broken down to a limited set of primitive components. By primitive component I mean a character whose definition does not contain that of any other character.

If a new character cannot be composed of existing primitive components, you can think of that character as a primitive component itself.

Most of these components are defined in the beginning of the rrpl.json file, but they're not delibrately organized yet. You can catch them with a regex such as "[123456780\-\|\(\)]+".

If you would just like to see a list of common radicals in the Chinese language, check out https://zh.wikipedia.org/wiki/部首#康熙部首讀音表, they're in the 2nd column of the table.

The way the code subdivides the gird is by counting the number of "parrallel" components. For example, in (A)-(B), component A and B each get 50% of horizontal space, while in (A)-(B)-(C), the components each get 1/3 of horizontal space, and in ((A)-(B))-(C), A and B each get 25% and C gets 50%.

I hope my explanation is helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants