Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/use @lemma to <w> tokens in corpus #114

Open
iljackb opened this issue Oct 14, 2021 · 0 comments
Open

Add/use @lemma to <w> tokens in corpus #114

iljackb opened this issue Oct 14, 2021 · 0 comments
Labels
enhancement final output goals Goals for tasks to do to achieve best possible output of project and contribution to community to-do

Comments

@iljackb
Copy link
Owner

iljackb commented Oct 14, 2021

This will greatly enhance the content of the corpus however major decisions have to be made about what form to reference as the lemma. Given the homographs due to tone (and lack of representation thereof in orthography adopted), this would probably require tone diacritics to be used as minimal distinguishing markers to be able to have entirely unique forms in the @lemma.

More study and planning needed.

@iljackb iljackb added enhancement to-do final output goals Goals for tasks to do to achieve best possible output of project and contribution to community labels Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement final output goals Goals for tasks to do to achieve best possible output of project and contribution to community to-do
Projects
None yet
Development

No branches or pull requests

1 participant