Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for macrons (vowel length-marks) #5

Open
Fuco1 opened this issue Jun 16, 2015 · 4 comments
Open

Add support for macrons (vowel length-marks) #5

Fuco1 opened this issue Jun 16, 2015 · 4 comments
Milestone

Comments

@Fuco1
Copy link
Contributor

Fuco1 commented Jun 16, 2015

This would probably need a lot of work over the dictionary, but if we make macrons/lengths supported in the code, the database/dictionary can simply be slowly updated "on the fly".

@mk270
Copy link
Owner

mk270 commented Jun 17, 2015

That's a great idea - there are probably good ways of attacking the lack of macrons in the dictionary, too

@Fuco1
Copy link
Contributor Author

Fuco1 commented Jun 17, 2015

We've been talking about this in #emacs IRC for years, and there are some programmers and people who would put in some work. I might send some of them here.

Good that you took up the initiative, thanks for that :)

@mk270 mk270 changed the title Longterm goal: add support for macrons Add support for macron (vowel length-marks) Jun 21, 2015
@mk270 mk270 changed the title Add support for macron (vowel length-marks) Add support for macrons (vowel length-marks) Jun 28, 2015
@mk270
Copy link
Owner

mk270 commented Jun 30, 2015

@Fuco1
Copy link
Contributor Author

Fuco1 commented Jul 11, 2015

I was thinking about how to do this and here is my plan:

  1. We should make it possible to input text with macrons, but then strip them (simple search&replace) and pass this "fixed" text to later processing.
  2. We should make the code loading the database/inflections/stems be able to work with macrons, but strip them before parsing happens. This means that we don't have to touch the parsing logic at all (and it is not even necessary since latin is unambiguous enough without macrons)
  3. Make the output respect macrons, so that the user sees it when reviewing the output.

This will have a minor inconvenience such as the user inputting exercitūs which we would also parse as nominative singular (which it isn't). I think this is fine. Later, the forms not matching the input can be filtered in a post-process step.

With these changes in place, the database update can be gradual and everything will work in a backward-compatible way.

Steps 1 and 3 should be relatively simple. Step 2 will also require some changes in the utilities generating the binary files---I don't yet understand the purpose of all of them.

@mk270 mk270 modified the milestone: v2 Jul 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants