Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyphenation should allow initial and final one-letter syllables #1276

Open
dliessi opened this issue Mar 15, 2020 · 7 comments
Open

Hyphenation should allow initial and final one-letter syllables #1276

dliessi opened this issue Mar 15, 2020 · 7 comments

Comments

@dliessi
Copy link
Collaborator

dliessi commented Mar 15, 2020

As an example, "Ave Maria" is currently hyphenated (with the Italian dictionary) as "Ave Ma -- ria", while the correct hyphenation is "A -- ve Ma -- ri -- a".
One-letter initial and final syllables are quite frequent in both Italian and Latin.

@dliessi
Copy link
Collaborator Author

dliessi commented Mar 16, 2020

I haven't read the code, and I don't know whether this should be addressed by the dictionary or by our code.

In Italian (and probably Latin, when hyphenated using the Italian dictionary) the initial one-letter syllable case could be worked around with the following logic:

  • hyphenate the words: "ave" -> "ave", "amore" -> "amo -- re", "astro" -> "astro"
  • check the first resulting syllable: if it starts with a vowel and a consonant and contains at least another vowel, then it is actually two syllables; in that case split after the first vowel: "ave" -> "a -- ve", "amo -- re" -> "a -- mo -- re", "astro" -> "a -- stro"

I can't imagine exceptions to this logic, according to the Italian rules.

Final one-letter syllables are a bit trickier, because whether vowels should or should not be divided depends on the pair of vowels and on the word.
I think that the possibilities are few, so even enumerating them should be feasible: I'll check a grammar book.

Of course if the problem depends on the dictionary I would prefer fixing it rather than working around it with special cases.

@dliessi
Copy link
Collaborator Author

dliessi commented Apr 9, 2020

I marked this issue as solved, even though in Italian final one-letter syllables remain not hyphenated: indeed the cause for the latter is that the relevant hyphenation patterns are missing, so it does not depend on Frescobaldi.

I'll track this remaining part of the issue and work on it outside Frescobaldi.

@dliessi dliessi reopened this Apr 10, 2020
@dliessi
Copy link
Collaborator Author

dliessi commented Apr 10, 2020

I reverted 77048db in 5615430.
Some dictionaries assume that initial and final one-letter syllables are forbidden, so the change, although technically correct, introduced some hyphenation mistakes.

Possible approaches:

  • whitelists of words with initial and final one-letter syllables,
  • blacklists correcting the mistakes,
  • improve the dictionaries outside Frescobaldi.

The whitelist and blacklist approaches should be developed in a way that takes advantage of regular patterns and not simply as lists of words.
I would still prefer to improve the dictionaries: I suspect that for many languages it is only a matter of adding or fixing very few patterns.

@dliessi
Copy link
Collaborator Author

dliessi commented Apr 10, 2020

In addition we could set left and right on a per language basis.

@wbsoft
Copy link
Collaborator

wbsoft commented Apr 20, 2020

I think improving the dictionaries outside of Frescobaldi is a bit hairy, as those dictionaries are (should be) quite tried-and-trusted, and also probably auto-generated from really large real life word lists.

Interesting discussion about how patterns are prepared.
https://tex.stackexchange.com/questions/262588/how-are-hyphenation-patterns-written

Maybe we should add an interface to add hyphenation patterns (effectively building a whitelist, via a user interface), and indeed, to make the left and right length per-language configurable.

@dliessi
Copy link
Collaborator Author

dliessi commented Apr 21, 2020

Maybe we should add an interface to add hyphenation patterns (effectively building a whitelist, via a user interface), and indeed, to make the left and right length per-language configurable.

I agree.

See https://tug.org/tex-hyphen/: in the table they list the minimum length of initial and final syllables assumed by the dictionaries.
Maybe the default values should be taken from there (most hyphenation dictionaries originally come from TeX).

I think improving the dictionaries outside of Frescobaldi is a bit hairy, as those dictionaries are (should be) quite tried-and-trusted, and also probably auto-generated from really large real life word lists.

At least in the case of Italian, regardless of what the table says, the dictionary already fully supports one-letter initial syllables; some hyphenation points (namely those between two vowels) are missed also in the middle of words, which counts as a bug in my opinion; solving that bug would add support for one-letter final syllables.
I'll get in touch with the maintainer of the Italian (and Latin) dictionary anyway, hoping to solve that bug.

Especially if the patterns are auto-generated (as we suspect), solving the bug should be possibly tedious but probably not difficult, as it should only be a matter of adding missing hyphenation points to the word list.

@wbsoft
Copy link
Collaborator

wbsoft commented Apr 21, 2020

I am currently leaning towards making a specialized music hyphen package, and including the dictionaries, although clearly marking what we change. So we need no config/whitelist UI, but get as best as possible behaviour out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants