Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyphenation on words joined with hyphen in Polish #1960

Closed
Omikhleia opened this issue Jan 20, 2024 · 6 comments · Fixed by #1962
Closed

Hyphenation on words joined with hyphen in Polish #1960

Omikhleia opened this issue Jan 20, 2024 · 6 comments · Fixed by #1962
Assignees
Labels
bug Software bug issue
Milestone

Comments

@Omikhleia
Copy link
Member

Omikhleia commented Jan 20, 2024

Rumour has it that in Polish, when a word containing a hyphen is cut at that point, the hyphen must be repeated on the next line -- and that several typesetting systems have unaddressed enhancement requests on that topic:

Typst (issue in 2024): typst/typst#3235
OpenOffice (issue in 2006): https://bz.apache.org/ooo/show_bug.cgi?id=71679

If this typography convention for Polish is correct, then SILE too is currently wrong:

\begin[papersize=a5]{document}
\language[main=pl]
\lua{
    for i = 1,100 do
       SILE.typesetter:typeset("biało-czerwony ")
    end
}
\end{document}

image

Except that it would be a fairly trivial thing to fix... Just a few lines of code, possibly:
(EDIT: Well, a bit more, see PR)

-- Put this in languages/pl.lua
SILE.nodeMakers.pl = pl.class(SILE.nodeMakers.unicode)

function SILE.nodeMakers.pl:handleWordBreak (item)
  if item.text == "-" then
    self:addToken(item.text, item)
    self:makeToken()
    coroutine.yield(SILE.nodefactory.discretionary({
      postbreak = SILE.shaper:createNnodes("-", self.options)
    }))
  else
    self._base.handleWordBreak(self, item)
  end
end

image

Yay! 😄


I could make a PR, but the details are in the devil. Here I stack the "-" onto the previous word, and insert a postbreak discretionary.
But we could also stack the "-" on the next word, or even wholly ignore it (of course, each time using an adequate postbreak/prebreak/replacement discretionary).

The question at stakes is how is supposed to be hyphenated the first word?

With the above fix, we get bia•ło-•czer•wony (I am marking the hyphenation points with • to distinguish them from the word's hyphen)
... because SILE.typesetter:typeset(SILE.showHyphenationPoints("biało-, "pl")) ➡️ bia•ło-

But note that SILE.showHyphenationPoints("biało", "pl") ➡️ biało (EDIT: no hyphenation point currently)
... so depending on how we do it, we can get different hyphenations...

It seems to me that bia•ło-•czer•wony might be correct, but we'd need a Polish friend to confirm the expectations 🇵🇱

EDIT I corrected SILE.showHyphenationPoints("czerwony", "pl") ➡️ czer•wony (not czer•wo•ny) with default settings.

@Omikhleia
Copy link
Member Author

Omikhleia commented Jan 20, 2024

Perhaps it's not too impolite to ask @jakubkaczor (who opened the above-mentioned Typst issue) on that matter?

@Omikhleia Omikhleia added the bug Software bug issue label Jan 20, 2024
@Omikhleia
Copy link
Member Author

If this typography convention for Polish is correct,

Some 2022 (LaTeX) Babel for Polish manual also mentions it: "According to Polish rules, when a break occurs at an explicit hyphen, the hyphen gets repeated at the beginning of the new line."

(Of course they "fix" it by requiring the user to typeset some specific markup, with active "catcodes"... Sigh.)

@jakubkaczor
Copy link

It is not impolite at all to ask me. If I understood the question correctly, you wonder whether there should be any hyphenation points in the word between a hyphen if it occurs. I am not knowledgeable enough in the topic, but I can link some sources.

I believe the most common package, and the one I used, for correcting hyphenation in LaTeX is polski package. The author provides commands for hyphen (dywiz), en-dash (ppauza), and em-dash (pauza). These are explained in the (english!) documentation for the package. As far as I understand it, the author uses \kern to allow hyphenation in words between and after hyphen, so the answer would be: yes, the first word should also be considered for hyphenation. Please, correct me if I am misunderstanding. As far as I am informed, the active author is a professional typographer, member of the GUST and one of the authors of the TeX Gyre fonts.

@Omikhleia
Copy link
Member Author

@jakubkaczor Many thanks! Indeed on p. 11 of the document you mentioned: "... allow both parts of the word to be considered
for hyphenation"

That answers my question. (I can't understand the 0-valued kerns in TeX, but whatever, the conclusion is the key.)

@Omikhleia Omikhleia added this to the v0.14.15 milestone Jan 20, 2024
@Omikhleia Omikhleia self-assigned this Jan 20, 2024
@Omikhleia Omikhleia changed the title Hyphenation on words joined with hyphen in Polish Hyphenation on words joined with hyphen in Polish (or Czech) Jan 20, 2024
@Omikhleia
Copy link
Member Author

Omikhleia commented Jan 20, 2024

Apparently also in Czech

@Omikhleia Omikhleia changed the title Hyphenation on words joined with hyphen in Polish (or Czech) Hyphenation on words joined with hyphen in Polish Jan 21, 2024
@alerque
Copy link
Member

alerque commented Jan 21, 2024

Thanks for looking into this @Omikhleia, and thanks for the feedback @jakubkaczor. This should be working properly in the next patch release. It might even be worth adding an example to the website to showcase this. I could then add an Turkish example too so we have some samples of how atypical hyphenation rules are or can be handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software bug issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants