Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By removing duplicates, luaotfload messes up codepoints #185

Closed
callegar opened this issue May 11, 2021 · 12 comments
Closed

By removing duplicates, luaotfload messes up codepoints #185

callegar opened this issue May 11, 2021 · 12 comments
Labels
wontfix This will not be worked on works with mode=harf This problem will disappear when a harfbuzz enhanced luatex is available

Comments

@callegar
Copy link

callegar commented May 11, 2021

Hi,

It looks like luaotfload removes duplicates from fonts, hence breaking the expected correspondence between code-point and actual character (the result is that it introduces an offset). Not only this makes it quite hard to pick the correct character when looking at font glyph tables, it also breaks "cut and paste" (pasting a character in the latex source does not provide that character in the PDF) and breaks compatibility with xelatex.

See the following example using the Material Design Icon Font with lualatex.

The Material Design icon set is made available as a webfont, including a ttf version at https://materialdesignicons.com/ (use the download button to get version 5.4.55). The downloaded file materialdesignicons-webfont.ttf declares the font name "Material Design Icons" and when saved at a system accessible place seems to work just fine with xelatex.

However, if I try to use that with luatex as:

\documentclass{article}
\usepackage{fontspec}
\newfontface\MDI{Material Design Icons}[]

\begin{document}
Look at this character {\MDI \char"F1372}!
\end{document}

then I cannot seem to get the correct characters. For instance F1372 should be mdi-account-details-outline character according to the table in https://pictogrammers.github.io/@mdi/font/5.4.55/ but it turns out as a different character.

The matter is discussed at https://tex.stackexchange.com/questions/596610/how-to-use-luatex-with-large-unicode-codepoint/596626#596626

@zauguin zauguin added wontfix This will not be worked on works with mode=harf This problem will disappear when a harfbuzz enhanced luatex is available labels May 11, 2021
@zauguin
Copy link
Member

zauguin commented May 11, 2021

For the node shaper this is by design: All mappings in the font for the Supplementary Private Use Area A and B (aka everything starting with U+F0000) are ignored and these codepoints are instead used to map all glyphs which don't have another mapping. The codepoints are assigned in GID order and therefore might be completely unrelated to any potential assignments to Supplementary Private Use codepoints in the font.
Current versions ensure that regular Private Use Codepoints (U+E000 - U+F8FF) works normally, but since the node shaper ensures that every glyph is mapped to some valid Unicode codepoint it has to sacrify the other, much less used, area.

The recommended way to use such fonts is using the HarfBuzz shaper (which specifically ensures that all codepoint assignments from the font are preserved) or by accessing the glyphs through glyphnames (for fonts which have useful glyphnames that is).

@callegar
Copy link
Author

callegar commented May 11, 2021

I understand and would like to apologize in advance for the long post below. Unfortunately, this behavior makes it quite difficult to:

  • use certain icon/symbol fonts for which upstream offers visual-glyph to codepoint tables such as https://pictogrammers.github.io/@mdi/font/5.4.55/ or search services where a keyword gets associated to a symbol and its codepoint.

  • select the desired glyph from a demo document or web page and paste into the latex source with some expectation that the glyph will stay. Incidentally, https://pictogrammers.github.io/@mdi/font/5.4.55/ does exactly that: it copies for you the desired glyph in the clipboard so that it can be pasted in the (unicode-based) document.

Furthermore, in addition to the incompatibility with xelatex, from your explanation it looks like this design decision causes "internal" incompatibility within lualatex itself, so that the same document source produces results that change with the shaper not just with respect to the quality of the font rendering (which is well expected), but in the very semantic of the document (because you end up silently using completely different characters).

This may lead to interesting results. For instance, I imagine that it would not be hard to design a font and a document so that the latter in its source form or compiled via xelatex or lualatex+harfbuzz looks quite innocuous, to then contain insulting pictograms (or even text) once compiled by another user with lualatex and the node shaper because of the codepoint replacement.

An additional issue is that it may not be possible to use the HarfBuzz shaper as you suggest, because the feature set is different. For instance, the node shaper supports "variable otf" fonts (which one can easily expect to rapidly become popular even for printed docs and not just on the web) while I read that the HarfBuzz shaper will likely not get that ability.

With respect to your second suggestion, can you please expand on how to access glyphs via glyphnames from lualatex? Is there anything similar to the \char, \uchar or \symbol command for that?

Finally, do you think that it could be possible for the luaotfload developers to introduce some change to the current behavior i.e. rethink the wontfix label? I have a couple of questions about this:

  • In my ignorance of the internal workings of luaotfload, I don't understand why there are glyphs that do not have another mapping so that the PUA-A and PUA-B need to be sacrificed. Can you clarify? Has this to do with luatex having to also support legacy fonts or encodings? Otherwise, how can xelatex and luatex+harfbuzz do without this sacrifice? Could this sacrifice be at least controlled by an option at the fontface definition?

  • From the observed behavior, I am getting the impression that nothing is really sacrificed for the tested font in PUA-A and PUA-B. I do not see there any stuff in addition to what I am expected to get. The issue does not seem to be that area being used for introducing something else, rather that area being used removing gaps (duplicates?), so that the codepoint->glyph association is changed. Why is this "gap removal" necessary at all when there is nothing that needs to be pushed there?

I really would like to advocate that the behavior made more controllable. The PUA-A and B intended purpose seems to be that they are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments, not as a fully available working area for the renderer. Consequently, if luatex really cannot do without making some reuse of these areas, IMHO it should really try to minimize their disruption. Not doing so may break any "agreement" on the usage of such areas, be them private to some organization or (more broadly) private as published by some font designers for a specific font or even by some entities such as CSUR (that uses PUA-A and PUA-B) for wider purposes.

@u-fischer
Copy link
Member

For instance, the node shaper supports "variable otf" fonts

But this here is not a variable font.

In my ignorance of the internal workings of luaotfload, I don't understand why there are glyphs that do not have another mapping so that the PUA-A and PUA-B need to be sacrificed. Can you clarify?

Variants of glyphs need a number too.

With respect to your second suggestion, can you please expand on how to access glyphs via glyphnames from lualatex?

Currently the glyph names are not in the lua and so not accessible. From a remark on the context list I guess that the names are dropped because the font hasn't named all glyphs.

I really would like to advocate that the behavior made more controllable.

The code for the node mode is imported from context. I would suggest that you discuss this on the context mailing list, there is already a thread about this font.

@callegar
Copy link
Author

@u-fischer thanks for the details, that was very helpful!

But this here is not a variable font.

I am aware that the Material Design Icon font is not variable, I was mentioning with a future-proof mind as it is not unlikely that in the future we get variable icon fonts.

Variants of glyphs need a number too.

Out of curiosity, and if I am not stealing too much time, how does the harf mode handle this?

I would suggest that you discuss this on the context mailing list

I will definitely take a look at the context ml. Thanks for the pointer.

@zauguin
Copy link
Member

zauguin commented May 12, 2021

An additional issue is that it may not be possible to use the HarfBuzz shaper as you suggest, because the feature set is different. For instance, the node shaper supports "variable otf" fonts (which one can easily expect to rapidly become popular even for printed docs and not just on the web) while I read that the HarfBuzz shaper will likely not get that ability.

Given the history of multiple master fonts (and Metafonts) I'm not convinced that one can expect that, but we will see. In any case the HarfBuzz shaper probably will get that ability, it's just a bit unclear when. (Basically we need to instantiate fonts. This has to be done either through Lua code or in HarfBuzz and using HarfBuzz functionality would bee more in line with the general approach of the HarfBuzz shaper. Therefore this is waiting for upstream support.)

With respect to your second suggestion, can you please expand on how to access glyphs via glyphnames from lualatex? Is there anything similar to the \char, \uchar or \symbol command for that?

In fontawesome5 I use

\char\directlua{tex.sprint(font.getfont(font.current()).resources.unicodes[token.scan_string()]~or~0)}{some_nice_glyphname}

which works as long as the node shaper is used. But the webfont in question does not have glyphnames.

Finally, do you think that it could be possible for the luaotfload developers to introduce some change to the current behavior i.e. rethink the wontfix label?

As Ulrike already wrote, this is ConTeXt code so it will be fixed if it gets fixed in ConTeXt.

In my ignorance of the internal workings of luaotfload, I don't understand why there are glyphs that do not have another mapping so that the PUA-A and PUA-B need to be sacrificed. Can you clarify? Has this to do with luatex having to also support legacy fonts or encodings? Otherwise, how can xelatex and luatex+harfbuzz do without this sacrifice?

Basically LuaTeX stores the text internally in a bunch of glyph nodes which contain a char field to indicate which character they indicate. After shaping, the node shaper still uses valid Unicode values in the char field, so in prder to allow using glyphs which do not have a direct Unicode mapping (e.g. glyphs which can only be accessed though alternates or as ligatures) they have to be mapped to some existing Unicode values.

XeLaTeX works completely different on that level so it isn't really comparable, the harfbuzz shaper gives such glyphs "codepoints" outside of the range of valid Unicode values. (So 0x110000 and higher) This leads to different issues (e.g. it's not so easy to just insert these glyphs directly as with the names glyph code I gave above since \char does not accept such invalid Unicode values) but avoids overwriting anything.

@callegar
Copy link
Author

For those who might be interested, the relevant ConTeXt thread is \char not working with private unicodes on the ConTeXt User mailing list (aka ntg-context). See https://mailman.ntg.nl/pipermail/ntg-context/2021/102074.html

@callegar
Copy link
Author

A quick update:

@zauguin
Copy link
Member

zauguin commented May 18, 2021

* Would be extra nice if luaotfload could import this change.

It is already merged into dev.

Incidentally, it would be really great if a bugfix release of luaotfload could be made with this and the fix for #186, since both offer at least some workarounds to the current issue.

There are two smaller issues which will be fixed with the next ConTeXt upload, therefore we won't do a release before that. But I'm hoping to make a new release soon after that.

@callegar
Copy link
Author

@Zanguin Now that a new release of luaotfload is out (I have just received it via tlmgr) I would like to start experimenting with it. I wonder if you could be so kind to help me (and maybe other readers of this issue) by any pointer to where some documentation about the font tables can be found. I have been unable to find anything about the resources table you mention in your sample code above. Furthermore, I believe that there should be a glyphs table letting one go from the codepoint to the glyph name among other things, but I am unable to find it. Even looking at the ConTeXt doc would help, but I am not really accustomed to it. I have found an "all about fonts" doc, but it does not seem to have mention to any glyphs table.

Thanks for any help and sorry, it this question is not really 100% consistent with the issue it is attached to.

@zauguin
Copy link
Member

zauguin commented May 23, 2021

some documentation about the font tables

Anything not documented in the LuaTeX manual, the ConTeXt manual or the luaotfload documentation is considered an internal implementation detail. As far as I am aware, the resources table is not documented. (If you do find documentation for it, please let me know)

By the way, there is a documented alternative way to get the codepoint for a glyphname: luaotfload.aux.glyph_of_name. See section 11.2.1 of the luaotfload documentation for documetation of that function and related functionality. There you will also find a function luaotfload.aux.name_of_slot to get the name for a particular codepoint. These functions also work for HarfBuzz based fonts.

Furthermore, I believe that there should be a glyphs table letting one go from the codepoint to the glyph name among other things, but I am unable to find it.

There is no glyphs table in standard fontloader loaded fonts.

@callegar
Copy link
Author

@zauguin Thanks a lot! I thought there ought to be a glyphs table because I found this somewhere:

local f = fontloader.open('PunkNova.kern.otf')
print (f.fontname)
local i = 0
if f.glyphcnt > 0 then
   for i=f.glyphmin,f.glyphmax do
       local g = f.glyphs[i]
       if g then
          print(g.name)
       end
       i = i + 1
   end
end
fontloader.close(f)

But it is not completely clear to me if the object returned by the fontloader.open should be the same as that returned by a font.getfont. Guess not since I do not seem to find the same members.

@callegar
Copy link
Author

... and I do not seem to be very successful with the name_of_slot(id, slot) call either. Is id something that can be obtained from font.current()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on works with mode=harf This problem will disappear when a harfbuzz enhanced luatex is available
Projects
None yet
Development

No branches or pull requests

3 participants