Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage displayed after 6-byte emoji. #2186

Open
jcoffland opened this issue Sep 25, 2020 · 13 comments
Open

Garbage displayed after 6-byte emoji. #2186

jcoffland opened this issue Sep 25, 2020 · 13 comments

Comments

@jcoffland
Copy link

jcoffland commented Sep 25, 2020

Some emojis display normally in the terminal but are followed by garbage in polybar.

For example, the following command will produce a sun emoji ☀️:

echo -e "\xe2\x98\x80\xef\xb8\x8f"

This works great in the terminal but displays with UTF garbage in polybar.

polybar-sun-emoji

Oddly it works correctly with this code:

echo -e "\xe2\x98\x80"

But in the terminal this only displays a small non-color sun emoji.

I'm trying to get wttr.in to display correctly on polybar.

curl -s https://wttr.in?format=%c+%t+%m

Edit: Apparently the problematic code is VARIATION SELECTOR-16 in UTF-8 format.

@jcoffland
Copy link
Author

I found a work around:

curl -s https://wttr.in?format=%c+%t+%m | sed 's/\xef\xb8\x8f//'

@parmort
Copy link
Contributor

parmort commented Oct 4, 2020

I can't reproduce, could you please send a minimal config?

@jcoffland
Copy link
Author

[bar/top]
modules-center = weather
font-0 = NotoSans-Regular:size=10;0
font-1 = "Noto Color Emoji:scale=10:style=Regular;2"
font-2 = Unifont:size=10;0

[module/weather]
type = custom/script
exec = bash -c 'echo -e "\xe2\x98\x80\xef\xb8\x8f"'

Of course you have to have Noto Color Emoji installed. On Debian/Ubuntu:

sudo apt-get install -y fonts-noto-color-emoji

The real weather command is something like this:

exec = curl -s https://wttr.in?format=%c+%t

But I cannot guarantee it will be sunny in your locale.

I just found that if I drop font-2 the problem goes away and I see the following message from polybar:

warn: Dropping unmatched character ️ (U+fe0f)

I believe the whole issue is that polybar is not handling variation selectors.

@parmort
Copy link
Contributor

parmort commented Oct 6, 2020

I see. The issue is Unifont, whatever that may be on your system, has a character to represent the variation selector: this is what Polybar displays. Because Polybar can find a character to display for that codepoint, it does. As soon as polybar doesn't have a character to display, it throws the warning.

For now, I would recommend either changing or removing Unifont, especially if that font is unused elsewhere in your configuration.

Edit: @patrick96, what do you think? Should Polybar handle the variation selection better? If so, how?

@jcoffland
Copy link
Author

My understanding, and maybe it's wrong, is that variation selectors modify the previous code point. So, polybar should not just treat it as another character. It should either ignore it or take it in to consideration when selecting the correct variant of the emoji. Polybar could just filter out variation selectors. I'm not quite sure how they are supposed to be handled. You would think the font would need this info.

@parmort
Copy link
Contributor

parmort commented Oct 6, 2020

You would think. My uneducated guess is that Polybar is displaying each character at a time, so character sequences aren't picked up by fonts. If this were the case, the variation selectors should be filtered out or something.

@OJFord
Copy link

OJFord commented Oct 12, 2020

I don't have Unifont, and don't have a character to display for U+fe0f, so the character is correctly displayed from Noto Color Emoji, but still the 'dropping unmatched character' is continually logged.

@tminhvu
Copy link

tminhvu commented Jan 24, 2021

I have this issue too, can I just ignore it since the emoji is still displayed?

@ripytide
Copy link

I found an emoji on emojipedia that was followed by the VS16 code, in polybar the VS16 code would throw warnings as an unmatched character, as a workaround I was able to copy-paste just the emoji code without the VS16 using the corresponding Wikipedia Unicode block page which lists the emoji's with and without the VS's.

@parmort
Copy link
Contributor

parmort commented May 6, 2021

The "dropping unmatched character" warnings are annoying, but if it is displayed fine, then it's no worries.

@oldmansutton
Copy link

oldmansutton commented Dec 29, 2021

The problem is though, that it's NOT displayed fine. For example: U+2197 (↗) shows a northwest arrow, however, it is tiny and unimpressive. However, U+2197,U+FE0F (↗️) shows more what you'd expect for the emoji. The variation selector changes the what is displayed, so without the support for the U+FE0F variation selector, things are NOT displaying as they should. I DO have a current workaround to strip the FE0F character from what is being passed to polybar, but it doesn't look right without the appropriate variation. Getting the variation selector working properly is the main concern.

EDIT: Funny enough, while editing this message, I see two distinct different glyphs for the northwest arrow. However upon saving, github displays them both as the variant.

EDIT2: Alternately, defaulting to automatically display the variation glyphs like github is doing would also be acceptable (to me)

@mainrs
Copy link

mainrs commented Jan 21, 2022

EDIT2: Alternately, defaulting to automatically display the variation glyphs like github is doing would also be acceptable (to me)

This wouldn’t solve the problem but just shift it. Now people that want the normal variation can’t properly display the emoji. :)
Any guidance on where or what has to be changed to make this work?

@patrick96
Copy link
Member

patrick96 commented Jan 21, 2022

Well, commenting on this issue has been on my todo list for a while.

The issue here hints at the underlying issues polybar has with font rendering: We do it ourselves using the cairo low-level text api.

This is an excerpt from the cairo text api:

The functions with glyphs in their name form cairo's low-level text API. The low-level API relies on the user to convert text to a set of glyph indexes and positions. This is a very hard problem and is best handled by external libraries, like the pangocairo that is part of the Pango text layout and rendering library. Pango is available from http://www.pango.org/.

As I understand it, this means any kind of algorithm to display all kinds of unicode control characters would have to be done by us, which we don't by the way.

This creates a number of issues:


Now, to address some comments in this thread:

From @parmort

what do you think? Should Polybar handle the variation selection better? If so, how?

Yes! We should definitely have proper font handling. Filtering out control characters is at best a temporary workaround to not mess up people's bars.

My uneducated guess is that Polybar is displaying each character at a time, so character sequences aren't picked up by fonts. If this were the case, the variation selectors should be filtered out or something.

This is part of the problem, but I am not sure that passing the character together with its variant selector would make a difference. It's likely that the character and variant of the character are two different glyphs inside the font, but I don't know enough about unicode and fonts to be sure.

Polybar doesn't always render each character at a time, but due to the way we do font fallback, the variant selector may be displayed using a different font. That's because we treat each character as a printable character and try to find the first font in the list that can display it. So what happens in the original post is, as @parmort has correctly identified, polybar sees that the emoji font can display the sun emoji, but it doesn't find a glyph for the variant selector in the emoji font. Why would it? It's not a printable character. It then goes further down the list and the unifont font happens to have a glyph for it because it tries to provide printable characters for almost every unicode code point (even if they're control characters).

From @oldmansutton and @mainrs

EDIT2: Alternately, defaulting to automatically display the variation glyphs like github is doing would also be acceptable (to me)

This wouldn’t solve the problem but just shift it. Now people that want the normal variation can’t properly display the emoji. :)

I agree.


Any guidance on where or what has to be changed to make this work? - @mainrs

None of this has been particularly actionable information.
On a high-level, here is what I think needs to happen: Polybar needs to be able to do font rendering using pango.

How exactly this will/should happen, I am not sure. I have barely any experience with pango, cairo, or font rendering in general (the font renderer was here before I started contributing to polybar).

From everything I have read, it may not be possible to just replace the font rendering with pango without breaking existing functionality.
For one, automatic font fallback may not be possible in polybar, though explicit font selection using %{T} could still work.

If anyone has experience with pango, I would appreciate your insight here:

  1. What options do we have to define fallback fonts for pango? The only thing I have found was on the side of fontconfig, but nothing on the application side.
  2. I have some concerns around our formatting tags and translating that to pango. If we have a string like %{F#ff0}some %{B#0ff}text, each text fragment that doesn't contain any tags is dispatched to the renderer and the renderer applies the currently set colors. From what I can tell, pango can also do that. What concerns me is how control characters like the left-to-right override (and other stateful format controls) behave. If the first text contains an LTR override character, is the second text also rendered LTR if I call pango_layout_set_text with it?

The answers to this will inform how exactly our solution will look like (maybe we need to translate all formatting tags to pango markup?).

In any case, I think pango rendering should be a separate code path while still leaving the current functionality in tact. There should maybe be a setting in the bar section to switch to the new system. If so, it should be as "simple" (the dispatch is simple, the rendering itself probably not) as to distinguish between the two cases in renderer::render_text and call the right font rendering code.

I will open a seperate issue for pango rendering to not clog up specific discussions. I will also add it to the 3.7.0 so that we can focus on this after the next release.

EDIT: I have now opened the new issue #2576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants