Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: emoji support #505

Open
atoktoto opened this issue Aug 22, 2020 · 19 comments
Open

Feature request: emoji support #505

atoktoto opened this issue Aug 22, 2020 · 19 comments

Comments

@atoktoto
Copy link

The code textGraphics.putString(0, 3, "🍕") results in two question mark characters being displayed in the terminal. At the same time System.out.println("🍕") works as intended (at least in terminal emulators supporting emojis: ie. the new Windows Terminal).

Is it possible to support this use-case?
I guess the emoji codepoints do not fit into char type that is used in Terminal.putCharacter so this would require major changes.

@rednoah
Copy link
Contributor

rednoah commented Aug 23, 2020

Assuming that you're talking about Windows, printing Emoji to Terminal works already via the WriteConsole native call in WindowsConsoleOutputStream. But only the new Windows Terminal is capable by default to actually display Emoji. CMD and PowerShell will just display a box.

That being said, full support for grapheme clusters (i.e. user-perceived characters) would be appreciated not for Emoji, but for all the non-Latin languages where a series of code points coalesce into a single-width user-perceived character.

@rednoah
Copy link
Contributor

rednoah commented Aug 26, 2020

com.ibm.icu.text.BreakIterator can be used to iterate grapheme clusters (i.e. single-width user-perceived characters) though this will require icu4j as an additional dependency.

import com.ibm.icu.text.BreakIterator;
public static List<String> getGraphemeClusters(String self) {
	List<String> characters = new ArrayList<String>(self.length());
	BreakIterator i = BreakIterator.getCharacterInstance();
	i.setText(self);
	for (int begin = 0, end = 0; (end = i.next()) != BreakIterator.DONE; begin = i.current()) {
		characters.add(self.substring(begin, end));
	}
	return characters;
}

The JDK built-in java.text.BreakIterator may or may not work well depending on the specific use case. It'll work for Asian languages (e.g. บุฟเฟต์) but won't work Emoji sequences (e.g. 👩‍👩‍👦‍👦).

@atoktoto
Copy link
Author

atoktoto commented Aug 26, 2020

Seems certainly doable but would require a significant change (or addition) to the Lanterna interfaces. Currently, char and TextCharacter seems to be the center of the whole operation. Replacing it with String representing a single grapheme cluster seems wasteful (in terms of memory) for the general case and can make it less legible: void putCharacter(String s) looks wrong :D

Also, Windows Terminal only displays emoji correctly if running a WSL session

@mabe02
Copy link
Owner

mabe02 commented Aug 30, 2020

I don't see why emoji wouldn't work with the current system, given that we can do CJK characters just fine. I'll investigate, maybe it's the terminal encoding that needs to be updated.

@rednoah
Copy link
Contributor

rednoah commented Aug 30, 2020

Here's what I get with lanterna 3.0.3 for a file name such as THAIบุฟเฟต์EMOJI👩‍👩‍👦‍👦.txt:

  • Works for Thai
  • Does not work for complex Emoji

Screen Shot 2020-08-30 at 10 02 12

That being said, neither iTerm nor Terminal render this particular Emoji correctly either:

$ ls
THAIบุฟเฟต์EMOJI👩?👩?👦?👦.txt

@rednoah
Copy link
Contributor

rednoah commented Aug 30, 2020

EDIT: บุฟเฟต์ does render correctly, but the layout does not account for compound characters บุ and ต์ taking up only 1 character (even though it's 2 code points each) and so the layout is off by 2 here:
Screen Shot 2020-08-30 at 10 10 32

$ ls
TEST.mp4
บุฟเฟต์.mp4

CKJ works because those are 1 code point per character, i.e. is 1 code point, but บุ is 2 code points which are composed into a single logical character by the text renderer.

@mabe02
Copy link
Owner

mabe02 commented Aug 30, 2020

Interesting, so the CJK detector incorrectly flags บุ as two text characters wide?

@mabe02
Copy link
Owner

mabe02 commented Aug 30, 2020

Ok, I see the problem now. Java char type isn't able to store emoji:
https://developers.redhat.com/blog/2019/08/16/manipulating-emojis-in-java-or-what-is-%F0%9F%90%BB-1/
Slightly unexpected. Will see what we can do about this.

@rednoah
Copy link
Contributor

rednoah commented Aug 30, 2020

Yes, บุ is 2 code points. It even requires hitting DELETE twice to delete the entire character. Hitting DELETE once only changes บุ to . Kinda like NFD except there is no NFC for บุ.

@mabe02
Copy link
Owner

mabe02 commented Sep 12, 2020

The problem I'm finding is that even "บุ".length() returns 2...
I'm trying to change the internal representation of TerminalCharacter to String, but it's tricky to know if the character should be considered single- or double-width, given Java provides little guidance. I'd like to avoid hard-coding unicode page references if possible...

@mabe02
Copy link
Owner

mabe02 commented Sep 12, 2020

Have been browsing articles and it really seems like while we can get the number of code points, there's no way to know if these code points are combined into a single character, or if that character is double or single width!

@rednoah
Copy link
Contributor

rednoah commented Sep 12, 2020

Yep, pretty much. lanterna effectively can't predict how a terminal window is going to render the text, because it depends on the version of unicode used by the text renderer. Though we can generally assume that long-established unicode sequences like บุ will work just fine, while recent additions like 👩‍👩‍👦‍👦 are likely to not work.

You can use the java.text.BreakIterator to split a String into "display characters" like so:

public static List<String> getGraphemeClusters(String self) {
	List<String> characters = new ArrayList<String>(self.length());
	BreakIterator i = BreakIterator.getCharacterInstance();
	i.setText(self);
	for (int begin = 0, end = 0; (end = i.next()) != BreakIterator.DONE; begin = i.current()) {
		characters.add(self.substring(begin, end));
	}
	return characters;
}

java.text.BreakIterator and com.ibm.icu.text.BreakIterator can be used interchangeably. java.text.BreakIterator has the advantage of being a JDK built-in class. com.ibm.icu.text.BreakIterator has the advantage of working better for recent unicode additions (i.e. complex compound emoji sequences; notably probably something your terminal window won't display correctly anyway).

It might make sense to make the BreakIterator configurable:

  • NullBreakIterator (same behaviour as now, 1 code point = 1 character)
  • java.text.BreakIterator (e.g. for users that target Windows CMD)
  • com.ibm.icu.text.BreakIterator (e.g. for users that target the new Windows Terminal)

@mabe02
Copy link
Owner

mabe02 commented Sep 13, 2020

Ok, so here's what we'll do. In 3.0 we'll restrict TextCharacter to BMP only, with an override if you really know what you're doing. In 3.1, also restrict but change to use String internally and let you supply your own "String" character for complicated emoji. Will try this out.

@mabe02
Copy link
Owner

mabe02 commented Sep 13, 2020

Ok, I misunderstood the BMP plane again. I've just blocked 3.0 from creating TextCharacters from surrogate char:s at least. So next will use the BreakIterator above to in 3.1 to try to group characters.

@mabe02
Copy link
Owner

mabe02 commented Sep 19, 2020

Okay, I've re-worked TextCharacter to support this:
PR for review: #508

@mabe02
Copy link
Owner

mabe02 commented Sep 27, 2020

Ok, code is merged. If you clone and build release/3.1 (I'll do another release in a week or so) you should be able to print emoji as double-width and your magic บุ character only occupying one column. Please try it out and report back before I close this.

@MVoloshin
Copy link

MVoloshin commented Feb 25, 2022

@mabe02 , cant print BOMB character "\uD83D\uDCA3" or 💣 using Lanterna 3.2.0-master on Windows 7 x64 (SwingTerminalWindow). I just get two rectangles(

@avl42
Copy link
Contributor

avl42 commented Oct 17, 2024 via email

@mabe02
Copy link
Owner

mabe02 commented Nov 14, 2024

Works for me on Ubuntu:
image
Maybe it's a font problem? I just did this:
terminal.putString("\uD83D\uDCA3");

(modified SwingTerminalTest.java)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants