Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some emojis ending with \ufe0fย are not completely matched #28

Closed
merih opened this issue Aug 9, 2017 · 12 comments
Closed

Some emojis ending with \ufe0fย are not completely matched #28

merih opened this issue Aug 9, 2017 · 12 comments

Comments

@merih
Copy link

merih commented Aug 9, 2017

Male detective emoji, ๐Ÿ•ต๏ธ "\u{1f575}\ufe0f", when matched with emoji regex, not all of its codepoints are consumed, leaving \ufe0f behind. The emoji is typed with control+cmd+space shortcut of Mac.

"\u{1f575}\ufe0f".match(emojiRegex(), "").length
//> 1
@mathiasbynens
Copy link
Owner

mathiasbynens commented Aug 9, 2017

Thatโ€™s not a standard emoji sequence AFAICT โ€” U+1F575 U+FE0F is not listed in emoji-zwj-sequences.txt. The U+FE0F is not necessary.

@gilmoreorless
Copy link
Contributor

Apple appears to have a very loose idea of conformance to the standard set of codepoints. While working on other fixes for emoji-regex I created a list of all the emoji available on my Mac (macOS 10.12.6) via the emoji picker. (Handy hint: Don't do that if you value your time and patience.)

There were 49 Emoji_Presentation or Emoji_Modifier_Base characters that have U+FE0F appended to them by the macOS picker, with no real consistency about which ones do or don't get the variation selector added (e.g. ๐Ÿคž doesn't but โœŒ๏ธ does). Plus there were another 100 or so textual representation characters that are displayed by macOS in presentation mode without appending U+FE0F.

@anoff
Copy link

anoff commented Aug 15, 2017

Any real downsides to adding this control character to the regex? Besides bloating the regex just to workaround a possible MacOS bug.

@artyom
Copy link

artyom commented Aug 16, 2017

Excerpt from http://unicode.org/Public/emoji/5.0/emoji-test.txt:

1F575 FE0F                                 ; fully-qualified     # ๐Ÿ•ต๏ธ detective
1F575                                      ; non-fully-qualified # ๐Ÿ•ต detective

So the sequence in question is rather conformant.

@mathiasbynens
Copy link
Owner

Thanks for the pointer, @artyom!

Per http://unicode.org/reports/tr51/#Emoji_Implementation_Notes, emoji ZWJ sequences โ€œmay have an emoji presentation selectorโ€.

@mathiasbynens
Copy link
Owner

Hacky solution: just add \uFE0F? to the regex (or [\uFE0E\uFE0F]? for the text regex). However, some of the sequences already end with presentation or variation selectors and are therefore already qualified โ€” those shouldnโ€™t be matched along with the U+FE0F. A proper fix will take some more time.

@gilmoreorless
Copy link
Contributor

For my own project's use I ended up going with that same hacky solution. I figured it wasn't right to submit a PR back to this project for it, so I just left it on a custom branch of my fork.

@fredvollmer
Copy link

Are there any plans to integrate this into the project? It seems that the consensus is that this is a legitimate use case...sorry if I'm off base here

@mathiasbynens
Copy link
Owner

@fredvollmer #28 (comment) answers your question. Iโ€™d welcome a patch :)

@gdutwyg
Copy link

gdutwyg commented Mar 30, 2018

@mathiasbynens how to solve this quesiton? I met this question, too

@jerry153fish
Copy link

Hi @mathiasbynens, is it possible to add rules for those not fall on the sequence

egs :

๐Ÿฟ ๐Ÿ•Š ๐Ÿ‘ ๐Ÿ•ท ๐Ÿ•ธ ๐Ÿ‘“ โ›‘ ๐Ÿ—ฃ ๐Ÿ•ถ โœŒ๏ธ โ˜๏ธ โœ๏ธ โœŒ๐Ÿผ โšก๏ธ โญ๏ธ ๐ŸŒช ๐ŸŒค ๐ŸŒฅ ๐ŸŒฆ ๐ŸŒง โ›ˆ ๐ŸŒฉ ๐ŸŒจ ๐ŸŒฌ ๐Ÿ’จ ๐ŸŒถ ๐Ÿฝ โ›ธ โ›ท ๐ŸŽ– ๐Ÿต ๐ŸŽ— ๐ŸŽŸ ๐ŸŽ ๐Ÿ ๐Ÿ›ฉ ๐Ÿ›ฐ ๐Ÿ›ฅ ๐Ÿ›ณ ๐Ÿ—บ ๐ŸŸ โ›ฑ ๐Ÿ– ๐Ÿ ๐Ÿœ โ›ฐ ๐Ÿ” ๐Ÿ• ๐Ÿš ๐Ÿ˜ ๐Ÿ— ๐Ÿ› โ›ฉ ๐Ÿ›ค ๐Ÿ›ฃ ๐Ÿž ๐Ÿ™ ๐Ÿ–ฅ ๐Ÿ–จ ๐Ÿ–ฑ ๐Ÿ–ฒ ๐Ÿ•น ๐Ÿ—œ ๐Ÿ“ฝ ๐ŸŽž ๐ŸŽ™ ๐ŸŽš ๐ŸŽ› โฑ โฒ ๐Ÿ•ฏ ๐Ÿ—‘ ๐Ÿ›ข // โš’ ๐Ÿ›  โ› โ›“ ๐Ÿ—ก ๐Ÿ›ก ๐Ÿ•ณ ๐ŸŒก ๐Ÿ›Ž ๐Ÿ— ๐Ÿ›‹ ๐Ÿ› ๐Ÿ–ผ ๐Ÿ› ๐Ÿท ๐Ÿ—’ ๐Ÿ—“ ๐Ÿ—ƒ ๐Ÿ—ณ ๐Ÿ—„ ๐Ÿ—‚ ๐Ÿ—ž ๐Ÿ–‡ ๐Ÿ–Š ๐Ÿ–‹ ๐Ÿ–Œ ๐Ÿ– ๐Ÿ•‰ โธ โฏ โน โบ โญ โฎ ๐Ÿ‘โ€๐Ÿ—จ ๐Ÿ—ฏ ๐Ÿ•ฐ โ›ด ๐ŸŒซ ๐Ÿ€„ โ›„๏ธ โ›…๏ธ โ˜”๏ธ โ˜•๏ธ โšฝ๏ธ โšพ โ›ณ๏ธ โ›ต๏ธ โ›ฝ๏ธ โš“๏ธ โ›ฒ๏ธ โ›บ๏ธ โ›ช๏ธ โŒš๏ธ โŒ›๏ธ โ™ˆ๏ธ โ™‰๏ธ โ™Š๏ธ โ™‹๏ธ โ™Œ๏ธ โ™๏ธ โ™Ž๏ธ โ™๏ธ โ™๏ธ โ™‘๏ธ โ™’๏ธ โ™“๏ธ ๐Ÿˆš๏ธ โญ•๏ธ โ›”๏ธ โ—๏ธ ๐Ÿˆฏ๏ธ โ™ฟ๏ธ โšช๏ธ โšซ๏ธ โฌ›๏ธ โฌœ๏ธ โ—พ๏ธ โ—ฝ๏ธ

@mathiasbynens
Copy link
Owner

Try again using the latest release!

const emojiRegex = require('emoji-regex');

const string = '\u{1F575}\uFE0F'; // '๐Ÿ•ต๏ธ'
console.log(
	string.match(emojiRegex())
);
// โ†’ [ '๐Ÿ•ต๏ธ' ]

Closing as fixed. Feel free to reopen or file a new bug in case I missed anything.

Ashoat added a commit to CommE2E/comm that referenced this issue Jun 7, 2023
Summary:
Turns out that [macOS](mathiasbynens/emoji-regex#28 (comment)) [appends](mathiasbynens/emoji-regex#68) the `U+FE0F` the character to some Unicode emojis when you select them from the native OS emoji selector. It's not clear why Apple does this, or why it only happens for a certain set of emoji.

This still counts as [valid emoji Unicode](mathiasbynens/emoji-regex#28 (comment)). However, our `onlyOneEmojiRegex` thinks it's two emojis.

Our implementation of `onlyOneEmojiRegex` involves introspecting into the RegExp string that `emoji-regex` uses, and is not an officially supported approach by that package. `emoji-regex` supports matching emojis in text, and checking if the text includes only emoji. But checking for precisely one emoji is more complicated, and our approach (which is basically just extracting the raw RegExp and putting it inside of `/^()$/`) doesn't work in some scenarios where `U+FE0F` is suffixed.

Luckily we don't use the native macOS emoji selector in any of our UIs, but it does look like @Ginsu used it to select some of the emojis. The diff adds a unit test to make sure all of the default emojis pass `onlyOneEmojiRegex`, and fixes all failing emojis.

Test Plan: I noticed that a test username of `at4` would match up with an anchor emoji as the default, and the anchor emoji was failing to be set. After this diff everything worked

Reviewers: ginsu, atul

Reviewed By: atul

Subscribers: tomek, ginsu

Differential Revision: https://phab.comm.dev/D8145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants