Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex does not match many objects from Apple emoji palette, but does match the same emojis from Android #68

Closed
kenpowers-signal opened this issue May 11, 2020 · 6 comments

Comments

@kenpowers-signal
Copy link

kenpowers-signal commented May 11, 2020

There are several emojis which can be inserted from the iOS / macOS emoji pickers which are not recognized by the regex provided by this library, but the same emojis inserted from the Android emoji picker are recognized.

const emojiRegex = require("emoji-regex");
const regex = emojiRegex();

const ios = ['⏱', '⏲', '🕰', '⌛️', '⏳', '🎛'];
const android = ['⏱️', '⏲️', '🕰️', '⌛', '⏳', '🎛️'];

console.log({ ios: ios.map(e => regex.test(e)), android: android.map(e => regex.test(e)) });

Runkit output:

Object {android: [true, false, true, false, true, false], ios: [false, false, false, true, false, false]}
  • Android emojis recognized: 3/6
  • iOS emojis recognized: 1/6

I haven't dug into the unicode to see what's happening just yet.

@gilmoreorless
Copy link
Contributor

I've noticed previously that Apple has a loose conformance with the Unicode spec around emoji characters — specifically which ones do and don't need the special U+FE0F presentation selector appended. See #28 for a discussion about this problem. Perhaps Android has the same non-conformance problem, but in a different way.

Regarding your specific example, there's definitely a difference in the output between the platforms, even though they have the same visual appearance on my machine.

const listCodePoints = (arr) =>
  arr.map(
    (e) => [...e].map(
      (cp) => `U+${cp.codePointAt(0).toString(16).toUpperCase()}`
    ).join(' ')
  );

console.log(listCodePoints(ios));
// [ "U+23F1", "U+23F2", "U+1F570", "U+231B U+FE0F", "U+23F3", "U+1F39B" ]

console.log(listCodePoints(android));
// [ "U+23F1 U+FE0F", "U+23F2 U+FE0F", "U+1F570 U+FE0F", "U+231B", "U+23F3", "U+1F39B U+FE0F" ]

@scottnonnenberg-signal
Copy link

@gilmoreorless Does it make sense to make emoji-regex a little looser to allow for this?

@gilmoreorless
Copy link
Contributor

@scottnonnenberg-signal It does make sense as a potential variation. I pondered about a "loose" variant in #33 (comment), but that was about a slightly different problem. The simple answer is no-one has yet done the work to add one — @mathiasbynens pointed out it's not a straightforward task in #28 (comment).

@mathiasbynens
Copy link
Owner

The best long-term solution is for Apple to respect the Unicode Standard and not deviate from it. In recent macOS updates it seems like emoji input has improved in terms of spec compliance, so I'm hopeful.

@kenpowers-signal Do you have an up-to-date iOS device handy? Could you try inputting those emoji again on the latest iOS? I wonder if the variation selectors are still missing.

@jayna37
Copy link

jayna37 commented Dec 2, 2020

iOS 14.2.1
⏱⏲🕰⌛️⏳🎛
macOS 11.0.1 emoji picker
⏱⏲🕰⌛️⏳🎛
macOS 11.0.1 Japanese IME (without control knobs cause I don't know how to input)
⏱⏲🕰⌛︎⏳

Apparently all the same as earlier iOS.

@mathiasbynens
Copy link
Owner

v10.0.0 now leverages emoji-test-regex-pattern which has a dedicated list of emojis that Apple's iOS emoji picker enters in overqualified form: https://github.com/mathiasbynens/emoji-test-regex-pattern/blob/89818e015d94a8d31c7fe30444f9ac7030908f14/script/get-sequences.js#L1-L48 Please try v10.0.0 and see it it helps.

Ashoat added a commit to CommE2E/comm that referenced this issue Jun 7, 2023
Summary:
Turns out that [macOS](mathiasbynens/emoji-regex#28 (comment)) [appends](mathiasbynens/emoji-regex#68) the `U+FE0F` the character to some Unicode emojis when you select them from the native OS emoji selector. It's not clear why Apple does this, or why it only happens for a certain set of emoji.

This still counts as [valid emoji Unicode](mathiasbynens/emoji-regex#28 (comment)). However, our `onlyOneEmojiRegex` thinks it's two emojis.

Our implementation of `onlyOneEmojiRegex` involves introspecting into the RegExp string that `emoji-regex` uses, and is not an officially supported approach by that package. `emoji-regex` supports matching emojis in text, and checking if the text includes only emoji. But checking for precisely one emoji is more complicated, and our approach (which is basically just extracting the raw RegExp and putting it inside of `/^()$/`) doesn't work in some scenarios where `U+FE0F` is suffixed.

Luckily we don't use the native macOS emoji selector in any of our UIs, but it does look like @Ginsu used it to select some of the emojis. The diff adds a unit test to make sure all of the default emojis pass `onlyOneEmojiRegex`, and fixes all failing emojis.

Test Plan: I noticed that a test username of `at4` would match up with an anchor emoji as the default, and the anchor emoji was failing to be set. After this diff everything worked

Reviewers: ginsu, atul

Reviewed By: atul

Subscribers: tomek, ginsu

Differential Revision: https://phab.comm.dev/D8145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants