Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

What constitutes an acceptable keyword? #194

Open
jacobwhall opened this issue Jul 2, 2021 · 6 comments
Open

What constitutes an acceptable keyword? #194

jacobwhall opened this issue Jul 2, 2021 · 6 comments

Comments

@jacobwhall
Copy link

First of all, thank you for maintaining this repository!

I wrote a rudimentary emoji search program using your data, and noticed that, for example, "poop" does not match any of the keywords for 馃挬:

"馃挬": [
"pile_of_poo",
"hankey",
"shitface",
"fail",
"turd",
"shit"
],

There are a lot of other poop synonyms listed here, so I feel that "poop" would be an uncontroversial addition. But there are many synonyms for poop, and we might not want to include them all?

Another example I ran into was for 馃摫:

emojilib/dist/emoji-en-US.json

Lines 7259 to 7265 in f3169dc

"馃摫": [
"mobile_phone",
"technology",
"apple",
"gadgets",
"dial"
],

The first phrase I'd say if you asked me to identify this emoji is "cell phone." However, none of the keywords for this emoji would match "cell." Would it be appropriate to add "cell," "cell_phone," or "cellular_phone?" Are non-official keywords that use underscores OK, or should substrings like "phone" be added as well as "mobile_phone?"

Finally, and I write this sincerely, I'd like to discuss 馃崋:

emojilib/dist/emoji-en-US.json

Lines 4269 to 4275 in f3169dc

"馃崋": [
"eggplant",
"vegetable",
"nature",
"food",
"aubergine"
],

This emoji is often used to signify a penis. Would it be acceptable to add "dick" or "penis" to the list of associated keywords for this emoji? I think that doing so would better reflect common usage, but might stray too far from Unicode's "intended use" for the emoji (if that's a thing).

I suggest that a section be added to CONTRIBUTING.md or README.md that gives guidance to future contributors about questions like these.

鈥nd that's how I posted a GitHub issue about poop, cell phones, and penises 馃お

@muan
Copy link
Owner

muan commented Feb 3, 2022

Hey sorry for the lack of response I was largely away last year.

TBH I have not thought about this at length. but I agree with what you've written here. If pull requests were sent for these keywords, I'd accept them all.

I suggest that a section be added to CONTRIBUTING.md or README.md that gives guidance to future contributors about questions like these.

I agree. I'd be happy to accept a PR for this if anyone's willing to send them.

@thdoan
Copy link

thdoan commented Jul 5, 2022

@jacobwhall I'm planning to fork this and start an emoji autocomplete project also. Have you settled on a fast way to search through the aliases? I was thinking about doing something like a filter, but not sure if there are faster options out there.

UPDATE: I did some performance tests, and I think for best performance I'm going to flatten the arrays into strings -- finding partial text matches in strings is faster than doing the same operation on arrays.

https://jsbench.me/zql58n0oew/1

When doing a partial match on every keystroke, every bit of performance counts ^^.

@jacobwhall
Copy link
Author

@thdoan sounds like you've done as much as I have. I wrote an emoji picker in Python that you're welcome to check out. The search works surprisingly well!

@thdoan
Copy link

thdoan commented Jul 8, 2022

@jacobwhall cool, I'm experimenting with an emoji autocomplete by leveraging the browser's native datalist functionality. However, I've decided to start my emojis map from scratch based on https://emojipedia.org/ (all tedious manual work since they closed their API). We'll see how it goes.

@JoshuaKGoldberg
Copy link
Collaborator

+1, having docs on this would be great. I'm working on omnidan/node-emoji#132 to bring node-emoji to emojilib@3. The test cases in that draft PR are showing a lot of places where emojilib@3 removed conveniences the library relied on. For example, "heart" shows up in a few emojis, but not 鉂わ笍 itself:

"鉂わ笍": [
"red_heart",
"love",
"like",
"valentines"
],

I wrote a quick script to find discrepencies:

// npm i emojilib-2@npm:emojilib@2 emojilib-3@npm:emojilib@3
const { lib: emojisV2 } = await import("emojilib-2");
const { default: emojisV3 } = await import("emojilib-3", {
  assert: { type: "json" },
});

const missing = [];
const missingIgnoringAliases = [];

for (const [nameV2, detailsV2] of Object.entries(emojisV2)) {
  const detailsV3 = emojisV3[detailsV2.char];
  if (detailsV3?.includes(nameV2)) {
    continue;
  }

  const complaint = { nameV2, detailsV2, detailsV3 };
  missing.push(complaint);

  const primaryAlias = detailsV3?.[0];
  if (
    primaryAlias &&
    !/^(?:flag|two|smiling_face_with)_|_face$/.test(primaryAlias)
  ) {
    missingIgnoringAliases.push(complaint);
  }
}

console.table({
  "Missing in general": missing.length,
  "Missing ignoring a few quick aliases": missingIgnoringAliases.length,
});
鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹鈹鈹
鈹               (index)                鈹 Values 鈹
鈹溾攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹鈹鈹
鈹          Missing in general          鈹  678   鈹
鈹 Missing ignoring a few quick aliases 鈹  456   鈹
鈹斺攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹粹攢鈹鈹鈹鈹鈹鈹鈹鈹

@muan is there a description anywhere of how #178's lists were generated? Or, if not, could you speak to how you generated it?

@muan
Copy link
Owner

muan commented Sep 22, 2023

@muan is there a description anywhere of how #178's lists were generated? Or, if not, could you speak to how you generated it?

I believe I had some hack-together local scripts so I don't recall the exact differences. But here's what might have happened:

Previously this project was exclusively built for github shortcodes at our internal hackathon, and with v3 I decided to move away from that. so the primary key became their official unicode names, which would explains why tada was replaced with party popper, poop was replaced by pile of poo.

IIRC, the official name of the emoji changes with each version sometimes too (gun -> water gun), which was why I made the character be the key now.

I feel like I would/should have done the work to compare and keep the GitHub shortcodes but I guess I did not.

So to add them all back, a name/alias comparison between GitHub's set and the unicode set could potentially do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants