Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Get emoji list & aliases from data file instead of symlinks #47

Merged
merged 10 commits into from Jun 27, 2014

Conversation

Projects
None yet
6 participants
Member

mislav commented Jun 3, 2014

Alternative to #44

Previously, emoji name & unicode aliases were determined by following symlinks among images/emoji/*.png. This led to nontrivial code for resolving these aliases, made it tricky for contributors to add new aliases and inspect existing ones, and didn't leave room for adding metadata to emojis such as tags or descriptions from the Unicode spec.

Moreover, the aliases as symlinks led to duplication of image assets in users' applications, with hocho.png and knife.png representing the same emoji but being two separate images. Users were also unsure what to do with unicode/{HEX-NAME}.png files, which would end up among their images after running the :emoji task.

This change removes the symlinks support and creates the list of emojis and their aliases in emoji.txt. A single emoji is now represented with an Emoji::Character instance, which has the image_filename method to determine the path to the corresponding image instead of having to construct it manually.

This is kind of a breaking change, however. Since neither hocho.png nor knife.png images won't exist anymore in people's image assets (only the unicode/{HEX-CODE}.png version of each emoji), simply gsubbing :knife: with <img src="/images/emoji/knife.png"> won't work anymore. Instead, users need to look up the Emoji::Character and use its image_filename method. I believe this is a worthwhile breaking change, since it gives us leverage to shift things around internally in future revisions (such as changing the images naming scheme) and not have to break anything for users again.

/cc @josh @muan

Member

mislav commented Jun 3, 2014

Pros:

  • Ability to easily add new aliases, tags
  • Ability for contributors to be able to edit aliases/tags in the GitHub.com web interface
  • Handling individual emoji as actual Ruby objects
  • No symlinks means no cross-platform or gem problems
  • Fewer image files in users' image assets

Cons:

  • Can't easily replace syntax like :knife: with <img src="/images/emoji/knife.png">. Emoji filters like that in html-pipeline will have to be revisited.

Open questions:

  • Is the custom format of the emoji.txt file OK? I didn't want to go with YAML because I wanted to display raw emoji in the file, and YAML insists on escaping them.
  • I've preserved the old APIs such as Emoji.names, Emoji.unicodes, Emoji.unicode_for(...), but if this is going to lead to a major release of gemoji library, I would like to nuke those APIs as they are obsoleted by the new APIs that deal with Emoji::Character instances.
  • Could we maybe revisit functionality proposed in #45?

/cc @javan

Owner

josh commented Jun 3, 2014

General 👍

I think the directly alias linking is fine to 🔥. Maybe we just want to release this as a new major version.

Is the custom format of the emoji.txt file OK?

Yeah. Can we document what encoding that file is in. Assuming UTF-8, just want to be clear somewhere.

, I would like to nuke those APIs as they are obsoleted by the new APIs that deal with Emoji::Character instances.

🔥 them and lets do a major release.

Could we maybe revisit functionality proposed in 45

Yeah, the api should encompass image name and aliases now.

@josh josh and 1 other commented on an outdated diff Jun 3, 2014

+
+ if emoji.custom?
+ aliases = aliases[1..-1]
+ spec_aliases = []
+ puts "#{emoji.name} (custom)"
+ else
+ variation_codepoint = Emoji::Character::VARIATION_SELECTOR_16.codepoints[0]
+ chars = emoji.raw.codepoints.map { |code| UnicodeCharacter.fetch(code) unless code == variation_codepoint }.compact
+ spec_aliases = chars.each.flat_map(&:aliases)
+ puts "#{emoji.raw} (#{emoji.hex_inspect}) #{chars.map(&:description).join(' + ')}"
+ end
+
+ puts " = #{aliases.join(', ')}" if aliases.any?
+ puts " ~ #{tags.join(', ')}" if tags.any?
+ puts " + #{unicodes.map{|u| Emoji::Character.hex_inspect(u) }.join(', ')}" if unicodes.any?
+ # puts " * #{spec_aliases.join(', ')}" if spec_aliases.any?
@josh

josh Jun 3, 2014

Owner

Is dump supposed to write back out to the .txt file somewhere?

@mislav

mislav Jun 4, 2014

Member

I used it to create the .txt file via shell redirection, but yeah, it could write back to the .txt file maybe. Or I can just 💣 the dump script? Honestly, I don't think that the .txt file will need being generated programatically anymore because it's supposed to be edited by hand from now on.

Owner

josh commented Jun 3, 2014

/cc @aroben for emoji too

Collaborator

javan commented Jun 3, 2014

Is the custom format of the emoji.txt file OK?

What do you think about using JSON instead since we already store emoji in db/Category-Emoji.json?

Member

mislav commented Jun 4, 2014

On Wed, Jun 4, 2014 at 3:22 AM, Javan Makhmali notifications@github.com
wrote:

What do you think about using JSON instead since we already store emoji in
db/Category-Emoji.json?

I don't like JSON files as a source of data that's supposed to be manually
edited. I'd much rather that users append new aliases to a list such as = foo, bar, baz than to edit arrays like `["foo", "bar", "baz"]. Also, I
find the current text file much more readable and easier to navigate than a
huge JSON array with nested hashes.

We don't actually store emojis in Category-Emoji.json. That's simply a dump
of Apple's own configuration file which we're using in tests to check does
our list cover all known emojis. This dump is not used as source of data at
runtime and is not intended to be edited.

Owner

josh commented Jun 4, 2014

Question about aliases, do they have to be globally unique? How are we validating that if so?

Member

mislav commented Jun 4, 2014

On Wed, Jun 4, 2014 at 10:42 AM, Joshua Peek notifications@github.com
wrote:

Question about aliases, do they have to be globally unique? How are we
validating that if so?

They are supposed to. We're not enforcing that constraint, though. I'll
think about how to do that technically.

Collaborator

javan commented Jun 5, 2014

I'd much rather that users append new aliases to a list such as = foo, bar, baz

True, that is easier, but IMO we don't need to make it easy to add aliases. In fact, I'd rather freeze our current aliases (and possibly allow people to extend them in their app).

I'm more concerned with how it easy it us for us to add new emojis if / when Apple releases them.

Owner

josh commented Jun 5, 2014

and possibly allow people to extend them in their app

Good point.

I'm more concerned with how it easy it us for us to add new emojis if / when Apple releases them.

Yeah, a set of new emojis are currently being discussed by the unicode community

Member

mislav commented Jun 6, 2014

On Thu, Jun 5, 2014 at 7:43 PM, Javan Makhmali notifications@github.com
wrote:

IMO we don't need to make it easy to add aliases.

Yeah, I agree we shouldn't be going wild with new aliases. I intend to use
tags for completion suggestions, and to add better tags instead of better
aliases in the future.

In fact, I'd rather freeze our current aliases (and possibly allow people
to extend them in their app).

👍 for in-app extensions. Also, an API for adding custom emoji might be
useful as well. I'll think about how that might look like.

So do both of you think we should have a JSON format instead of plaintext
I've invented? It means we would have to add the JSON gem dependency since
JSON is not available by default in Ruby 1.8. Personally, I don't think we
gain anything with JSON format, but I'm open to being convinced.

As for new emojis, that's why we have "Category-Emoji.json" extracted from
OS X and the integrity test. When a new OS X gets released with new emojis,
we'll re-run that task, the build will start failing, and we'll simply
inject a list of new emojis into the old list. I might repurpose the dump
script to do just that, actually. The "tough" part will be getting the new
PNGs (I have no idea how the current ones were obtained)?

Collaborator

javan commented Jun 6, 2014

I'm not opposed to the plain text file and I like how readable it is. Just pushing back a little since it's a made up format, which makes it less useful universally. If, for example, http://www.emoji-cheat-sheet.com/ was built with a language other than Ruby, they'd have to write their own parser to consume our list of emojis.

The emoji PNGs were extracted with https://github.com/tmm1/emoji-extractor. It's awesome, but it saves the files like 1.png, 2.png, etc. so they have to be manually renamed to match their code points. Figuring out a way to automate that would be great. Here are a couple leads: tmm1/emoji-extractor#1

Member

mislav commented Jun 6, 2014

On Fri, Jun 6, 2014 at 8:57 PM, Javan Makhmali notifications@github.com
wrote:

If, for example, http://www.emoji-cheat-sheet.com/ was built with a
language other than Ruby, they'd have to write their own parser to consume
our list of emojis.

That's a good point.

Owner

josh commented Jun 6, 2014

JSON is not available by default in Ruby 1.8.

Don't worry about 1.8. Consider it 1.9+. And if you wanted 1.8, you could just install the json gem as a polyfill.

@mislav mislav referenced this pull request Jun 18, 2014

Closed

Adds support for aliases #44

Member

mislav commented Jun 20, 2014

@josh @javan OK the list of emoji is now in JSON. Comma-first style to allow adding/removing specific aliases or tags without affecting surrounding lines.

However, now that I think about it, I should probably configure CI or something to fail if someone tried to remove an existing alias via a PR, since that would constitute a breaking change. Will probably make such and other sanity checks in another PR.

I would merge this PR in an attempt to start moving master branch towards the next major release. Other tasks that I had in mind for separate PRs:

  • Create a dump script that will help extract new Unicode 7 emojis when Apple releases them
  • 🔥 the old API such as name_for() and unicode_for()
  • Create new APIs for users to customize the list of emojis in their apps: add new (custom) emoji, add new aliases/tags to existing emoji
Contributor

aroben commented Jun 20, 2014

Seems like we don't need emoji.txt anymore, right?

@aroben aroben commented on an outdated diff Jun 20, 2014

+ = closed_umbrella
+ ~ weather, rain
+💄 (1f484) lipstick
+ = lipstick
+ ~ makeup
+💛 (1f49b) yellow heart
+ = yellow_heart
+💙 (1f499) blue heart
+ = blue_heart
+💜 (1f49c) purple heart
+ = purple_heart
+💚 (1f49a) green heart
+ = green_heart
+❤️ (2764-fe0f) heavy black heart
+ = heart
+ ~ love
@aroben

aroben Jun 20, 2014

Contributor

The current release of gemoji also maps a plain U+2764 character to this image. Is it intentional that we're no longer doing that? (Windows shows an emoji for plain U+2764, OS X does not.)

@aroben aroben and 1 other commented on an outdated diff Jun 20, 2014

+🀄️ (1f004-fe0f) mahjong tile red dragon
+ = mahjong
+🎲 (1f3b2) game die
+ = game_die
+ ~ dice, gambling
+🎯 (1f3af) direct hit
+ = dart
+ ~ target
+🏈 (1f3c8) american football
+ = football
+ ~ sports
+🏀 (1f3c0) basketball and hoop
+ = basketball
+ ~ sports
+⚽️ (26bd-fe0f) soccer ball
+ = soccer
@aroben

aroben Jun 20, 2014

Contributor

Similarly, gemoji used to map U+26BD to this image. Both Windows and OS X do that. Is it intentional that we're no longer doing so? (There are more examples of this kind of thing, I just wanted to point out a couple.)

@aroben

aroben Jun 20, 2014

Contributor

Oh, I see now that this is handled in code. 💖

@mislav

mislav Jun 23, 2014

Member

Yep, this is handled in code. Your confusion about this, however, might be an indicator that this implicit handling is too magic and alternate codepoints should instead be listed explicitly in the file?

I kept this behavior implicit because there are many characters that use the fe0f alternate selector.

@aroben

aroben Jun 23, 2014

Contributor

I think your current solution is fine.

Contributor

aroben commented Jun 20, 2014

Did you generate emoji.json manually? Should we have a script that will generate it?

Contributor

aroben commented Jun 20, 2014

It would be great to sort emoji.txt and emoji.json in ascending code point order.

Contributor

aroben commented Jun 20, 2014

So many great things in this PR! I love the test that the README docs actually work.

Contributor

aroben commented Jun 20, 2014

Looks like emoji.json needs to be added to the gemspec.

Member

mislav commented Jun 23, 2014

Thanks @aroben for the review. I'll fix these items and work towards wrapping up a 2.0 release soon

mislav added some commits Mar 12, 2014

Add task to download Unicode 6.3 names list
This list contains textual descriptions for emoji characters.
Get emoji list & aliases from data file instead of symlinks
Previously, emoji name & unicode aliases were determined by following
symlinks among `images/emoji/*.png`. This led to nontrivial code for
resolving these aliases, made it tricky for contributors to add new
aliases and inspect existing ones, and didn't leave room for adding
metadata to emojis such as tags or descriptions from the Unicode spec.

Moreover, the aliases as symlinks led to duplication of image assets in
users' applications, with `hocho.png` and `knife.png` representing the
same emoji but being two separate images. Users were also unsure what to
do with `unicode/{HEX-NAME}.png` files, which would end up among their
images after running the `:emoji` task.

This change removes the symlinks support and creates the list of emojis
and their aliases in `emoji.json`. A single emoji is now represented with
an Emoji::Character instance, which has the `image_filename` method to
determine the path to the corresponding image instead of having to
construct it manually.
Add dump script to inspect Unicode descriptions & aliases
This script was used to construct the initial `emoji.json` file.

Example:

    rake db:dump | less -r
Improve emoji lookup fallback behavior
Instead of an unfriendly LocalJumpError, raise the Emoji::NotFound
exception with a helpful message if the block was not given.

If a fallback block was given, yield the value for which the lookup failed.
Have `unicode_aliases` include emoji's raw representation as well
This brings it up to par with `aliases`, which includes `name` as well.
Tweak tags for some emoji
Ensure that tags accurately describe the original meaning of the emoji
(with respect to its definition from Unicode spec).

mislav added a commit that referenced this pull request Jun 27, 2014

Merge pull request #47 from github/no-symlinks
Get emoji list & aliases from data file instead of symlinks

@mislav mislav merged commit d7b020f into master Jun 27, 2014

@mislav mislav deleted the no-symlinks branch Jun 27, 2014

Member

mislav commented Jun 27, 2014

@aroben I've nuked emoji.txt, fixed the gemspec, and adjusted the dump script to write out the JSON format.

@jwheare jwheare referenced this pull request in iamcal/emoji-data Jun 27, 2014

Closed

Use emoji.json from gemoji #9

Contributor

aroben commented Jun 27, 2014

⚡️⚡️⚡️

On Fri, Jun 27, 2014 at 6:33 AM, Mislav Marohnić notifications@github.com
wrote:

@aroben https://github.com/aroben I've nuked emoji.txt, fixed the
gemspec, and adjusted the dump script to write out the JSON format.


Reply to this email directly or view it on GitHub
#47 (comment).

Collaborator

jeremy commented on 5d24012 Jul 17, 2014

I found this API style pretty surprising, since I expect a missing lookup to return nil for nice operation with truthy conditionals, like

if emoji = Emoji.find_by_alias($1)
  ...
else
  ...
end

Raising an exception when the alias is missing does not feel actually exceptional—it's an expected common case. By raising, developers will often end up rescuing the exception for control flow rather than looking deeper to see that find_by_* has Hash#fetch-like semantics.

Owner

josh replied Jul 17, 2014

@jeremy yeah, I agree, this is kinda a smell.

Member

mislav replied Jul 17, 2014

I wanted a nil-free API. Agreed that it makes the truthy conditional a little more awkward by forcing you to append a { nil } block. I don't expect developers to start using these exceptions as control flow, because if they do, that's their own fault. I think in some cases an exception is in order if the user think they're sure that an emoji by that name exists, but for some reason it doesn't (or gets removed in the future).

Suggestions for another API are welcome, but we'd have to do another major or minor release with this. v2.0 is already out.

Collaborator

jeremy commented on lib/emoji.rb in 5d24012 Jul 17, 2014

IndexError is used for out of bounds indexes on Arrays. Probably want to subclass KeyError for a map lookup, or just use StandardError.

Member

mislav replied Jul 17, 2014

I didn't want to subclass KeyError because it doesn't exist on 1.8, but later I realized gemoji isn't 1.8 compatible anyway.

Collaborator

jeremy commented on lib/emoji.rb in 5d24012 Jul 17, 2014

Example of a more familiar API style, including a second arg for default value:

def find_by_alias(name)
  fetch_by_alias name, nil
end

ALIAS_MISSING = lambda { |name| raise NotFound, "Emoji not found by name: %s" % name.inspect }
def fetch_by_alias(name, *default_value, &default_block)
  default_block ||= ALIAS_MISSING
  names_index.fetch(name, *default_value, &default_block)
end
Member

mislav replied Jul 17, 2014

Example of a more familiar API style

I can be on board with this, sure. Please send a PR and we could make it v2.1 or something

Collaborator

jeremy replied Jul 18, 2014

Done: #53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment