Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get emoji list & aliases from data file instead of symlinks #47

Merged
merged 10 commits into from
Jun 27, 2014

Conversation

mislav
Copy link
Contributor

@mislav mislav commented Jun 3, 2014

Alternative to #44

Previously, emoji name & unicode aliases were determined by following symlinks among images/emoji/*.png. This led to nontrivial code for resolving these aliases, made it tricky for contributors to add new aliases and inspect existing ones, and didn't leave room for adding metadata to emojis such as tags or descriptions from the Unicode spec.

Moreover, the aliases as symlinks led to duplication of image assets in users' applications, with hocho.png and knife.png representing the same emoji but being two separate images. Users were also unsure what to do with unicode/{HEX-NAME}.png files, which would end up among their images after running the :emoji task.

This change removes the symlinks support and creates the list of emojis and their aliases in emoji.txt. A single emoji is now represented with an Emoji::Character instance, which has the image_filename method to determine the path to the corresponding image instead of having to construct it manually.

This is kind of a breaking change, however. Since neither hocho.png nor knife.png images won't exist anymore in people's image assets (only the unicode/{HEX-CODE}.png version of each emoji), simply gsubbing :knife: with <img src="/images/emoji/knife.png"> won't work anymore. Instead, users need to look up the Emoji::Character and use its image_filename method. I believe this is a worthwhile breaking change, since it gives us leverage to shift things around internally in future revisions (such as changing the images naming scheme) and not have to break anything for users again.

/cc @josh @muan

@mislav
Copy link
Contributor Author

mislav commented Jun 3, 2014

Pros:

  • Ability to easily add new aliases, tags
  • Ability for contributors to be able to edit aliases/tags in the GitHub.com web interface
  • Handling individual emoji as actual Ruby objects
  • No symlinks means no cross-platform or gem problems
  • Fewer image files in users' image assets

Cons:

  • Can't easily replace syntax like :knife: with <img src="/images/emoji/knife.png">. Emoji filters like that in html-pipeline will have to be revisited.

Open questions:

  • Is the custom format of the emoji.txt file OK? I didn't want to go with YAML because I wanted to display raw emoji in the file, and YAML insists on escaping them.
  • I've preserved the old APIs such as Emoji.names, Emoji.unicodes, Emoji.unicode_for(...), but if this is going to lead to a major release of gemoji library, I would like to nuke those APIs as they are obsoleted by the new APIs that deal with Emoji::Character instances.
  • Could we maybe revisit functionality proposed in Add method to add your own custom emoji after loading the library #45?

/cc @javan

@josh
Copy link
Contributor

josh commented Jun 3, 2014

General 👍

I think the directly alias linking is fine to 🔥. Maybe we just want to release this as a new major version.

Is the custom format of the emoji.txt file OK?

Yeah. Can we document what encoding that file is in. Assuming UTF-8, just want to be clear somewhere.

, I would like to nuke those APIs as they are obsoleted by the new APIs that deal with Emoji::Character instances.

🔥 them and lets do a major release.

Could we maybe revisit functionality proposed in 45

Yeah, the api should encompass image name and aliases now.

puts " = #{aliases.join(', ')}" if aliases.any?
puts " ~ #{tags.join(', ')}" if tags.any?
puts " + #{unicodes.map{|u| Emoji::Character.hex_inspect(u) }.join(', ')}" if unicodes.any?
# puts " * #{spec_aliases.join(', ')}" if spec_aliases.any?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dump supposed to write back out to the .txt file somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used it to create the .txt file via shell redirection, but yeah, it could write back to the .txt file maybe. Or I can just 💣 the dump script? Honestly, I don't think that the .txt file will need being generated programatically anymore because it's supposed to be edited by hand from now on.

@josh
Copy link
Contributor

josh commented Jun 3, 2014

/cc @aroben for emoji too

@javan
Copy link
Contributor

javan commented Jun 3, 2014

Is the custom format of the emoji.txt file OK?

What do you think about using JSON instead since we already store emoji in db/Category-Emoji.json?

@mislav
Copy link
Contributor Author

mislav commented Jun 4, 2014

On Wed, Jun 4, 2014 at 3:22 AM, Javan Makhmali notifications@github.com
wrote:

What do you think about using JSON instead since we already store emoji in
db/Category-Emoji.json?

I don't like JSON files as a source of data that's supposed to be manually
edited. I'd much rather that users append new aliases to a list such as = foo, bar, baz than to edit arrays like `["foo", "bar", "baz"]. Also, I
find the current text file much more readable and easier to navigate than a
huge JSON array with nested hashes.

We don't actually store emojis in Category-Emoji.json. That's simply a dump
of Apple's own configuration file which we're using in tests to check does
our list cover all known emojis. This dump is not used as source of data at
runtime and is not intended to be edited.

@josh
Copy link
Contributor

josh commented Jun 4, 2014

Question about aliases, do they have to be globally unique? How are we validating that if so?

@mislav
Copy link
Contributor Author

mislav commented Jun 4, 2014

On Wed, Jun 4, 2014 at 10:42 AM, Joshua Peek notifications@github.com
wrote:

Question about aliases, do they have to be globally unique? How are we
validating that if so?

They are supposed to. We're not enforcing that constraint, though. I'll
think about how to do that technically.

@javan
Copy link
Contributor

javan commented Jun 5, 2014

I'd much rather that users append new aliases to a list such as = foo, bar, baz

True, that is easier, but IMO we don't need to make it easy to add aliases. In fact, I'd rather freeze our current aliases (and possibly allow people to extend them in their app).

I'm more concerned with how it easy it us for us to add new emojis if / when Apple releases them.

@josh
Copy link
Contributor

josh commented Jun 5, 2014

and possibly allow people to extend them in their app

Good point.

I'm more concerned with how it easy it us for us to add new emojis if / when Apple releases them.

Yeah, a set of new emojis are currently being discussed by the unicode community

@mislav
Copy link
Contributor Author

mislav commented Jun 6, 2014

On Thu, Jun 5, 2014 at 7:43 PM, Javan Makhmali notifications@github.com
wrote:

IMO we don't need to make it easy to add aliases.

Yeah, I agree we shouldn't be going wild with new aliases. I intend to use
tags for completion suggestions, and to add better tags instead of better
aliases in the future.

In fact, I'd rather freeze our current aliases (and possibly allow people
to extend them in their app).

👍 for in-app extensions. Also, an API for adding custom emoji might be
useful as well. I'll think about how that might look like.

So do both of you think we should have a JSON format instead of plaintext
I've invented? It means we would have to add the JSON gem dependency since
JSON is not available by default in Ruby 1.8. Personally, I don't think we
gain anything with JSON format, but I'm open to being convinced.

As for new emojis, that's why we have "Category-Emoji.json" extracted from
OS X and the integrity test. When a new OS X gets released with new emojis,
we'll re-run that task, the build will start failing, and we'll simply
inject a list of new emojis into the old list. I might repurpose the dump
script to do just that, actually. The "tough" part will be getting the new
PNGs (I have no idea how the current ones were obtained)?

@javan
Copy link
Contributor

javan commented Jun 6, 2014

I'm not opposed to the plain text file and I like how readable it is. Just pushing back a little since it's a made up format, which makes it less useful universally. If, for example, http://www.emoji-cheat-sheet.com/ was built with a language other than Ruby, they'd have to write their own parser to consume our list of emojis.

The emoji PNGs were extracted with https://github.com/tmm1/emoji-extractor. It's awesome, but it saves the files like 1.png, 2.png, etc. so they have to be manually renamed to match their code points. Figuring out a way to automate that would be great. Here are a couple leads: tmm1/emoji-extractor#1

@mislav
Copy link
Contributor Author

mislav commented Jun 6, 2014

On Fri, Jun 6, 2014 at 8:57 PM, Javan Makhmali notifications@github.com
wrote:

If, for example, http://www.emoji-cheat-sheet.com/ was built with a
language other than Ruby, they'd have to write their own parser to consume
our list of emojis.

That's a good point.

@josh
Copy link
Contributor

josh commented Jun 6, 2014

JSON is not available by default in Ruby 1.8.

Don't worry about 1.8. Consider it 1.9+. And if you wanted 1.8, you could just install the json gem as a polyfill.

@mislav mislav mentioned this pull request Jun 18, 2014
@mislav
Copy link
Contributor Author

mislav commented Jun 20, 2014

@josh @javan OK the list of emoji is now in JSON. Comma-first style to allow adding/removing specific aliases or tags without affecting surrounding lines.

However, now that I think about it, I should probably configure CI or something to fail if someone tried to remove an existing alias via a PR, since that would constitute a breaking change. Will probably make such and other sanity checks in another PR.

I would merge this PR in an attempt to start moving master branch towards the next major release. Other tasks that I had in mind for separate PRs:

  • Create a dump script that will help extract new Unicode 7 emojis when Apple releases them
  • 🔥 the old API such as name_for() and unicode_for()
  • Create new APIs for users to customize the list of emojis in their apps: add new (custom) emoji, add new aliases/tags to existing emoji

@aroben
Copy link
Contributor

aroben commented Jun 20, 2014

Seems like we don't need emoji.txt anymore, right?

= green_heart
❤️ (2764-fe0f) heavy black heart
= heart
~ love
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current release of gemoji also maps a plain U+2764 character to this image. Is it intentional that we're no longer doing that? (Windows shows an emoji for plain U+2764, OS X does not.)

@aroben
Copy link
Contributor

aroben commented Jun 20, 2014

Did you generate emoji.json manually? Should we have a script that will generate it?

@aroben
Copy link
Contributor

aroben commented Jun 20, 2014

It would be great to sort emoji.txt and emoji.json in ascending code point order.

@aroben
Copy link
Contributor

aroben commented Jun 20, 2014

So many great things in this PR! I love the test that the README docs actually work.

@aroben
Copy link
Contributor

aroben commented Jun 20, 2014

Looks like emoji.json needs to be added to the gemspec.

@mislav
Copy link
Contributor Author

mislav commented Jun 23, 2014

Thanks @aroben for the review. I'll fix these items and work towards wrapping up a 2.0 release soon

Previously, emoji name & unicode aliases were determined by following
symlinks among `images/emoji/*.png`. This led to nontrivial code for
resolving these aliases, made it tricky for contributors to add new
aliases and inspect existing ones, and didn't leave room for adding
metadata to emojis such as tags or descriptions from the Unicode spec.

Moreover, the aliases as symlinks led to duplication of image assets in
users' applications, with `hocho.png` and `knife.png` representing the
same emoji but being two separate images. Users were also unsure what to
do with `unicode/{HEX-NAME}.png` files, which would end up among their
images after running the `:emoji` task.

This change removes the symlinks support and creates the list of emojis
and their aliases in `emoji.json`. A single emoji is now represented with
an Emoji::Character instance, which has the `image_filename` method to
determine the path to the corresponding image instead of having to
construct it manually.
This list contains textual descriptions for emoji characters.
This script was used to construct the initial `emoji.json` file.

Example:

    rake db:dump | less -r
Instead of an unfriendly LocalJumpError, raise the Emoji::NotFound
exception with a helpful message if the block was not given.

If a fallback block was given, yield the value for which the lookup failed.
This brings it up to par with `aliases`, which includes `name` as well.
Ensure that tags accurately describe the original meaning of the emoji
(with respect to its definition from Unicode spec).
mislav added a commit that referenced this pull request Jun 27, 2014
Get emoji list & aliases from data file instead of symlinks
@mislav mislav merged commit d7b020f into master Jun 27, 2014
@mislav mislav deleted the no-symlinks branch June 27, 2014 10:33
@mislav
Copy link
Contributor Author

mislav commented Jun 27, 2014

@aroben I've nuked emoji.txt, fixed the gemspec, and adjusted the dump script to write out the JSON format.

@aroben
Copy link
Contributor

aroben commented Jun 27, 2014

⚡⚡⚡

On Fri, Jun 27, 2014 at 6:33 AM, Mislav Marohnić notifications@github.com
wrote:

@aroben https://github.com/aroben I've nuked emoji.txt, fixed the
gemspec, and adjusted the dump script to write out the JSON format.


Reply to this email directly or view it on GitHub
#47 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants