πŸ˜„ Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr)
PHP Java
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
esplugin Release 6.2.4 Apr 19, 2018
synonyms Add Region support and improve the README Jan 19, 2018
tools
.gitignore
LICENSE
README.md
emoticons.txt
unicode-license.txt

README.md

Emoji, flags and emoticons support for Elasticsearch

Add support for emoji and flags in any Lucene compatible search engine!

If you wish to search 🍩 to find donuts in your documents, you came to the right place.

The analysis-emoji Plugin

To index emoji, you need a custom Tokenizer which is not considering them as punctuation. You can either build an analyzer with the whitespace tokenizer as described here, or use this plugin.

The plugin expose a new emoji_tokenizer, based on icu_tokenizer but with custom BreakIterator rules to keep emoji!

Head over the /esplugin directory for installation instructions.

The Synonyms, flags and emoticons

Once you have a 🍩 token, you need to expand it to the token "donut", in your language. That's the goal of the synonym dictionaries.

We build Solr / Lucene compatible synonyms files in all languages supported by Unicode CLDR so you can set them up in an analyzer. It looks like this:

πŸ‘©β€πŸš’ => πŸ‘©β€πŸš’, firefighter, firetruck, woman
πŸ‘©β€βœˆ => πŸ‘©β€βœˆ, pilot, plane, woman
πŸ₯“ => πŸ₯“, bacon, meat, food
πŸ₯” => πŸ₯”, potato, vegetable, food
πŸ˜… => πŸ˜…, cold, face, open, smile, sweat
πŸ˜† => πŸ˜†, face, laugh, mouth, open, satisfied, smile
🚎 => 🚎, bus, tram, trolley
πŸ‡«πŸ‡· => πŸ‡«πŸ‡·, france
πŸ‡¬πŸ‡§ => πŸ‡¬πŸ‡§, united kingdom

For emoticons, use this mapping with a char_filter to replace emoticons by emoji.

Learn more about this in our blog post describing how to search with emoji in Elasticsearch (2016).

Getting started

Download the emoji and emoticon file you want from this repository and store them in PATH_ES/config/analysis.

config
β”œβ”€β”€ analysis
β”‚Β Β  β”œβ”€β”€ cldr-emoji-annotation-synonyms-en.txt
β”‚Β Β  └── emoticons.txt
β”œβ”€β”€ elasticsearch.yml
...

Use them like this:

PUT /en-emoji
{
  "settings": {
    "analysis": {
      "char_filter": {
        "emoticons_char_filter": {
          "type": "mapping",
          "mappings_path": "analysis/emoticons.txt"
        }
      },
      "filter": {
        "english_emoji": {
          "type": "synonym",
          "synonyms_path": "analysis/cldr-emoji-annotation-synonyms-en.txt" 
        }
      }
    }
  }
}

Head over the /esplugin directory for a fully functional mapping.

How to contribute

Build from CLDR SVN

You will need:

  • php cli
  • php zip and curl extensions

Edit the tag in tools/build-released.php and run php tools/build-released.php.

Update emoticons

Run php tools/build-emoticon.php.

Licenses

Emoji data courtesy of CLDR. See unicode-license.txt for details. Some modifications are done on the data, see here. Emoticon data based on https://github.com/wooorm/emoticon/ (MIT).

This repository in distributed under MIT License. Feel free to use and contribute as you please!