Replace non-emojis with actual emojis #27

robindiddams · 2021-02-17T16:01:46Z

I have noticed in my use of ecoji that there are many characters in it's 1024 emoji table that are not emojis:
🅰, 🅰🅱, 🅱🅾, 🅾,🇦, 🇦🇧, 🇧🇨, 🇪🇫, 🇫🇬, 🇭🇮, 🇮🇯, 🇯🇰, 🇰🇱, 🇱🇲, 🇴🇵, 🇵🇶, 🇶🇷, 🇹🇺, 🇺🇻, 🇻🇼, 🇼🇽, 🇽🇾, 🇾🇿, 🇿

I believe this was due to the script that was used to generate the mapping.go file it appears to have isolated multi-codepoint emojis's codepoints and mapped them individually, many of these are regional indicator symbols that in union with one other may form a flag emoji, (🇺🇸 for example is 🇺 and 🇸), but by themselves are not emojis and will not render colorfully.

I suggest that new single-codepoint emojis be selected and used to replace these. I understand that this would be a breaking change and would probably require a version 2 of ecoji. But I love emojis and I love the idea of encoding data as them and I think there are a lot of good uses for ecoji, but some of them seem less appealing if you might get some weird unicode box in the middle of a string.

I am happy to provide new emoji list suggestions and do the work of replacing them, but want most to start a discussion.

Please let me know if I am being unclear or misunderstanding. Thanks for your consideration. 🙏

tremblay · 2021-03-01T20:12:57Z

I would like to add the 5 "skin tone modifiers" to the list of emoji that should be replaced. These are 🏻, 🏼, 🏽, 🏾, 🏿 (also known as 0x1F3FB, 0x1F3FC, 0x1F3FD, 0x1F3FE, 0x1F3FF).

Since these characters are modifiers, they aren't considered standalone emoji by some libraries, which can cause numerous problems, including:

they can be unintentionally merged with the preceding character
they cannot be typed on most emoji keyboards
since they are not considered emojis by some implementations, they may alter the string length or be filtered out entirely

robindiddams · 2021-03-14T03:13:57Z

I've started work on the emoji replacements, I'll be doing the (replacing) work in this gist https://gist.github.com/Robindiddams/943202dbc129f16b64f2113ea91ce180 Only 2 so far, but if you have any ideas on possible replacements, feel free to leave a comment.

keith-turner · 2021-04-06T01:53:05Z

, I'll be doing the (replacing) work in this gist https://gist.github.com/Robindiddams/943202dbc129f16b64f2113ea91ce180

@robindiddams that is a very nice write up, I really like it. I agree it would be nice to replace the following with something more exciting.

I have noticed in my use of ecoji that there are many characters in it's 1024 emoji table that are not emojis:
🅰, 🅰🅱, 🅱🅾, 🅾,🇦, 🇦🇧, 🇧🇨, 🇪🇫, 🇫🇬, 🇭🇮, 🇮🇯, 🇯🇰, 🇰🇱, 🇱🇲, 🇴🇵, 🇵🇶, 🇶🇷, 🇹🇺, 🇺🇻, 🇻🇼, 🇼🇽, 🇽🇾, 🇾🇿, 🇿

Thinking about ecoji 2, it would be nice if it could support the following properties.

Anything encoded w/ ecoji 1 can be decoded by ecoji 2
Anything encoded w/ ecoji 2 can either be decoded by ecoji 1 or fail. A properly written ecoji 1 impl would never decode something differently though.

These two properties avoid chaos and confusion. It seems one way to achieve this is to ensure that when the same character is used in ecoji1 and ecoji2 that it has the same index (for index I am using the terminology from the table in your gist). I think your proposal may achieve this, but I am not sure.

robindiddams · 2021-04-07T21:40:05Z

Thanks @keith-turner

Anything encoded w/ ecoji 1 can be decoded by ecoji 2

Anything encoded w/ ecoji 2 can either be decoded by ecoji 1 or fail. A properly written ecoji 1 impl would impl would never decode something differently though.

I 100% agree and think this is probably doable in the decoder and if not I could ~~write~~ generate a regex to discern if it's ecoji 1 and hot-swap. With these two goals, once I finish choosing replacements, I'll see if I can hack something up 👍

keith-turner · 2021-04-07T23:01:25Z

I could write generate a regex to discern if it's ecoji 1 and hot-swap

@robindiddams I was thinking of a simple state machine for decoding. Regexes are state machines, but thinking one implicit in the code could be faster. Was thinking a decoder could conceptually start off in ecoj1or2 mode and stay there as long as it sees chars that are present in both. Once it sees a chars that is only in ecoji1 or only in ecoji 2, it could switch to either ecoji1 or ecoji2 mode where it only expects to see the chars that go with that mode and errors otherwise.

robindiddams · 2021-04-08T16:07:55Z

yeah that sounds pretty doable, I think we're on the same page 👍

keith-turner · 2021-11-04T01:02:50Z

@robindiddams on #29 you asked what else needs to be done for Ecoji 2. I would like to do the following

Update the README to mention ecoji 2 and use ecoji 2 for the encoding examples if that is not done. Want to explain that v2 can decode v1 and v2 w/o going into how this works too much in the readme. Want to explain the APIs in the go library are designed such that you have to change your code to use ecoji2. Any code previously written against the library will continue to encode w/ v1. So explain that one has to opt into to using ecoji2 which is important for cross language compat.
Update the documentation that goes over how to encode data using the Ecoji spec. Want to update these docs to go over the diffs between v1 and v2, how one can detect the diffs bettween v1 and v2 and how to encoded and decode data. Basically enough information for someone to implement ecoji v1 and v2 in another programming language. Also while working on ecoji2 I figured out a simpler and more efficient way to encode using 64 bit ints, want to update the docs with that.
Want to look into adding more test, making sure the test really cover all of the edge cases really well. Not sure if the test do this or not. In Java I have used code coverage tools, not sure that exists for go.

After doing the above I want to post a PR to merge the ecoji v2 branch and request that anyone who implemented ecoji v1 in another language review the PR if they are interested.

@robindiddams I may work on some of this Fri or Sat, but not sure. If you are interested in working on anything or have any ideas about what else should be done before v2 release let me know. You can always make additional PRs to the ecoji2 branch.

keith-turner · 2021-11-04T01:07:54Z

One other task I want to do is update the ecoji.io website to use v2 for encoding. The source for that site is in the gh-pages branch in this repo. Thought about adding some sort of v1/v2 toggle on the site, but I think that may clutter it too much. Thinking maybe just go w/ v2.

keith-turner · 2021-11-04T01:14:16Z

One other task I would like to do is attempt to simplify this function. Not sure if its possible. While working on ecoji v2 recently I made a lot of the code simpler, but that function continues to be complex.

keith-turner · 2021-11-04T01:17:04Z

On #29 there were some replacements of a few emojis that I was thinking of doing for v2, I have not done that yet.

dcow mentioned this issue Mar 2, 2021

Address ambiguous emoji #28

Closed

robindiddams mentioned this issue Jun 13, 2021

Replace Non emojis with real ones #29

Merged

keith-turner mentioned this issue Sep 4, 2022

Ecoji version 2 #30

Merged

keith-turner closed this as completed Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace non-emojis with actual emojis #27

Replace non-emojis with actual emojis #27

robindiddams commented Feb 17, 2021

tremblay commented Mar 1, 2021

robindiddams commented Mar 14, 2021

keith-turner commented Apr 6, 2021 •

edited

Loading

robindiddams commented Apr 7, 2021

keith-turner commented Apr 7, 2021 •

edited

Loading

robindiddams commented Apr 8, 2021

keith-turner commented Nov 4, 2021 •

edited

Loading

keith-turner commented Nov 4, 2021

keith-turner commented Nov 4, 2021

keith-turner commented Nov 4, 2021

Replace non-emojis with actual emojis #27

Replace non-emojis with actual emojis #27

Comments

robindiddams commented Feb 17, 2021

tremblay commented Mar 1, 2021

robindiddams commented Mar 14, 2021

keith-turner commented Apr 6, 2021 • edited Loading

robindiddams commented Apr 7, 2021

keith-turner commented Apr 7, 2021 • edited Loading

robindiddams commented Apr 8, 2021

keith-turner commented Nov 4, 2021 • edited Loading

keith-turner commented Nov 4, 2021

keith-turner commented Nov 4, 2021

keith-turner commented Nov 4, 2021

keith-turner commented Apr 6, 2021 •

edited

Loading

keith-turner commented Apr 7, 2021 •

edited

Loading

keith-turner commented Nov 4, 2021 •

edited

Loading