Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace non-emojis with actual emojis #27

Closed
robindiddams opened this issue Feb 17, 2021 · 10 comments · Fixed by #29
Closed

Replace non-emojis with actual emojis #27

robindiddams opened this issue Feb 17, 2021 · 10 comments · Fixed by #29

Comments

@robindiddams
Copy link
Contributor

I have noticed in my use of ecoji that there are many characters in it's 1024 emoji table that are not emojis:
🅰, 🅰🅱, 🅱🅾, 🅾,🇦, 🇦🇧, 🇧🇨, 🇪🇫, 🇫🇬, 🇭🇮, 🇮🇯, 🇯🇰, 🇰🇱, 🇱🇲, 🇴🇵, 🇵🇶, 🇶🇷, 🇹🇺, 🇺🇻, 🇻🇼, 🇼🇽, 🇽🇾, 🇾🇿, 🇿

I believe this was due to the script that was used to generate the mapping.go file it appears to have isolated multi-codepoint emojis's codepoints and mapped them individually, many of these are regional indicator symbols that in union with one other may form a flag emoji, (🇺🇸 for example is 🇺 and 🇸), but by themselves are not emojis and will not render colorfully.

I suggest that new single-codepoint emojis be selected and used to replace these. I understand that this would be a breaking change and would probably require a version 2 of ecoji. But I love emojis and I love the idea of encoding data as them and I think there are a lot of good uses for ecoji, but some of them seem less appealing if you might get some weird unicode box in the middle of a string.

I am happy to provide new emoji list suggestions and do the work of replacing them, but want most to start a discussion.

Please let me know if I am being unclear or misunderstanding. Thanks for your consideration. 🙏

@tremblay
Copy link

tremblay commented Mar 1, 2021

I would like to add the 5 "skin tone modifiers" to the list of emoji that should be replaced. These are 🏻, 🏼, 🏽, 🏾, 🏿 (also known as 0x1F3FB, 0x1F3FC, 0x1F3FD, 0x1F3FE, 0x1F3FF).

Since these characters are modifiers, they aren't considered standalone emoji by some libraries, which can cause numerous problems, including:

  • they can be unintentionally merged with the preceding character
  • they cannot be typed on most emoji keyboards
  • since they are not considered emojis by some implementations, they may alter the string length or be filtered out entirely

@robindiddams
Copy link
Contributor Author

I've started work on the emoji replacements, I'll be doing the (replacing) work in this gist https://gist.github.com/Robindiddams/943202dbc129f16b64f2113ea91ce180 Only 2 so far, but if you have any ideas on possible replacements, feel free to leave a comment.

@keith-turner
Copy link
Owner

keith-turner commented Apr 6, 2021

, I'll be doing the (replacing) work in this gist https://gist.github.com/Robindiddams/943202dbc129f16b64f2113ea91ce180

@robindiddams that is a very nice write up, I really like it. I agree it would be nice to replace the following with something more exciting.

I have noticed in my use of ecoji that there are many characters in it's 1024 emoji table that are not emojis:
🅰, 🅰🅱, 🅱🅾, 🅾,🇦, 🇦🇧, 🇧🇨, 🇪🇫, 🇫🇬, 🇭🇮, 🇮🇯, 🇯🇰, 🇰🇱, 🇱🇲, 🇴🇵, 🇵🇶, 🇶🇷, 🇹🇺, 🇺🇻, 🇻🇼, 🇼🇽, 🇽🇾, 🇾🇿, 🇿

Thinking about ecoji 2, it would be nice if it could support the following properties.

  • Anything encoded w/ ecoji 1 can be decoded by ecoji 2
  • Anything encoded w/ ecoji 2 can either be decoded by ecoji 1 or fail. A properly written ecoji 1 impl would never decode something differently though.

These two properties avoid chaos and confusion. It seems one way to achieve this is to ensure that when the same character is used in ecoji1 and ecoji2 that it has the same index (for index I am using the terminology from the table in your gist). I think your proposal may achieve this, but I am not sure.

@robindiddams
Copy link
Contributor Author

Thanks @keith-turner

  • Anything encoded w/ ecoji 1 can be decoded by ecoji 2
  • Anything encoded w/ ecoji 2 can either be decoded by ecoji 1 or fail. A properly written ecoji 1 impl would impl would never decode something differently though.

I 100% agree and think this is probably doable in the decoder and if not I could write generate a regex to discern if it's ecoji 1 and hot-swap. With these two goals, once I finish choosing replacements, I'll see if I can hack something up 👍

@keith-turner
Copy link
Owner

keith-turner commented Apr 7, 2021

I could write generate a regex to discern if it's ecoji 1 and hot-swap

@robindiddams I was thinking of a simple state machine for decoding. Regexes are state machines, but thinking one implicit in the code could be faster. Was thinking a decoder could conceptually start off in ecoj1or2 mode and stay there as long as it sees chars that are present in both. Once it sees a chars that is only in ecoji1 or only in ecoji 2, it could switch to either ecoji1 or ecoji2 mode where it only expects to see the chars that go with that mode and errors otherwise.

@robindiddams
Copy link
Contributor Author

yeah that sounds pretty doable, I think we're on the same page 👍

@keith-turner
Copy link
Owner

keith-turner commented Nov 4, 2021

@robindiddams on #29 you asked what else needs to be done for Ecoji 2. I would like to do the following

  • Update the README to mention ecoji 2 and use ecoji 2 for the encoding examples if that is not done. Want to explain that v2 can decode v1 and v2 w/o going into how this works too much in the readme. Want to explain the APIs in the go library are designed such that you have to change your code to use ecoji2. Any code previously written against the library will continue to encode w/ v1. So explain that one has to opt into to using ecoji2 which is important for cross language compat.
  • Update the documentation that goes over how to encode data using the Ecoji spec. Want to update these docs to go over the diffs between v1 and v2, how one can detect the diffs bettween v1 and v2 and how to encoded and decode data. Basically enough information for someone to implement ecoji v1 and v2 in another programming language. Also while working on ecoji2 I figured out a simpler and more efficient way to encode using 64 bit ints, want to update the docs with that.
  • Want to look into adding more test, making sure the test really cover all of the edge cases really well. Not sure if the test do this or not. In Java I have used code coverage tools, not sure that exists for go.

After doing the above I want to post a PR to merge the ecoji v2 branch and request that anyone who implemented ecoji v1 in another language review the PR if they are interested.

@robindiddams I may work on some of this Fri or Sat, but not sure. If you are interested in working on anything or have any ideas about what else should be done before v2 release let me know. You can always make additional PRs to the ecoji2 branch.

@keith-turner
Copy link
Owner

One other task I want to do is update the ecoji.io website to use v2 for encoding. The source for that site is in the gh-pages branch in this repo. Thought about adding some sort of v1/v2 toggle on the site, but I think that may clutter it too much. Thinking maybe just go w/ v2.

@keith-turner
Copy link
Owner

One other task I would like to do is attempt to simplify this function. Not sure if its possible. While working on ecoji v2 recently I made a lot of the code simpler, but that function continues to be complex.

@keith-turner
Copy link
Owner

On #29 there were some replacements of a few emojis that I was thinking of doing for v2, I have not done that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants