Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Non emojis with real ones #29

Merged
merged 5 commits into from Oct 8, 2021

Conversation

robindiddams
Copy link
Contributor

Fixes #27, Removes and replaces all emojis that are either not emojis (🅾) or are partial emojis (🏛). In trying to figure out one to replace I ended up writing a program to determine which ones needed to go and which ones i could choose from. It turns out a lot of them were multi codepoint emojis (which ecoji doesnt support) so I made it select ones for me from a valid list of sorted single codepoint emojis.

NOTE: I know it looks like I'm replacing a lot of emojis that appear to render fine but go ahead and look up any of them in the unicode emoji spec or emojipedia and you'll find that theyre all 2+ codepoints.

Other things:

  • Since I was generating new emoji sets over and over again I added a code generator for mapping.go. Modeled like the one from the rust impl. you can run go run gen.go to try it out.
  • I added 2 unit tests, although I think because I'm on go 1.16 I couldn't actually run them without making this a go module, so i did that locally but I didn't check in the go.mod file since I already changed enough stuff. (but in case you wanted to make this a go module, happy to help there too 👀)

Replacement table

Since it appears that the mapping needs to be sorted these replacements arent exactly 1-1 (ie, they wont necessarily fill the same spot that the previous one left because their unicode code point isnt the same). Unless we no longer care about the set being in numerical order then theres really no point in manually selecting replacements since we dont control where theyd be in the set.

Invalid Emoji (hex) Replacement (hex)
🅰 (1f170) 🛕 (1f6d5)
🅱 (1f171) 🛖 (1f6d6)
🅾 (1f17e) 🛗 (1f6d7)
🅿 (1f17f) 🛺 (1f6fa)
🇦 (1f1e6) 🛻 (1f6fb)
🇧 (1f1e7) 🛼 (1f6fc)
🇨 (1f1e8) 🟠 (1f7e0)
🇩 (1f1e9) 🟡 (1f7e1)
🇪 (1f1ea) 🟢 (1f7e2)
🇫 (1f1eb) 🟣 (1f7e3)
🇬 (1f1ec) 🟤 (1f7e4)
🇭 (1f1ed) 🟥 (1f7e5)
🇮 (1f1ee) 🟦 (1f7e6)
🇯 (1f1ef) 🟧 (1f7e7)
🇰 (1f1f0) 🟨 (1f7e8)
🇱 (1f1f1) 🟩 (1f7e9)
🇲 (1f1f2) 🟪 (1f7ea)
🇳 (1f1f3) 🟫 (1f7eb)
🇴 (1f1f4) 🤌 (1f90c)
🇵 (1f1f5) 🤍 (1f90d)
🇶 (1f1f6) 🤎 (1f90e)
🇷 (1f1f7) 🤏 (1f90f)
🇸 (1f1f8) 🤿 (1f93f)
🇹 (1f1f9) 🥱 (1f971)
🇺 (1f1fa) 🥲 (1f972)
🇻 (1f1fb) 🥷 (1f977)
🇼 (1f1fc) 🥸 (1f978)
🇽 (1f1fd) 🥻 (1f97b)
🇾 (1f1fe) 🦣 (1f9a3)
🇿 (1f1ff) 🦤 (1f9a4)
🈂 (1f202) 🦥 (1f9a5)
🈷 (1f237) 🦦 (1f9a6)
🌡 (1f321) 🦧 (1f9a7)
🌤 (1f324) 🦨 (1f9a8)
🌥 (1f325) 🦩 (1f9a9)
🌦 (1f326) 🦪 (1f9aa)
🌧 (1f327) 🦫 (1f9ab)
🌨 (1f328) 🦬 (1f9ac)
🌩 (1f329) 🦭 (1f9ad)
🌪 (1f32a) 🦮 (1f9ae)
🌫 (1f32b) 🦯 (1f9af)
🌬 (1f32c) 🦺 (1f9ba)
🌶 (1f336) 🦻 (1f9bb)
🍽 (1f37d) 🦼 (1f9bc)
🎖 (1f396) 🦽 (1f9bd)
🎗 (1f397) 🦾 (1f9be)
🎙 (1f399) 🦿 (1f9bf)
🎚 (1f39a) 🧃 (1f9c3)
🎛 (1f39b) 🧄 (1f9c4)
🎞 (1f39e) 🧅 (1f9c5)
🎟 (1f39f) 🧆 (1f9c6)
🏋 (1f3cb) 🧇 (1f9c7)
🏌 (1f3cc) 🧈 (1f9c8)
🏎 (1f3ce) 🧉 (1f9c9)
🏔 (1f3d4) 🧊 (1f9ca)
🏕 (1f3d5) 🧋 (1f9cb)
🏖 (1f3d6) 🧍 (1f9cd)
🏗 (1f3d7) 🧎 (1f9ce)
🏘 (1f3d8) 🧏 (1f9cf)
🏙 (1f3d9) 🧖 (1f9d6)
🏚 (1f3da) 🧗 (1f9d7)
🏛 (1f3db) 🧘 (1f9d8)
🏜 (1f3dc) 🧙 (1f9d9)
🏝 (1f3dd) 🧚 (1f9da)
🏞 (1f3de) 🧛 (1f9db)
🏟 (1f3df) 🧜 (1f9dc)
🏳 (1f3f3) 🧝 (1f9dd)
🏵 (1f3f5) 🧞 (1f9de)
🏷 (1f3f7) 🧟 (1f9df)
🏻 (1f3fb) 🧠 (1f9e0)
🏼 (1f3fc) 🧡 (1f9e1)
🏽 (1f3fd) 🧢 (1f9e2)
🏾 (1f3fe) 🧣 (1f9e3)
🏿 (1f3ff) 🧤 (1f9e4)
🐿 (1f43f) 🧥 (1f9e5)
👁 (1f441) 🧦 (1f9e6)
📽 (1f4fd) 🧧 (1f9e7)
🕉 (1f549) 🧨 (1f9e8)
🕊 (1f54a) 🧩 (1f9e9)
🕯 (1f56f) 🧪 (1f9ea)
🕰 (1f570) 🧫 (1f9eb)
🕳 (1f573) 🧬 (1f9ec)
🕴 (1f574) 🧭 (1f9ed)
🕵 (1f575) 🧮 (1f9ee)
🕶 (1f576) 🧯 (1f9ef)
🕷 (1f577) 🧰 (1f9f0)
🕸 (1f578) 🧱 (1f9f1)
🕹 (1f579) 🧲 (1f9f2)
🖇 (1f587) 🧳 (1f9f3)
🖊 (1f58a) 🧴 (1f9f4)
🖋 (1f58b) 🧵 (1f9f5)
🖌 (1f58c) 🧶 (1f9f6)
🖍 (1f58d) 🧷 (1f9f7)
🖐 (1f590) 🧸 (1f9f8)
🖥 (1f5a5) 🧹 (1f9f9)
🖨 (1f5a8) 🧺 (1f9fa)
🖱 (1f5b1) 🧻 (1f9fb)
🖲 (1f5b2) 🧼 (1f9fc)
🖼 (1f5bc) 🧽 (1f9fd)
🗂 (1f5c2) 🧾 (1f9fe)
🗃 (1f5c3) 🧿 (1f9ff)
🗄 (1f5c4) 🩰 (1fa70)
🗑 (1f5d1) 🩱 (1fa71)
🗒 (1f5d2) 🩲 (1fa72)
🗓 (1f5d3) 🩳 (1fa73)
🗜 (1f5dc) 🩴 (1fa74)
🗝 (1f5dd) 🩸 (1fa78)
🗞 (1f5de) 🩹 (1fa79)
🗡 (1f5e1) 🩺 (1fa7a)
🗣 (1f5e3) 🪀 (1fa80)
🗨 (1f5e8) 🪁 (1fa81)
🗯 (1f5ef) 🪂 (1fa82)
🗳 (1f5f3) 🪃 (1fa83)
🗺 (1f5fa) 🪄 (1fa84)
🛋 (1f6cb) 🪅 (1fa85)
🛍 (1f6cd) 🪆 (1fa86)
🛎 (1f6ce) 🪐 (1fa90)
🛏 (1f6cf) 🪑 (1fa91)
🛠 (1f6e0) 🪒 (1fa92)
🛡 (1f6e1) 🪓 (1fa93)
🛢 (1f6e2) 🪔 (1fa94)
🛣 (1f6e3) 🪕 (1fa95)
🛤 (1f6e4) 🪖 (1fa96)
🛥 (1f6e5) 🪗 (1fa97)
🛩 (1f6e9) 🪘 (1fa98)
🛰 (1f6f0) 🪙 (1fa99)
🛳 (1f6f3) 🪚 (1fa9a)
🦰 (1f9b0) 🪛 (1fa9b)
🦱 (1f9b1) 🪜 (1fa9c)
🦲 (1f9b2) 🪝 (1fa9d)
🦳 (1f9b3) 🪞 (1fa9e)

Sorry it took me so long to get around to this, been a busy few months 😅, lmk if anything is unclear or if you think I did anything egregious.

@robindiddams
Copy link
Contributor Author

Keeping it as a draft for now since its been awhile and I took a few liberties, @keith-turner if this is going in the direction you like then lmk and ill ready-for-review it

@keith-turner
Copy link
Owner

@robindiddams I took a quick look over this and it looks neat. I want to take a more in depth look which will take some time, I will do that soon.

@keith-turner keith-turner changed the base branch from master to ecojiv2 June 19, 2021 16:31
@keith-turner
Copy link
Owner

@robindiddams I looked over the code changes and have not seen any problems so far. I am going to look over the emoji mapping changes next.

I created an ecojiv2 branch and pointed this PR to it. I am thinking there are other changes I would like to make RE ecoji v2 (readme changes, possibly API changes, experiment w/ changes to ecoji.io, and possibly command line tool changes) before merging ecojiv2 into the master branch. May also want to request review of the branch from others who have written ecoji impls in other languages before merging it into master. Once all of this is done, I will merge the branch into master.

@keith-turner
Copy link
Owner

@robindiddams now that the ecojiv2 branch exists, are you ok with taking this PR out of draft status?

@robindiddams robindiddams marked this pull request as ready for review June 19, 2021 19:54
@robindiddams
Copy link
Contributor Author

@keith-turner thanks for taking the time, yeah that makes sense. Hearing from the other ecoji implementers will be fun too 🎉. Give me shout if theres any way I can help 👍

@keith-turner
Copy link
Owner

keith-turner commented Jun 21, 2021

@robindiddams I have been analyzing this and I do not think it has the property that something encoded w/ ecojiv1 will decoded to the same thing in ecojiv2. Below are some experiments I did. The hex 0180 corresponds to the binary 0000000110000000. Taking the first 10-bits of the binary is 0000000110 which is 6. Looking at this table the ordinal 6 maps to 🆎 in ecoji v1 and 🆔 in ecojiv2. When the hex 0180 is converted to binary data, encoded with ecoji v1, and then decoded w/ ecoji v2 it decodes to 0080 instead of 0180.

$ alias ecoji1=<path to ecoji v1>/ecoji
$ alias ecoji2=<path to ecoji v2>/ecoji
$ echo 0180 | xxd -r -ps | ecoji1
🆎🀄☕☕
$ echo 0180 | xxd -r -ps | ecoji2
🆔🀄☕☕
$ echo 0180 | xxd -r -ps | ecoji2 | ecoji2 -d | xxd -ps
0180
$ echo 0180 | xxd -r -ps | ecoji1 | ecoji2 -d | xxd -ps
0080
$ echo 0180 | xxd -r -ps | xxd -b
00000000: 00000001 10000000

For ecoji v2 to be able o decode ecoji v1, I think that for any emojis the two standards share they must have the same 10-bit ordinal. Looking at what you have done and the table I created, I am thinking that we can not have backwards compatibility and maintain the sort order. Personally I think its better to drop the sort order constraint and have backwards compatibility.

Below I took some data from here and modified it real quickly to make the first few have the same ordinals when used by both. I randomly selected a few codepoints not used in ecoji1 to fill in for the code points that are not candidates for ecoji 2. The following encoding example for ecoji 2 would maintain compat and break sorting, which again I think is fine.

Code point Emoji Candidate v1 ord v2 ord
U+1F004 🀄 true 0 0
U+1F0CF 🃏 true 1 1
U+1F170 🅰 false 2 -1
U+1F171 🅱 false 3 -1
U+1F17E 🅾 false 4 -1
U+1F17F 🅿 false 5 -1
U+1FAA1 🪡 true -1 2
U+1FAA2 🪢 true -1 3
U+1FAA3 🪣 true -1 4
U+1FAA4 🪤 true -1 5
U+1F18E 🆎 true 6 6
U+1F191 🆑 true 7 7
U+1F192 🆒 true 8 8
U+1F193 🆓 true 9 9

I am thinking a next step is to write a little program to list all of the candidates for ecoji2 that are not used by ecoji1. Then we just need to select the candidates that we want to fill in for the ordinals that ecoji 1 needs replaced.

@keith-turner
Copy link
Owner

I modified my little program here to print what emojis are available and not used by ecoji1. I also made it determine how many need to be replaced. There are 131 that need to be replaced and 220 are available to chose from. Below are all that are available to chose from

⌚ ⌛ ⏩ ⏪ ⏫ ⏬ ⏰ ⏳ ◽ ◾ ☔ ☕ ♈ ♉ ♊
♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓ ♿ ⚓ ⚡ ⚪ ⚫ ⚽
⚾ ⛄ ⛅ ⛎ ⛔ ⛪ ⛲ ⛳ ⛵ ⛺ ⛽ ✅ ✊ ✋ ✨
❌ ❎ ❓ ❔ ❕ ❗ ➕ ➖ ➗ ➰ ➿ ⬛ ⬜ ⭐ ⭕
📑 🙋 🛕 🛖 🛗 🛺 🛻 🛼 🟠 🟡 🟢 🟣 🟤 🟥 🟦
🟧 🟨 🟩 🟪 🟫 🤌 🤍 🤎 🤏 🤿 🥱 🥲 🥷 🥸 🥻
🦣 🦤 🦥 🦦 🦧 🦨 🦩 🦪 🦫 🦬 🦭 🦮 🦯 🦺 🦻
🦼 🦽 🦾 🦿 🧃 🧄 🧅 🧆 🧇 🧈 🧉 🧊 🧋 🧍 🧎
🧏 🧖 🧗 🧘 🧙 🧚 🧛 🧜 🧝 🧞 🧟 🧠 🧡 🧢 🧣
🧤 🧥 🧦 🧧 🧨 🧩 🧪 🧫 🧬 🧭 🧮 🧯 🧰 🧱 🧲
🧳 🧴 🧵 🧶 🧷 🧸 🧹 🧺 🧻 🧼 🧽 🧾 🧿 🩰 🩱
🩲 🩳 🩴 🩸 🩹 🩺 🪀 🪁 🪂 🪃 🪄 🪅 🪆 🪐 🪑
🪒 🪓 🪔 🪕 🪖 🪗 🪘 🪙 🪚 🪛 🪜 🪝 🪞 🪟 🪠
🪡 🪢 🪣 🪤 🪥 🪦 🪧 🪨 🪰 🪱 🪲 🪳 🪴 🪵 🪶
🫀 🫁 🫂 🫐 🫑 🫒 🫓 🫔 🫕 🫖

@keith-turner
Copy link
Owner

keith-turner commented Jun 21, 2021

I was looking at the list of one I posted trying to find the ones I liked and that seemed hard. I found it easier to look for ones I did not like and found the following.

⚪ ⚫ ❌ ❎ ❓ ❔ ❕ ❗ ➕ ➖ ➗ 🟠 🟡 🟢 🟣 🟤 🟥 🟦 🟧 🟨 🟩 🟪 🟫 ⬛ ⬜ ◽ ◾ ✅

Thought the following were redundant.

⌚ ⌛ ⏰ ⏳ 🤍 🤎 🧡

The following are padding

☕ 🙋📑

I also noticed the padding emoji 🏍 is listed as having two code points.

@robindiddams
Copy link
Contributor Author

Ah right because im assuming v2 until I hit v1, since the v2 order is different theyre already incompatible. duh 🥴. My bad 😅, so we're back to the original plan then! At least now we can choose fun emojis like 🪤! Cool, a little later this week ill sub in those, skipping the ones you dont like and picking ones distinctively

@robindiddams
Copy link
Contributor Author

robindiddams commented Sep 11, 2021

Okay @keith-turner sorry that took longer than I said 😅. Heres what I've come up with:

Padding

index V1 Emoji (hex) Replacement (hex) (name)
40 ⚜ (269c) 🪴 (1fab4) (Potted Plant)
41 🏍 (1f3cd) 🛼 (1f6fc) (Roller Skate)

Emojis

index V1 Emoji (hex) Replacement (hex) (name)
2 🅰 (1f170) 🦾 (1f9be) (Mechanical Arm)
3 🅱 (1f171) 🦿 (1f9bf) (Mechanical Leg)
4 🅾 (1f17e) 🦻 (1f9bb) (Ear with Hearing Aid)
5 🅿 (1f17f) 🧠 (1f9e0) (Brain)
17 🇦 (1f1e6) 🫀 (1fac0) (Anatomical Heart)
18 🇧 (1f1e7) 🫁 (1fac1) (Lungs)
19 🇨 (1f1e8) 🦧 (1f9a7) (Orangutan)
20 🇩 (1f1e9) 🦮 (1f9ae) (Guide Dog)
21 🇪 (1f1ea) 🦬 (1f9ac) (Bison)
22 🇫 (1f1eb) 🦣 (1f9a3) (Mammoth)
23 🇬 (1f1ec) 🦫 (1f9ab) (Beaver)
24 🇭 (1f1ed) 🦥 (1f9a5) (Sloth)
25 🇮 (1f1ee) 🦦 (1f9a6) (Otter)
26 🇯 (1f1ef) 🦨 (1f9a8) (Skunk)
27 🇰 (1f1f0) 🦤 (1f9a4) (Dodo)
28 🇱 (1f1f1) 🪶 (1fab6) (Feather)
29 🇲 (1f1f2) 🦩 (1f9a9) (Flamingo)
30 🇳 (1f1f3) 🦭 (1f9ad) (Seal)
31 🇴 (1f1f4) 🪲 (1fab2) (Beetle)
32 🇵 (1f1f5) 🪳 (1fab3) (Cockroach)
33 🇶 (1f1f6) 🪰 (1fab0) (Fly)
34 🇷 (1f1f7) 🪱 (1fab1) (Worm)
35 🇸 (1f1f8) 🫐 (1fad0) (Blueberries)
36 🇹 (1f1f9) 🫒 (1fad2) (Olive)
37 🇺 (1f1fa) 🫑 (1fad1) (Bell Pepper)
38 🇻 (1f1fb) 🧄 (1f9c4) (Garlic)
39 🇼 (1f1fc) 🧅 (1f9c5) (Onion)
40 🇽 (1f1fd) 🫓 (1fad3) (Flatbread)
41 🇾 (1f1fe) 🧇 (1f9c7) (Waffle)
42 🇿 (1f1ff) 🫔 (1fad4) (Tamale)
44 🈂 (1f202) 🧆 (1f9c6) (Falafel)
52 🈷 (1f237) 🫕 (1fad5) (Fondue)
91 🌡 (1f321) 🧈 (1f9c8) (Butter)
92 🌤 (1f324) 🦪 (1f9aa) (Oyster)
93 🌥 (1f325) 🫖 (1fad6) (Teapot)
94 🌦 (1f326) 🧋 (1f9cb) (Bubble Tea)
95 🌧 (1f327) 🧃 (1f9c3) (Beverage Box)
96 🌨 (1f328) 🧉 (1f9c9) (Mate Drink)
97 🌩 (1f329) 🧊 (1f9ca) (Ice Cube)
98 🌪 (1f32a) 🧭 (1f9ed) (Compass)
99 🌫 (1f32b) 🧱 (1f9f1) (Brick)
100 🌬 (1f32c) 🪨 (1faa8) (Rock)
110 🌶 (1f336) 🪵 (1fab5) (Wood)
181 🍽 (1f37d) 🛖 (1f6d6) (Hut)
204 🎖 (1f396) ⛪ (26ea) (Church)
205 🎗 (1f397) 🛕 (1f6d5) (Hindu Temple)
206 🎙 (1f399) ⛲ (26f2) (Fountain)
207 🎚 (1f39a) ⛺ (26fa) (Tent)
208 🎛 (1f39b) 🛻 (1f6fb) (Pickup Truck)
209 🎞 (1f39e) 🦽 (1f9bd) (Manual Wheelchair)
210 🎟 (1f39f) 🦼 (1f9bc) (Motorized Wheelchair)
254 🏋 (1f3cb) 🛺 (1f6fa) (Auto Rickshaw)
255 🏌 (1f3cc) ⛽ (26fd) (Fuel Pump)
256 🏎 (1f3ce) ⚓ (2693) (Anchor)
262 🏔 (1f3d4) ⛵ (26f5) (Sailboat)
263 🏕 (1f3d5) 🪂 (1fa82) (Parachute)
264 🏖 (1f3d6) 🧳 (1f9f3) (Luggage)
265 🏗 (1f3d7) 🪐 (1fa90) (Ringed Planet)
266 🏘 (1f3d8) ⭐ (2b50) (White Medium Star)
267 🏙 (1f3d9) ⛅ (26c5) (Sun Behind Cloud)
268 🏚 (1f3da) ☔ (2614) (Umbrella with Rain Drops)
269 🏛 (1f3db) ⚡ (26a1) (High Voltage Sign)
270 🏜 (1f3dc) ⛄ (26c4) (Snowman Without Snow)
271 🏝 (1f3dd) 🧨 (1f9e8) (Firecracker)
272 🏞 (1f3de) ✨ (2728) (Sparkles)
273 🏟 (1f3df) 🧧 (1f9e7) (Red Envelope)
291 🏳 (1f3f3) ⚽ (26bd) (Soccer Ball)
293 🏵 (1f3f5) ⚾ (26be) (Baseball)
294 🏷 (1f3f7) ⛳ (26f3) (Flag in Hole)
298 🏻 (1f3fb) 🤿 (1f93f) (Diving Mask)
299 🏼 (1f3fc) 🪀 (1fa80) (Yo-Yo)
300 🏽 (1f3fd) 🪁 (1fa81) (Kite)
301 🏾 (1f3fe) 🪄 (1fa84) (Magic Wand)
302 🏿 (1f3ff) 🧿 (1f9ff) (Nazar Amulet)
366 🐿 (1f43f) 🧩 (1f9e9) (Jigsaw Puzzle Piece)
368 👁 (1f441) 🧸 (1f9f8) (Teddy Bear)
555 📽 (1f4fd) 🪅 (1fa85) (Pinata)
619 🕉 (1f549) 🪆 (1fa86) (Nesting Dolls)
620 🕊 (1f54a) 🧵 (1f9f5) (Spool of Thread)
649 🕯 (1f56f) 🪡 (1faa1) (Sewing Needle)
650 🕰 (1f570) 🧶 (1f9f6) (Ball of Yarn)
651 🕳 (1f573) 🪢 (1faa2) (Knot)
652 🕴 (1f574) 🦺 (1f9ba) (Safety Vest)
653 🕵 (1f575) 🧣 (1f9e3) (Scarf)
654 🕶 (1f576) 🧤 (1f9e4) (Gloves)
655 🕷 (1f577) 🧥 (1f9e5) (Coat)
656 🕸 (1f578) 🧦 (1f9e6) (Socks)
657 🕹 (1f579) 🥻 (1f97b) (Sari)
659 🖇 (1f587) 🩱 (1fa71) (One-Piece Swimsuit)
660 🖊 (1f58a) 🩲 (1fa72) (Briefs)
661 🖋 (1f58b) 🩳 (1fa73) (Shorts)
662 🖌 (1f58c) 🩴 (1fa74) (Thong Sandal)
663 🖍 (1f58d) 🩰 (1fa70) (Ballet Shoes)
664 🖐 (1f590) 🥱 (1f971) (Yawning Face)
668 🖥 (1f5a5) 🧢 (1f9e2) (Billed Cap)
669 🖨 (1f5a8) 🪖 (1fa96) (Military Helmet)
670 🖱 (1f5b1) 🪗 (1fa97) (Accordion)
671 🖲 (1f5b2) 🪕 (1fa95) (Banjo)
672 🖼 (1f5bc) 🪘 (1fa98) (Long Drum)
673 🗂 (1f5c2) 🧮 (1f9ee) (Abacus)
674 🗃 (1f5c3) 🪔 (1fa94) (Diya Lamp)
675 🗄 (1f5c4) 🪙 (1fa99) (Coin)
676 🗑 (1f5d1) 🧾 (1f9fe) (Receipt)
677 🗒 (1f5d2) 🪓 (1fa93) (Axe)
678 🗓 (1f5d3) 🪃 (1fa83) (Boomerang)
679 🗜 (1f5dc) 🪚 (1fa9a) (Carpentry Saw)
680 🗝 (1f5dd) 🪛 (1fa9b) (Screwdriver)
681 🗞 (1f5de) 🦯 (1f9af) (Probing Cane)
682 🗡 (1f5e1) 🪝 (1fa9d) (Hook)
683 🗣 (1f5e3) 🧰 (1f9f0) (Toolbox)
684 🗨 (1f5e8) 🧲 (1f9f2) (Magnet)
685 🗯 (1f5ef) 🪜 (1fa9c) (Ladder)
686 🗳 (1f5f3) 🧪 (1f9ea) (Test Tube)
687 🗺 (1f5fa) 🧫 (1f9eb) (Petri Dish)
842 🛋 (1f6cb) 🧬 (1f9ec) (DNA Double Helix)
844 🛍 (1f6cd) 🩸 (1fa78) (Drop of Blood)
845 🛎 (1f6ce) 🩹 (1fa79) (Adhesive Bandage)
846 🛏 (1f6cf) 🩺 (1fa7a) (Stethoscope)
850 🛠 (1f6e0) 🪞 (1fa9e) (Mirror)
851 🛡 (1f6e1) 🪟 (1fa9f) (Window)
852 🛢 (1f6e2) 🪑 (1fa91) (Chair)
853 🛣 (1f6e3) 🪠 (1faa0) (Plunger)
854 🛤 (1f6e4) 🪤 (1faa4) (Mouse Trap)
855 🛥 (1f6e5) 🪒 (1fa92) (Razor)
856 🛩 (1f6e9) 🧴 (1f9f4) (Lotion Bottle)
859 🛰 (1f6f0) 🥲 (1f972) (Smiling Face with Tear)
860 🛳 (1f6f3) 🥸 (1f978) (Disguised Face)
1005 🦰 (1f9b0) 🧷 (1f9f7) (Safety Pin)
1006 🦱 (1f9b1) 🧹 (1f9f9) (Broom)
1007 🦲 (1f9b2) 🧺 (1f9fa) (Basket)
1008 🦳 (1f9b3) 🧻 (1f9fb) (Roll of Paper)

You can view the full list here: https://gist.github.com/robindiddams/943202dbc129f16b64f2113ea91ce180

I conducted your same test (but with different numbers):

echo 0450 | xxd -r -ps | ecoji1
🇦🏎☕☕

❯ echo 0450 | xxd -r -ps | ecoji2
🫀⚓☕☕

❯ echo 0450 | xxd -r -ps | ecoji2 | ecoji2 -d | xxd -ps
0450

❯ echo 0450 | xxd -r -ps | ecoji1 | ecoji2 -d | xxd -ps
0450

Since 2 of the padding chars changed, the code had to change a bit more too. I want to write some unit tests that would cover decoding something with each one but I'm also not that great at computer math so might need some help generating an encoded string for each 😅.

@keith-turner
Copy link
Owner

Okay @keith-turner sorry that took longer than I said

That happens. I should be able to look at this again this weekend.

Copy link
Owner

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robindiddams this is looking really nice, I partially reviewed the updates. I analyzed the new emojis.txt and emojisv1.txt and the changes look really good. I also looked at the code, but did not complete reviewing it. I have not had a chance to think through the padding changes, I will do that when I circle back to look at this again in a few days.

} else {
if ok := checkRune(c); !ok {
// try to fallback to ecoji v1
if isV1 := checkRuneV1(c); isV1 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be more strict for the following case.

  1. See a rune that is only used by ecoji v2
  2. See a run that is only used by ecoji v1

For the above sequence of events, probably want to return an error. I don't think it currently does, but not completely sure.

In other programming languages I might use an enum type for this instead of a bool, but I don't think go has enums. Could have two bools instead of one like ecojiV1 *bool, ecojiV2 *bool and whenever a rune is seens that is only used by v1 or v2 it will set it to true. When one is set to true then do not expect to see runes only used be the other.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robindiddams I pushed a new commit 7c0dda1 to the ecojiv2 branch to attempt to handle this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay awesome!

@keith-turner keith-turner merged commit e0f67ea into keith-turner:ecojiv2 Oct 8, 2021
@keith-turner
Copy link
Owner

@robindiddams I am merged this into the ecojiv2 branch so I can make follow on changes. Please feel free to submit more pull request against the ecojiv2 branch. I accidentally typed the preceding message in the commit message 😆 I may try to fix that real quick.

keith-turner pushed a commit that referenced this pull request Oct 8, 2021
@keith-turner
Copy link
Owner

I fixed the commit message by force pusing a new commit (85f4673) on the ecojiv2 branch

@keith-turner
Copy link
Owner

@robindiddams I have been doing a lot of work in the ecojiv2 branch over the past few days. I was just looking over the new emojis that you selected for ecoji v2. I was looking at some that you did not select that I thought were interesting, which I listed in the following table.

Code Emoji Description
U+1F977 🥷 Ninja
U+1F9D9 🧙 Mage
U+1F9DA 🧚 Fairy
U+1F9DB 🧛 Vampire
U+1F9DD 🧝 Elf
U+1F9DC 🧜 mperson
U+1F9DE 🧞 genie
U+1F9DF 🧟 Zombie

Then I looked for ones in ecoji v2 that I thought were the least interesting that could possibly be replaced with the above. I listed those below.

Code Emoji Description
U+1FAB4 🪴 potted plant
U+2728 sparkles
U+1F9F1 🧱 brick
U+1F9F4 🧴 lotion bottle
U+1F9FF 🧿 nazar amulet
U+1FA78 🩸 drop of blood
U+1FAB3 🪳 cockroach

Do you have any thoughts on making these replacements?

I feel like I am getting close to wrapping up the things I want do on the ecoji v2 branch. The remaining things I want to do are review the new emojis in detail (which I did a first pass at leading this post, still want to do a 2nd pass) and update the documentation (readme and docs dir).

@robindiddams
Copy link
Contributor Author

Yeah I actually avoided all the people emojis since they are almost always unioned to a skin tone and/or gender modifier so I preferred emojis that were more standalone recognizable. It certainly wouldn't be an issue to include them though.

Also I really like the sparkles and brick emojis, but I know brick looks weird on widows so I get it if you wanna drop em 🤷‍♀️.

@keith-turner
Copy link
Owner

Thinking about the following smaller set of replacements. The ninja and zombie being my favorite two. Dropped some others after reading your comment since it seems pretty similar to other existing emojis.

Code Emoji Description Replacement code Replacement emoji Description
U+1FAB4 🪴 potted plant U+1F977 🥷 Ninja
U+1F9F4 🧴 lotion bottle U+1F9D9 🧙 Mage
U+1F9FF 🧿 nazar amulet U+1F9DB 🧛 Vampire
U+1FA78 🩸 drop of blood U+1F9DE 🧞 genie
U+1FAB3 🪳 cockroach U+1F9DF 🧟 Zombie

Yeah I actually avoided all the people emojis since they are almost always unioned to a skin tone and/or gender modifier so I preferred emojis that were more standalone recognizable.

There are a lot of existing people emojis in use, I looked over the existing set of emojis and AFAICT the above people emojis are distinguishable from the existing people except for the vampire. Mulling over dropping the vampire as a replacement.

If the vampire is dropped could possibly do the following

Code Emoji Description Replacement code Replacement emoji Description
U+1FAB4 🪴 potted plant U+1F977 🥷 Ninja
U+1F9F4 🧴 lotion bottle U+1F9D9 🧙 Mage
U+1F9FF 🧿 nazar amulet U+1F9DE 🧞 genie
U+1FA78 🩸 drop of blood U+1F9DF 🧟 Zombie

Also I really like the sparkles and brick emojis

Well lets keep those then.

@robindiddams
Copy link
Contributor Author

okay sounds good! I like the replacements!

Well let's keep those then.

woo hoo! thanks man! 👌

Since this pr is done, do you want to go back to tracking things in #27 ? if theres more work you wanna delegate im happy to take em on.

@dcow
Copy link

dcow commented Nov 4, 2021

It seems like the scope of this issue has expanded beyond simply replacing "non emoji" (if this issue is not the right place I can open a new issue for v2). I have a few suggestions.

I would consider also replacing:

6 🆎 (1f18e) -
7 🆑 (1f191) -
8 🆒 (1f192) -
9 🆓 (1f193) -
10 🆔 (1f194) -
11 🆕 (1f195) -
12 🆖 (1f196) -
13 🆗 (1f197) -
14 🆘 (1f198) -
15 🆙 (1f199) -
16 🆚 (1f19a) -

and

45 🈚 (1f21a) -
46 🈯 (1f22f) -
47 🈲 (1f232) -
48 🈳 (1f233) -
49 🈴 (1f234) -
50 🈵 (1f235) -
51 🈶 (1f236) -

and

53 🈸 (1f238) -
54 🈹 (1f239) -
55 🈺 (1f23a) -
56 🉐 (1f250) -
57 🉑 (1f251) -
281 🏧 (1f3e7) -
488 💹 (1f4b9) -
546 📴 (1f4f4) -
582 🔙 (1f519) -
583 🔚 (1f51a) -
584 🔛 (1f51b) -
585 🔜 (1f51c) -
586 🔝 (1f51d) -
587 🔞 (1f51e) -
588 🔟 (1f51f) -
589 🔠 (1f520) -
590 🔡 (1f521) -
591 🔢 (1f522) -
592 🔣 (1f523) -
593 🔤 (1f524) -

The rationale is that these cannot be described in a language agnostic way. You have to know the language they're representing to accurately describe them. How do you tell someone to enter 🈹 or 🆓 over the phone if you're not a native speaker of either language, for instance?

Similarly I think it is hard to describe the different between the following sets:

64 🌆 (1f306) -
65 🌇 (1f307) -
76 🌒 (1f312) -
77 🌓 (1f313) -
78 🌔 (1f314) -
80 🌖 (1f316) -
-- -- --
81 🌗 (1f317) -
82 🌘 (1f318) -
216 🎥 (1f3a5) -
217 🎦 (1f3a6) -
190 🎆 (1f386) -
191 🎇 (1f387) -

(^possible color blindness issues)

798 🚚 (1f69a) -
799 🚛 (1f69b) -
781 🚉 (1f689) -
782 🚊 (1f68a) -
769 🙍 (1f64d) -
770 🙎 (1f64e) -
607 🔲 (1f532) -
608 🔳 (1f533) -
609 🔴 (1f534) -
610 🔵 (1f535) -
611 🔶 (1f536) -
612 🔷 (1f537) -
613 🔸 (1f538) -
614 🔹 (1f539) -
615 🔺 (1f53a) -
616 🔻 (1f53b) -
617 🔼 (1f53c) -
618 🔽 (1f53d) -
562 🔅 (1f505) -
563 🔆 (1f506) -
564 🔇 (1f507) -
565 🔈 (1f508) -
566 🔉 (1f509) -
567 🔊 (1f50a) -
557 🔀 (1f500) -
-- -- --
558 🔁 (1f501) -
559 🔂 (1f502) -
560 🔃 (1f503) -
561 🔄 (1f504) -
691 🗾 (1f5fe) -
452 💕 (1f495) -
461 💞 (1f49e) -
-- -- --
446 💏 (1f48f) -
448 💑 (1f491) -
-- -- --

Should note that not all of these

There are also a few that I don't understand why were replaced, like:

91 🌡 (1f321) 🧈 (1f9c8) (Butter)
98 🌪 (1f32a) 🧭 (1f9ed) (Compass)
181 🍽 (1f37d) 🛖 (1f6d6) (Hut)
204 🎖 (1f396) ⛪ (26ea) (Church)
205 🎗 (1f397) 🛕 (1f6d5) (Hindu Temple)
206 🎙 (1f399) ⛲ (26f2) (Fountain)
207 🎚 (1f39a) ⛺ (26fa) (Tent)
208 🎛 (1f39b) 🛻 (1f6fb) (Pickup Truck)
209 🎞 (1f39e) 🦽 (1f9bd) (Manual Wheelchair)
210 🎟 (1f39f) 🦼 (1f9bc) (Motorized Wheelchair)
366 🐿 (1f43f)
620 🕊 (1f54a) 🧵 (1f9f5) (Spool of Thread)
675 🗄 (1f5c4) 🪙 (1fa99) (Coin)
676 🗑 (1f5d1) 🧾 (1f9fe) (Receipt)
655 🕷 (1f577) 🧥 (1f9e5) (Coat)
656 🕸 (1f578) 🧦 (1f9e6) (Socks)
659 🖇 (1f587) 🩱 (1fa71) (One-Piece Swimsuit)
682 🗡 (1f5e1) 🪝 (1fa9d) (Hook)
681 🗞 (1f5de) 🦯 (1f9af) (Probing Cane)
294 🏷 (1f3f7) ⛳ (26f3) (Flag in Hole)
273 🏟 (1f3df) 🧧 (1f9e7) (Red Envelope)

Theres are all easy to describe and visually distinct, what was the rationale for why they needed replacement?

Finally here's an interesting one

93 🌥 (1f325) 🫖 (1fad6) (Teapot)

Sun behind cloud gets replaced with teapot, but then

267 🏙 (1f3d9) ⛅ (26c5) (Sun Behind Cloud)

night skyline gets replaced with a different sun behind cloud.

There are probably some more to pick through but figure I should get thoughts on this much first.

@keith-turner
Copy link
Owner

@dcow after reading your comment I created docs/emojis.md which is automatically generated by markdown_test.go.

I think @robindiddams removed some of the emojis that were used in ecoji V1 because there were not single code point fully qualified emojis. At the end of docs/emojis.md there is a section about candidates there were not used. I need to reconcile all of this w/ your comment but will have to do that later. In general we have a few options for changing the set of emojis for ecoji v2.

  • Replace emojis from the set of candidates
  • Expand the set of candidates and then replace emojis from the expanded set.

I need to go back and look over your suggestions and see where they fall. Also maybe your comment deserves its own issue.

@keith-turner
Copy link
Owner

@dcow I added a section to the end of emojis.md that shows the emojis in ecoji V1 that were not in the set of candidates.

@keith-turner
Copy link
Owner

I would consider also replacing:

@dcow I looked through what you posted in that section and I agree those are not interesting at all and would be nice to replace. I looked through the available unused candidates that I thought were interesting and came up with the following. I think you identified ~38 emojis that would be nice replace in your first section. I think there are around ~42 unused candidates below, so maybe could swap those.

Codepoint Emoji
U+270B
U+1F90C 🤌
U+1F90F 🤏
U+270A
U+1F9CF 🧏
U+1F9DA 🧚
U+1F9DB 🧛
U+1F9DC 🧜
U+1F9DD 🧝
U+1F9CD 🧍
U+1F9CE 🧎
U+1F9D6 🧖
U+1F9D7 🧗
U+1F9D8 🧘
U+1FAC2 🫂
U+1FAB4 🪴
U+1F9FF 🧿
U+1FA78 🩸
U+1F6D7 🛗
U+1F9F4 🧴
U+1FAA3 🪣
U+1F9FC 🧼
U+1FAA5 🪥
U+1F9FD 🧽
U+1F9EF 🧯
U+1FAA6 🪦
U+1FAA7 🪧
U+267F

The following unused candidates are less interesting than those above IMO but more interesting than the ones you identified.

Codepoint Emoji
U+26D4
U+2648
U+2649
U+264A
U+264B
U+264C
U+264D
U+264E
U+264F
U+2650
U+2651
U+2652
U+2653
U+26CE

keith-turner added a commit that referenced this pull request Nov 19, 2021
Based on the comment from @dcow made on #29 modfied the emojis used for Ecoji V2.

$ sed --file replace.sed -i emojisV2.txt

$ cat replace.sed
s/1f18e/270b/
s/1f191/1f90c/
s/1f192/1f90f/
s/1f193/270a/
s/1f194/1f9cf/
s/1f195/1f9da/
s/1f196/1f9db/
s/1f197/1f9dc/
s/1f198/1f9dd/
s/1f199/1f9cd/
s/1f19a/1f9ce/
s/1f21a/1f9d6/
s/1f22f/1f9d7/
s/1f232/1f9d8/
s/1f233/1fac2/
s/1f234/1fab4/
s/1f235/1f9ff/
s/1f236/1fa78/
s/1f238/1f6d7/
s/1f239/1f9f4/
s/1f23a/1faa3/
s/1f250/1f9fc/
s/1f251/1faa5/
s/1f3e7/1f9fd/
s/1f4b9/1f9ef/
s/1f4f4/1faa6/
s/1f519/1faa7/
s/1f51a/267f/
s/1f51b/26d4/
s/1f51c/2648/
s/1f51d/2649/
s/1f51e/264a/
s/1f51f/264b/
s/1f520/264c/
s/1f521/264d/
s/1f522/264e/
s/1f523/264f/
s/1f524/2650/
s/1f538/2651/
s/1f539/2652/
s/1f53a/2653/
s/1f53b/26ce/
@keith-turner keith-turner mentioned this pull request Sep 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace non-emojis with actual emojis
3 participants