Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace emoji short codes with unicode representation #3799

Closed
JamieMagee opened this issue May 22, 2019 · 32 comments · Fixed by #4000 or #4409
Closed

Replace emoji short codes with unicode representation #3799

JamieMagee opened this issue May 22, 2019 · 32 comments · Fixed by #4000 or #4409
Labels
priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others

Comments

@JamieMagee
Copy link
Contributor

Emoji shortcodes don't render in all instances, specifically in emails. Additionally, they're mostly to help humans in typing emoji, without having to find the exact Unicode character.

I would like to replace the emoji shortcodes, with the actual Unicode code point. so :vertical_traffic_light: becomes 🚦, :recycle: becomes ♻️.

Thoughts?

@rarkins
Copy link
Collaborator

rarkins commented May 22, 2019

Sounds ok to me if it renders everywhere

@rarkins rarkins added priority-4-low Low priority, unlikely to be done unless it becomes important to more people ready type:refactor Refactoring or improving of existing code labels Jun 4, 2019
@rarkins
Copy link
Collaborator

rarkins commented Jun 22, 2019

@JamieMagee I noticed on GitHub that emoji shortcodes look best in commit messages (because they're text) but unicode would be better for PR titles, because otherwise GitHub doesn't render them. I was considering an option like "Replace emoji with unicode in PR titles" to solve this, but would this be compatible with Azure DevOps too?

@JamieMagee
Copy link
Contributor Author

@rarkins I think it would be simplest to just use unicode everywhere. Git is designed with UTF-8 support everywhere. From https://git-scm.com/docs/git-commit

Commit log messages are typically encoded in UTF-8, but other extended ASCII encodings are also supported.

Additionally most terminals support unicode, so displaying emoji as a developer should be supported.

The main issues I had with using emoji shortcodes is that they weren't supported everywhere. The 2 cases that affected me were in emails, and in Microsoft Teams notifications.

@renovate-bot
Copy link
Collaborator

🎉 This issue has been resolved in version 18.21.8 🎉

The release is available on:

Your semantic-release bot 📦🚀

@nvdbleek
Copy link

nvdbleek commented Jul 4, 2019

@rarkins would you be open for a PR that makes replacing emoji shortcodes with their unicode equivalent configurable? In older versions of bitbucket server (together with older version of MysQL) there is a problem with 4 byte Unicode Characters, which results in a 500 error when creating the PR.

@JamieMagee
Copy link
Contributor Author

@nvdbleek This seems like a reasonable request.

Out of interest, do you know which versions of Bitbucket Server/MySQL this affects? I can't find any detailed information for Bitbucket Server specifically, but I found this for Fisheye, which suggests MySQL 5.7.7, and this page suggesting that uft8mb4 was supported in MySQL 5.5

@nvdbleek
Copy link

nvdbleek commented Jul 4, 2019

@JamieMagee uft8mb4 is supported in MySQL 5.5, but you get into trouble because of the max key length of 767 bytes. Bitbucket has keys of 255 characters and 1020 bytes (4x255) is greater than the maximum of 767... So you can't use uft8mb4, or at least I couldn't get it to work ...

@JamieMagee
Copy link
Contributor Author

@nvdbleek Thanks for the explanation. I see that the limit was raised to 3072 bytes in MySQL 5.7, which I assume is why Fisheye requires 5.7.7 or higher.

@rarkins
Copy link
Collaborator

rarkins commented Jul 4, 2019

@JamieMagee should we revert in the meantime?

@JamieMagee
Copy link
Contributor Author

@rarkins Yeah, a revert is the quickest fix. I'm heading on vacation today, so I definitely won't have time to implement the configuration.

@rarkins
Copy link
Collaborator

rarkins commented Jul 4, 2019

I’m on vacation too so I think I’ll go with a revert today and then we reimplement with some kind of mapping later.

@nvdbleek
Copy link

nvdbleek commented Jul 4, 2019

@rarkins and @JamieMagee I would be happy to help and create a PR, but I will need some guidance on a config option name. Then I could make the change, create a test-case and create the PR.

@rarkins
Copy link
Collaborator

rarkins commented Jul 4, 2019

I think it might be named something like “emojiType”, default to value “shortcode” and have a second option “utf8”.

Starting with a revert seems best for now though.

@nvdbleek
Copy link

nvdbleek commented Jul 4, 2019

@rarkins are you going to re-open this issue? Then I can make the PR to rollback and attach it to this issue. And this evening I will start working on the "emojiType " config option (don't have a lot of time this evening, I might finish it this weekend)

@rarkins rarkins reopened this Jul 4, 2019
@rarkins rarkins removed priority-4-low Low priority, unlikely to be done unless it becomes important to more people ready type:refactor Refactoring or improving of existing code labels Jul 4, 2019
@nvdbleek
Copy link

nvdbleek commented Jul 7, 2019

sorry, I have to do more work for my day-job this weekend then I was anticipating... Will try to work on this next week, or otherwise the week after next week when I'm on vacation then I will have for sure more time available...

@rarkins
Copy link
Collaborator

rarkins commented Jul 8, 2019

We probably don't need to make this configurable, at least until we're presented with a compelling reason. For now it should be OK to use unicode for all except Bitbucket Server. So the question is - what's the best way to do that?

A nice way to be to use Bitbucket Server's getPrBody to massage unicode chars to emojis, including possibly with this library: https://github.com/iamcal/js-emoji

However I think for that to work then we need to ourselves determine and extract the emoji characters, e.g. something like this:

let output = '';
prBody.split('').forEach(char => {
  if (isEmoji(char)) {
    output += emoji.replace_unified(char);
  } else {
    output += char;
  }
});

@viceice
Copy link
Member

viceice commented Jul 11, 2019

Bitbucket server is working fine with unicode here. so i would prefer a config switch.

BBS: 6.4.0
SQL: MS SQL Server 2014 SP3 CU2 (SQL_Latin1_General_CP437_CI_AI)

@viceice
Copy link
Member

viceice commented Jul 11, 2019

as the bitbucket-server docu says, there is no unicode support with mysql for all versions:

image

https://confluence.atlassian.com/bitbucketserver/connecting-bitbucket-server-to-mysql-776640382.html

@viceice
Copy link
Member

viceice commented Jul 11, 2019

@nvdbleek any chance to switch the database to postgres or msssql or anything else with unicode support? This is a really bad blocker because we need a unicode replacement for the | char in markdown table

@JamieMagee
Copy link
Contributor Author

JamieMagee commented Jul 11, 2019

@viceice I don't think that's an issue. utf8 data type supports up to 3 bytes per character, and | is a single byte character in utf8 (0x7C). The problem is all emoji, like 🚦 are 4 bytes (In this case 0xF0 0x9F 0x9A 0xA6) and that requires use of utf8mb4, which is apparently not supported by Bitbucket Server.

@viceice
Copy link
Member

viceice commented Jul 12, 2019

@JamieMagee They are supported by Bitbucket Server, but not with MySQL.

I'm using emojis without problems on MS SQL Server.
I think the problem is, that Bitbucket supports the ancient MySQL 5.5 which has the index limit.
Also they only support the Connector/J 5.1, which is very old too, but supports utf8mb4

So it should be possible to use emojis, if you use Connector/J 5.1.47 with MySQL 5.7 / MariaDB 10.2 or later. (This versions comes with InnoDB 5.7 with expaned index length)

I know that the pipe is a single byte char. I've found another bug with markdown tables.

| head | test | e |
| --- | --- | --- |
| unescaped | `^7.0.0 | ^8.0.0` | test it |
| escaped | `^7.0.0 \| ^8.0.0` | test it |
head test e
unescaped `^7.0.0 ^8.0.0`
escaped ^7.0.0 | ^8.0.0 test it

So we have to escape the pipe.

As simple solution @rarkins and i thougt that we can simply replace the pipe with an unicode char, which looks like a pipe. But all alternatives require 4 bytes.

I will build some docker images to test emojis with newer mysql/mariadb support.

/edit: added issue link

@rarkins
Copy link
Collaborator

rarkins commented Jul 12, 2019

Our idea is:

  • Massage the markdown to make sure we don't break table columns in each platform
  • For platforms that display the escaped pipe (incorrectly), replace the \| with a unicode pipe lookalike

@nvdbleek
Copy link

@viceice I tried changing the collation of an old MySQL DB to mb4 UTF-8 and I can confirm that there is a problem with the max key length of 767 bytes. Bitbucket has keys of 255 characters and 1020 bytes (4x255) is greater than the maximum of 767. This limit is indeed bumped in MySQL 5.7. Atlassian even documents how you can change the collation to a 4-byte UTF-8 Unicode encoding on https://confluence.atlassian.com/kb/how-to-fix-the-collation-and-character-set-of-a-mysql-database-744326173.html

An upgrade to MySQL 5.7 for my database is on our roadmap. But I suspect that I'm not the only person that is running Bitbucket on an older MySQL. So the question is does Renovate wants to support those installations? If the answer is yes, we have to decide how we want to handle this. I like the proposal of @rarkins to massage Unicode chars to emoji Bitbucket Server's getPrBody. I don't see any substantial downsides of not using Unicode chars for Prs in Bitbucket (it is not that much extra code and it is also only a couple of bytes extra in the DB).

If we can agree on an approach I'm volunteering to create a GitHub PR.

@rarkins
Copy link
Collaborator

rarkins commented Jul 12, 2019

Given that Bitbucket Server (now) recommends against MySQL, and MySQL 5.7 was apparently released in 2013, I definitely have some doubts about how widespread this combination is amongst current or future Renovate users.

The main downsides for supporting unicode massaging are:

  • Having to maintain an extra config option for this (assuming we can't autodetect the DB in use)
  • Always having to think twice before using unicode anywhere else, e.g. the pipe substitution issue, which is not related to emoji

@viceice
Copy link
Member

viceice commented Jul 12, 2019

The escape bug in bitbucket server will be fixed in near future: commonmark/commonmark-java#149
hopefully not so long future 😄

@viceice
Copy link
Member

viceice commented Jul 12, 2019

Using initial db with utf8mb4_bin not supported, trying to convert later from utf8_bin
image

@viceice
Copy link
Member

viceice commented Jul 12, 2019

I've tried to create a sample with mysql 5.7: https://github.com/ViceIce/renovate-docker

I can't create a repository 😱 used utf8_bin as db collation

image

@nvdbleek
Copy link

@viceice and what if you change it to CHARACTER SET utf8mb4 COLLATE utf8mb4_bin ? The 4 byte encoding version?

@nvdbleek
Copy link

@viceice sorry, you first tried utf8mb4_bin, I missed that comment...

@viceice
Copy link
Member

viceice commented Jul 12, 2019

@nvdbleek install not possible, after install i can't convert column collation ecause of a key contraint

the error could be from docker for windows local path mapping, i will try a docker-volume later.

@viceice
Copy link
Member

viceice commented Jul 12, 2019

I'll also try Automated setup for Bitbucket Server

@rarkins rarkins added in progress priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others labels Jul 22, 2019
@renovate-bot
Copy link
Collaborator

🎉 This issue has been resolved in version 19.51.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others
Projects
None yet
5 participants