Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards compatibility with non-emojified character sets #1

pento opened this issue Feb 8, 2015 · 3 comments


None yet
1 participant
Copy link

commented Feb 8, 2015

Some character sets in MySQL don't support emoji. It's a fairly sad state of affairs. If we change the emoji to their HTML encoded version, we can make everyone happy!

Here's a function I wrote to do it:

 * Convert any 4 byte emoji in a string to their equivalent HTML entitiy.
 * This allows us to store emoji in a DB using the utf8 character set.
 * @since 4.2.0
 * @param  string $content The content to encode
 * @return string The encoded content
function wp_encode_emoji( $content ) {
    if ( function_exists( 'mb_convert_encoding' ) ) {
        $regex = '/(
              \x23\xE2\x83\xA3               # Digits
            | \xF0\x9F[\x85-\x88][\xB0-\xBF] # Enclosed characters
            | \xF0\x9F[\x8C-\x97][\x80-\xBF] # Misc
            | \xF0\x9F\x98[\x80-\xBF]        # Smilies
            | \xF0\x9F\x99[\x80-\x8F]
            | \xF0\x9F\x9A[\x80-\xBF]        # Transport and map symbols
            | \xF0\x9F\x99[\x80-\x85]
        $matches = array();
        if ( preg_match_all( $regex, $content, $matches ) ) {
            if ( ! empty( $matches[1] ) ) {
                foreach( $matches[1] as $emoji ) {
                    $unpacked = unpack( 'H*', mb_convert_encoding( $emoji, 'UTF-32', 'UTF-8' ) );
                    if ( isset( $unpacked[1] ) ) {
                        $entity = '&#x' . trim( $unpacked[1], '0' ) . ';';
                        $content = str_replace( $emoji, $entity, $content );

    return $content;

@pento pento self-assigned this Feb 8, 2015

pento added a commit that referenced this issue Feb 8, 2015

Add wp_encode_emoji(), and apply it to posts.
This can be used to HTML encode emoji characters, so that they can be
stored in non-utf8mb4 fields.

See #1

This comment has been minimized.

Copy link
Owner Author

commented Feb 8, 2015

There's currently no hook to apply this function to options. We may add an appropriate hook, pending discussion on this ticket.

In the mean time, this ticket is a good place to discuss any other fields that should support HTML encoded emoji. The only rule is that the field must be used in HTML - I have no desire to decode the emoji at some later point.


This comment has been minimized.

Copy link
Owner Author

commented Feb 10, 2015

I'm inclined to only add support to blog name and blog title. We can add others in the future, if there's demand.


This comment has been minimized.

Copy link
Owner Author

commented Mar 4, 2015

I'm fine with this. Let's do it.

@pento pento closed this Mar 4, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.