Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards compatibility with non-emojified character sets #1

Closed
pento opened this issue Feb 8, 2015 · 3 comments

Comments

Projects
None yet
1 participant
@pento
Copy link
Owner

commented Feb 8, 2015

Some character sets in MySQL don't support emoji. It's a fairly sad state of affairs. If we change the emoji to their HTML encoded version, we can make everyone happy!

Here's a function I wrote to do it:

/**
 * Convert any 4 byte emoji in a string to their equivalent HTML entitiy.
 *
 * This allows us to store emoji in a DB using the utf8 character set.
 *
 * @since 4.2.0
 * @param  string $content The content to encode
 * @return string The encoded content
 */
function wp_encode_emoji( $content ) {
    if ( function_exists( 'mb_convert_encoding' ) ) {
        $regex = '/(
              \x23\xE2\x83\xA3               # Digits
              [\x30-\x39]\xE2\x83\xA3
            | \xF0\x9F[\x85-\x88][\xB0-\xBF] # Enclosed characters
            | \xF0\x9F[\x8C-\x97][\x80-\xBF] # Misc
            | \xF0\x9F\x98[\x80-\xBF]        # Smilies
            | \xF0\x9F\x99[\x80-\x8F]
            | \xF0\x9F\x9A[\x80-\xBF]        # Transport and map symbols
            | \xF0\x9F\x99[\x80-\x85]
        )/x';
        $matches = array();
        if ( preg_match_all( $regex, $content, $matches ) ) {
            if ( ! empty( $matches[1] ) ) {
                foreach( $matches[1] as $emoji ) {
                    $unpacked = unpack( 'H*', mb_convert_encoding( $emoji, 'UTF-32', 'UTF-8' ) );
                    if ( isset( $unpacked[1] ) ) {
                        $entity = '&#x' . trim( $unpacked[1], '0' ) . ';';
                        $content = str_replace( $emoji, $entity, $content );
                    }
                }
            }
        }
    }

    return $content;
}

@pento pento self-assigned this Feb 8, 2015

pento added a commit that referenced this issue Feb 8, 2015

Add wp_encode_emoji(), and apply it to posts.
This can be used to HTML encode emoji characters, so that they can be
stored in non-utf8mb4 fields.

See #1
@pento

This comment has been minimized.

Copy link
Owner Author

commented Feb 8, 2015

There's currently no hook to apply this function to options. We may add an appropriate hook, pending discussion on this ticket.

In the mean time, this ticket is a good place to discuss any other fields that should support HTML encoded emoji. The only rule is that the field must be used in HTML - I have no desire to decode the emoji at some later point.

@pento

This comment has been minimized.

Copy link
Owner Author

commented Feb 10, 2015

I'm inclined to only add support to blog name and blog title. We can add others in the future, if there's demand.

@pento

This comment has been minimized.

Copy link
Owner Author

commented Mar 4, 2015

I'm fine with this. Let's do it.

@pento pento closed this Mar 4, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.