Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards compatibility with non-emojified character sets #1

Closed
pento opened this issue Feb 8, 2015 · 3 comments
Closed

Backwards compatibility with non-emojified character sets #1

pento opened this issue Feb 8, 2015 · 3 comments
Assignees

Comments

@pento
Copy link
Owner

pento commented Feb 8, 2015

Some character sets in MySQL don't support emoji. It's a fairly sad state of affairs. If we change the emoji to their HTML encoded version, we can make everyone happy!

Here's a function I wrote to do it:

/**
 * Convert any 4 byte emoji in a string to their equivalent HTML entitiy.
 *
 * This allows us to store emoji in a DB using the utf8 character set.
 *
 * @since 4.2.0
 * @param  string $content The content to encode
 * @return string The encoded content
 */
function wp_encode_emoji( $content ) {
    if ( function_exists( 'mb_convert_encoding' ) ) {
        $regex = '/(
              \x23\xE2\x83\xA3               # Digits
              [\x30-\x39]\xE2\x83\xA3
            | \xF0\x9F[\x85-\x88][\xB0-\xBF] # Enclosed characters
            | \xF0\x9F[\x8C-\x97][\x80-\xBF] # Misc
            | \xF0\x9F\x98[\x80-\xBF]        # Smilies
            | \xF0\x9F\x99[\x80-\x8F]
            | \xF0\x9F\x9A[\x80-\xBF]        # Transport and map symbols
            | \xF0\x9F\x99[\x80-\x85]
        )/x';
        $matches = array();
        if ( preg_match_all( $regex, $content, $matches ) ) {
            if ( ! empty( $matches[1] ) ) {
                foreach( $matches[1] as $emoji ) {
                    $unpacked = unpack( 'H*', mb_convert_encoding( $emoji, 'UTF-32', 'UTF-8' ) );
                    if ( isset( $unpacked[1] ) ) {
                        $entity = '&#x' . trim( $unpacked[1], '0' ) . ';';
                        $content = str_replace( $emoji, $entity, $content );
                    }
                }
            }
        }
    }

    return $content;
}
@pento pento self-assigned this Feb 8, 2015
pento added a commit that referenced this issue Feb 8, 2015
This can be used to HTML encode emoji characters, so that they can be
stored in non-utf8mb4 fields.

See #1
@pento
Copy link
Owner Author

pento commented Feb 8, 2015

There's currently no hook to apply this function to options. We may add an appropriate hook, pending discussion on this ticket.

In the mean time, this ticket is a good place to discuss any other fields that should support HTML encoded emoji. The only rule is that the field must be used in HTML - I have no desire to decode the emoji at some later point.

@pento
Copy link
Owner Author

pento commented Feb 10, 2015

I'm inclined to only add support to blog name and blog title. We can add others in the future, if there's demand.

@pento
Copy link
Owner Author

pento commented Mar 4, 2015

I'm fine with this. Let's do it.

@pento pento closed this as completed Mar 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant