Skip to content

Commit

Permalink
In normalize_string() replace 4-byte unicode characters with '?' char…
Browse files Browse the repository at this point in the history
…acter.

These are not supported in default utf-8 charset on mysql,
the chance we'd need them in searching is very low.
  • Loading branch information
alecpl committed Dec 12, 2013
1 parent 7eecf87 commit d19c0f9
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions program/lib/Roundcube/rcube_utils.php
Expand Up @@ -912,10 +912,20 @@ public static function tokenize_string($str)
*
* @param string Input string (UTF-8)
* @param boolean True to return list of words as array
*
* @return mixed Normalized string or a list of normalized tokens
*/
public static function normalize_string($str, $as_array = false)
{
// replace 4-byte unicode characters with '?' character,
// these are not supported in default utf-8 charset on mysql,
// the chance we'd need them in searching is very low
$str = preg_replace('/('
. '\xF0[\x90-\xBF][\x80-\xBF]{2}'
. '|[\xF1-\xF3][\x80-\xBF]{3}'
. '|\xF4[\x80-\x8F][\x80-\xBF]{2}'
. ')/', '?', $str);

// split by words
$arr = self::tokenize_string($str);

Expand Down

0 comments on commit d19c0f9

Please sign in to comment.