Skip to content

Commit

Permalink
Fix GH-13815: mb_trim() inaccurate $characters default value (#13820)
Browse files Browse the repository at this point in the history
Because the default characters are defined in the stub file, and the
stub file is UTF-8 (typically), the characters are encoded in the string
as UTF-8. When using a different character encoding, there is a mismatch
between what mb_trim expects and the UTF-8 encoded string it gets.

One way of solving this is by making the characters argument nullable,
which would mean that it always uses the internal code path that has the
unicode codepoints that are defaulted actually stored as codepoint
numbers instead of in a string.

Co-authored-by: @ranvis
  • Loading branch information
nielsdos committed Apr 24, 2024
1 parent 13a5a81 commit f813708
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 6 deletions.
2 changes: 1 addition & 1 deletion ext/mbstring/mbstring.c
Original file line number Diff line number Diff line change
Expand Up @@ -3129,7 +3129,7 @@ static void php_do_mb_trim(INTERNAL_FUNCTION_PARAMETERS, mb_trim_mode mode)
ZEND_PARSE_PARAMETERS_START(1, 3)
Z_PARAM_STR(str)
Z_PARAM_OPTIONAL
Z_PARAM_STR(what)
Z_PARAM_STR_OR_NULL(what)
Z_PARAM_STR_OR_NULL(encoding)
ZEND_PARSE_PARAMETERS_END();

Expand Down
6 changes: 3 additions & 3 deletions ext/mbstring/mbstring.stub.php
Original file line number Diff line number Diff line change
Expand Up @@ -139,11 +139,11 @@ function mb_ucfirst(string $string, ?string $encoding = null): string {}

function mb_lcfirst(string $string, ?string $encoding = null): string {}

function mb_trim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", ?string $encoding = null): string {}
function mb_trim(string $string, ?string $characters = null, ?string $encoding = null): string {}

function mb_ltrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", ?string $encoding = null): string {}
function mb_ltrim(string $string, ?string $characters = null, ?string $encoding = null): string {}

function mb_rtrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", ?string $encoding = null): string {}
function mb_rtrim(string $string, ?string $characters = null, ?string $encoding = null): string {}

/** @refcount 1 */
function mb_detect_encoding(string $string, array|string|null $encodings = null, bool $strict = false): string|false {}
Expand Down
4 changes: 2 additions & 2 deletions ext/mbstring/mbstring_arginfo.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions ext/mbstring/tests/gh13815.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
--TEST--
GH-13815 (mb_trim() inaccurate $characters default value)
--EXTENSIONS--
mbstring
--FILE--
<?php
$strUtf8 = "\u{3042}\u{3000}"; // U+3000: fullwidth space
var_dump(mb_strlen(mb_trim($strUtf8)));
var_dump(mb_strlen(mb_trim($strUtf8, encoding: 'UTF-8')));

mb_internal_encoding('Shift_JIS');
$strSjis = mb_convert_encoding($strUtf8, 'Shift_JIS', 'UTF-8');
var_dump(mb_strlen(mb_trim($strSjis)));
var_dump(mb_strlen(mb_trim($strSjis, encoding: 'Shift_JIS')));
?>
--EXPECT--
int(1)
int(1)
int(1)
int(1)

0 comments on commit f813708

Please sign in to comment.