Skip to content

Commit

Permalink
Fix #62545: wrong unicode mapping in some charsets
Browse files Browse the repository at this point in the history
Undefined characters are best mapped to Unicode REPLACEMENT characters.
  • Loading branch information
cmb69 committed Mar 11, 2018
1 parent 76fc73c commit 01ea314
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 5 deletions.
3 changes: 3 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ PHP NEWS
- IMAP:
. Fixed bug #75774 (imap_append HeapCorruction). (Anatol)

- Mbstring:
. Fixed bug #62545 (wrong unicode mapping in some charsets). (cmb)

- Opcache:
. Fixed bug #75720 (File cache not populated after SHM runs full). (Dmitry)
. Fixed bug #75579 (Interned strings buffer overflow may cause crash).
Expand Down
2 changes: 1 addition & 1 deletion ext/mbstring/libmbfl/filters/unicode_table_cp1251.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ static const unsigned short cp1251_ucs_table[] = {
0x0402, 0x0403, 0x201a, 0x0453, 0x201e, 0x2026, 0x2020, 0x2021,
0x20ac, 0x2030, 0x0409, 0x2039, 0x040a, 0x040c, 0x040b, 0x040f,
0x0452, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
0x003f, 0x2122, 0x0459, 0x203a, 0x045a, 0x045c, 0x045b, 0x045f,
0xfffd, 0x2122, 0x0459, 0x203a, 0x045a, 0x045c, 0x045b, 0x045f,
0x00a0, 0x040e, 0x045e, 0x0408, 0x00a4, 0x0490, 0x00a6, 0x00a7,
0x0401, 0x00a9, 0x0404, 0x00ab, 0x00ac, 0x00ad, 0x00ae, 0x0407,
0x00b0, 0x00b1, 0x0406, 0x0456, 0x0491, 0x00b5, 0x00b6, 0x00b7,
Expand Down
8 changes: 4 additions & 4 deletions ext/mbstring/libmbfl/filters/unicode_table_cp1252.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@
* as it only covers this range, while the rest cover 0xa0 onwards */

static const unsigned short cp1252_ucs_table[] = {
0x20ac,0xfffe,0x201a,0x0192,0x201e,0x2026,0x2020,0x2021,
0x02c6,0x2030,0x0160,0x2039,0x0152,0xfffe,0x017d,0xfffe,
0xfffe,0x2018,0x2019,0x201c,0x201d,0x2022,0x2013,0x2014,
0x02dc,0x2122,0x0161,0x203a,0x0153,0xfffe,0x017e,0x0178
0x20ac,0xfffd,0x201a,0x0192,0x201e,0x2026,0x2020,0x2021,
0x02c6,0x2030,0x0160,0x2039,0x0152,0xfffd,0x017d,0xfffd,
0xfffd,0x2018,0x2019,0x201c,0x201d,0x2022,0x2013,0x2014,
0x02dc,0x2122,0x0161,0x203a,0x0153,0xfffd,0x017e,0x0178
};
#endif /* UNICODE_TABLE_CP1252_H */
18 changes: 18 additions & 0 deletions ext/mbstring/tests/bug62545.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
--TEST--
Bug #62545 (wrong unicode mapping in some charsets)
--SKIPIF--
<?php
if (!extension_loaded('mbstring')) die('skip mbstring extension not available');
?>
--FILE--
<?php
var_dump(
bin2hex(mb_convert_encoding("\x98", 'UTF-8', 'Windows-1251')),
bin2hex(mb_convert_encoding("\x81\x8d\x8f\x90\x9d", 'UTF-8', 'Windows-1252'))
);
?>
===DONE===
--EXPECT--
string(6) "efbfbd"
string(30) "efbfbdefbfbdefbfbdefbfbdefbfbd"
===DONE===

0 comments on commit 01ea314

Please sign in to comment.