Skip to content

Commit

Permalink
Fix incorrect check in cs_8559_5 in map_from_unicode()
Browse files Browse the repository at this point in the history
The condition `code == 0x0450 || code == 0x045D` is always false because
of an incorrect range check on code.
According to the BMP coverage in the encoding spec for ISO-8859-5
(https://encoding.spec.whatwg.org/iso-8859-5-bmp.html) the range of
valid characters is 0x0401 - 0x045F (except for 0x040D, 0x0450, 0x045D).
The current check has an upper bound of 0x044F instead of 0x045F.
Fix this by changing the upper bound.

Closes GH-10399

Signed-off-by: George Peter Banyard <girgias@php.net>
  • Loading branch information
nielsdos authored and Girgias committed Jan 25, 2023
1 parent b7a158a commit a8c8fb2
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 15 deletions.
1 change: 1 addition & 0 deletions NEWS
Expand Up @@ -15,6 +15,7 @@ PHP NEWS
- Standard:
. Fixed bug GH-10292 (Made the default value of the first param of srand() and
mt_srand() unknown). (kocsismate)
. Fix incorrect check in cs_8559_5 in map_from_unicode(). (nielsdos)

02 Feb 2023, PHP 8.1.15

Expand Down
2 changes: 1 addition & 1 deletion ext/standard/html.c
Expand Up @@ -477,7 +477,7 @@ static inline int map_from_unicode(unsigned code, enum entity_charset charset, u
*res = 0xF0; /* numero sign */
} else if (code == 0xA7) {
*res = 0xFD; /* section sign */
} else if (code >= 0x0401 && code <= 0x044F) {
} else if (code >= 0x0401 && code <= 0x045F) {
if (code == 0x040D || code == 0x0450 || code == 0x045D)
return FAILURE;
*res = code - 0x360;
Expand Down
28 changes: 14 additions & 14 deletions ext/standard/tests/strings/html_entity_decode_iso8859-5.phpt
Expand Up @@ -358,47 +358,47 @@ CYRILLIC SMALL LETTER YA: &#x44F; => ef
NUMERO SIGN: &#x2116; => f0
&#xF0; => &#xF0;

CYRILLIC SMALL LETTER IO: &#x451; => 2623783435313b
CYRILLIC SMALL LETTER IO: &#x451; => f1
&#xF1; => &#xF1;

CYRILLIC SMALL LETTER DJE: &#x452; => 2623783435323b
CYRILLIC SMALL LETTER DJE: &#x452; => f2
&#xF2; => &#xF2;

CYRILLIC SMALL LETTER GJE: &#x453; => 2623783435333b
CYRILLIC SMALL LETTER GJE: &#x453; => f3
&#xF3; => &#xF3;

CYRILLIC SMALL LETTER UKRAINIAN IE: &#x454; => 2623783435343b
CYRILLIC SMALL LETTER UKRAINIAN IE: &#x454; => f4
&#xF4; => &#xF4;

CYRILLIC SMALL LETTER DZE: &#x455; => 2623783435353b
CYRILLIC SMALL LETTER DZE: &#x455; => f5
&#xF5; => &#xF5;

CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I: &#x456; => 2623783435363b
CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I: &#x456; => f6
&#xF6; => &#xF6;

CYRILLIC SMALL LETTER YI: &#x457; => 2623783435373b
CYRILLIC SMALL LETTER YI: &#x457; => f7
&#xF7; => &#xF7;

CYRILLIC SMALL LETTER JE: &#x458; => 2623783435383b
CYRILLIC SMALL LETTER JE: &#x458; => f8
&#xF8; => &#xF8;

CYRILLIC SMALL LETTER LJE: &#x459; => 2623783435393b
CYRILLIC SMALL LETTER LJE: &#x459; => f9
&#xF9; => &#xF9;

CYRILLIC SMALL LETTER NJE: &#x45A; => 2623783435413b
CYRILLIC SMALL LETTER NJE: &#x45A; => fa
&#xFA; => &#xFA;

CYRILLIC SMALL LETTER TSHE: &#x45B; => 2623783435423b
CYRILLIC SMALL LETTER TSHE: &#x45B; => fb
&#xFB; => &#xFB;

CYRILLIC SMALL LETTER KJE: &#x45C; => 2623783435433b
CYRILLIC SMALL LETTER KJE: &#x45C; => fc
&#xFC; => &#xFC;

SECTION SIGN: &#xA7; => fd
&#xFD; => &#xFD;

CYRILLIC SMALL LETTER SHORT U: &#x45E; => 2623783435453b
CYRILLIC SMALL LETTER SHORT U: &#x45E; => fe
&#xFE; => &#xFE;

CYRILLIC SMALL LETTER DZHE: &#x45F; => 2623783435463b
CYRILLIC SMALL LETTER DZHE: &#x45F; => ff
&#xFF; => &#xFF;

0 comments on commit a8c8fb2

Please sign in to comment.