Skip to content

Commit

Permalink
mb_decode_mimeheader obeys RFC 2047 regarding underscores and QPrint …
Browse files Browse the repository at this point in the history
…encoding
  • Loading branch information
alexdowad committed Feb 22, 2023
1 parent 157ca65 commit 8995f60
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 2 deletions.
7 changes: 6 additions & 1 deletion NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ PHP NEWS
. Added json_validate(). (Juan Morales)

- MBString:
. mb_detect_encoding is better able to identify the correct encoding for Turkish text. (Alex Dowad)
. mb_detect_encoding is better able to identify the correct encoding for
Turkish text. (Alex Dowad)
. mb_detect_encoding's "non-strict" mode now behaves as described in the
documentation. Previously, it would return false if the very first byte
of the input string was invalid in all candidate encodings. (Alex Dowad)
Expand All @@ -62,6 +63,10 @@ PHP NEWS
MB_CASE_LOWER_SIMPLE and MB_CASE_TITLE_SIMPLE. (Alex Dowad)
. mb_detect_encoding is better able to identify UTF-8 and UTF-16 strings
with a byte-order mark. (Alex Dowad)
. mb_decode_mimeheader interprets underscores in QPrint-encoded MIME
encoded words as required by RFC 2047; they are converted to spaces.
Underscores must be encoded as "=5F" in such MIME encoded words.
(Alex Dowad)

- Opcache:
. Added start, restart and force restart time to opcache's
Expand Down
4 changes: 4 additions & 0 deletions UPGRADING
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ PHP 8.3 UPGRADE NOTES
casing rules for the Greek letter sigma. For mb_convert_case, conditional
casing only applies to MB_CASE_LOWER and MB_CASE_TITLE modes, not to
MB_CASE_LOWER_SIMPLE and MB_CASE_TITLE_SIMPLE. (Alex Dowad)
. mb_decode_mimeheader interprets underscores in QPrint-encoded MIME
encoded words as required by RFC 2047; they are converted to spaces.
Underscores must be encoded as "=5F" in such MIME encoded words.
(Alex Dowad)

- Standard:
. E_NOTICEs emitted by unserialized() have been promoted to E_WARNING.
Expand Down
5 changes: 4 additions & 1 deletion ext/mbstring/mbstring.c
Original file line number Diff line number Diff line change
Expand Up @@ -5705,7 +5705,10 @@ static unsigned char* mime_header_decode_encoded_word(unsigned char *p, unsigned
/* Fill `buf` with bytes from decoding QPrint */
while (p < e) {
unsigned char c = *p++;
if (c == '=' && (e - p) >= 2) {
if (c == '_') {
*bufp++ = ' ';
continue;
} else if (c == '=' && (e - p) >= 2) {
unsigned char c2 = *p++;
unsigned char c3 = *p++;
if (qprint_map[c2] >= 0 && qprint_map[c3] >= 0) {
Expand Down
23 changes: 23 additions & 0 deletions ext/mbstring/tests/mb_decode_mimeheader_variation5.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
--TEST--
Test mb_decode_mimeheader() function: use of underscores in QPrint-encoded data
--EXTENSIONS--
mbstring
--FILE--
<?php

// RFC 2047 says that in a QPrint-encoded MIME encoded word, underscores should be converted to spaces
var_dump(mb_decode_mimeheader("=?UTF-8?Q?abc?="));
var_dump(mb_decode_mimeheader("=?UTF-8?Q?abc_def?="));
var_dump(mb_decode_mimeheader("_=?UTF-8?Q?abc_def?=_"));
var_dump(mb_decode_mimeheader("=?UTF-8?Q?__=E6=B1=89=E5=AD=97__?="));

// This is how underscores should be encoded in MIME encoded words with QPrint
var_dump(mb_decode_mimeheader("=?UTF-8?Q?=5F?="));

?>
--EXPECT--
string(3) "abc"
string(7) "abc def"
string(9) "_abc def_"
string(10) " 汉字 "
string(1) "_"

0 comments on commit 8995f60

Please sign in to comment.