Skip to content

Commit 18e526c

Browse files
committed
Fix legacy text conversion filter for SJIS-2004
EUC-JP-2004 includes special byte sequences starting with 0x8E for kana. The legacy output routine for EUC-JP-2004 emits these sequences if the value of the output variable `s` is between 0x80 and 0xFF. Since the same routine was also used for SJIS-2004 and ISO-2022-JP-2004, before 8a915ed, the same 0x8E sequences would be emitted when converting to those text encodings as well. But that is completely wrong. 0x8E 0x__ does not mean the same in SJIS-2004 or ISO-2022-JP-2004 as it does in EUC-JP-2004. Therefore, in 8a915ed, I fixed the legacy conversion routine by checking whether the output encoding is EUC-JP-2004 or not. If it's not, and `s` is 0x80-0xFF, I made it emit an error. Well, it turns out that single bytes with values from 0xA1 to 0xDF are meaningful in SJIS-2004. To emit these bytes when appropriate, I had to amend the legacy conversion routine again. (For clarity, this does NOT mean reverting to the behavior prior to 8a915ed. We were right not to emit sequences starting with 0x8E in SJIS-2004. But in SJIS-2004, we *do* sometimes need to emit single bytes from 0xA1-0xDF.)
1 parent 3517a70 commit 18e526c

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

ext/mbstring/libmbfl/filters/mbfilter_sjis_2004.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -633,6 +633,8 @@ int mbfl_filt_conv_wchar_jis2004(int c, mbfl_convert_filter *filter)
633633
if (filter->to->no_encoding == mbfl_no_encoding_eucjp2004) {
634634
CK((*filter->output_function)(0x8e, filter->data));
635635
CK((*filter->output_function)(s1, filter->data));
636+
} else if (filter->to->no_encoding == mbfl_no_encoding_sjis2004 && (s1 >= 0xA1 && s1 <= 0xDF)) {
637+
CK((*filter->output_function)(s1, filter->data));
636638
} else {
637639
CK(mbfl_filt_conv_illegal_output(c, filter));
638640
}

0 commit comments

Comments
 (0)