Skip to content

Commit 4bd1343

Browse files
hsbtclaude
andcommitted
Transcode UTF-16 IO input to UTF-8 on the libfyaml backend
libfyaml only consumes UTF-8, so a UTF-16 IO fed through the chunked reader reached it as raw bytes and was rejected as invalid UTF-8. When the IO's external encoding is UTF-16LE/BE, slurp the whole stream and transcode it first; a 2-byte unit could otherwise straddle a read boundary. Other non-UTF-8 encodings stay raw and libfyaml rejects them, matching psych's UTF-8/UTF-16-only IO contract (Shift_JIS still raises). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent ff15ddf commit 4bd1343

1 file changed

Lines changed: 19 additions & 1 deletion

File tree

ext/psych/psych_parser_fy.c

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,25 @@ static VALUE parse(VALUE self, VALUE handler, VALUE yaml, VALUE path)
289289
}
290290

291291
if (rb_respond_to(yaml, id_read)) {
292-
if (fy_parser_set_input_callback(parser->fyp, (void *)yaml, io_reader) != 0) {
292+
VALUE ext_enc = rb_funcall(yaml, rb_intern("external_encoding"), 0);
293+
int ext_idx = NIL_P(ext_enc) ? -1 : rb_to_encoding_index(ext_enc);
294+
295+
if (ext_idx == rb_enc_find_index("UTF-16LE") ||
296+
ext_idx == rb_enc_find_index("UTF-16BE")) {
297+
/* libfyaml only consumes UTF-8. A UTF-16 stream cannot be fed
298+
* through the chunked reader because a 2-byte unit may straddle a
299+
* read boundary, so slurp the whole stream and transcode it. Any
300+
* other non-UTF-8 external encoding is left raw and libfyaml will
301+
* reject it, matching psych's "UTF-8/UTF-16 only" IO contract. */
302+
VALUE content = rb_funcall(yaml, id_read, 0);
303+
if (NIL_P(content)) content = rb_str_new("", 0);
304+
StringValue(content);
305+
yaml = transcode_string(content);
306+
if (fy_parser_set_string(parser->fyp,
307+
RSTRING_PTR(yaml), (size_t)RSTRING_LEN(yaml)) != 0) {
308+
rb_raise(rb_eRuntimeError, "could not set libfyaml input");
309+
}
310+
} else if (fy_parser_set_input_callback(parser->fyp, (void *)yaml, io_reader) != 0) {
293311
rb_raise(rb_eRuntimeError, "could not set libfyaml input");
294312
}
295313
} else {

0 commit comments

Comments
 (0)