Skip to content

Commit c5e58bc

Browse files
authored
Zlib.gunzip should not fail with utf-8 strings (#55)
zstream_discard_input was encoding and character-aware when given input is user-provided, so this discards `len` chars instead of `len` bytes. Also Zlib.gunzip explains in its rdoc that it is equivalent with the following code, but this doesn't fail for UTF-8 String. ```ruby string = %w[1f8b0800c28000000003cb48cdc9c9070086a6103605000000].pack("H*").force_encoding('UTF-8') sio = StringIO.new(string) p gz.read #=> "hello" gz&.close p Zlib.gunzip(string) #=> Zlib::DataError ``` Reported and discovered by eagletmt at https://twitter.com/eagletmt/status/1689692467929694209
1 parent a68a1f7 commit c5e58bc

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

ext/zlib/zlib.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -923,7 +923,7 @@ zstream_discard_input(struct zstream *z, long len)
923923
z->input = Qnil;
924924
}
925925
else {
926-
z->input = rb_str_substr(z->input, len,
926+
z->input = rb_str_subseq(z->input, len,
927927
RSTRING_LEN(z->input) - len);
928928
}
929929
}

test/zlib/test_zlib.rb

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1457,6 +1457,13 @@ def test_gunzip
14571457
assert_raise(Zlib::GzipFile::Error){ Zlib.gunzip(src) }
14581458
end
14591459

1460+
# Zlib.gunzip input is always considered a binary string, regardless of its String#encoding.
1461+
def test_gunzip_encoding
1462+
# vvvvvvvv = mtime, but valid UTF-8 string of U+0080
1463+
src = %w[1f8b0800c28000000003cb48cdc9c9070086a6103605000000].pack("H*").force_encoding('UTF-8')
1464+
assert_equal 'hello', Zlib.gunzip(src.freeze)
1465+
end
1466+
14601467
def test_gunzip_no_memory_leak
14611468
assert_no_memory_leak(%[-rzlib], "#{<<~"{#"}", "#{<<~'};'}")
14621469
d = Zlib.gzip("data")

0 commit comments

Comments
 (0)