Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread#wakeup leads to Zlib::BufError #57

Open
rabotyaga opened this issue Aug 22, 2023 · 4 comments · May be fixed by #74
Open

Thread#wakeup leads to Zlib::BufError #57

rabotyaga opened this issue Aug 22, 2023 · 4 comments · May be fixed by #74

Comments

@rabotyaga
Copy link

rabotyaga commented Aug 22, 2023

Minimal reproducible script:

require "securerandom"
require "stringio"
require "zlib"

content = SecureRandom.base64(5000)
gzipped = Zlib.gzip(content)

thr = Thread.new do
  loop do
    Zlib::GzipReader.new(StringIO.new(gzipped)).read
  end
end

loop do
  thr.wakeup
end

leads to

#<Thread:0x000000010511d090 gunzip.rb:8 run> terminated with exception (report_on_exception is true):
gunzip.rb:10:in `initialize': buffer error (Zlib::BufError)
	from gunzip.rb:10:in `new'
	from gunzip.rb:10:in `block (2 levels) in <main>'
	from gunzip.rb:9:in `loop'
	from gunzip.rb:9:in `block in <main>'
gunzip.rb:15:in `wakeup': killed thread (ThreadError)
	from gunzip.rb:15:in `block in <main>'
	from gunzip.rb:14:in `loop'
	from gunzip.rb:14:in `<main>'

The error doesn't happen, however, if we change Zlib::GzipReader.new(StringIO.new(gzipped)).read to Zlib.gunzip(gzipped), but still happens with Zlib.gzip(content).

Probably related to #49

@jeremyevans
Copy link
Contributor

I can reproduce the issue. My Ruby backtrace was different, showing the problem in read, not initialize. Here's the Ruby backtrace:

#<Thread:0x00000eb3af6a3a80 t.rb:8 run> terminated with exception (report_on_exception is true):
t.rb:10:in `read': buffer error (Zlib::BufError)
        from t.rb:10:in `block (2 levels) in <main>'
        from t.rb:9:in `loop'
        from t.rb:9:in `block in <main>'
t.rb:15:in `wakeup': killed thread (ThreadError)
        from t.rb:15:in `block in <main>'
        from t.rb:14:in `loop'
        from t.rb:14:in `<main>'

This was run under gdb with a breakpoint on raise_zlib_error. He's the backtrace for that:

#0  raise_zlib_error (err=-5, msg=0x0) at ../../../../ext/zlib/zlib.c:323
#1  0x00000eb353370144 in zstream_run_try (value_arg=16165695284096) at ../../../../ext/zlib/zlib.c:1148
#2  0x00000eb2f4c9fc5b in rb_ensure () from /usr/local/lib/libruby32.so
#3  0x00000eb35336fc9c in zstream_run_synchronized (value_arg=16164904909080) at ../../../../ext/zlib/zlib.c:1186
#4  0x00000eb2f4c9fc5b in rb_ensure () from /usr/local/lib/libruby32.so
#5  0x00000eb353371785 in zstream_run (z=0xeb381e59500, src=0x0, len=<optimized out>, flush=2)
    at ../../../../ext/zlib/zlib.c:1203
#6  gzfile_read_more (gz=0xeb381e59500, outbuf=4) at ../../../../ext/zlib/zlib.c:2838
#7  0x00000eb3533714bd in gzfile_read_all (gz=0xeb381e59500) at ../../../../ext/zlib/zlib.c:2953
#8  0x00000eb35336e106 in rb_gzreader_read (argc=0, argv=0xeb3de866f10, obj=<optimized out>)
    at ../../../../ext/zlib/zlib.c:4017
#9  0x00000eb2f4ea2548 in vm_call_cfunc_with_frame () from /usr/local/lib/libruby32.so
#10 0x00000eb2f4ea4f73 in vm_sendish () from /usr/local/lib/libruby32.so
#11 0x00000eb2f4e829d1 in vm_exec_core () from /usr/local/lib/libruby32.so
#12 0x00000eb2f4e96b09 in rb_vm_exec () from /usr/local/lib/libruby32.so
#13 0x00000eb2f4eaa68c in invoke_block_from_c_bh () from /usr/local/lib/libruby32.so
#14 0x00000eb2f4ea98ef in loop_i () from /usr/local/lib/libruby32.so
#15 0x00000eb2f4c9f82a in rb_vrescue2 () from /usr/local/lib/libruby32.so
#16 0x00000eb2f4c9f69e in rb_rescue2 () from /usr/local/lib/libruby32.so
#17 0x00000eb2f4ea2548 in vm_call_cfunc_with_frame () from /usr/local/lib/libruby32.so
#18 0x00000eb2f4ea4f73 in vm_sendish () from /usr/local/lib/libruby32.so
#19 0x00000eb2f4e829d1 in vm_exec_core () from /usr/local/lib/libruby32.so
#20 0x00000eb2f4e96b09 in rb_vm_exec () from /usr/local/lib/libruby32.so
#21 0x00000eb2f4e94364 in rb_vm_invoke_proc () from /usr/local/lib/libruby32.so
#22 0x00000eb2f4e439c4 in thread_do_start_proc () from /usr/local/lib/libruby32.so
#23 0x00000eb2f4e43109 in thread_start_func_2 () from /usr/local/lib/libruby32.so
#24 0x00000eb2f4e427fc in thread_start_func_1 () from /usr/local/lib/libruby32.so
#25 0x00000eb2f3208755 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
#26 0x00000eb3323d781a in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:86

The failure in zstream_run_try is after this code:

err = (int)(VALUE)rb_thread_call_without_gvl(zstream_run_func, (void *)args,
                                                 zstream_unblock_func, (void *)args);

Looking at zstream_unblock_func, it has the comment:

* There is no safe way to interrupt z->run->func().

Based on the comment, my guess is that there is no safe way to interrupt zstream inflation/deflation, and Thread#wakeup causes an interrupt, so it is not possible to support what you want. At best, we could document that it is not supported. However, I'm not a zlib expert, so it's possible I'm misunderstanding things. Hopefully someone with more experience in this area could confirm or correct my understanding.

@rabotyaga
Copy link
Author

rabotyaga commented Aug 22, 2023

Thank you very much for looking into this!

My Ruby backtrace was different, showing the problem in read, not initialize

Sometimes it's read, sometimes initialize.

Based on the comment, my guess is that there is no safe way to interrupt zstream inflation/deflation

An interesting thing, then, would be

The error doesn't happen, however, if we change Zlib::GzipReader.new(StringIO.new(gzipped)).read to Zlib.gunzip(gzipped), but still happens with Zlib.gzip(content).

@ko1
Copy link
Contributor

ko1 commented Oct 6, 2023

I got this error.

    /* retry if no exception is thrown */
    if (err == Z_OK && args->interrupt) {
        args->interrupt = 0;
        goto loop;
    }

MRI calls zstream_unblock_func and set args->interrupt. It means that zstream_run_func is interrupted (and will be canceled). But sometimes it is okay to ignore the interrupts. The above code retries zstream_run_func.

HOWEVER, it is possible to complete the task (e.g. deflate) before cancelling and there is no data to deflate (for example). This is why BufError was raised (no data).

So the above retrying code should be:

    /* retry if no exception is thrown */
    if (err == Z_OK && args->interrupt && not_completed(z)) {
        args->interrupt = 0;
        goto loop;
    }

I'm not sure how to implement not_completed so I only leave this memo.

@ianks
Copy link

ianks commented Jan 13, 2024

#74 fixes this for certain cases. I was hoping it would resolve the BufError r sporadically get in prod but no such luck. The patch is still worthwhile since it does seem to fix certain interrupt failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants