Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance regression with Redis protocol parser #488

Closed
tdyas opened this issue May 14, 2021 · 41 comments
Closed

performance regression with Redis protocol parser #488

tdyas opened this issue May 14, 2021 · 41 comments
Labels

Comments

@tdyas
Copy link
Contributor

tdyas commented May 14, 2021

I have encountered what I believe to be a performance bug in the Redis protocol parser, given thread dumps of my service using the redis crate show the service to be stuck in the Redis protocol parser. I would like advice on good ways to continue debugging this issue to narrow down a root cause.

Versions

  • Rust v1.51.0
  • redis 0.20.0
  • tokio 1.4.0

Background

My company is running a service that uses the redis crate to access an AWS Elasticache Redis instance. The service serves requests from its own clients via a Tokio event loop. The service is a cache service and typically uses less than 0.1 vCPU.

About 1-2 times per week, the service suddenly starts consuming all of its CPU limit and then gets throttled by the Linux kernel. This has occurred both with the limit to set to 1 vCPU and set to 0.5 vCPU. When this occurs, the service stops responding to requests entirely.

Diagnosis

I took two thread dumps of the same process about 10 minutes apart using gdb, and it appears that execution is stuck in the Redis parser.

Backtrace #1:

#0  0x00007f4f25afb141 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x000055e27139c8d6 in combine::stream::decode ()
#2  0x000055e2713a145f in <redis::parser::aio_support::ValueCodec as tokio_util::codec::decoder::Decoder>::decode ()
#3  0x000055e27139e7eb in tokio_util::codec::decoder::Decoder::decode_eof ()
#4  0x000055e2713afdd0 in <tokio_util::codec::framed_impl::FramedImpl<T,U,R> as futures_core::stream::Stream>::poll_next ()
#5  0x000055e2713ade59 in <redis::aio::PipelineSink<T,I,E> as futures_sink::Sink<redis::aio::PipelineMessage<SinkItem,I,E>>>::poll_flush ()
#6  0x000055e2713c3b62 in <futures_util::stream::stream::forward::Forward<St,Si,Item> as core::future::future::Future>::poll ()
#7  0x000055e2713a1ef9 in <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll ()
#8  0x000055e27139d3e9 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#9  0x000055e2713c1f4e in tokio::runtime::task::harness::poll_future ()
#10 0x000055e2713c29c2 in tokio::runtime::task::harness::Harness<T,S>::poll ()
#11 0x000055e27183d644 in std::thread::local::LocalKey<T>::with ()
#12 0x000055e271864ef2 in tokio::runtime::thread_pool::worker::Context::run_task ()
#13 0x000055e27186416e in tokio::runtime::thread_pool::worker::Context::run ()
#14 0x000055e27183f5b5 in tokio::macros::scoped_tls::ScopedKey<T>::set ()
#15 0x000055e271863fc6 in tokio::runtime::thread_pool::worker::run ()
#16 0x000055e27184a5b0 in tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut ()
#17 0x000055e27186866f in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#18 0x000055e27185ebae in tokio::runtime::task::harness::Harness<T,S>::poll ()
#19 0x000055e271842f2c in tokio::runtime::blocking::pool::Inner::run ()
#20 0x000055e27185478e in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#21 0x000055e271848e9a in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#22 0x000055e2718fdf0a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/alloc/src/boxed.rs:1521
#23 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/alloc/src/boxed.rs:1521
#24 std::sys::unix::thread::Thread::new::thread_start ()
    at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0//library/std/src/sys/unix/thread.rs:71
#25 0x00007f4f25dcafa3 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#26 0x00007f4f25b734cf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Note: The first trace was taken before I installed glibc debug symbols into the node for the node. However, the service runs in a container using a different Linux distribution, so I'm not sure the malloc decoding in the second trace is correct.

Backtrace #2:

#0  0x00007f4f25afe3a9 in malloc () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x000055e2713b8c16 in <combine::parser::PartialMode as combine::parser::ParseMode>::parse ()
#2  0x000055e2713c6e18 in <redis::parser::value::{{closure}}::{{closure}}::Dispatch<A,B,C,D,E,F> as combine::
parser::Parser<Input>>::parse_mode ()
#3  0x000055e2713b8145 in <combine::parser::PartialMode as combine::parser::ParseMode>::parse ()
#4  0x000055e2713a3fa0 in <combine::parser::combinator::AnySendSyncPartialStateParser<P> as combine::parser::
Parser<Input>>::parse_mode ()
#5  0x000055e2713a4ce6 in combine::parser::Parser::parse_with_state ()
#6  0x000055e27139c760 in combine::stream::decode ()
#7  0x000055e2713a145f in <redis::parser::aio_support::ValueCodec as tokio_util::codec::decoder::Decoder>::de
code ()
#8  0x000055e27139e7eb in tokio_util::codec::decoder::Decoder::decode_eof ()
#9  0x000055e2713afdd0 in <tokio_util::codec::framed_impl::FramedImpl<T,U,R> as futures_core::stream::Stream>::poll_next ()
#10 0x000055e2713ade59 in <redis::aio::PipelineSink<T,I,E> as futures_sink::Sink<redis::aio::PipelineMessage<SinkItem,I,E>>>::poll_flush ()
#11 0x000055e2713c3b62 in <futures_util::stream::stream::forward::Forward<St,Si,Item> as core::future::future::Future>::poll ()
#12 0x000055e2713a1ef9 in <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll ()
#13 0x000055e27139d3e9 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#14 0x000055e2713c1f4e in tokio::runtime::task::harness::poll_future ()
#15 0x000055e2713c29c2 in tokio::runtime::task::harness::Harness<T,S>::poll ()
#16 0x000055e27183d644 in std::thread::local::LocalKey<T>::with ()
#17 0x000055e271864ef2 in tokio::runtime::thread_pool::worker::Context::run_task ()
#18 0x000055e27186416e in tokio::runtime::thread_pool::worker::Context::run ()
#19 0x000055e27183f5b5 in tokio::macros::scoped_tls::ScopedKey<T>::set ()
#20 0x000055e271863fc6 in tokio::runtime::thread_pool::worker::run ()
#21 0x000055e27184a5b0 in tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut ()
#22 0x000055e27186866f in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#23 0x000055e27185ebae in tokio::runtime::task::harness::Harness<T,S>::poll ()
#24 0x000055e271842f2c in tokio::runtime::blocking::pool::Inner::run ()
#25 0x000055e27185478e in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#26 0x000055e271848e9a in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#27 0x000055e2718fdf0a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/alloc/src/boxed.rs:1521
#28 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/alloc/src/boxed.rs:1521
#29 std::sys::unix::thread::Thread::new::thread_start ()
    at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0//library/std/src/sys/unix/thread.rs:71
#30 0x00007f4f25dcafa3 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#31 0x00007f4f25b734cf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

I would like advice on what data would be useful to proceed to narrow this down.

@tdyas tdyas changed the title parser performance regression Redis parser performance regression May 14, 2021
@tdyas tdyas changed the title Redis parser performance regression performance regression with Redis protocol parser May 14, 2021
@enzious
Copy link

enzious commented May 14, 2021

This is probably this issue: #487
Which I addressed with this PR: #486
(The test failure seems unrelated)

For my investigation see: #310 (comment)

@tdyas
Copy link
Contributor Author

tdyas commented May 14, 2021

My service does not use any pub/sub features, just GET/SET/EXISTS commands. #486 looks like it is only concerned with pub/sub commands. Am I missing some sort of more fundamental change made by that PR?

@enzious
Copy link

enzious commented May 14, 2021

@tdyas What are you using that is using the aio stream?

Are you hitting this line: https://github.com/mitsuhiko/redis-rs/blob/master/src/parser.rs#L148

Read this: #310 (comment)
I suspect you're hitting something similar, misformatted stream breaking the parser and looping infinitely. May not be same cause, but same result.

@tdyas
Copy link
Contributor Author

tdyas commented May 14, 2021

I'm using the Tokio ConnectionManager via get_tokio_connection_manager on RedisClient. The service uses tokio and tonic crates to expose a gRPC service with Redis as a backend, so is fully async.

@tdyas
Copy link
Contributor Author

tdyas commented May 14, 2021

Are you hitting this line: https://github.com/mitsuhiko/redis-rs/blob/master/src/parser.rs#L148

Hard to tell from the backtraces I posted above. Unfortunately, they don't have line number info and the one call that is in the parser code is just listed as <redis::parser::value::{{closure}}::{{closure}}::Dispatch<A,B,C,D,E,F> as combine:: parser::Parser<Input>>::parse_mode ().

@enzious
Copy link

enzious commented May 14, 2021

I see Framed<...> and ValueCodec in your stacktrace. Are you processing a Stream from redis somehow?

@tdyas
Copy link
Contributor Author

tdyas commented May 14, 2021

I see Framed<...> and ValueCodec in your stacktrace. Are you processing a Stream from redis somehow?

Yes, this is a storage service using a gRPC streaming RPC. The service implements the Google ByteStream gRPC service where both reads and writes are gRPC streaming RPCs.

The tonic crate uses a Stream to implement streaming RPCs. Our Redis storage driver uses futures::stream::unfold to expand block numbers and then, for each block number, it retrieves that block from Redis by generating a future from the async call into the Redis client.

@tdyas
Copy link
Contributor Author

tdyas commented May 14, 2021

Does ConnectionManager properly handle multiple async tasks cloning it and concurrently submitting requests? Or is that not a supported use case? I wonder if maybe the out-of-sync state is resulting somehow from that?

@enzious
Copy link

enzious commented May 14, 2021

That's certainly a possibility to look at. The assumption that I've seen in the code is that the response always comes after the request... which is not true in at least one case. When the server sends a message that isn't a response to a request.

@tdyas
Copy link
Contributor Author

tdyas commented May 14, 2021

That's certainly a possibility to look at. The assumption that I've seen in the code is that the response always comes after the request... which is not true in at least one case. When the server sends a message that isn't a response to a request.

ConnectionManager uses MultiplexedConnection in its implementation, so I would hope that MultiplexedConnection can handle use by concurrent async tasks.

Looking at the Redis protocol documentation, the protocol is described as "simple" request-response but with two exceptions: pub/sub and pipelining:

Excluding the above two exceptions, the Redis protocol is a simple request-response protocol.

My service does use Redis pipelining in part of its implementation. Maybe relevant?

@tdyas
Copy link
Contributor Author

tdyas commented May 14, 2021

Also the redis crate is probably the source of the Framed given it is used in the implementation of MultiplexedConnection: https://github.com/mitsuhiko/redis-rs/blob/d9fdcedf3af6ed4257d950498450d0a2333acf48/src/aio.rs#L852

@enzious
Copy link

enzious commented May 14, 2021

I think you're definitely on the right track with pipe-lining.

@enzious
Copy link

enzious commented May 15, 2021

I would create a MVCE that does what you're doing now (minus the grpc, etc) but much faster and runs into the issue. That we can throw into some debugging and find out what the protocol stream is doing.

@enzious
Copy link

enzious commented May 15, 2021

I wrote this branch to throw away unparseable lines. It consumes them so that the parser can continue:
https://github.com/mitsuhiko/redis-rs/compare/master...enzious:fix/parser-infinite-loop?expand=1

It would need to be combined with the #486 PR to not break sub/unsub.

See this for why it gets stuck: https://docs.rs/combine/4.5.2/combine/fn.unexpected_any.html
Basically, if it can't read the first character at all, it gets stuck hitting the unexpected_any(...) call, which never consumes as documented.

@Marwes
Copy link
Collaborator

Marwes commented May 17, 2021

The parser shouldn't hang regardless, but why is the server sending something that is not expected by the protocol in the first place

@tdyas
Copy link
Contributor Author

tdyas commented May 17, 2021

The parser shouldn't hang regardless, but why is the server sending something that is not expected by the protocol in the first place

For context, the server in this case is AWS Elasticache for Redis with the redis6 profile.

@tdyas
Copy link
Contributor Author

tdyas commented May 17, 2021

Is there any recommended way to see what the server is sending in these cases? Probably just run my server with a modified version of the parser to log the data that is causing it an issue?

@enzious
Copy link

enzious commented May 17, 2021

The change I posted (https://github.com/mitsuhiko/redis-rs/compare/master...enzious:fix/parser-infinite-loop?expand=1) will have the offending line returned in the error. You could also change it to print to stdout.

@Marwes
Copy link
Collaborator

Marwes commented May 17, 2021

The code as is doesn't seem to print an error though so changing it to try and print a line probably wont help

@enzious
Copy link

enzious commented May 17, 2021

@Marwes I've gotten it to print out a partial message with the subscription issue I fixed in my other PR.
image
image

@Marwes
Copy link
Collaborator

Marwes commented May 18, 2021

I think this may be fixed by 306797c which hasn't been released. The stacktraces point into decode_eof so there is likely something getting stuck during shutdown (no idea why that is though). Will try to get a release out later today.

@tdyas
Copy link
Contributor Author

tdyas commented May 20, 2021

FYI I'm going to try and get out a deploy of my service today with v0.20.1.

@tdyas
Copy link
Contributor Author

tdyas commented May 26, 2021

0.20.1 didn't fix the issue. The infinite loop occurred again this morning. Stack trace taken via gdb from outside container:

Thread 6 (LWP 4149):
#0  0x00007fc152a6b7ef in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000562ce0a5e930 in mio::poll::Poll::poll ()
#2  0x0000562ce0a352e5 in tokio::io::driver::Driver::turn ()
#3  0x0000562ce0a2d6a3 in <tokio::park::either::Either<A,B> as tokio::park::Park>::park ()
#4  0x0000562ce0a3c6b3 in tokio::time::driver::Driver<P>::park_internal ()
#5  0x0000562ce0a2d57e in <tokio::park::either::Either<A,B> as tokio::park::Park>::park ()
#6  0x0000562ce0a58149 in <tokio::runtime::driver::Driver as tokio::park::Park>::park ()
#7  0x0000562ce063deb0 in tokio::macros::scoped_tls::ScopedKey<T>::set ()
#8  0x0000562ce0603637 in tokio::runtime::basic_scheduler::BasicScheduler<P>::block_on ()
#9  0x0000562ce063e1dd in tokio::runtime::Runtime::block_on ()
#10 0x0000562ce05f2c76 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#11 0x0000562ce0607658 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#12 0x0000562ce0ae327a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#13 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#14 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:71
#15 0x00007fc152cc2fa3 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007fc152a6b4cf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Thread 5 (LWP 4132):
#0  0x00007fc152a6b7ef in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000562ce0a5e930 in mio::poll::Poll::poll () at library/std/src/panicking.rs:519
#2  0x0000562ce0a352e5 in tokio::io::driver::Driver::turn () at library/std/src/panicking.rs:519
#3  0x0000562ce0a2d6a3 in <tokio::park::either::Either<A,B> as tokio::park::Park>::park ()
    at library/std/src/panicking.rs:519
#4  0x0000562ce0a3c6b3 in tokio::time::driver::Driver<P>::park_internal ()
    at library/std/src/panicking.rs:519
#5  0x0000562ce0a2d57e in <tokio::park::either::Either<A,B> as tokio::park::Park>::park ()
    at library/std/src/panicking.rs:519
#6  0x0000562ce0a58149 in <tokio::runtime::driver::Driver as tokio::park::Park>::park ()
    at library/std/src/panicking.rs:519
#7  0x0000562ce07e1160 in tokio::macros::scoped_tls::ScopedKey<T>::set ()
    at library/std/src/panicking.rs:519
#8  0x0000562ce079fda1 in tokio::runtime::basic_scheduler::BasicScheduler<P>::block_on ()
    at library/std/src/panicking.rs:519
#9  0x0000562ce07cec94 in tokio::runtime::Runtime::block_on () at library/std/src/panicking.rs:519
#10 0x0000562ce07a81f3 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    at library/std/src/panicking.rs:519
#11 0x0000562ce0807eb1 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    at library/std/src/panicking.rs:519
#12 0x0000562ce0ae327a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#13 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#14 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:71
#15 0x00007fc152cc2fa3 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007fc152a6b4cf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Thread 4 (LWP 4130):
#0  0x00007fc152cc93f9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
   from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000562ce0ae1ae5 in std::sys::unix::condvar::Condvar::wait_timeout ()
    at library/std/src/sys/unix/condvar.rs:98
#2  0x0000562ce093fbe6 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    at library/std/src/panicking.rs:519
#3  0x0000562ce093edc3 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    at library/std/src/panicking.rs:519
#4  0x0000562ce0ae327a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#5  <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#6  std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:71
#7  0x00007fc152cc2fa3 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007fc152a6b4cf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Thread 3 (LWP 4129):
#0  0x00007fc152a65f59 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000562ce0adc3aa in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:25
#2  std::sys_common::thread_parker::futex::Parker::park ()
    at library/std/src/sys_common/thread_parker/futex.rs:50
#3  std::thread::park () at library/std/src/thread/mod.rs:894
#4  std::sync::mpsc::blocking::WaitToken::wait () at library/std/src/sync/mpsc/blocking.rs:68
#5  0x0000562ce06ed9fa in std::sync::mpsc::sync::Packet<T>::recv () at library/std/src/panicking.rs:519
#6  0x0000562ce06cd442 in std::sync::mpsc::Receiver<T>::recv () at library/std/src/panicking.rs:519
#7  0x0000562ce06d2af0 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    at library/std/src/panicking.rs:519
#8  0x0000562ce06eccc1 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    at library/std/src/panicking.rs:519
#9  0x0000562ce0ae327a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#10 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#11 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:71
#12 0x00007fc152cc2fa3 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fc152a6b4cf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Thread 2 (LWP 4127):
#0  0x00007fc152ace6b3 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000562ce05e4f7d in <&mut W as core::fmt::Write>::write_str () at library/std/src/panicking.rs:519
#2  0x0000562ce0afc314 in core::fmt::write () at library/core/src/fmt/mod.rs:1091
#3  0x0000562ce05cc77e in redis::parser::aio_support::ValueCodec::decode_stream ()
    at library/std/src/panicking.rs:519
#4  0x0000562ce05cca7e in <redis::parser::aio_support::ValueCodec as tokio_util::codec::decoder::Decoder>::decode_eof () at library/std/src/panicking.rs:519
#5  0x0000562ce04e0d04 in <tokio_util::codec::framed_impl::FramedImpl<T,U,R> as futures_core::stream::Stream>::poll_next () at library/std/src/panicking.rs:519
#6  0x0000562ce042d364 in <redis::aio::PipelineSink<T,I,E> as futures_sink::Sink<redis::aio::PipelineMessage<SinkItem,I,E>>>::poll_flush () at library/std/src/panicking.rs:519
#7  0x0000562ce04f9da7 in <futures_util::stream::stream::forward::Forward<St,Si,Item> as core::future::future::Future>::poll () at library/std/src/panicking.rs:519
#8  0x0000562ce05491f0 in <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll () at library/std/src/panicking.rs:519
#9  0x0000562ce043d2d5 in tokio::runtime::task::core::CoreStage<T>::poll ()
    at library/std/src/panicking.rs:519
#10 0x0000562ce0447d57 in tokio::runtime::task::harness::poll_future () at library/std/src/panicking.rs:519
#11 0x0000562ce0449ba2 in tokio::runtime::task::harness::Harness<T,S>::poll ()
    at library/std/src/panicking.rs:519
#12 0x0000562ce0a2a252 in std::thread::local::LocalKey<T>::with () at library/std/src/panicking.rs:519
#13 0x0000562ce0a4b2e2 in tokio::runtime::thread_pool::worker::Context::run_task ()
    at library/std/src/panicking.rs:519
#14 0x0000562ce0a4a5fc in tokio::runtime::thread_pool::worker::Context::run ()
    at library/std/src/panicking.rs:519
#15 0x0000562ce0a2c585 in tokio::macros::scoped_tls::ScopedKey<T>::set ()
    at library/std/src/panicking.rs:519
#16 0x0000562ce0a4a3d2 in tokio::runtime::thread_pool::worker::run () at library/std/src/panicking.rs:519
#17 0x0000562ce0a599fe in tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut ()
    at library/std/src/panicking.rs:519
#18 0x0000562ce0a51e1f in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
    at library/std/src/panicking.rs:519
#19 0x0000562ce0a4c11e in tokio::runtime::task::harness::Harness<T,S>::poll ()
    at library/std/src/panicking.rs:519
#20 0x0000562ce0a4240a in tokio::runtime::blocking::pool::Inner::run () at library/std/src/panicking.rs:519
#21 0x0000562ce0a2915b in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    at library/std/src/panicking.rs:519
#22 0x0000562ce0a331d8 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    at library/std/src/panicking.rs:519
#23 0x0000562ce0ae327a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#24 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#25 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:71
#26 0x00007fc152cc2fa3 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#27 0x00007fc152a6b4cf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Thread 1 (LWP 4080):
#0  0x00007fc152cc900c in pthread_cond_wait@@GLIBC_2.3.2 ()
   from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000562ce0a31078 in tokio::park::thread::Inner::park () at library/std/src/panicking.rs:519
#2  0x0000562ce0a31739 in <tokio::park::thread::CachedParkThread as tokio::park::Park>::park ()
    at library/std/src/panicking.rs:519
#3  0x0000562ce048ff2e in tokio::park::thread::CachedParkThread::block_on ()
    at library/std/src/panicking.rs:519
#4  0x0000562ce04edf34 in tokio::runtime::thread_pool::ThreadPool::block_on ()
    at library/std/src/panicking.rs:519
#5  0x0000562ce0532408 in tokio::runtime::Runtime::block_on () at library/std/src/panicking.rs:519
#6  0x0000562ce04f9846 in storage_server::main () at library/std/src/panicking.rs:519
#7  0x0000562ce04e5623 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    at library/std/src/panicking.rs:519
#8  0x0000562ce04e5f7d in std::rt::lang_start::{{closure}} () at library/std/src/panicking.rs:519
#9  0x0000562ce0ae059a in core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once () at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/ops/function.rs:259
#10 std::panicking::try::do_call () at library/std/src/panicking.rs:379
#11 std::panicking::try () at library/std/src/panicking.rs:343
#12 std::panic::catch_unwind () at library/std/src/panic.rs:431
#13 std::rt::lang_start_internal () at library/std/src/rt.rs:51
#14 0x0000562ce04f9972 in main () at library/std/src/panicking.rs:519

@zenria
Copy link

zenria commented May 31, 2021

Hi, I think I hit this bug too ; but for me it seems to have been introduced when I upgraded from 0.20.0 to 0.20.1

2362 Thread_227141: tokio-runtime-worker
    + 2362 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2034947b]
    +   2362 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff2034d950]
    +     2362 std::sys::unix::thread::Thread::new::thread_start::h4185c11ff3c2c216  (in image-server-rs) + 45  [0x102cf4a8d]
    +       2362 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h15be92413e879d2a  (in image-server-rs) + 177  [0x102ca2441]
    +         2362 std::sys_common::backtrace::__rust_begin_short_backtrace::h2bcbcab97ec94de4  (in image-server-rs) + 273  [0x102c88671]
    +           2362 tokio::runtime::blocking::pool::Inner::run::h54358afa69133ead  (in image-server-rs) + 250  [0x102c7bdba]
    +             2362 tokio::runtime::task::harness::Harness$LT$T$C$S$GT$::poll::h2d0ff4b85d5deba2  (in image-server-rs) + 219  [0x10256da9b]
    +               2362 std::panicking::try::h6f4325a97d2d651e  (in image-server-rs) + 29  [0x10259638d]
    +                 2362 tokio::runtime::task::core::CoreStage$LT$T$GT$::poll::hd0050cc3569ae976  (in image-server-rs) + 63  [0x1025a3d5f]
    +                   2362 tokio::runtime::thread_pool::worker::run::h76568683f5f9c0a7  (in image-server-rs) + 104  [0x102c954f8]
    +                     2362 tokio::macros::scoped_tls::ScopedKey$LT$T$GT$::set::hcb93829e72d8222d  (in image-server-rs) + 67  [0x102c8a713]
    +                       2362 tokio::runtime::thread_pool::worker::Context::run::h45b37643e5377ba0  (in image-server-rs) + 251  [0x102c9572b]
    +                         2362 tokio::runtime::thread_pool::worker::Context::run_task::h9cd3024e63c6739f  (in image-server-rs) + 214  [0x102c96366]
    +                           2362 std::thread::local::LocalKey$LT$T$GT$::with::hd2ef3c91eb61ce9e  (in image-server-rs) + 100  [0x102c769b4]
    +                             2362 tokio::runtime::task::harness::Harness$LT$T$C$S$GT$::poll::hd2f5b2ee001c8e31  (in image-server-rs) + 191  [0x102382e0f]
    +                               2362 tokio::runtime::task::harness::poll_future::hce0f2f3cd41f9026  (in image-server-rs) + 86  [0x102380a46]
    +                                 2362 tokio::runtime::task::core::CoreStage$LT$T$GT$::poll::h59380b10262444aa  (in image-server-rs) + 43  [0x10243633b]
    +                                   2362 _$LT$futures_util..future..future..Map$LT$Fut$C$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::he7fe5b84383c575c  (in image-server-rs) + 31  [0x10239ba4f]
    +                                     2362 _$LT$futures_util..stream..stream..forward..Forward$LT$St$C$Si$C$Item$GT$$u20$as$u20$core..future..future..Future$GT$::poll::he43abdef4845a08e  (in image-server-rs) + 825  [0x102388e49]
    +                                       2201 _$LT$redis..aio..PipelineSink$LT$T$C$I$C$E$GT$$u20$as$u20$futures_sink..Sink$LT$redis..aio..PipelineMessage$LT$SinkItem$C$I$C$E$GT$$GT$$GT$::poll_flush::h30d4bef1caedfc70  (in image-server-rs) + 248  [0x1022dff68]
    +                                       ! 2028 _$LT$tokio_util..codec..framed_impl..FramedImpl$LT$T$C$U$C$R$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::h1207f396589b0b8b  (in image-server-rs) + 470  [0x102388346]
    +                                       ! : 2004 _$LT$redis..parser..aio_support..ValueCodec$u20$as$u20$tokio_util..codec..decoder..Decoder$GT$::decode_eof::h8c0760aa8e506104  (in image-server-rs) + 19  [0x1026f4253]
    +                                       ! : | 1839 redis::parser::aio_support::ValueCodec::decode_stream::hd97f672a44276252  (in image-server-rs) + 79  [0x1026f3def]
    +                                       ! : | + 1742 combine::stream::decode::hb822b5c05a9c8bd4  (in image-server-rs) + 52  [0x102704c04]
    +                                       ! : | + ! 1402 combine::parser::Parser::parse_with_state::h6c0d33685aac6ad6  (in image-server-rs) + 57  [0x1026ebec9]
    +                                       ! : | + ! : 1164 _$LT$combine..parser..combinator..AnySendSyncPartialStateParser$LT$P$GT$$u20$as$u20$combine..parser..Parser$LT$Input$GT$$GT$::parse_mode::h39d9a3eac1ebb935  (in image-server-rs) + 143  [0x1026e9a5f]
    +                                       ! : | + ! : | 411 _$LT$$LP$Y$C$Z$RP$$u20$as$u20$combine..parser..choice..ChoiceParser$LT$Input$GT$$GT$::parse_mode_choice::h2ed5f79b02d37d99  (in image-server-rs) + 1800,1890,...  [0x1026fa728,0x1026fa782,...]
    +                                       ! : | + ! : | 349 _$LT$$LP$Y$C$Z$RP$$u20$as$u20$combine..parser..choice..ChoiceParser$LT$Input$GT$$GT$::parse_mode_choice::h2ed5f79b02d37d99  (in image-server-rs) + 214  [0x1026fa0f6]
    +                                       ! : | + ! : | + 269 _$LT$combine..parser..sequence..ThenPartial$LT$P$C$F$GT$$u20$as$u20$combine..parser..Parser$LT$Input$GT$$GT$::parse_mode_impl::h957c7918125ccb6a  (in image-server-rs) + 146  [0x1026f4852]
    +                                       ! : | + ! : | + ! 200 _rjem_mallocx  (in image-server-rs) + 4,2549,...  [0x1024f4d54,0x1024f5745,...]
    +                                       ! : | + ! : | + ! 31 __rg_alloc  (in image-server-rs) + 23,30,...  [0x102425d87,0x102425d8e,...]
    +                                       ! : | + ! : | + ! 20 __rg_alloc  (in image-server-rs) + 20  [0x102425d84]
    +                                       ! : | + ! : | + ! : 20 jemallocator::layout_to_flags::hec8a536de5f64c30  (in image-server-rs) + 4,24,...  [0x1024f2d74,0x1024f2d88,...]
    +                                       ! : | + ! : | + ! 10 _rjem_mallocx  (in image-server-rs) + 78  [0x1024f4d9e]
    +                                       ! : | + ! : | + ! : 10 pthread_getspecific  (in libsystem_pthread.dylib) + 0,9  [0x7fff203494e5,0x7fff203494ee]
    +                                       ! : | + ! : | + ! 5 __rust_alloc  (in image-server-rs) + 0  [0x10243c340]
    +                                       ! : | + ! : | + ! 3 _rjem_mallocx  (in image-server-rs) + 2729  [0x1024f57f9]
    +                                       ! : | + ! : | + !   3 _rjem_je_tcache_event_hard  (in image-server-rs) + 51,54,...  [0x1025229d3,0x1025229d6,...]
    +                                       ! : | + ! : | + 80 _$LT$combine..parser..sequence..ThenPartial$LT$P$C$F$GT$$u20$as$u20$combine..parser..Parser$LT$Input$GT$$GT$::parse_mode_impl::h957c7918125ccb6a  (in image-server-rs) + 228,216,...  [0x1026f48a4,0x1026f4898,...]
    +                                       ! : | + ! : | 190 _$LT$$LP$Y$C$Z$RP$$u20$as$u20$combine..parser..choice..ChoiceParser$LT$Input$GT$$GT$::parse_mode_choice::h2ed5f79b02d37d99  (in image-server-rs) + 2013  [0x1026fa7fd]
    +                                       ! : | + ! : | + 117 _rjem_sdallocx  (in image-server-rs) + 160,79,...  [0x1024f6e80,0x1024f6e2f,...]
    +                                       ! : | + ! : | + 29 __rg_dealloc  (in image-server-rs) + 0,16,...  [0x102425da0,0x102425db0,...]
    +                                       ! : | + ! : | + 24 __rg_dealloc  (in image-server-rs) + 21  [0x102425db5]
    +                                       ! : | + ! : | + ! 24 jemallocator::layout_to_flags::hec8a536de5f64c30  (in image-server-rs) + 4,14,...  [0x1024f2d74,0x1024f2d7e,...]
    +                                       ! : | + ! : | + 11 _rjem_sdallocx  (in image-server-rs) + 42  [0x1024f6e0a]
    +                                       ! : | + ! : | + ! 11 pthread_getspecific  (in libsystem_pthread.dylib) + 0,9  [0x7fff203494e5,0x7fff203494ee]
    +                                       ! : | + ! : | + 5 __rust_dealloc  (in image-server-rs) + 0  [0x10243c350]
    +                                       ! : | + ! : | + 3 _rjem_sdallocx  (in image-server-rs) + 497  [0x1024f6fd1]
    +                                       ! : | + ! : | + ! 3 _rjem_je_tcache_event_hard  (in image-server-rs) + 0,54,...  [0x1025229a0,0x1025229d6,...]
    +                                       ! : | + ! : | + 1 DYLD-STUB$$pthread_getspecific  (in image-server-rs) + 0  [0x102d47238]
    +                                       ! : | + ! : | 147 _$LT$$LP$Y$C$Z$RP$$u20$as$u20$combine..parser..choice..ChoiceParser$LT$Input$GT$$GT$::parse_mode_choice::h2ed5f79b02d37d99  (in image-server-rs) + 1466  [0x1026fa5da]
    +                                       ! : | + ! : | + 111 _$LT$combine..stream..easy..Error$LT$T$C$R$GT$$u20$as$u20$core..cmp..PartialEq$GT$::eq::hce8a0ef958c14ec7  (in image-server-rs) + 173  [0x10270f57d]
    +                                       ! : | + ! : | + ! 111 _platform_memcmp  (in libsystem_platform.dylib) + 293,288,...  [0x7fff2038fc45,0x7fff2038fc40,...]
    +                                       ! : | + ! : | + 30 _$LT$combine..stream..easy..Error$LT$T$C$R$GT$$u20$as$u20$core..cmp..PartialEq$GT$::eq::hce8a0ef958c14ec7  (in image-server-rs) + 4,18,...  [0x10270f4d4,0x10270f4e2,...]
    +                                       ! : | + ! : | + 6 DYLD-STUB$$memcmp  (in image-server-rs) + 0  [0x102d4713c]
    +                                       ! : | + ! : | 28 _$LT$$LP$Y$C$Z$RP$$u20$as$u20$combine..parser..choice..ChoiceParser$LT$Input$GT$$GT$::parse_mode_choice::h2ed5f79b02d37d99  (in image-server-rs) + 1481  [0x1026fa5e9]
    +                                       ! : | + ! : | + 28 core::ptr::drop_in_place$LT$combine..stream..easy..Error$LT$u8$C$$RF$$u5b$u8$u5d$$GT$$GT$::h259916db9b02961f  (in image-server-rs) + 1,7,...  [0x10270db41,0x10270db47,...]
    +                                       ! : | + ! : | 25 _$LT$$LP$Y$C$Z$RP$$u20$as$u20$combine..parser..choice..ChoiceParser$LT$Input$GT$$GT$::parse_mode_choice::h2ed5f79b02d37d99  (in image-server-rs) + 1970  [0x1026fa7d2]
    +                                       ! : | + ! : | + 25 core::ptr::drop_in_place$LT$combine..stream..easy..Error$LT$u8$C$$RF$$u5b$u8$u5d$$GT$$GT$::h259916db9b02961f  (in image-server-rs) + 1,32,...  [0x1026e5381,0x1026e53a0,...]
    +                                       ! : | + ! : | 14 _$LT$$LP$Y$C$Z$RP$$u20$as$u20$combine..parser..choice..ChoiceParser$LT$Input$GT$$GT$::parse_mode_choice::h2ed5f79b02d37d99  (in image-server-rs) + 1533  [0x1026fa61d]
    +                                       ! : | + ! : |   14 core::ptr::drop_in_place$LT$combine..stream..easy..Error$LT$u8$C$$RF$$u5b$u8$u5d$$GT$$GT$::h259916db9b02961f  (in image-server-rs) + 7,1,...  [0x1026f5ff7,0x1026f5ff1,...]
    +                                       ! : | + ! : 224 _$LT$combine..parser..combinator..AnySendSyncPartialStateParser$LT$P$GT$$u20$as$u20$combine..parser..Parser$LT$Input$GT$$GT$::parse_mode::h39d9a3eac1ebb935  (in image-server-rs) + 277,280,...  [0x1026e9ae5,0x1026e9ae8,...]
    +                                       ! : | + ! : 14 _$LT$combine..parser..combinator..AnySendSyncPartialStateParser$LT$P$GT$$u20$as$u20$combine..parser..Parser$LT$Input$GT$$GT$::parse_mode::h39d9a3eac1ebb935  (in image-server-rs) + 61  [0x1026e9a0d]
    +                                       ! : | + ! :   14 _$LT$T$u20$as$u20$core..any..Any$GT$::type_id::h20b766b7aa757eb4  (in image-server-rs) + 0,4,...  [0x1026f53e0,0x1026f53e4,...]
    +                                       ! : | + ! 340 combine::parser::Parser::parse_with_state::h6c0d33685aac6ad6  (in image-server-rs) + 73,76,...  [0x1026ebed9,0x1026ebedc,...]
    +                                       ! : | + 97 combine::stream::decode::hb822b5c05a9c8bd4  (in image-server-rs) + 1,8,...  [0x102704bd1,0x102704bd8,...]
    +                                       ! : | 129 redis::parser::aio_support::ValueCodec::decode_stream::hd97f672a44276252  (in image-server-rs) + 671,675,...  [0x1026f403f,0x1026f4043,...]
    +                                       ! : | 36 redis::parser::aio_support::ValueCodec::decode_stream::hd97f672a44276252  (in image-server-rs) + 530  [0x1026f3fb2]
    +                                       ! : |   36 bytes::bytes_mut::BytesMut::set_start::ha0fe78e5b5faaec0  (in image-server-rs) + 8,194,...  [0x102cb6698,0x102cb6752,...]
    +                                       ! : 24 _$LT$redis..parser..aio_support..ValueCodec$u20$as$u20$tokio_util..codec..decoder..Decoder$GT$::decode_eof::h8c0760aa8e506104  (in image-server-rs) + 1,6,...  [0x1026f4241,0x1026f4246,...]
    +                                       ! 158 _$LT$tokio_util..codec..framed_impl..FramedImpl$LT$T$C$U$C$R$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::h1207f396589b0b8b  (in image-server-rs) + 462,563,...  [0x10238833e,0x1023883a3,...]
    +                                       ! 15 _$LT$tokio_util..codec..framed_impl..FramedImpl$LT$T$C$U$C$R$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::h1207f396589b0b8b  (in image-server-rs) + 50  [0x1023881a2]
    +                                       !   15 _$LT$tokio_util..codec..framed_impl..RWFrames$u20$as$u20$core..borrow..BorrowMut$LT$tokio_util..codec..framed_impl..ReadFrame$GT$$GT$::borrow_mut::hd315c048eaae78db  (in image-server-rs) + 0,8,...  [0x102c74d40,0x102c74d48,...]
    +                                       85 _$LT$redis..aio..PipelineSink$LT$T$C$I$C$E$GT$$u20$as$u20$futures_sink..Sink$LT$redis..aio..PipelineMessage$LT$SinkItem$C$I$C$E$GT$$GT$$GT$::poll_flush::h30d4bef1caedfc70  (in image-server-rs) + 234  [0x1022dff5a]
    +                                       ! 58 redis::aio::PipelineSink$LT$T$C$I$C$E$GT$::send_result::hb61c1e8dfea601ef  (in image-server-rs) + 6,8,...  [0x1022f0ed6,0x1022f0ed8,...]
    +                                       ! 27 core::ptr::drop_in_place$LT$redis..types..RedisError$GT$::hd9e22f32ac13612f  (in image-server-rs) + 21,100,...  [0x1022e78a5,0x1022e78f4,...]
    +                                       76 _$LT$redis..aio..PipelineSink$LT$T$C$I$C$E$GT$$u20$as$u20$futures_sink..Sink$LT$redis..aio..PipelineMessage$LT$SinkItem$C$I$C$E$GT$$GT$$GT$::poll_flush::h30d4bef1caedfc70  (in image-server-rs) + 255,161,...  [0x1022dff6f,0x1022dff11,...]

Downgrading to 0.20.0 seems to solve the issue for my usage.

@Marwes
Copy link
Collaborator

Marwes commented May 31, 2021

Without a reproduction it is going to be hard to figure out :/

@tdyas
Copy link
Contributor Author

tdyas commented May 31, 2021

What additional observability would help debug this issue?

@Marwes
Copy link
Collaborator

Marwes commented May 31, 2021

Some input that triggers this behavior. There should be some specific input causing this.

@tdyas
Copy link
Contributor Author

tdyas commented May 31, 2021

From the latest backtrace, it looks like decode_eof (or a caller earlier in the call stack) is being reentered. Is that expected?

If not, maybe it would help to adding a counter to detect the recursion and then dump the input that caused it?

@Marwes
Copy link
Collaborator

Marwes commented May 31, 2021

What makes you think it is reentered? That in and of itself is not a problem as the multiplexed connections may need to tell multiple requests that the connection is at EOF (has shutdown)

@tdyas
Copy link
Contributor Author

tdyas commented May 31, 2021

What makes you think it is reentered?

There are two calls to decode_eof in the latest backtrace: #488 (comment). That would suggest that the earlier decode_eof ended up making calls that ended up in decode_eof being called again. Maybe I am misreading the backtrace?

@Marwes
Copy link
Collaborator

Marwes commented May 31, 2021

If not, maybe it would help to adding a counter to detect the recursion and then dump the input that caused it?

If you want to try anything it is easy to setup a git dependency with any patches that you may want. We do not need to do an entire release for things like temporary logging.

@Marwes
Copy link
Collaborator

Marwes commented May 31, 2021

Yeah, I am not really sure how to read that trace :/

@tdyas
Copy link
Contributor Author

tdyas commented Jun 1, 2021

If you want to try anything it is easy to setup a git dependency with any patches that you may want. We do not need to do an entire release for things like temporary logging.

Right. What would be useful would be any suggestions you may have for where to log (assuming you are more familiar with this code than me).

@Marwes
Copy link
Collaborator

Marwes commented Jun 1, 2021

I guess logging when decode_eof is entered? And any other function in the reported call stack which you suspect we could end up looping in.

@Marwes
Copy link
Collaborator

Marwes commented Jun 1, 2021

Are both of you using the ConnectionManager?
Is there nothing logged about this error? (That is, are you sure there are no errors reported for the requests that you are trying to make).

@zenria
Copy link

zenria commented Jun 2, 2021

Are both of you using the ConnectionManager?

Yes, I'm using ConnectionManager

Is there nothing logged about this error? (That is, are you sure there are no errors reported for the requests that you are trying to make).

No requests returns error prior to this issue. When the CPU consumption occurs, the connection to redis is not usable anymore: any requests hangs forever without returning errors.

@enzious
Copy link

enzious commented Jun 13, 2021

I had two clients with redis-rs go into an infinite parser loop at the same time so it's getting triggered externally somehow. I put some debug statements in, so I'll see where that goes.

@Marwes Marwes added the bug label Jun 15, 2021
@ghost
Copy link

ghost commented Jun 16, 2021

I believe I have found the cause for this, still trying to figure where the correct place to apply the fix however.

Marwes pushed a commit to Marwes/redis-rs that referenced this issue Jun 16, 2021
The async decoder would return unexpected end of input errors forever
instead of `None` to indicate that the stream was done. Since the
`MultiplexedConnection` would just drop these errors if there was no
in_flight requests, ending up in an infinite loop inside `poll_read` on
shutdown.

cc redis-rs#488
Marwes pushed a commit to Marwes/redis-rs that referenced this issue Jun 17, 2021
The async decoder would return unexpected end of input errors forever
instead of `None` to indicate that the stream was done. Since the
`MultiplexedConnection` would just drop these errors if there was no
in_flight requests, ending up in an infinite loop inside `poll_read` on
shutdown.

cc redis-rs#488
@Marwes
Copy link
Collaborator

Marwes commented Jun 17, 2021

Released 0.20.2 with a hopeful fix

@tdyas
Copy link
Contributor Author

tdyas commented Jun 17, 2021

Released 0.20.2 with a hopeful fix

Awesome! Thank you!!

@zenria
Copy link

zenria commented Jun 21, 2021

Thank you! I can confirm that 0.20.2 seems to fix the issue (at least for my usage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants