Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop due to thread safety of MpscUnboundedArrayQueue.poll #947

Closed
xiazuojie opened this issue Oct 23, 2020 · 7 comments · Fixed by #990
Closed

Infinite loop due to thread safety of MpscUnboundedArrayQueue.poll #947

xiazuojie opened this issue Oct 23, 2020 · 7 comments · Fixed by #990
Labels
bug superseded Issue is superseded by another

Comments

@xiazuojie
Copy link
Contributor

xiazuojie commented Oct 23, 2020

As its name (Multi Producer Single Consumer) indicates, MpscUnboundedArrayQueue is not thread-safe when polling. If multiple threads are polling from it, those threads may end up in infinite loop here:

                do
                {
                    e = lvElement(buffer, offset);
                }
                while (e == null);

UnboundedProcessor is internally using MpscUnboundedArrayQueue. When a connection needs to be terminated (due to no keep-alive acks or other reasons), more than 1 threads will try to poll from its MpscUnboundedArrayQueue.

As shown below,

  • "reactor-tcp-nio-7" is the "normal" working thread polling from the queue.
  • "parallel-7" is the thread responsible for terminating a connection due to "no keep-alive acks". It tries to poll all elements from the queue in order to release their memory.
public void clear() {
    while (!queue.isEmpty()) {
      T t = queue.poll();
      if (t != null) {
        release(t);
      }
    }
  }
"reactor-tcp-nio-7@16412" daemon prio=5 tid=0x250 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at io.rsocket.internal.jctools.queues.BaseMpscLinkedArrayQueue.poll(BaseMpscLinkedArrayQueue.java:256)
      at io.rsocket.internal.UnboundedProcessor.poll(UnboundedProcessor.java:330)
      at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.poll(FluxMapFuseable.java:174)
      at reactor.netty.channel.MonoSendMany$SendManyInner.run(MonoSendMany.java:264)
      at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.lang.Thread.run(Thread.java:745)

"parallel-7@16290" daemon prio=5 tid=0x258 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at io.rsocket.internal.jctools.queues.BaseMpscLinkedArrayQueue.poll(BaseMpscLinkedArrayQueue.java:256)
      at io.rsocket.internal.UnboundedProcessor.clear(UnboundedProcessor.java:346)
      at io.rsocket.internal.UnboundedProcessor.cancel(UnboundedProcessor.java:317)
      at io.rsocket.internal.UnboundedProcessor.dispose(UnboundedProcessor.java:364)
      at io.rsocket.RSocketRequester.terminate(RSocketRequester.java:594)
      at io.rsocket.RSocketRequester.tryTerminate(RSocketRequester.java:559)
      at io.rsocket.RSocketRequester.tryTerminateOnKeepAlive(RSocketRequester.java:541)
      at io.rsocket.RSocketRequester$$Lambda$904.1238285436.accept(Unknown Source:-1)
      at io.rsocket.keepalive.KeepAliveSupport.tryTimeout(KeepAliveSupport.java:110)
      at io.rsocket.keepalive.KeepAliveSupport$ClientKeepAliveSupport.onIntervalTick(KeepAliveSupport.java:146)
      at io.rsocket.keepalive.KeepAliveSupport.lambda$start$0(KeepAliveSupport.java:54)
      at io.rsocket.keepalive.KeepAliveSupport$$Lambda$906.2130835366.accept(Unknown Source:-1)
      at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
      at reactor.core.publisher.FluxInterval$IntervalRunnable.run(FluxInterval.java:123)
      at reactor.core.scheduler.PeriodicWorkerTask.call(PeriodicWorkerTask.java:59)
      at reactor.core.scheduler.PeriodicWorkerTask.run(PeriodicWorkerTask.java:73)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

Expected Behavior

Thread-safe. No infinite loop.

Actual Behavior

Threads sometimes end up in infinite loop when terminating a connection. It has happened 3 times in production for me.

Steps to Reproduce

Demo of infinite loop when multiple threads try to poll from MpscUnboundedArrayQueue.

import io.rsocket.internal.jctools.queues.MpscUnboundedArrayQueue
import org.junit.Test
import reactor.util.concurrent.Queues

class MpscUnboundedArrayQueueTest {
    @Test
    fun test() {
        val a = MpscUnboundedArrayQueue<String>(Queues.SMALL_BUFFER_SIZE)

        for(x in 0 until 4) {
            val t = Thread(
                Runnable {
                    while (true) {
                        a.offer(java.time.LocalTime.now().toString())
                        Thread.sleep(10)
                    }
                }, "p$x"
            )
            t.isDaemon = true
            t.start()
        }
        for(x in 0 until 4) {
            Thread(
                Runnable {
                    while (true) {
                        println(java.time.LocalTime.now().toString() + ": " + Thread.currentThread().toString() + ": " + a.poll())
                    }
                }, "c$x"
            ).start()
        }
        while (true) {
            Thread.sleep(10000)
        }
    }
}

Possible Solution

Replace MpscUnboundedArrayQueue with MPMC (Multi Producer Multi Consumer) queue.

Your Environment

  • RSocket version(s) used: 1.0.0-RC6
  • Other relevant libraries versions (eg. netty, ...):
  • Platform (eg. JVM version (javar -version) or Node version (node --version)): Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
  • OS and version (eg uname -a):Darwin MacBook-Pro.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
@OlegDokuka OlegDokuka added the bug label Oct 23, 2020
@OlegDokuka OlegDokuka modified the milestone: 1.1.0 Oct 23, 2020
@OlegDokuka OlegDokuka removed the bug label Oct 23, 2020
@OlegDokuka
Copy link
Member

OlegDokuka commented Oct 23, 2020

@xiazuojie I just noticed that you report that issue for 1.0.0-RC6, can you please check that this one is still an issue for 1.0.2 and 1.1.0-SNAPSHOT? At least 1.1.0 should not have this problem anymore

@xiazuojie
Copy link
Contributor Author

xiazuojie commented Oct 23, 2020

@xiazuojie I just noticed that you report that issue for 1.0.0-RC6, can you please check that this one is still an issue for 1.0.2 and 1.1.0-SNAPSHOT?

it's still an issue in the latest master branch by checking the code. I cannot reproduce the issue reliably due to the complexity of multi-thread problem.

@OlegDokuka
Copy link
Member

OlegDokuka commented Oct 23, 2020

I mean we still use a single consumer queue, but in the latest implementation, it should not be possible to run in that case when you dispose an UnboundedProcessor and poll elements at the same time which is the successor to MpScUnboundedQueue.

@xiazuojie
Copy link
Contributor Author

I mean we still use a single consumer queue, but in the latest implementation, it should not be possible to run in that case when you dispose an UnboundedProcessor and poll elements at the same time which is the successor to MpScUnboundedQueue.

Let me give it a check. I'll report back later

@OlegDokuka
Copy link
Member

Although the mentioned tests reproduce the issue, it does not represent how we use MpscQueue inside rsocket. Indeed, we had this issue of racing dispose() which causing queue.clear() with queue.poll but that should be addressed in 1.1 for sure.

@OlegDokuka
Copy link
Member

@xiazuojie do you have any updates on whether that issue reproduces with rsocket 1.1.0?

@OlegDokuka
Copy link
Member

@xiazuojie should be fixed in 1.0.4! Let me know if it works for you as well

@OlegDokuka OlegDokuka added superseded Issue is superseded by another and removed needs information labels Feb 26, 2021
@OlegDokuka OlegDokuka removed this from the 1.0.4 milestone Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug superseded Issue is superseded by another
Projects
None yet
2 participants