Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread.interrupt() cause message loss and hang for ChannelRpcTimeout(default 10mins) or forever #718

Open
MrLiuzy opened this issue Dec 10, 2021 · 5 comments

Comments

@MrLiuzy
Copy link

MrLiuzy commented Dec 10, 2021

To avoid thread execution for too long, we interrupt the thread after the time limit is reached. Recently, We find that occasionally messages get lost. We use TX mode , and txCommit() got an ChannelContinuationTimeoutException. I found the following problem according to the warning log.

if thread interrupted before rpc(m,k) then rpc will not be sent to the server. And k.getReply(..) will hang for _rpcTimeout or forever.

client version is 5.8.0

com.rabbitmq.client.impl.AMQChannel

private AMQCommand privateRpc(Method m)
        throws IOException, ShutdownSignalException
    {
        SimpleBlockingRpcContinuation k = new SimpleBlockingRpcContinuation(m);
        rpc(m, k);
        // At this point, the request method has been sent, and we
        // should wait for the reply to arrive.
        //
        // Calling getReply() on the continuation puts us to sleep
        // until the connection's reader-thread throws the reply over
        // the fence or the RPC times out (if enabled)
        if(_rpcTimeout == NO_RPC_TIMEOUT) {
            return k.getReply();
        } else {
            try {
                return k.getReply(_rpcTimeout);
            } catch (TimeoutException e) {
                throw wrapTimeoutException(m, e);
            }
        }
    }
@michaelklishin
Copy link
Member

So what is the suggested change to make this method more interrupt-safe?

@acogoluegnes
Copy link
Contributor

I don't remember off the top of my head why the getReply method ignores interruption. We could check before calling getReply if the thread has been interrupted, but this is actually a significant change in the contract, I don't really measure the impacts since this behavior has been there forever and people can do/expect pretty much anything when calling this code.

Since the current contact is currently to ignore interruption, I don't really see why breaking it, especially if the code has a configurable timeout and so a foreseeable behavior.

@MrLiuzy
Copy link
Author

MrLiuzy commented Dec 11, 2021

Hi, michaelklishin and acogoluegnes. Thanks for your replies.

Excuse me, my English is poor. May not be well @@described.

In my case, what I want say is that if we discard a message we can tell k.getReply(..) there is no respone to wait,because we don't send the rpc request to the rabbitmq server. So if a message was be discarded, I think that wake up k.getReply(..) immediately and let k.getReply(..) throw a special exception may be better for users. Or,notify the user that the message was discarded, throught a Listener

I think ChannelContinuationTimeoutException has at least tow kinds of meaning.
One is rpc request was send successfully, and server received, but the reply was not received by the client because of the network. In this case, rpc success actually.
Two is rpc request was be discarded, we had to wait until it timed out.Because there is no response from server. In this case, rpc failed.
@michaelklishin @acogoluegnes

@acogoluegnes
Copy link
Contributor

The semantics here are debatable, but we could try to do an effort if we know the response won't come. The problem is to know that the request has been dropped, and this is more complicated than it sounds.

Could you please provide a test case that reproduces the problem you face?

@michaelklishin Do you know why BlockingRpcContinuation#getReply() is "uninterruptible" (ignores thread interruption)?

@MrLiuzy
Copy link
Author

MrLiuzy commented Dec 15, 2021

@acogoluegnes
Hi acogoluegnes, you can reproduce the problem in the following steps:
① Set a breakpoint at withTx--channel.txCommit();
② Debug publish junit test
③ Wait until the thread suspends at the breakpoint, then input any char except 'q' through the console to interrupt the thread
④ Continue the program from the breakpoint, you will get the problem. And a warn log will print, like 'WARN com.rabbitmq.client.impl.nio.SocketChannelFrameHandlerState - Thread interrupted during enqueuing frame in write queue'

        @Test
	public void publish() throws IOException, TimeoutException, InterruptedException {
		ConnectionFactory factory = createFactory();
		
		final Connection connection = factory.newConnection();
		boolean isTx = true;
		int msgCount = 1;
		BasicProperties properties = new BasicProperties();
		AtomicInteger succCount = new AtomicInteger();
		AtomicInteger failCount = new AtomicInteger();
		int threadCount = 1;
		Thread thread = null;
		for(int n=0;n<threadCount;n++) {
			thread = new Thread(()->{
				try {
					if(isTx) {
						withTx(connection, msgCount, properties, succCount, failCount);
					}else {
						withNonTx(connection, msgCount, properties, succCount, failCount);
					}
				} catch (IOException e) {
				}
			});
			thread.start();
		}
		try(Scanner scanner = new Scanner(System.in);){
			while(!"q".equals(scanner.nextLine())) {
				thread.interrupt();
			}
		}
		
		System.out.println(String.format("success count %s ; fail count%s", succCount.get(), failCount.get()));
	}

	private void withTx(final Connection connection, int msgCount, BasicProperties properties, AtomicInteger succCount, AtomicInteger failCount)
			throws IOException {
		Channel channel = connection.createChannel();
		channel.txSelect();
		byte[] bytes = "".getBytes();
		long beginTime = System.currentTimeMillis();
		for(int i=0;i<msgCount;i++) {
			try {
				channel.basicPublish("", "LzyTest", properties, bytes);
				channel.txCommit();//① set a breakpoint at here
				succCount.incrementAndGet();
			} catch (IOException e) {
				failCount.incrementAndGet();
				e.printStackTrace();
			}
		}
		long times = System.currentTimeMillis() - beginTime;
		System.out.println(String.format("cost: %s ms avg: %s ms", times, times*1.0 / msgCount));
	}
	
	private void withNonTx(final Connection connection, int msgCount, BasicProperties properties, AtomicInteger succCount, AtomicInteger failCount)
			throws IOException {
		Channel channel = connection.openChannel().get();
		byte[] bytes = "".getBytes();
		long beginTime = System.currentTimeMillis();
		for(int i=0;i<msgCount;i++) {
			try {
				channel.basicPublish("", "LzyTest", properties, bytes);
				succCount.incrementAndGet();
			} catch (IOException e) {
				failCount.incrementAndGet();
			}
		}
		long times = System.currentTimeMillis() - beginTime;
		System.out.println(String.format("cost: %s ms avg: %s ms", times, times*1.0 / msgCount));
	}

	private ConnectionFactory createFactory() {
		ConnectionFactory factory = new ConnectionFactory();
		factory.setHost("ip");
		factory.setPort(5672);
		factory.setUsername("username");
		factory.setPassword("password");
		factory.setConnectionTimeout(100000);
		factory.setAutomaticRecoveryEnabled(true);
		factory.setNetworkRecoveryInterval(10 * 60 * 1000);
		factory.setChannelRpcTimeout(30_000);
		factory.useNio();
		factory.setNioParams(new NioParams().setNbIoThreads(4));
		return factory;
	}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants