-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix another NullPointerException
in FixedStreamMessage
#4667
Conversation
a `NullPointerException` caused due to a race condition between `collect()` and `abort() was fixed in line#4652. Howerver, we got another reoprt from Slack community. https://line-armeria.slack.com/archives/C1NGPBUH2/p1675994120153789 ``` 2023-02-09T02:08:55,526 [armeria-common-worker-epoll-3-3] WARN com.linecorp.armeria.internal.common.stream.FixedStreamMessage - Subscriber.onError() should not raise an exception. subscriber: null com.linecorp.armeria.common.util.CompositeException: 2 exceptions occurred. at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError0(FixedStreamMessage.java:247) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError(FixedStreamMessage.java:237) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort1(FixedStreamMessage.java:342) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort0(FixedStreamMessage.java:328) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort(FixedStreamMessage.java:308) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.stream.OneElementFixedStreamMessage.abort(OneElementFixedStreamMessage.java:112) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:287) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:264) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.grpc.AbstractServerCall.doClose(AbstractServerCall.java:239) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:222) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:217) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.grpc.FramedGrpcService.lambda$startCall$4(FramedGrpcService.java:318) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?] at com.linecorp.armeria.common.util.UnmodifiableFuture.doComplete(UnmodifiableFuture.java:164) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.CancellationScheduler$CancellationFuture.doComplete(CancellationScheduler.java:521) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.CancellationScheduler.invokeTask(CancellationScheduler.java:477) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.CancellationScheduler.finishNow0(CancellationScheduler.java:322) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.common.CancellationScheduler.finishNow(CancellationScheduler.java:306) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.internal.server.DefaultServiceRequestContext.cancel(DefaultServiceRequestContext.java:327) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.StreamingDecodedHttpRequest.abortResponse(StreamingDecodedHttpRequest.java:181) ~[armeria-1.21.1-SNAPSHOT.jar:?] at com.linecorp.armeria.server.Http2RequestDecoder.onRstStreamRead(Http2RequestDecoder.java:356) ~[armeria-1.21.1-SNAPSHOT.jar:?] at io.netty.handler.codec.http2.Http2FrameListenerDecorator.onRstStreamRead(Http2FrameListenerDecorator.java:59) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final] at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:442) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final] at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final] at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:509) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final] ... Caused by: com.linecorp.armeria.common.util.CompositeException$ExceptionOverview: Multiple exceptions (2) |-- java.lang.NullPointerException: Cannot invoke "org.reactivestreams.Subscriber.onError(java.lang.Throwable)" because "this.subscriber" is null at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError0(FixedStreamMessage.java:242) at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError(FixedStreamMessage.java:237) at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort1(FixedStreamMessage.java:342) at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort0(FixedStreamMessage.java:328) at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort(FixedStreamMessage.java:308) at com.linecorp.armeria.internal.common.stream.OneElementFixedStreamMessage.abort(OneElementFixedStreamMessage.java:112) at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:287) at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:264) at com.linecorp.armeria.server.grpc.AbstractServerCall.doClose(AbstractServerCall.java:239) at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:222) at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:217) at com.linecorp.armeria.server.grpc.FramedGrpcService.lambda$startCall$4(FramedGrpcService.java:318) ... ``` Modifications: - Check if a stream is aborted while `subscribe0()` or `collect()` is the pending queue of an event executor. - If it is aborted, abort the subscriber or the collection future. - Check if a stream is subscribed while `abort1()` is in the pending queue of an event executor. - If it is subscribed, delegate the subscribe0() to signal abortCause via onError(). - Test possible race conditions by switching the execution order of in an event executor. Result: You no longer see a `NullPointerException` when a stream is aborted.
NullPointerException
on FixedStreamMessage
NullPointerException
in FixedStreamMessage
Thank you @ikhoon . I have run some tests from my side using this branch and have not been able to reproduce this issue anymore. It looks like it is working for me. |
It's a relief. Thanks for testing this fix. 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a super nit comment. Thanks for the fix @ikhoon 👍 🙇 🚀
@@ -336,10 +359,19 @@ private void abort1(Throwable cause, boolean subscribed) { | |||
if (completed) { | |||
return; | |||
} | |||
completed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to be handled in this PR:
nit; I think it would be easier to reason about if this path was only called from the event loop.
I understand that functionally there is no difference when subscribed = false
since completed
isn't used and the targets to clean up is null anyways.
Having said this, I'm wondering if it's just easier to set completionFuture
when the executor is set during abortion:
if (executorUpdater.compareAndSet(this, null, ImmediateEventExecutor.INSTANCE)) {
// No subscription was made.
completionFuture.completeExceptionally(finalCause);
} else {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said this, I'm wondering if it's just easier to set completionFuture when the executor is set during abortion:
Did you think of inlining the code? For example:
if (executorUpdater.compareAndSet(this, null, ImmediateEventExecutor.INSTANCE)) {
cleanupObjects(cause);
completionFuture.completeExceptionally(cause);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you think of inlining the code? For example:
I missed that we need to still call cleanupObjects
, but yes I think this is what I thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting completed
to true could be useful to prevent double abortion although the result without completed
is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prevent double abortion
I see, I missed this intention. I think it's fine to leave as-is then 👍
if (executor.inEventLoop()) { | ||
if (executor == ImmediateEventExecutor.INSTANCE) { | ||
// Double abortion | ||
abort1(finalCause, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also consider the case that a user subscribes with ImmediateEventExecutor.INSTANCE
?
How about preventing double abortion by setting the abortCause
only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing this! 😄
A
NullPointerException
caused due to a race condition betweencollect()
andabort()
was fixed in #4652. Howerver, we got another reoprt from Slack community. https://line-armeria.slack.com/archives/C1NGPBUH2/p1675994120153789Modifications:
subscribe0()
orcollect()
is the pending queue of an event executor.abort1()
is in the pending queue of an event executor.Result:
You no longer see a
NullPointerException
when a stream is aborted.