Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BUSY_TRY to EmitFailureHandler? #2883

Closed
He-Pin opened this issue Jan 13, 2022 · 8 comments · Fixed by #2943
Closed

Add BUSY_TRY to EmitFailureHandler? #2883

He-Pin opened this issue Jan 13, 2022 · 8 comments · Fixed by #2943
Labels
for/user-attention This issue needs user attention (feedback, rework, etc...) good first issue Ideal for a new contributor, we'll help help wanted We need contributions on this type/enhancement A general enhancement
Milestone

Comments

@He-Pin
Copy link
Contributor

He-Pin commented Jan 13, 2022

aside with FAIL_FAST.

@simonbasle simonbasle added the ❓need-triage This issue needs triage, hasn't been looked at by a team member yet label Jan 13, 2022
@simonbasle
Copy link
Member

Hi @hepin1989, would you like to either specify the behavior a bit more (I'm assuming you've already coded an adhoc EmitFailureHandler) or even submit a PR ?

I would make that a static factory method rather than a singleton, in order to have some state to avoid infinite looping (either by having a loop counter and a max number of iterations, or a nanoseconds deadline that is checked against the current System.nanoTime() on every x iteration - possibly every single iteration).

@simonbasle simonbasle added for/user-attention This issue needs user attention (feedback, rework, etc...) good first issue Ideal for a new contributor, we'll help help wanted We need contributions on this type/enhancement A general enhancement and removed ❓need-triage This issue needs triage, hasn't been looked at by a team member yet labels Jan 24, 2022
@simonbasle simonbasle added this to the 3.4.x Backlog milestone Jan 24, 2022
@Animesh27
Copy link
Contributor

@simonbasle. I would like to take this issue.

@simonbasle
Copy link
Member

@simonbasle. I would like to take this issue.

nice ! Please start from the 3.4.x branch, as main is currently focused on the upcoming 3.5 version

@Animesh27
Copy link
Contributor

@simonbasle Can you please elaborate the issue ?.

@simonbasle
Copy link
Member

Background and EmitResult to consider

The EmitFailureHandler interface implements a check for the benefit of Sinks emit* methods (as opposed to tryEmit* style of API). These methods internally delegate to their tryEmit* API counterparts, and they have an opinionated way of reacting to the various EmitResult codes it can return to represent a failure.

But first, they invoke the EmitFailureHandler to see if the operation should be retried. The only off-the-shelves implementation simply says "no" to that question (hence failing fast).

Another common approach is to optimistically retry in case of failure, in a loop, assuming the type of failure is transient. This issue is about representing that strategy as a pre-implemented EmitFailureHandler.

Note that some EmitResult should not be considered (return false):

  • any EmitResult that returns true to isSuccess (should never be passed to the handler in the handler anyway)
  • FAIL_CANCELLED: the sink has been cancelled, there is no point in retrying the operation (terminal state)
  • FAIL_COMPLETED: the sink has been closed by a terminal signal (terminal state)

Typically, two states are of particular interest:

  • FAIL_OVERFLOW: not enough downstream request. this is least likely to be transient, so most probably we don't want to consider that one either
  • FAIL_NON_SERIALIZED: this is I think the focus of this issue, denoting competing use of the sink. most likely to be lifted once the other competing thread has finished its operation (optimistically).

Implementing some "busy looping"

In effect, returning true in the EmitFailureHandler has the effect of looping. If we don't implement any thread sleeping or parking, it is busy looping. Returning false exits the loop.

So what we want is to have at least FAIL_NON_SERIALIZED immediately loop again, up to a deadline.

For the deadline, providing a Duration at construction time of the EmitFailureHandler should be good to give it some state dedicated to the given operation. So the handler instance would compute and store a deadline = System.nanoTime() + duration.toNanos() at construction. It would then check if that deadline is passed when given an EmitResult.FAIL_NON_SERIALIZED.

Note I'm using nanoseconds above, but at a minimum there will be construction of the handler, passing to emit method + actual emit attempt before the first check, which will already take tens or hundreds of milliseconds anyway. Not much sense to provide a Duration under 100ms in that sense.
That said, using wall-clock time (System.currentTimeMillis()) is not great in terms of reliability/precision, so we still prefer System.nanoTime().

Here is what a typical usage could look like:

import reactor.core.publisher.Sinks.EmitFailureHandler
//racing operation 1
sink.emitNext("example", EmitFailureHandler.busyLoop(Duration.ofSeconds(2)));
//racing operation 2
sink.emitComplete(EmitFailureHandler.busyLoop(Duration.ofMillis(1900)));

Caveats to document

The caveat with this approach is that busyLoop(...) EmitFailureHandler instances cannot be reused (1 emit* operation, 1 handler).

The time resolution note from above could also probably be hinted at in the javadoc.

simonbasle added a commit that referenced this issue Mar 4, 2022
This commit adds OptimisticEmitFailureHandler, a flavor of handler that
retries on FAIL_NON_SERIALIZED codes up to a deadline (configured via a
timeout Duration provided at construction of each instance).

This is publicly exposed as EmitFailureHandler.busyLoop factory method.

Fixes #2883.

Co-authored-by: Simon Baslé <sbasle@vmware.com>
@ivyazmitinov
Copy link
Contributor

Hello, everyone!
Could you, please confirm that the EmitFailureHandler#busyLooping is a public stable API and can be used in production? I had to google pretty thoroughly to discover this functionality and it is not documented in the reference docs

@simonbasle
Copy link
Member

@ivyazmitinov yes, this is a public method of a public class, which means it is supported. You need to understand the caveats, and if they don't match your need it is not terribly hard to write your own. See also our deprecation policy

@ivyazmitinov
Copy link
Contributor

@simonbasle, great, thanks!
I'll create a PR with documentation for it, if you don't mind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/user-attention This issue needs user attention (feedback, rework, etc...) good first issue Ideal for a new contributor, we'll help help wanted We need contributions on this type/enhancement A general enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants