Orchestration ID reuse policies #42

cgillum · 2023-10-28T22:43:37Z

This idea comes from dapr/dapr#7101, as it relates to reusing existing workflow IDs.

When scheduling the creation of a new orchestration, the following options should be available:

TerminateIfExists - Terminates any existing orchestrations with the same instance ID and then schedules a new instance as one atomic action, similar to on-demand ContinueAsNew.
SkipIfExists - If there is an existing orchestration already scheduled, then the scheduler does nothing.
ThrowIfExists - If there is an existing orchestration then the scheduler throws an exception (this is the current behavior).

As an optional extension of this, we can consider exposing an "Execution ID" or "Generation ID" property to the orchestration which can help distinguish different instantiations of the given instance ID.

The text was updated successfully, but these errors were encountered:

clintsinger · 2023-11-08T17:50:19Z

My vote for having the extra ID to be able to identify different runs of the workflow in logs.

olitomlinson · 2023-11-23T22:00:01Z

@cgillum

Would it be reasonable to add a hint of : SKIP_IF_RUNNING_OR_COMPLETE_OR_PAUSED

An example, scenarios where workflows are being started via queue with at-least-once guarantees (which is quite common) it would be useful to have this hint because :

It would prevent a workflow being prematurely terminated & rescheduled if it was currently in a running state.
- Restarting a running workflow may be wasteful and incur unwanted / unanticipated side-effects when repeated from the start.
It would prevent a workflow being rescheduled if it was in a complete state
- Supporting idempotent design - Some business processes should never be repeated once they've ran to completion
It would prevent a workflow being rescheduled if it was in a paused state
- Pausing a workflow would be a deliberate human action. Therefor automatically restarting it may cause undesirable side-effects.
However, it would allow a workflow in a failed state to be rescheduled, which could be desirable.

Unless I'm mistaken, It's my understanding that none of the 3 new reuse policies would accommodate the above scenario?

clintsinger · 2023-11-23T22:33:28Z

The running state is exactly the one I want to interrupt for the scenario that I originally proposed. I agree that once it is complete then rescheduling is just a matter of starting a new instance.

The original definition of TerminateIfExists didn't care if it was just scheduled or already running. The workflow would be terminated. SkipIfExists wouldn't care either way either.

olitomlinson · 2023-11-23T22:46:12Z

Yes this is why I'm proposing an additional policy, so that it doesn't interfere with the needs of your use-case (which I think is TERMINATE_IF_EXISTS)

clintsinger · 2023-11-23T22:47:34Z

So you are suggesting that one combines, for example, TERMINATE_IF_EXISTS and SKIP_IF_RUNNING_OR_COMPLETE?

As a way to differentiate what to do if it is just scheduled vs already running?

clintsinger · 2023-11-23T22:52:58Z

I am assuming this doesn't have much meaning for SKIP_IF_EXISTS and SKIP_IF_RUNNING_OR_COMPLETE since it would already be skipped from the first policy.

olitomlinson · 2023-11-24T14:37:06Z

This is how I interpret the current proposal :

Start = start/restart the orchestration
No-op = leave the orchestration in its current state (do not start/restart)

	Doesn't Exist	Running	Completed	Failed	Paused
`TERMINATE_IF_EXISTS`	Start	Start	Start	Start	Start
`THROW_IF_EXISTS`	Start	No-op	No-op	No-op	No-op
`SKIP_IF_EXISTS`	Start	No-op	No-op	No-op	No-op

Assuming my interpretation above is correct, I would like to see the following policy added SKIP_IF_RUNNING_OR_COMPLETE_OR_PAUSED

	Doesn't Exist	Running	Completed	Failed	Paused
`SKIP_IF_RUNNING_OR_COMPLETE_OR_PAUSED`	Start	No-op	No-op	Start	No-op

kaibocai · 2023-11-28T02:30:56Z

Thanks @olitomlinson for the explanation on SKIP_IF_RUNNING_OR_COMPLETE_OR_PAUSED I think it makes sense to support idempotent design. @cgillum what do you think about this OPTION?

cgillum · 2023-11-28T19:33:41Z

Catching up on this discussion. If I understand the proposal from @olitomlinson, the problem you're trying to solve is that you want to only reuse the orchestration ID if an existing instance is in a failed state. I think that makes sense. Would it just be "failed", or could it also include "Terminated" (and perhaps some future "Canceled" state)?

clintsinger · 2023-11-28T20:43:36Z

Wouldn't it be easier just to say RESTART_IF_FAILED (or RESCHEDULE_IF_FAILED)?

It appears to only be a rule for when the workflow is in a failed state.

olitomlinson · 2023-11-28T20:50:25Z

@cgillum I agree, it should be SKIP_IF_RUNNING_OR_COMPLETE_OR_PAUSED_OR_TERMINATED - I would bucket paused and terminate together as they are both deliberate human actions, which shouldn't be automatically restarted.

Although I think going down this path is going to lead to a trying to support a huge set of highly specific, Policies which isn't sustainable.

Suggestion

Divorce the ACTION (Terminate, Skip, Throw) from the applicable STATE/S, such that a user can dial in the exact requirement.

TERMINATE_IF_EXISTS would be expressed as...

WorkflowOptions { 
    ReusePolicy = new ReusePolicy(action: Action.Terminate, states: new[]{ State.All })
}

SKIP_IF_RUNNING_OR_COMPLETE_OR_PAUSED_OR_TERMINATED would be expressed as...

WorkflowOptions { 
    ReusePolicy  = new ReusePolicy(action: Action.Skip, states: new[]{ State.Running, State.Complete, State.Paused, State.Terminated})
}

clintsinger · 2023-11-28T21:37:51Z

I'd recommend using enum flags for the states field.

cgillum · 2023-11-28T23:42:04Z

I'm in agreement with the more fine-grained policy. It introduces more conceptual complexity, IMO, but it's more flexible, precise, and better aligned with existing APIs in the .NET version of the Durable Task Framework.

olitomlinson · 2023-11-29T10:13:24Z

I'd recommend using enum flags for the states field.

I've never used Enum Flags so no first hand experience, but they look great for this use-case!

kaibocai · 2023-11-30T15:14:40Z

When rethinking this proposal, I found there are some cases that are not quite clear. For ex:

ReusePolicy  = new ReusePolicy(action: Action.Skip, states: new[]{ State.Pending})

For this option, we are creating a new instance with action SKIP and target status Pending, which means it will try to create a new instance, but if it finds there is an instance in Pending status then it will skip creating the new instance. What if we have an instance in Running status, should we skip creating the new instance or should we terminate the running instance and create the new one?

Summarize the confusion in below pic

@clintsinger @cgillum @olitomlinson

olitomlinson · 2023-11-30T17:15:14Z

First thoughts, the first two seem straight forward :

Instance exists -> skip -> status not in target status set = terminate existing instance and create new one
Instance exists -> throw -> status not in target status set = terminate existing instance and create new one

However I'm stuck on this one...

Instance exists -> terminate -> status not in target status set = ?????

clintsinger · 2023-11-30T17:25:18Z

If we think of the second condition as a "when/where" clause then perhaps we can simplify things a bit and just eliminate the "skip" condition since that could be the default for the other two if the predicate isn't matched.

Or some variation on that.

clintsinger · 2023-11-30T17:36:15Z

The "variation on that" could be that we keep the "throw" as the default path as it is today.

cgillum · 2023-11-30T18:19:27Z

Does this pseudocode seem like the right behavior?

ok = not exists

if exists:
  if skip and status_matches:
    # log a warning
    ok = True
  if terminate and status_matches:
    # terminate the instance
    ok = True

if not ok:
   # throw

if not skip:
  # create the instance

kaibocai · 2023-11-30T20:41:28Z

I think it should be

if not exists:
  # create instance
else:
  ok = false
  if skip and status_matches:
    # log a warning
    ok = True
  if terminate and status_matches:
    # terminate the instance
    ok = True
  
  if not ok:
    # throw
  if not skip:
    # create instance

The reason is that when customer create a instance using
ctx.createOrhcestration(instanceID, action.SKIP, [status.Pending, status.Runnning])
I interpret as they still want to create an instance if there is no instance exists.
So that is to say, if no instance exists, the behavior of createOrhcestration should always be creating a new instance no matter what options used here.

Another thing to point out for the above pseudocode (Chris's as well) is that if the customer chooses the action THROW then no matter what statuses they set, it will always Throw if an instance already exists. I just want to make sure this is what we want.

@cgillum

cgillum · 2023-11-30T22:03:53Z

Yeah, it looks like my skip logic wasn't quite right, but I suppose that's what unit tests are for. :)

And I see your point about throw. Since it's the default behavior anyways, using it with status filters is meaningless. Does that suggest we could get by without an explicit throw action, since that's the default behavior anyways? It seems we only need to opt-into skip and terminate.

kaibocai self-assigned this Nov 17, 2023

This was referenced Nov 21, 2023

Support orchestration id reuse policies microsoft/durabletask-protobuf#19

Merged

Support Orchestration ID Reuse Policies #45

Closed

kaibocai mentioned this issue Nov 29, 2023

Support reusing orchestration id #46

Merged

This was referenced Dec 15, 2023

Support orchestration id reuse policy dapr/dapr#7308

Merged

Orchestration ID reuse policies #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orchestration ID reuse policies #42

Orchestration ID reuse policies #42

cgillum commented Oct 28, 2023

clintsinger commented Nov 8, 2023

olitomlinson commented Nov 23, 2023 •

edited

clintsinger commented Nov 23, 2023 •

edited

olitomlinson commented Nov 23, 2023 •

edited

clintsinger commented Nov 23, 2023 •

edited

clintsinger commented Nov 23, 2023

olitomlinson commented Nov 24, 2023 •

edited

kaibocai commented Nov 28, 2023

cgillum commented Nov 28, 2023

clintsinger commented Nov 28, 2023

olitomlinson commented Nov 28, 2023

clintsinger commented Nov 28, 2023

cgillum commented Nov 28, 2023

olitomlinson commented Nov 29, 2023

kaibocai commented Nov 30, 2023 •

edited

olitomlinson commented Nov 30, 2023

clintsinger commented Nov 30, 2023

clintsinger commented Nov 30, 2023

cgillum commented Nov 30, 2023

kaibocai commented Nov 30, 2023

cgillum commented Nov 30, 2023

Orchestration ID reuse policies #42

Orchestration ID reuse policies #42

Comments

cgillum commented Oct 28, 2023

clintsinger commented Nov 8, 2023

olitomlinson commented Nov 23, 2023 • edited

clintsinger commented Nov 23, 2023 • edited

olitomlinson commented Nov 23, 2023 • edited

clintsinger commented Nov 23, 2023 • edited

clintsinger commented Nov 23, 2023

olitomlinson commented Nov 24, 2023 • edited

kaibocai commented Nov 28, 2023

cgillum commented Nov 28, 2023

clintsinger commented Nov 28, 2023

olitomlinson commented Nov 28, 2023

clintsinger commented Nov 28, 2023

cgillum commented Nov 28, 2023

olitomlinson commented Nov 29, 2023

kaibocai commented Nov 30, 2023 • edited

olitomlinson commented Nov 30, 2023

clintsinger commented Nov 30, 2023

clintsinger commented Nov 30, 2023

cgillum commented Nov 30, 2023

kaibocai commented Nov 30, 2023

cgillum commented Nov 30, 2023

olitomlinson commented Nov 23, 2023 •

edited

clintsinger commented Nov 23, 2023 •

edited

olitomlinson commented Nov 23, 2023 •

edited

clintsinger commented Nov 23, 2023 •

edited

olitomlinson commented Nov 24, 2023 •

edited

kaibocai commented Nov 30, 2023 •

edited