sub orchestrator retries by famarting · Pull Request #84 · microsoft/durabletask-go

famarting · 2024-10-29T17:36:24Z

second part for dapr/go-sdk#541

adds support for call suborchestrator with retries

based on the existing implementation for activities

famarting · 2024-11-04T15:57:39Z

ping @cgillum

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

famarting · 2024-11-20T13:06:33Z

ping

cgillum · 2024-11-20T19:21:13Z

@famarting looks like your new tests are not passing. The CI is reporting timeouts.

famarting · 2024-11-22T11:42:06Z

@cgillum tests are failing because of this error

ERROR: 2024/11/20 19:09:57 orchestration-processor: failed to complete work item: orchestration instance already exists

It makes me think that the strategy I followed to implement sub orchestrator retries is not a valid strategy? do you have any idea or suggestion how to address this?

First I implemented activity retries and I implemented a recursive algorithm that using timers scheduled the activity introducing delays between tries... however this doesnt seem to work OOB for sub orchestrations due to the "already exists" error. Additionally, looking at the python sdk, seems like retries are not implemented using a recursive algorithm, instead extra logic has been added into process event to account for delays and retries.... am I forced to follow the same strategy as python? or what options do we have?

cgillum · 2024-11-22T18:50:46Z

My first guess is that the sqlite backend implementation doesn't allow reusing the existing sub-orchestration ID, so it's returning ErrDuplicateInstance from the call to CompleteOrchestrationWorkItem.

Your test might pass if you forgo providing a specific instance ID and allow the sub-orchestration to use a random instance ID. However, if you want the scenario of reusing a user-specific instance ID to work, then we'll need to change sqlite to handle the case where an existing instance already exists such that we delete the old (failed) instance and create the new instance. Note that the Dapr Actors backend might require similar changes.

…tor on retry Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

famarting · 2024-11-26T09:44:03Z

My first guess is that the sqlite backend implementation doesn't allow reusing the existing sub-orchestration ID, so it's returning ErrDuplicateInstance from the call to CompleteOrchestrationWorkItem.

Your test might pass if you forgo providing a specific instance ID and allow the sub-orchestration to use a random instance ID. However, if you want the scenario of reusing a user-specific instance ID to work, then we'll need to change sqlite to handle the case where an existing instance already exists such that we delete the old (failed) instance and create the new instance. Note that the Dapr Actors backend might require similar changes.

ok, it was much simpler than I thought, thank you

cgillum · 2024-11-27T14:33:24Z

@famarting looks like the new test is still failing due to a mismatch in expected OTel span outputs:

 Error Trace:	/home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:172
        	        /home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:144
        	        /home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:125
        	        /home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:30
        	        /home/runner/work/durabletask-go/durabletask-go/tests/orchestrations_test.go:466
 Error:      	[]attribute.KeyValue{attribute.KeyValue{Key:"durabletask.type", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"orchestration", slice:interface {}(nil)}}, attribute.KeyValue{Key:"durabletask.task.name", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"Child", slice:interface {}(nil)}}, attribute.KeyValue{Key:"durabletask.task.instance_id", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"8e5d728a-cce1-4b74-8928-d5df2a648de3_child", slice:interface {}(nil)}}, attribute.KeyValue{Key:"durabletask.runtime_status", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"FAILED", slice:interface {}(nil)}}} does not contain attribute.KeyValue{Key:"durabletask.task.task_id", Value:attribute.Value{vtype:2, numeric:0x0, stringly:"", slice:interface {}(nil)}}
 Test:       	Test_SingleSubOrchestrator_Failed_Retries

It looks like you're attempting to assert a particular task ID for the sub-orchestrations, but there aren't any such task ID attributes getting generated for sub-orchestrations. I don't see similar asserts for task IDs in the other sub-orchestration test (suggesting that they're not expected to exist), so I wonder if you should just remove the task ID asserts you added for the sub-orchestrations.

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

famarting · 2024-11-27T16:07:37Z

followed your feedback, thank you

cgillum

Looks good! Everything's passing now.

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

sub orchestrator retries

e8dee2b

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

famarting mentioned this pull request Nov 20, 2024

sub orchestrator retries dapr/durabletask-go#1

Merged

famarting added 2 commits November 26, 2024 10:40

fix sub orchestrator retries, terminate existing failed sub orchestra…

12b2f21

…tor on retry Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

update changelog

7572160

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

famarting force-pushed the sub-orchestrator-retries branch from 404799e to 7572160 Compare November 26, 2024 09:42

fix test

2a3b837

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

cgillum approved these changes Nov 27, 2024

View reviewed changes

cgillum merged commit 6ab94e1 into microsoft:main Nov 27, 2024

JoshVanL referenced this pull request in JoshVanL/durabletask-go Apr 4, 2025

Sub-orchestration retries (dapr#84)

c6bfdd9

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sub orchestrator retries#84

sub orchestrator retries#84
cgillum merged 4 commits intomicrosoft:mainfrom
famarting:sub-orchestrator-retries

famarting commented Oct 29, 2024

Uh oh!

famarting commented Nov 4, 2024

Uh oh!

famarting commented Nov 20, 2024

Uh oh!

cgillum commented Nov 20, 2024

Uh oh!

famarting commented Nov 22, 2024

Uh oh!

cgillum commented Nov 22, 2024

Uh oh!

famarting commented Nov 26, 2024

Uh oh!

cgillum commented Nov 27, 2024 •

edited

Loading

Uh oh!

famarting commented Nov 27, 2024

Uh oh!

cgillum left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

famarting commented Oct 29, 2024

Uh oh!

famarting commented Nov 4, 2024

Uh oh!

famarting commented Nov 20, 2024

Uh oh!

cgillum commented Nov 20, 2024

Uh oh!

famarting commented Nov 22, 2024

Uh oh!

cgillum commented Nov 22, 2024

Uh oh!

famarting commented Nov 26, 2024

Uh oh!

cgillum commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

famarting commented Nov 27, 2024

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cgillum commented Nov 27, 2024 •

edited

Loading