Retry policies implementation #11

DeepanshuA · 2023-06-16T03:45:35Z

This PR introduces retry policies.

Retry Policies can have following attributes:

First Retry Interval
Max Number Of Attempts
Backoff Coefficient
Max Retry Interval
Retry Timeout

Once this PR is merged, would need to incorporate Docs also for Retry Policy and attach the Design approach taken for it.

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

durabletask/worker.py

cgillum · 2023-06-21T23:20:50Z

durabletask/worker.py

+        self.call_activity_util(id, activity, input, retry_policy)
+        return self._pending_tasks.get(id, task.CompletableTask())
+
+    def call_activity_util(self, id: Optional[int],


Let's call this call_activity_helper. "Util" seems a little unnatural in this context.

Done, merged both helper methods and calling call_activity_function_helper.

cgillum · 2023-06-21T23:22:07Z

durabletask/worker.py

+            if act_task is None:
+                encoded_input = shared.to_json(input) if input else None
+            else:
+                encoded_input = input.__str__()


Why do we do __str__() here? Can you add a comment explaining this?

Added comment:

# Here, we don't need to convert the input to JSON because it is already converted. # We just need to take string representation of it.

But isn't it already a str in these cases?

input here is of type TInput@call_activity_function_helper.
And, below new_schedule_task_action expects encoded_input to be of type str.
I have removed __str__() but rather typecasted to str now.

durabletask/worker.py

cgillum · 2023-06-21T23:25:02Z

durabletask/worker.py

+                                        retry_policy=retry_policy)
+        return self._pending_tasks.get(id, task.CompletableTask())
+
+    def call_sub_orchestrator_util(self, id: Optional[int],


Prefer call_sub_orchestrator_helper.

Merged both helper methods and calling call_activity_function_helper.

cgillum · 2023-06-21T23:28:45Z

durabletask/worker.py

+                if timer_task._retryable_parent is not None and timer_task._retryable_type is not None:
+                    activity_action = timer_task._retryable_parent._action
+
+                    if timer_task._retryable_type == "activity":


I'm wondering if there's a way to simplify the design so that we don't need the _retryable_type field. It feels a bit fragile. I don't have any specific suggestions right now, but I'd like us to consider it.

Have tried to modify the logic, to look more standarized. Using an enum and setting it as a part of RetryableTask.

Modified to use a boolean, instead of enum.

durabletask/worker.py

tests/test_orchestration_e2e.py

tests/test_orchestration_executor.py

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

cgillum

One thing I'm concerned about with this PR is whether it correctly handles cases where retry policies are mixed with when_all and when_any use cases. Can we add some test cases that cover these scenarios?

durabletask/task.py

cgillum · 2023-07-17T20:44:02Z

durabletask/worker.py

+            if act_task is None:
+                encoded_input = shared.to_json(input) if input else None
+            else:
+                encoded_input = input.__str__()


But isn't it already a str in these cases?

…able activity Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

DeepanshuA · 2023-08-16T16:03:50Z

One thing I'm concerned about with this PR is whether it correctly handles cases where retry policies are mixed with when_all and when_any use cases. Can we add some test cases that cover these scenarios?

now, I have added tests for when_any and when_all with retryable task.

durabletask/task.py

cgillum · 2023-09-22T04:14:33Z

durabletask/task.py

+        self._is_sub_orch = is_sub_orch
+
+    def try_completion(self) -> bool:
+        if self._retry_count >= self._retry_policy.max_number_of_attempts - 1:


Instead of tracking the number of retries, would it be more intuitive to track the number of attempts since we're comparing to max_number_of_attempts?

How is number of attempts different than number of retries? Number of attempts is retries - 1.
Or is it a nomenclature suggestion?

I have updated it to be attempt_count and updated the logic accordingly.

durabletask/task.py

durabletask/worker.py

tests/test_orchestration_e2e.py

tests/test_orchestration_executor.py

…etry-policies-impl

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

DeepanshuA · 2023-10-19T14:46:49Z

@cgillum Please review this PR. It has been a quite stretched PR time-wise.
I have incorporated review comments.

cgillum

Changes are looking good. Just a few fairly minor things.

cgillum · 2023-10-19T16:19:08Z

durabletask/worker.py

+                        next_delay = sub_orch_task.compute_next_delay()
+                        if next_delay == None:
+                            sub_orch_task.fail(
+                                f"Sub-orchestration task #{task_id} failed: {failedEvent.failureDetails.errorMessage}",


One potential improvement we could add here is including the number of retries in this error message.

This is logged when it finally fails after Either 1. All retries have exhausted Or 2. Retry timed out.
This is not actually logged with each retry.

I can add such logging at each attempt count but would need some new definition to have this understanding that what attempt number is it And, if retry has timed out, etc.
Do you think that is required at each retry/attempt?

I'm not suggesting we log on each retry (though we could consider this too). I'm simply suggesting that the error message could include the number of retries that were attempted. For example,

Suggested change

f"Sub-orchestration task #{task_id} failed: {failedEvent.failureDetails.errorMessage}",

f"Sub-orchestration task #{task_id} failed after {attempt_count} attempt(s): {failedEvent.failureDetails.errorMessage}",

tests/test_orchestration_executor.py

cgillum · 2023-10-19T16:26:06Z

tests/test_orchestration_executor.py

+                                            retry_timeout=timedelta(seconds=50)),
+                               input="Tokyo")
+        t2 = ctx.call_activity(dummy_activity, input="Seattle")
+        winner = yield task.when_all([t1, t2])


This orchestration logic doesn't make sense. The result of when_all is not a "winner". Rather, it's a list containing the results of both t1 and t2. This means taht your if/else check below will always execute the else block no matter what. Please rewrite this orchestration so that it correctly uses the task.when_all API.

Yeah, updated to return as per when_all

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

cgillum and others added 5 commits May 26, 2023 05:17

Starter code for retry policies

f336612

Add retry policy impl

43c8cc1

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

Add retryable type to timer

65e833e

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

Remove first_attempts global variable

13bde88

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

Fix test

d11b6a5

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

cgillum reviewed Jun 21, 2023

View reviewed changes

DeepanshuA added 5 commits July 11, 2023 17:58

Incorporate review comments AND enum type for task type

d3cbeb5

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

Remove unused

cb6064d

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

Use boolean instead of enum

a697ffa

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

UT, E2E

84f2907

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

Update comment

84afd30

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

DeepanshuA requested a review from cgillum July 14, 2023 14:21

Count sub orch run only when it is not replayed - E2E

cad6e6d

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

cgillum reviewed Jul 17, 2023

View reviewed changes

Address review comments and add UTs for when_any, when_all with retry…

ee38eb7

…able activity Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

DeepanshuA requested a review from cgillum August 16, 2023 16:03

cgillum reviewed Sep 22, 2023

View reviewed changes

DeepanshuA added 3 commits October 17, 2023 23:10

Merge branch 'main' of github.com:microsoft/durabletask-python into r…

7c24672

…etry-policies-impl

Incorporate review comments

41569d9

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

remove unrequired def

8e78369

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

DeepanshuA requested a review from cgillum October 17, 2023 20:51

cgillum reviewed Oct 19, 2023

View reviewed changes

Incorporate review comments in test

192aa29

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

DeepanshuA requested a review from cgillum October 20, 2023 14:46

cgillum approved these changes Oct 23, 2023

View reviewed changes

Update CHANGELOG.md

8bf61cb

cgillum merged commit 80de6c2 into microsoft:main Oct 26, 2023

shubham1172 mentioned this pull request Oct 30, 2023

[Workflow] Adding metrics for Dapr Workflow dapr/dapr#7109

Closed

	f"Sub-orchestration task #{task_id} failed: {failedEvent.failureDetails.errorMessage}",
	f"Sub-orchestration task #{task_id} failed after {attempt_count} attempt(s): {failedEvent.failureDetails.errorMessage}",

Retry policies implementation #11

Retry policies implementation #11

Uh oh!

Conversation

DeepanshuA commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DeepanshuA commented Aug 16, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DeepanshuA Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DeepanshuA commented Oct 19, 2023

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DeepanshuA commented Jun 16, 2023 •

edited

Loading

DeepanshuA Oct 12, 2023 •

edited

Loading