Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose retry count #524

Merged
merged 8 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions api-contracts/dispatcher/dispatcher.proto
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ message AssignedAction {

// the step name
string stepName = 12;

// the count number of the retry attempt
int32 retryCount = 13;
}

message WorkerListenRequest {
Expand Down
44 changes: 44 additions & 0 deletions frontend/docs/pages/home/features/retries/simple.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import { Callout, Card, Cards, Steps, Tabs } from "nextra/components";

# Retry Strategies in Hatchet: Simple Step Retry

Hatchet provides a simple and effective way to handle failures in your workflow steps using the step-level retry configuration. This feature allows you to specify the number of times a step should be retried if it fails, helping to improve the reliability and resilience of your workflows.
Expand Down Expand Up @@ -39,6 +41,48 @@ It's important to note that step-level retries are not suitable for all types of

Additionally, if a step interacts with external services or databases, you should ensure that the operation is idempotent (i.e., can be safely repeated without changing the result) before enabling retries. Otherwise, retrying the step could lead to unintended side effects or inconsistencies in your data.

## Accessing the Retry Count in a Step

If you need to access the current retry count within a step, you can use the `retryCount` method available in the step context:

<Tabs items={['Python', 'Typescript', 'Go']}>
<Tabs.Tab>

```python
@hatchet.step(timeout='2s', retries=3)
def step1(self, context: Context):
retry_count = context.retry_count()
print(f"Retry count: {retry_count}")
raise Exception("Step failed")
```

</Tabs.Tab>
<Tabs.Tab>

```typescript
async function step(context: Context) {
const retryCount = context.retryCount();
console.log(`Retry count: ${retryCount}`);
throw new Error("Step failed");
}
```

</Tabs.Tab>
<Tabs.Tab>

```go
func(ctx worker.HatchetContext) (result *stepOneOutput, err error) {
count := ctx.RetryCount()

return &stepOneOutput{
Message: "Count is: " + strconv.Itoa(count),
}, nil
}
```

</Tabs.Tab>
</Tabs>

## Conclusion

Hatchet's step-level retry feature is a simple and effective way to handle transient failures in your workflow steps, improving the reliability and resilience of your workflows. By specifying the number of retries for each step, you can ensure that your workflows can recover from temporary issues without requiring complex error handling logic.
Expand Down
15 changes: 13 additions & 2 deletions internal/services/dispatcher/contracts/dispatcher.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions internal/services/dispatcher/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ func (worker *subscribedWorker) StartStepRun(
ActionPayload: string(inputBytes),
StepName: stepName,
WorkflowRunId: sqlchelpers.UUIDToStr(stepRun.WorkflowRunId),
RetryCount: stepRun.StepRun.RetryCount,
})
}

Expand Down Expand Up @@ -105,6 +106,7 @@ func (worker *subscribedWorker) CancelStepRun(
ActionType: contracts.ActionType_CANCEL_STEP_RUN,
StepName: stepRun.StepReadableId.String,
WorkflowRunId: sqlchelpers.UUIDToStr(stepRun.WorkflowRunId),
RetryCount: stepRun.StepRun.RetryCount,
})
}

Expand Down
4 changes: 4 additions & 0 deletions pkg/client/dispatcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ type Action struct {

// the action type
ActionType ActionType

// the count of the retry attempt
RetryCount int32
}

type WorkerActionListener interface {
Expand Down Expand Up @@ -332,6 +335,7 @@ func (a *actionListenerImpl) Actions(ctx context.Context) (<-chan *Action, error
ActionId: assignedAction.ActionId,
ActionType: actionType,
ActionPayload: []byte(unquoted),
RetryCount: assignedAction.RetryCount,
}
}
}()
Expand Down
6 changes: 6 additions & 0 deletions pkg/worker/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ type HatchetContext interface {

RefreshTimeout(incrementTimeoutBy string) error

RetryCount() int

client() client.Client

action() *client.Action
Expand Down Expand Up @@ -195,6 +197,10 @@ func (h *hatchetContext) StreamEvent(message []byte) {
}
}

func (h *hatchetContext) RetryCount() int {
return int(h.a.RetryCount)
}

func (h *hatchetContext) index() int {
return h.i
}
Expand Down
4 changes: 4 additions & 0 deletions pkg/worker/middleware_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ func (c *testHatchetContext) StreamEvent(message []byte) {
panic("not implemented")
}

func (c *testHatchetContext) RetryCount() int {
panic("not implemented")
}

func (c *testHatchetContext) action() *client.Action {
panic("not implemented")
}
Expand Down