Skip to content

Commit

Permalink
feat: expose retry count (#524)
Browse files Browse the repository at this point in the history
* feat: expose retry count

* feat: expose retry count go

* docs: accessing retry count

* fix: import

* fix: tests

* fix: docs formatting

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
  • Loading branch information
3 people committed Jun 7, 2024
1 parent f4be542 commit e09ee98
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 2 deletions.
3 changes: 3 additions & 0 deletions api-contracts/dispatcher/dispatcher.proto
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ message AssignedAction {

// the step name
string stepName = 12;

// the count number of the retry attempt
int32 retryCount = 13;
}

message WorkerListenRequest {
Expand Down
44 changes: 44 additions & 0 deletions frontend/docs/pages/home/features/retries/simple.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import { Callout, Card, Cards, Steps, Tabs } from "nextra/components";

# Retry Strategies in Hatchet: Simple Step Retry

Hatchet provides a simple and effective way to handle failures in your workflow steps using the step-level retry configuration. This feature allows you to specify the number of times a step should be retried if it fails, helping to improve the reliability and resilience of your workflows.
Expand Down Expand Up @@ -39,6 +41,48 @@ It's important to note that step-level retries are not suitable for all types of

Additionally, if a step interacts with external services or databases, you should ensure that the operation is idempotent (i.e., can be safely repeated without changing the result) before enabling retries. Otherwise, retrying the step could lead to unintended side effects or inconsistencies in your data.

## Accessing the Retry Count in a Step

If you need to access the current retry count within a step, you can use the `retryCount` method available in the step context:

<Tabs items={['Python', 'Typescript', 'Go']}>
<Tabs.Tab>

```python
@hatchet.step(timeout='2s', retries=3)
def step1(self, context: Context):
retry_count = context.retry_count()
print(f"Retry count: {retry_count}")
raise Exception("Step failed")
```

</Tabs.Tab>
<Tabs.Tab>

```typescript
async function step(context: Context) {
const retryCount = context.retryCount();
console.log(`Retry count: ${retryCount}`);
throw new Error("Step failed");
}
```

</Tabs.Tab>
<Tabs.Tab>

```go
func(ctx worker.HatchetContext) (result *stepOneOutput, err error) {
count := ctx.RetryCount()

return &stepOneOutput{
Message: "Count is: " + strconv.Itoa(count),
}, nil
}
```

</Tabs.Tab>
</Tabs>

## Conclusion

Hatchet's step-level retry feature is a simple and effective way to handle transient failures in your workflow steps, improving the reliability and resilience of your workflows. By specifying the number of retries for each step, you can ensure that your workflows can recover from temporary issues without requiring complex error handling logic.
Expand Down
15 changes: 13 additions & 2 deletions internal/services/dispatcher/contracts/dispatcher.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions internal/services/dispatcher/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ func (worker *subscribedWorker) StartStepRun(
ActionPayload: string(inputBytes),
StepName: stepName,
WorkflowRunId: sqlchelpers.UUIDToStr(stepRun.WorkflowRunId),
RetryCount: stepRun.StepRun.RetryCount,
})
}

Expand Down Expand Up @@ -105,6 +106,7 @@ func (worker *subscribedWorker) CancelStepRun(
ActionType: contracts.ActionType_CANCEL_STEP_RUN,
StepName: stepRun.StepReadableId.String,
WorkflowRunId: sqlchelpers.UUIDToStr(stepRun.WorkflowRunId),
RetryCount: stepRun.StepRun.RetryCount,
})
}

Expand Down
4 changes: 4 additions & 0 deletions pkg/client/dispatcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ type Action struct {

// the action type
ActionType ActionType

// the count of the retry attempt
RetryCount int32
}

type WorkerActionListener interface {
Expand Down Expand Up @@ -332,6 +335,7 @@ func (a *actionListenerImpl) Actions(ctx context.Context) (<-chan *Action, error
ActionId: assignedAction.ActionId,
ActionType: actionType,
ActionPayload: []byte(unquoted),
RetryCount: assignedAction.RetryCount,
}
}
}()
Expand Down
6 changes: 6 additions & 0 deletions pkg/worker/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ type HatchetContext interface {

RefreshTimeout(incrementTimeoutBy string) error

RetryCount() int

client() client.Client

action() *client.Action
Expand Down Expand Up @@ -195,6 +197,10 @@ func (h *hatchetContext) StreamEvent(message []byte) {
}
}

func (h *hatchetContext) RetryCount() int {
return int(h.a.RetryCount)
}

func (h *hatchetContext) index() int {
return h.i
}
Expand Down
4 changes: 4 additions & 0 deletions pkg/worker/middleware_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ func (c *testHatchetContext) StreamEvent(message []byte) {
panic("not implemented")
}

func (c *testHatchetContext) RetryCount() int {
panic("not implemented")
}

func (c *testHatchetContext) action() *client.Action {
panic("not implemented")
}
Expand Down

0 comments on commit e09ee98

Please sign in to comment.