fix: harden retry logic - typed error checking and injectable RetryConfig#66
fix: harden retry logic - typed error checking and injectable RetryConfig#66
Conversation
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughRefactors Gemini integration to add configurable retry fields to Embedder and LLMClient, initializes them with DefaultRetryConfig, and replaces string-based retry detection with typed error inspection for REST (googleapi.Error) and gRPC (status codes). Tests updated to use structured errors. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Gemini as LLMClient/Embedder
participant Retry as withRetry
participant External as Google API / gRPC
Client->>Gemini: Request (embed/generate)
Gemini->>Retry: invoke with retryConfig
Retry->>External: call API
alt success
External-->>Retry: response
Retry-->>Gemini: return result
Gemini-->>Client: final response
else retryable error (429/5xx or gRPC codes)
External-->>Retry: retryable error
Retry-->>Retry: backoff + jitter (respect context)
Retry->>External: retry call
else non-retryable error
External-->>Retry: non-retryable error
Retry-->>Gemini: propagate error
Gemini-->>Client: error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Replace fragile strings.Contains matching with errors.As(*googleapi.Error) for HTTP status codes and gRPC status.FromError for RPC codes. This prevents false positives (e.g. "Internal" matching unrelated messages) and makes retry logic correct by construction. Update retry_test.go to use real *googleapi.Error and gRPC status instances instead of plain errors.New strings, so tests exercise the actual code path. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Store retryConfig as a field on both structs, set to DefaultRetryConfig() in their constructors. generateWithRetry and Embed now use the struct field instead of calling DefaultRetryConfig() inline, allowing tests to inject fast configs (e.g. 1ms delays) without waiting real wall-clock time. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
9a23f0f to
b4a71a6
Compare
Simili Triage ReportNote Quality Score: 9.2/10 (Excellent) Classification
Quality Improvements
Similar Threads
Generated by Simili Bot |
Simili Triage ReportNote Quality Score: 9.5/10 (Excellent) Classification
Quality Improvements
Similar Threads
Generated by Simili Bot |
There was a problem hiding this comment.
Pull request overview
This PR hardens Gemini retry behavior by switching transient-error detection from string matching to typed error inspection and by moving retry configuration onto client structs so it can be overridden.
Changes:
- Updated
isRetryableErrorto useerrors.As(*googleapi.Error)for HTTP codes and gRPC status codes for RPC errors. - Added
retryConfigfields toLLMClientandEmbedder, initialized fromDefaultRetryConfig(), and used bygenerateWithRetry/Embed. - Updated retry unit tests to use typed
*googleapi.Errorandstatus.New(...).Err()errors.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| internal/integrations/gemini/retry.go | Replaces string matching with typed REST/gRPC error classification for retryability. |
| internal/integrations/gemini/retry_test.go | Updates tests to exercise typed error paths (googleapi + gRPC status). |
| internal/integrations/gemini/llm.go | Stores retry config on LLMClient and uses it in generateWithRetry. |
| internal/integrations/gemini/embedder.go | Stores retry config on Embedder and uses it in Embed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // gRPC transport: generative-ai-go can return gRPC status errors. | ||
| if st, ok := status.FromError(err); ok { | ||
| switch st.Code() { | ||
| case codes.ResourceExhausted, codes.Unavailable, codes.Internal: | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
status.FromError(err) does not unwrap %w-wrapped errors, so retryable gRPC status errors can be missed once they’ve been wrapped upstream (e.g., fmt.Errorf("...: %w", err) in Embed). Consider using errors.As with an interface like interface{ GRPCStatus() *status.Status } (or iteratively unwrapping) so gRPC codes are detected through wrappers.
| // NewLLMClient creates a new Gemini LLM client. | ||
| func NewLLMClient(apiKey, model string) (*LLMClient, error) { | ||
| ctx := context.Background() | ||
| client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey)) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create Gemini client: %w", err) | ||
| } | ||
|
|
||
| if model == "" { | ||
| model = "gemini-2.5-flash" // Fast and cost-effective | ||
| } | ||
|
|
||
| return &LLMClient{ | ||
| client: client, | ||
| model: model, | ||
| client: client, | ||
| model: model, | ||
| retryConfig: DefaultRetryConfig(), | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
The PR description says RetryConfig is injectable, but retryConfig is unexported and NewLLMClient doesn’t provide any way for callers outside the gemini package to set it. To make this truly injectable, consider adding a constructor overload/option (e.g., NewLLMClientWithRetryConfig or functional options) or an exported setter.
| @@ -33,8 +34,9 @@ func NewEmbedder(apiKey, model string) (*Embedder, error) { | |||
| } | |||
|
|
|||
| return &Embedder{ | |||
| client: client, | |||
| model: model, | |||
| client: client, | |||
| model: model, | |||
| retryConfig: DefaultRetryConfig(), | |||
| }, nil | |||
| } | |||
There was a problem hiding this comment.
The PR description says RetryConfig is injectable, but retryConfig is unexported and NewEmbedder doesn’t provide any way for callers outside the gemini package to set it. Consider adding a constructor overload/option or an exported setter so integration tests (or other packages) can inject a fast retry config without relying on package-internal access.
| {"nil error", nil, false}, | ||
| {"rate limit 429", errors.New("googleapi: Error 429: Resource exhausted"), true}, | ||
| {"ResourceExhausted gRPC", errors.New("rpc error: code = ResourceExhausted"), true}, | ||
| {"server error 500", errors.New("googleapi: Error 500: Internal Server Error"), true}, | ||
| {"bad gateway 502", errors.New("HTTP 502: Bad Gateway"), true}, | ||
| {"unavailable 503", errors.New("googleapi: Error 503: Service Unavailable"), true}, | ||
| {"gateway timeout 504", errors.New("HTTP 504: Gateway Timeout"), true}, | ||
| {"Unavailable keyword", errors.New("rpc error: code = Unavailable"), true}, | ||
| {"Internal keyword", errors.New("rpc error: code = Internal"), true}, | ||
| {"client error 400", errors.New("googleapi: Error 400: Bad Request"), false}, | ||
| {"forbidden 403", errors.New("googleapi: Error 403: Forbidden"), false}, | ||
| {"not found 404", errors.New("googleapi: Error 404: Not Found"), false}, | ||
| {"rate limit 429", &googleapi.Error{Code: 429, Message: "Resource exhausted"}, true}, | ||
| {"ResourceExhausted gRPC", status.New(codes.ResourceExhausted, "resource exhausted").Err(), true}, | ||
| {"server error 500", &googleapi.Error{Code: 500, Message: "Internal Server Error"}, true}, | ||
| {"bad gateway 502", &googleapi.Error{Code: 502, Message: "Bad Gateway"}, true}, | ||
| {"unavailable 503", &googleapi.Error{Code: 503, Message: "Service Unavailable"}, true}, | ||
| {"gateway timeout 504", &googleapi.Error{Code: 504, Message: "Gateway Timeout"}, true}, | ||
| {"Unavailable gRPC", status.New(codes.Unavailable, "service unavailable").Err(), true}, | ||
| {"Internal gRPC", status.New(codes.Internal, "internal error").Err(), true}, | ||
| {"client error 400", &googleapi.Error{Code: 400, Message: "Bad Request"}, false}, | ||
| {"forbidden 403", &googleapi.Error{Code: 403, Message: "Forbidden"}, false}, | ||
| {"not found 404", &googleapi.Error{Code: 404, Message: "Not Found"}, false}, |
There was a problem hiding this comment.
There’s no test case that covers a retryable gRPC status error being wrapped (e.g., fmt.Errorf("wrap: %w", status.New(...).Err())). Since production code may wrap errors, adding such a case would ensure isRetryableError continues to work end-to-end with wrapped gRPC errors.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
internal/integrations/gemini/llm.go (1)
105-122: Consider exposing a way to customize RetryConfig for external callers.The
retryConfigfield is unexported, which allows same-package tests to modify it directly. However, if external integrations need custom retry behavior (e.g., shorter delays for integration tests), they currently have no API to do so.Consider adding either:
- A functional option pattern:
NewLLMClient(apiKey, model, WithRetryConfig(cfg))- A setter method:
SetRetryConfig(cfg RetryConfig)This is optional if external customization isn't a current requirement.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/integrations/gemini/llm.go` around lines 105 - 122, The NewLLMClient currently hardcodes retryConfig via DefaultRetryConfig and does not expose a way for callers to customize it; add a public API to set custom retry behavior by either implementing a functional option (e.g., add a WithRetryConfig(retryConfig RetryConfig) option consumed by NewLLMClient(...opts ...LLMOption)) or by adding a setter on LLMClient (e.g., SetRetryConfig(cfg RetryConfig)); ensure the implementation updates the LLMClient.retryConfig field and that NewLLMClient still defaults to DefaultRetryConfig when no option or setter is used so existing behavior remains unchanged.internal/integrations/gemini/retry.go (1)
53-67: Consider addingcodes.DeadlineExceededto retryable gRPC codes.The current implementation handles
ResourceExhausted,Unavailable, andInternal. However,DeadlineExceededis also a transient error that can occur during temporary network slowdowns and is commonly retried.💡 Optional: Add DeadlineExceeded to retryable codes
if st, ok := status.FromError(err); ok { switch st.Code() { - case codes.ResourceExhausted, codes.Unavailable, codes.Internal: + case codes.ResourceExhausted, codes.Unavailable, codes.Internal, codes.DeadlineExceeded: return true } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/integrations/gemini/retry.go` around lines 53 - 67, The retry logic currently treats gRPC status codes ResourceExhausted, Unavailable, and Internal as retryable; add codes.DeadlineExceeded to the switch in the status.FromError(err) block so DeadlineExceeded is also considered transient and returns true (update the switch that checks st.Code() alongside cases codes.ResourceExhausted, codes.Unavailable, codes.Internal to include codes.DeadlineExceeded).internal/integrations/gemini/retry_test.go (1)
19-49: Consider adding test cases for wrapped errors.The current tests cover direct
googleapi.Errorand gRPC status errors, buterrors.Asis designed to unwrap nested errors. Adding a test case for wrapped errors would confirm thatisRetryableErrorcorrectly handles errors wrapped viafmt.Errorf("...: %w", err).💡 Optional: Add wrapped error test case
{"wrapped 429", fmt.Errorf("API call failed: %w", &googleapi.Error{Code: 429, Message: "Rate limited"}), true}, {"wrapped gRPC", fmt.Errorf("RPC failed: %w", status.New(codes.Unavailable, "unavailable").Err()), true},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/integrations/gemini/retry_test.go` around lines 19 - 49, Test suite TestIsRetryableError lacks cases where retryable errors are wrapped; update the table of tests in TestIsRetryableError to include wrapped variants using fmt.Errorf with %w that wrap a googleapi.Error (e.g. wrapped 429) and a gRPC status error (e.g. wrapped Unavailable) so that isRetryableError's use of errors.As is exercised for nested errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/integrations/gemini/llm.go`:
- Around line 105-122: The NewLLMClient currently hardcodes retryConfig via
DefaultRetryConfig and does not expose a way for callers to customize it; add a
public API to set custom retry behavior by either implementing a functional
option (e.g., add a WithRetryConfig(retryConfig RetryConfig) option consumed by
NewLLMClient(...opts ...LLMOption)) or by adding a setter on LLMClient (e.g.,
SetRetryConfig(cfg RetryConfig)); ensure the implementation updates the
LLMClient.retryConfig field and that NewLLMClient still defaults to
DefaultRetryConfig when no option or setter is used so existing behavior remains
unchanged.
In `@internal/integrations/gemini/retry_test.go`:
- Around line 19-49: Test suite TestIsRetryableError lacks cases where retryable
errors are wrapped; update the table of tests in TestIsRetryableError to include
wrapped variants using fmt.Errorf with %w that wrap a googleapi.Error (e.g.
wrapped 429) and a gRPC status error (e.g. wrapped Unavailable) so that
isRetryableError's use of errors.As is exercised for nested errors.
In `@internal/integrations/gemini/retry.go`:
- Around line 53-67: The retry logic currently treats gRPC status codes
ResourceExhausted, Unavailable, and Internal as retryable; add
codes.DeadlineExceeded to the switch in the status.FromError(err) block so
DeadlineExceeded is also considered transient and returns true (update the
switch that checks st.Code() alongside cases codes.ResourceExhausted,
codes.Unavailable, codes.Internal to include codes.DeadlineExceeded).
status.FromError does not unwrap %w-wrapped errors, so retryable gRPC status errors (e.g. ResourceExhausted) could be missed when wrapped by an upstream fmt.Errorf call. Switch to errors.As with the GRPCStatus() interface, which correctly traverses the error chain. Add two new test cases: wrapped retryable (ResourceExhausted) and wrapped non-retryable (NotFound) to cover this code path. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Simili Triage ReportNote Quality Score: 9.5/10 (Excellent) Classification
Quality Improvements
Similar Threads
Generated by Simili Bot |
Summary
Follow-up to #63 addressing two open items flagged during review.
isRetryableError: replacesstrings.Containsmatching witherrors.As(*googleapi.Error)for HTTP status codes andstatus.FromError+ gRPC codes for RPC errors. Eliminates false positives (e.g. the word"Internal"accidentally matching unrelated error messages).RetryConfigonLLMClientandEmbedder: both structs now hold aretryConfigfield initialised toDefaultRetryConfig()in their constructors.generateWithRetryandEmbeduse the struct field instead of callingDefaultRetryConfig()inline, allowing integration-level tests to inject fast configs without waiting on real wall-clock delays.retry_test.go: all test cases now construct real*googleapi.Errorandstatus.New(codes.X).Err()typed errors so the tests exercise the actual code path end-to-end.Test plan
go test ./internal/integrations/gemini/...)go build ./...)isRetryableErrorcorrectly retries HTTP 429/5xx and gRPC ResourceExhausted/Unavailable/InternalisRetryableErrordoes not retry HTTP 400/403/404 or generic errorsLLMClientandEmbeddercan be constructed with a customRetryConfigfor fast test executionSummary by CodeRabbit