fix: harden retry logic - typed error checking and injectable RetryConfig by Kavirubc · Pull Request #66 · similigh/simili-bot

Kavirubc · 2026-02-17T04:30:06Z

Summary

Follow-up to #63 addressing two open items flagged during review.

Typed error checking in isRetryableError: replaces strings.Contains matching with errors.As(*googleapi.Error) for HTTP status codes and status.FromError + gRPC codes for RPC errors. Eliminates false positives (e.g. the word "Internal" accidentally matching unrelated error messages).
Injectable RetryConfig on LLMClient and Embedder: both structs now hold a retryConfig field initialised to DefaultRetryConfig() in their constructors. generateWithRetry and Embed use the struct field instead of calling DefaultRetryConfig() inline, allowing integration-level tests to inject fast configs without waiting on real wall-clock delays.
Updated retry_test.go: all test cases now construct real *googleapi.Error and status.New(codes.X).Err() typed errors so the tests exercise the actual code path end-to-end.

Test plan

All 19 existing retry unit tests pass (go test ./internal/integrations/gemini/...)
Build passes with no new warnings (go build ./...)
isRetryableError correctly retries HTTP 429/5xx and gRPC ResourceExhausted/Unavailable/Internal
isRetryableError does not retry HTTP 400/403/404 or generic errors
LLMClient and Embedder can be constructed with a custom RetryConfig for fast test execution

Summary by CodeRabbit

Refactor
- Improved error handling in Gemini integration with typed error inspection for more reliable retry detection
- Enhanced retry logic to properly recognize rate-limit and transient errors across REST and gRPC protocols
New Features
- Made retry settings configurable within Gemini embed and LLM clients
Tests
- Updated tests to use structured REST/gRPC error types and expanded retry-path coverage

coderabbitai · 2026-02-17T04:30:26Z

No actionable comments were generated in the recent review. 🎉

📝 Walkthrough

Walkthrough

Refactors Gemini integration to add configurable retry fields to Embedder and LLMClient, initializes them with DefaultRetryConfig, and replaces string-based retry detection with typed error inspection for REST (googleapi.Error) and gRPC (status codes). Tests updated to use structured errors.

Changes

Cohort / File(s)	Summary
Retry Configuration Integration `internal/integrations/gemini/embedder.go`, `internal/integrations/gemini/llm.go`	Added internal `retryConfig` fields to `Embedder` and `LLMClient`, initialized via `DefaultRetryConfig()` in constructors; runtime retry logic now uses these fields instead of local defaults.
Retry Error Handling Refactor `internal/integrations/gemini/retry.go`	Rewrote `isRetryableError()` and related retry flow to use typed inspections: handle REST errors via `*googleapi.Error` (429 or 5xx) and gRPC via `status.Code` (ResourceExhausted, Unavailable, Internal); removed message-fragment matching and clarified comments.
Tests Updated for Typed Errors `internal/integrations/gemini/retry_test.go`	Replaced string-based error test cases with `*googleapi.Error` and `grpc/status` errors, added necessary imports, and updated wrapped/embedded error scenarios to validate typed retry behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Gemini as LLMClient/Embedder
    participant Retry as withRetry
    participant External as Google API / gRPC

    Client->>Gemini: Request (embed/generate)
    Gemini->>Retry: invoke with retryConfig
    Retry->>External: call API
    alt success
        External-->>Retry: response
        Retry-->>Gemini: return result
        Gemini-->>Client: final response
    else retryable error (429/5xx or gRPC codes)
        External-->>Retry: retryable error
        Retry-->>Retry: backoff + jitter (respect context)
        Retry->>External: retry call
    else non-retryable error
        External-->>Retry: non-retryable error
        Retry-->>Gemini: propagate error
        Gemini-->>Client: error
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: implement exponential backoff for Gemini API #63: Modifies / initializes RetryConfig and integrates retry behavior into Gemini embedder/LLM paths; closely related to these retry-config and typed-error changes.

Poem

🐰 In burrows of code I hop and write,
Typed errors gleam in morning light,
Retries now follow tuned command,
Backoff, jitter — steady stand,
Gemini hums, robust and bright. 🌿

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the two main changes: typed error checking and injectable RetryConfig, which are the core objectives of this PR.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/retry-improvements

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Replace fragile strings.Contains matching with errors.As(*googleapi.Error) for HTTP status codes and gRPC status.FromError for RPC codes. This prevents false positives (e.g. "Internal" matching unrelated messages) and makes retry logic correct by construction. Update retry_test.go to use real *googleapi.Error and gRPC status instances instead of plain errors.New strings, so tests exercise the actual code path. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>

Store retryConfig as a field on both structs, set to DefaultRetryConfig() in their constructors. generateWithRetry and Embed now use the struct field instead of calling DefaultRetryConfig() inline, allowing tests to inject fast configs (e.g. 1ms delays) without waiting real wall-clock time. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>

gh-simili-bot · 2026-02-17T04:31:14Z

Simili Triage Report

Note

Quality Score: 9.2/10 (Excellent)
The issue could be improved. See suggestions below.

Classification

Category	Value
Labels

Quality Improvements

Test plan could be more detailed
Expand the test plan to explicitly mention if new tests were added for the RetryConfig injection mechanism and the new error type handling, or if existing tests were modified to cover these aspects.

Similar Threads

Similarity	Thread	Status
79%	#50 [Reliability]: Implement Exponential Backoff fo...	—
77%	#50 [Reliability]: Implement Exponential Backoff fo...	Open
76%	#33 Similar Issue	—

_{Generated by Simili Bot}

gh-simili-bot · 2026-02-17T04:34:55Z

Simili Triage Report

Note

Quality Score: 9.5/10 (Excellent)
The issue is well-described.

Classification

Category	Value
Labels

Quality Improvements

Could optionally include small 'before' and 'after' code snippets for key changes to further illustrate implementation details, although the textual description is already very clear.

Similar Threads

Similarity	Thread	Status
80%	#50 [Reliability]: Implement Exponential Backoff fo...	—
79%	#50 [Reliability]: Implement Exponential Backoff fo...	Open

_{Generated by Simili Bot}

Copilot

Pull request overview

This PR hardens Gemini retry behavior by switching transient-error detection from string matching to typed error inspection and by moving retry configuration onto client structs so it can be overridden.

Changes:

Updated isRetryableError to use errors.As(*googleapi.Error) for HTTP codes and gRPC status codes for RPC errors.
Added retryConfig fields to LLMClient and Embedder, initialized from DefaultRetryConfig(), and used by generateWithRetry/Embed.
Updated retry unit tests to use typed *googleapi.Error and status.New(...).Err() errors.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
internal/integrations/gemini/retry.go	Replaces string matching with typed REST/gRPC error classification for retryability.
internal/integrations/gemini/retry_test.go	Updates tests to exercise typed error paths (googleapi + gRPC status).
internal/integrations/gemini/llm.go	Stores retry config on `LLMClient` and uses it in `generateWithRetry`.
internal/integrations/gemini/embedder.go	Stores retry config on `Embedder` and uses it in `Embed`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-17T04:35:27Z

+	// gRPC transport: generative-ai-go can return gRPC status errors.
+	if st, ok := status.FromError(err); ok {
+		switch st.Code() {
+		case codes.ResourceExhausted, codes.Unavailable, codes.Internal:
 			return true
 		}
 	}


status.FromError(err) does not unwrap %w-wrapped errors, so retryable gRPC status errors can be missed once they’ve been wrapped upstream (e.g., fmt.Errorf("...: %w", err) in Embed). Consider using errors.As with an interface like interface{ GRPCStatus() *status.Status } (or iteratively unwrapping) so gRPC codes are detected through wrappers.

Copilot · 2026-02-17T04:35:27Z

 // NewLLMClient creates a new Gemini LLM client.
 func NewLLMClient(apiKey, model string) (*LLMClient, error) {
 	ctx := context.Background()
 	client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
 	if err != nil {
 		return nil, fmt.Errorf("failed to create Gemini client: %w", err)
 	}

 	if model == "" {
 		model = "gemini-2.5-flash" // Fast and cost-effective
 	}

 	return &LLMClient{
-		client: client,
-		model:  model,
+		client:      client,
+		model:       model,
+		retryConfig: DefaultRetryConfig(),
 	}, nil
 }


The PR description says RetryConfig is injectable, but retryConfig is unexported and NewLLMClient doesn’t provide any way for callers outside the gemini package to set it. To make this truly injectable, consider adding a constructor overload/option (e.g., NewLLMClientWithRetryConfig or functional options) or an exported setter.

Copilot · 2026-02-17T04:35:27Z

@@ -33,8 +34,9 @@ func NewEmbedder(apiKey, model string) (*Embedder, error) {
 	}

 	return &Embedder{
-		client: client,
-		model:  model,
+		client:      client,
+		model:       model,
+		retryConfig: DefaultRetryConfig(),
 	}, nil
 }


The PR description says RetryConfig is injectable, but retryConfig is unexported and NewEmbedder doesn’t provide any way for callers outside the gemini package to set it. Consider adding a constructor overload/option or an exported setter so integration tests (or other packages) can inject a fast retry config without relying on package-internal access.

Copilot · 2026-02-17T04:35:28Z

 		{"nil error", nil, false},
-		{"rate limit 429", errors.New("googleapi: Error 429: Resource exhausted"), true},
-		{"ResourceExhausted gRPC", errors.New("rpc error: code = ResourceExhausted"), true},
-		{"server error 500", errors.New("googleapi: Error 500: Internal Server Error"), true},
-		{"bad gateway 502", errors.New("HTTP 502: Bad Gateway"), true},
-		{"unavailable 503", errors.New("googleapi: Error 503: Service Unavailable"), true},
-		{"gateway timeout 504", errors.New("HTTP 504: Gateway Timeout"), true},
-		{"Unavailable keyword", errors.New("rpc error: code = Unavailable"), true},
-		{"Internal keyword", errors.New("rpc error: code = Internal"), true},
-		{"client error 400", errors.New("googleapi: Error 400: Bad Request"), false},
-		{"forbidden 403", errors.New("googleapi: Error 403: Forbidden"), false},
-		{"not found 404", errors.New("googleapi: Error 404: Not Found"), false},
+		{"rate limit 429", &googleapi.Error{Code: 429, Message: "Resource exhausted"}, true},
+		{"ResourceExhausted gRPC", status.New(codes.ResourceExhausted, "resource exhausted").Err(), true},
+		{"server error 500", &googleapi.Error{Code: 500, Message: "Internal Server Error"}, true},
+		{"bad gateway 502", &googleapi.Error{Code: 502, Message: "Bad Gateway"}, true},
+		{"unavailable 503", &googleapi.Error{Code: 503, Message: "Service Unavailable"}, true},
+		{"gateway timeout 504", &googleapi.Error{Code: 504, Message: "Gateway Timeout"}, true},
+		{"Unavailable gRPC", status.New(codes.Unavailable, "service unavailable").Err(), true},
+		{"Internal gRPC", status.New(codes.Internal, "internal error").Err(), true},
+		{"client error 400", &googleapi.Error{Code: 400, Message: "Bad Request"}, false},
+		{"forbidden 403", &googleapi.Error{Code: 403, Message: "Forbidden"}, false},
+		{"not found 404", &googleapi.Error{Code: 404, Message: "Not Found"}, false},


There’s no test case that covers a retryable gRPC status error being wrapped (e.g., fmt.Errorf("wrap: %w", status.New(...).Err())). Since production code may wrap errors, adding such a case would ensure isRetryableError continues to work end-to-end with wrapped gRPC errors.

coderabbitai

🧹 Nitpick comments (3)

internal/integrations/gemini/llm.go (1)
105-122: Consider exposing a way to customize RetryConfig for external callers.

The retryConfig field is unexported, which allows same-package tests to modify it directly. However, if external integrations need custom retry behavior (e.g., shorter delays for integration tests), they currently have no API to do so.

Consider adding either:

A functional option pattern: NewLLMClient(apiKey, model, WithRetryConfig(cfg))

A setter method: SetRetryConfig(cfg RetryConfig)

This is optional if external customization isn't a current requirement.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/integrations/gemini/llm.go` around lines 105 - 122, The NewLLMClient
currently hardcodes retryConfig via DefaultRetryConfig and does not expose a way
for callers to customize it; add a public API to set custom retry behavior by
either implementing a functional option (e.g., add a WithRetryConfig(retryConfig
RetryConfig) option consumed by NewLLMClient(...opts ...LLMOption)) or by adding
a setter on LLMClient (e.g., SetRetryConfig(cfg RetryConfig)); ensure the
implementation updates the LLMClient.retryConfig field and that NewLLMClient
still defaults to DefaultRetryConfig when no option or setter is used so
existing behavior remains unchanged.
internal/integrations/gemini/retry.go (1)
53-67: Consider adding codes.DeadlineExceeded to retryable gRPC codes.

The current implementation handles ResourceExhausted, Unavailable, and Internal. However, DeadlineExceeded is also a transient error that can occur during temporary network slowdowns and is commonly retried.
💡 Optional: Add DeadlineExceeded to retryable codes
 	if st, ok := status.FromError(err); ok {
 		switch st.Code() {
-		case codes.ResourceExhausted, codes.Unavailable, codes.Internal:
+		case codes.ResourceExhausted, codes.Unavailable, codes.Internal, codes.DeadlineExceeded:
 			return true
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/integrations/gemini/retry.go` around lines 53 - 67, The retry logic
currently treats gRPC status codes ResourceExhausted, Unavailable, and Internal
as retryable; add codes.DeadlineExceeded to the switch in the
status.FromError(err) block so DeadlineExceeded is also considered transient and
returns true (update the switch that checks st.Code() alongside cases
codes.ResourceExhausted, codes.Unavailable, codes.Internal to include
codes.DeadlineExceeded).
internal/integrations/gemini/retry_test.go (1)
19-49: Consider adding test cases for wrapped errors.

The current tests cover direct googleapi.Error and gRPC status errors, but errors.As is designed to unwrap nested errors. Adding a test case for wrapped errors would confirm that isRetryableError correctly handles errors wrapped via fmt.Errorf("...: %w", err).
💡 Optional: Add wrapped error test case
{"wrapped 429", fmt.Errorf("API call failed: %w", &googleapi.Error{Code: 429, Message: "Rate limited"}), true},
{"wrapped gRPC", fmt.Errorf("RPC failed: %w", status.New(codes.Unavailable, "unavailable").Err()), true},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/integrations/gemini/retry_test.go` around lines 19 - 49, Test suite
TestIsRetryableError lacks cases where retryable errors are wrapped; update the
table of tests in TestIsRetryableError to include wrapped variants using
fmt.Errorf with %w that wrap a googleapi.Error (e.g. wrapped 429) and a gRPC
status error (e.g. wrapped Unavailable) so that isRetryableError's use of
errors.As is exercised for nested errors.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/integrations/gemini/llm.go`:
- Around line 105-122: The NewLLMClient currently hardcodes retryConfig via
DefaultRetryConfig and does not expose a way for callers to customize it; add a
public API to set custom retry behavior by either implementing a functional
option (e.g., add a WithRetryConfig(retryConfig RetryConfig) option consumed by
NewLLMClient(...opts ...LLMOption)) or by adding a setter on LLMClient (e.g.,
SetRetryConfig(cfg RetryConfig)); ensure the implementation updates the
LLMClient.retryConfig field and that NewLLMClient still defaults to
DefaultRetryConfig when no option or setter is used so existing behavior remains
unchanged.

In `@internal/integrations/gemini/retry_test.go`:
- Around line 19-49: Test suite TestIsRetryableError lacks cases where retryable
errors are wrapped; update the table of tests in TestIsRetryableError to include
wrapped variants using fmt.Errorf with %w that wrap a googleapi.Error (e.g.
wrapped 429) and a gRPC status error (e.g. wrapped Unavailable) so that
isRetryableError's use of errors.As is exercised for nested errors.

In `@internal/integrations/gemini/retry.go`:
- Around line 53-67: The retry logic currently treats gRPC status codes
ResourceExhausted, Unavailable, and Internal as retryable; add
codes.DeadlineExceeded to the switch in the status.FromError(err) block so
DeadlineExceeded is also considered transient and returns true (update the
switch that checks st.Code() alongside cases codes.ResourceExhausted,
codes.Unavailable, codes.Internal to include codes.DeadlineExceeded).

status.FromError does not unwrap %w-wrapped errors, so retryable gRPC status errors (e.g. ResourceExhausted) could be missed when wrapped by an upstream fmt.Errorf call. Switch to errors.As with the GRPCStatus() interface, which correctly traverses the error chain. Add two new test cases: wrapped retryable (ResourceExhausted) and wrapped non-retryable (NotFound) to cover this code path. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>

gh-simili-bot · 2026-02-17T04:41:23Z

Simili Triage Report

Note

Quality Score: 9.5/10 (Excellent)
The issue could be improved. See suggestions below.

Classification

Category	Value
Labels

Quality Improvements

Specific examples of the 'false positives' in isRetryableError are not provided.
Include a brief, concrete example of an error message that would have been incorrectly identified as retryable before this fix.

Similar Threads

Similarity	Thread	Status
80%	#50 [Reliability]: Implement Exponential Backoff fo...	—
80%	#50 [Reliability]: Implement Exponential Backoff fo...	Open

_{Generated by Simili Bot}

Kavirubc added 2 commits February 17, 2026 10:00

Kavirubc force-pushed the fix/retry-improvements branch from 9a23f0f to b4a71a6 Compare February 17, 2026 04:31

gh-simili-bot added bug Something isn't working enhancement New feature or request refactor labels Feb 17, 2026

github-project-automation Bot added this to simili-bot-v1-release Feb 17, 2026

github-project-automation Bot moved this to Todo in simili-bot-v1-release Feb 17, 2026

Kavirubc self-assigned this Feb 17, 2026

Kavirubc requested a review from Copilot February 17, 2026 04:32

Kavirubc moved this from Todo to Done in simili-bot-v1-release Feb 17, 2026

Kavirubc linked an issue Feb 17, 2026 that may be closed by this pull request

[Reliability]: Implement Exponential Backoff for Gemini API #50

Closed

Copilot started reviewing on behalf of Kavirubc February 17, 2026 04:33 View session

gh-simili-bot added fix testing labels Feb 17, 2026

Copilot AI reviewed Feb 17, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 17, 2026

View reviewed changes

Kavirubc merged commit e40a68c into main Feb 17, 2026
6 checks passed

Kavirubc deleted the fix/retry-improvements branch February 17, 2026 04:41

Kavirubc mentioned this pull request Feb 17, 2026

feat: add retry logic for OpenAI API calls (parity with Gemini) #69

Closed

4 tasks

gh-simili-bot mentioned this pull request Mar 5, 2026

feat(ai): OpenAI retry parity — typed error + tests (closes #69) #99

Merged

5 tasks

Conversation

Kavirubc commented Feb 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gh-simili-bot commented Feb 17, 2026

Simili Triage Report

Classification

Uh oh!

gh-simili-bot commented Feb 17, 2026

Simili Triage Report

Classification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gh-simili-bot commented Feb 17, 2026

Simili Triage Report

Classification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kavirubc commented Feb 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 17, 2026 •

edited

Loading