-
-
Notifications
You must be signed in to change notification settings - Fork 141
feat(bigquery): Implement exponential backoff for errors in BigQuery #578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughSwitches workspace BigQuery client to a git-pinned source, adds tonic; changes BigQuery batching to Arc-wrapped TableBatch items; maps gRPC status codes to granular error kinds; adds exponential backoff with jitter and retry for throttled streaming; introduces ErrorKind::DestinationThrottled. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@etl-destinations/src/bigquery/core.rs`:
- Around line 515-523: The doc and implementation in calculate_backoff are
inconsistent: the doc says cap initial_backoff*2^attempt then add jitter, but
the code caps after adding jitter so jitter can be removed; fix by changing
calculate_backoff to apply the min(MAX_BACKOFF_MS) to the exponential component
(using INITIAL_BACKOFF_MS and attempt with saturating shift/ mul) and then add
the random jitter (from rand) before building the Duration so jitter is
preserved at the cap; alternatively, if you prefer the current behavior, update
the doc comment to reflect that the cap is applied after adding jitter
(reference calculate_backoff, INITIAL_BACKOFF_MS, MAX_BACKOFF_MS).
In `@etl/src/error.rs`:
- Line 108: Add a standard documentation comment for the enum variant
DestinationThrottled in the Error enum describing when it is returned (e.g.,
when the destination is temporarily throttling requests or rate limits are
exceeded) and any relevant behavior (transient vs. permanent, whether retries
are safe). Update the doc to use the same stdlib tone and precision as other
variants and attach it immediately above the DestinationThrottled variant so
tools like rustdoc surface the explanation.
Pull Request Test Coverage Report for Build 21442167836Details
💛 - Coveralls |
Cargo.toml
Outdated
| fail = { version = "0.5.1", default-features = false } | ||
| futures = { version = "0.3.31", default-features = false } | ||
| gcp-bigquery-client = { version = "0.27.0", default-features = false } | ||
| gcp-bigquery-client = { git = "https://github.com/iambriccardo/gcp-bigquery-client", default-features = false, rev = "a1cc7895afce36c0c86cd71bab94253fef04f05c" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We plan on switching to the main fork once we upstreamed our changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@etl-destinations/src/bigquery/core.rs`:
- Around line 32-35: The constant MAX_THROTTLE_RETRY_ATTEMPTS is ambiguous about
whether it counts the initial attempt; either make the behavior explicit or
adjust the loop. Fix by one of two concise options: (A) rename the constant to
MAX_THROTTLE_ATTEMPTS and update its docstring to "maximum total attempts
(initial + retries)" wherever used (e.g., references in retry loop), or (B) keep
the name as "MAX_THROTTLE_RETRY_ATTEMPTS" and change the retry loop logic that
uses it (the block that uses INITIAL_BACKOFF_MS and exponential backoff) so it
treats the constant strictly as retry attempts (e.g., run initial attempt then
loop for retries < MAX_THROTTLE_RETRY_ATTEMPTS), and update the docstring to
"maximum number of retry attempts (excluding initial attempt)". Ensure the
chosen option is applied consistently to all uses of MAX_THROTTLE_RETRY_ATTEMPTS
in this file.
- Around line 563-584: The inline comments inside the retry loop (around the
block using MAX_THROTTLE_RETRY_ATTEMPTS, calculate_backoff, warn! and
last_error) are missing trailing periods; update those comment lines to end with
a period (e.g., change "Don't retry on last attempt" to "Don't retry on last
attempt." and any other similar comment in that block) so they conform to the
Rust comment punctuation guideline.
- Around line 535-537: Update the doc comment to use Rustdoc intra-doc links
instead of plain inline code for types: change occurrences like
`Arc<TableBatch>` and `TableBatch` to linked forms [`Arc`], [`TableBatch`] (or
[`std::sync::Arc`] if you prefer fully-qualified), and any method/type
references to the [`Type::method`] style so the comment on the slice of
Arc<TableBatch> becomes linked and navigable in generated docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@Cargo.toml`:
- Line 49: Add documentation above the gcp-bigquery-client git dependency in
Cargo.toml describing why the fork is used: list the specific features/fixes the
fork provides (e.g., built-in retry/backoff behavior), reference any open
upstream issue/PR numbers or links for lquerel/gcp-bigquery-client, and state
the tracking issue ID and a short migration plan and timeline for switching back
to the official crate or an approved replacement; ensure you mention the exact
dependency name "gcp-bigquery-client" and include whether upstream acceptance is
expected and fallback options if the fork becomes unavailable.
Summary by CodeRabbit
New Features
Bug Fixes
Breaking Changes
✏️ Tip: You can customize this high-level summary in your review settings.