Skip to content

Conversation

@iambriccardo
Copy link
Contributor

@iambriccardo iambriccardo commented Jan 28, 2026

Summary by CodeRabbit

  • New Features

    • Automatic retry for BigQuery throttling with exponential backoff and jitter.
  • Bug Fixes

    • More granular classification of BigQuery errors (throttling, missing tables, table conflicts, preconditions).
    • Improved resilience under high load via backoff-managed retries.
  • Breaking Changes

    • Public APIs now use shared, reference-counted batch types and emit more specific error kinds; callers may need minor updates.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Switches workspace BigQuery client to a git-pinned source, adds tonic; changes BigQuery batching to Arc-wrapped TableBatch items; maps gRPC status codes to granular error kinds; adds exponential backoff with jitter and retry for throttled streaming; introduces ErrorKind::DestinationThrottled.

Changes

Cohort / File(s) Change summary
Workspace manifest
Cargo.toml
Replaced gcp-bigquery-client = { version = "0.27.0", ... } with a git-pinned dependency (specific rev) and added tonic = { version = "0.14.2", default-features = false }.
etl-destinations feature & deps
etl-destinations/Cargo.toml
Added rand and tonic to the bigquery feature and declared rand (with thread_rng) and tonic dependencies.
BigQuery client API
etl-destinations/src/bigquery/client.rs
Reworked streaming API to accept IntoIterator<Item = Arc<TableBatch<...>>> (requires ExactSizeIterator), renamed/adjusted method to append-style, create_table_batch now returns Arc<TableBatch<...>>, and error mapping now matches on tonic::Code to emit granular ErrorKind variants.
Backoff & retry logic
etl-destinations/src/bigquery/core.rs
Added retry constants, full-jitter exponential backoff calculation, is_retryable_error helper, and append_table_batches_with_retry wrapper used by write_table_rows and write_events to retry throttled streaming with sleeps.
Error definitions
etl/src/error.rs
Added new ErrorKind::DestinationThrottled variant to represent throttling responses.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Egress as Egress/Caller
participant Writer as Writer (write_table_rows / write_events)
participant Retry as Retry wrapper (append_table_batches_with_retry)
participant Client as BigQueryClient
participant BQ as BigQuery API
Egress->>Writer: submit Arc collection
Writer->>Retry: call append_table_batches_with_retry(batches)
loop attempts (<= MAX_RETRY_ATTEMPTS)
Retry->>Client: append_table_batches (concurrent)
Client->>BQ: gRPC streaming insert
alt success
BQ-->>Client: success
Client-->>Retry: (bytes_sent, bytes_received)
Retry-->>Writer: return results
break
else throttled/unavailable
BQ-->>Client: status (RESOURCE_EXHAUSTED / UNAVAILABLE / etc.)
Client-->>Retry: map status -> DestinationThrottled
Retry->>Retry: calculate_backoff(attempt) and sleep (jitter)
end
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing exponential backoff for BigQuery errors, which is the primary focus of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@etl-destinations/src/bigquery/core.rs`:
- Around line 515-523: The doc and implementation in calculate_backoff are
inconsistent: the doc says cap initial_backoff*2^attempt then add jitter, but
the code caps after adding jitter so jitter can be removed; fix by changing
calculate_backoff to apply the min(MAX_BACKOFF_MS) to the exponential component
(using INITIAL_BACKOFF_MS and attempt with saturating shift/ mul) and then add
the random jitter (from rand) before building the Duration so jitter is
preserved at the cap; alternatively, if you prefer the current behavior, update
the doc comment to reflect that the cap is applied after adding jitter
(reference calculate_backoff, INITIAL_BACKOFF_MS, MAX_BACKOFF_MS).

In `@etl/src/error.rs`:
- Line 108: Add a standard documentation comment for the enum variant
DestinationThrottled in the Error enum describing when it is returned (e.g.,
when the destination is temporarily throttling requests or rate limits are
exceeded) and any relevant behavior (transient vs. permanent, whether retries
are safe). Update the doc to use the same stdlib tone and precision as other
variants and attach it immediately above the DestinationThrottled variant so
tools like rustdoc surface the explanation.

@coveralls
Copy link

coveralls commented Jan 28, 2026

Pull Request Test Coverage Report for Build 21442167836

Details

  • 32 of 87 (36.78%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.03%) to 81.02%

Changes Missing Coverage Covered Lines Changed/Added Lines %
etl-destinations/src/bigquery/core.rs 19 45 42.22%
etl-destinations/src/bigquery/client.rs 13 42 30.95%
Files with Coverage Reduction New Missed Lines %
etl-destinations/src/bigquery/client.rs 1 79.63%
Totals Coverage Status
Change from base Build 21434389108: -0.03%
Covered Lines: 17395
Relevant Lines: 21470

💛 - Coveralls

@iambriccardo iambriccardo changed the title Improve feat(bigquery): Implement exponential backoff for connection errors in BigQuery Jan 28, 2026
@iambriccardo iambriccardo changed the title feat(bigquery): Implement exponential backoff for connection errors in BigQuery feat(bigquery): Implement exponential backoff for errors in BigQuery Jan 28, 2026
Cargo.toml Outdated
fail = { version = "0.5.1", default-features = false }
futures = { version = "0.3.31", default-features = false }
gcp-bigquery-client = { version = "0.27.0", default-features = false }
gcp-bigquery-client = { git = "https://github.com/iambriccardo/gcp-bigquery-client", default-features = false, rev = "a1cc7895afce36c0c86cd71bab94253fef04f05c" }
Copy link
Contributor Author

@iambriccardo iambriccardo Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan on switching to the main fork once we upstreamed our changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@etl-destinations/src/bigquery/core.rs`:
- Around line 32-35: The constant MAX_THROTTLE_RETRY_ATTEMPTS is ambiguous about
whether it counts the initial attempt; either make the behavior explicit or
adjust the loop. Fix by one of two concise options: (A) rename the constant to
MAX_THROTTLE_ATTEMPTS and update its docstring to "maximum total attempts
(initial + retries)" wherever used (e.g., references in retry loop), or (B) keep
the name as "MAX_THROTTLE_RETRY_ATTEMPTS" and change the retry loop logic that
uses it (the block that uses INITIAL_BACKOFF_MS and exponential backoff) so it
treats the constant strictly as retry attempts (e.g., run initial attempt then
loop for retries < MAX_THROTTLE_RETRY_ATTEMPTS), and update the docstring to
"maximum number of retry attempts (excluding initial attempt)". Ensure the
chosen option is applied consistently to all uses of MAX_THROTTLE_RETRY_ATTEMPTS
in this file.
- Around line 563-584: The inline comments inside the retry loop (around the
block using MAX_THROTTLE_RETRY_ATTEMPTS, calculate_backoff, warn! and
last_error) are missing trailing periods; update those comment lines to end with
a period (e.g., change "Don't retry on last attempt" to "Don't retry on last
attempt." and any other similar comment in that block) so they conform to the
Rust comment punctuation guideline.
- Around line 535-537: Update the doc comment to use Rustdoc intra-doc links
instead of plain inline code for types: change occurrences like
`Arc<TableBatch>` and `TableBatch` to linked forms [`Arc`], [`TableBatch`] (or
[`std::sync::Arc`] if you prefer fully-qualified), and any method/type
references to the [`Type::method`] style so the comment on the slice of
Arc<TableBatch> becomes linked and navigable in generated docs.

@iambriccardo iambriccardo marked this pull request as ready for review January 28, 2026 14:27
@iambriccardo iambriccardo requested a review from a team as a code owner January 28, 2026 14:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@Cargo.toml`:
- Line 49: Add documentation above the gcp-bigquery-client git dependency in
Cargo.toml describing why the fork is used: list the specific features/fixes the
fork provides (e.g., built-in retry/backoff behavior), reference any open
upstream issue/PR numbers or links for lquerel/gcp-bigquery-client, and state
the tracking issue ID and a short migration plan and timeline for switching back
to the official crate or an approved replacement; ensure you mention the exact
dependency name "gcp-bigquery-client" and include whether upstream acceptance is
expected and fallback options if the fork becomes unavailable.

@iambriccardo iambriccardo merged commit 0540979 into main Jan 28, 2026
13 of 14 checks passed
@iambriccardo iambriccardo deleted the backoff-bq branch January 28, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants