Skip to content

sync: add --concurrency and --retry-timeout flags#1

Merged
luketchang merged 1 commit intomainfrom
feat/sync-concurrency-flag
May 7, 2026
Merged

sync: add --concurrency and --retry-timeout flags#1
luketchang merged 1 commit intomainfrom
feat/sync-concurrency-flag

Conversation

@luketchang
Copy link
Copy Markdown

The sync engine hardcoded its in-flight request limit at 20 and relied on the SDK's retry layer, which only retries 429 and 504 (databricks-sdk-go httpclient/errors.go DefaultErrorRetriable). 502 and 503 surface as fatal errors that fail the entire sync.

  • Add SyncOptions.Concurrency and SyncOptions.RetryTimeout; default to MaxRequestsInFlight and DefaultRetryTimeout respectively when zero so existing in-process callers (bundle deploy/files) need no changes.
  • Wrap each filer call (PUT, delete, mkdir, rmdir) in retries.Poll, retrying on 502/503/504 within the deadline and logging each retry at warn level. Uses the same retries.Poll helper used elsewhere in the CLI for state polling.
  • Surface --concurrency (default 5) and --retry-timeout (default 30s) on 'databricks sync' and 'databricks bundle sync'.

429 stays out of the new retry layer because the SDK already handles it with rate-limit-aware backoff. 500 stays out because it is not always transient for Databricks APIs.

Slack thread:
https://replit.slack.com/archives/C0A2Z9042FR/p1778082184841149

Changes

Why

Tests

The sync engine hardcoded its in-flight request limit at 20 and relied
on the SDK's retry layer, which only retries 429 and 504
(databricks-sdk-go httpclient/errors.go DefaultErrorRetriable). 502 and
503 surface as fatal errors that fail the entire sync.

- Add SyncOptions.Concurrency and SyncOptions.RetryTimeout; default to
  MaxRequestsInFlight and DefaultRetryTimeout respectively when zero so
  existing in-process callers (bundle deploy/files) need no changes.
- Wrap each filer call (PUT, delete, mkdir, rmdir) in retries.Poll,
  retrying on 502/503/504 within the deadline and logging each retry
  at warn level. Uses the same retries.Poll helper used elsewhere in
  the CLI for state polling.
- Surface --concurrency (default 5) and --retry-timeout (default 30s)
  on 'databricks sync' and 'databricks bundle sync'.

429 stays out of the new retry layer because the SDK already handles it
with rate-limit-aware backoff. 500 stays out because it is not always
transient for Databricks APIs.

Slack thread:
https://replit.slack.com/archives/C0A2Z9042FR/p1778082184841149
@luketchang
Copy link
Copy Markdown
Author

@replit/micromanager approve tested locally, new repo

@luketchang luketchang merged commit d7dc393 into main May 7, 2026
@luketchang luketchang deleted the feat/sync-concurrency-flag branch May 7, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants