[controller] Submit roll forward calls in parallel#2573
[controller] Submit roll forward calls in parallel#2573misyel wants to merge 5 commits intolinkedin:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR parallelizes the parent controller’s roll-forward fan-out to child regions so a slow region doesn’t block submission to other regions.
Changes:
- Submits
rollForwardToFutureVersionrequests to all regions concurrently viaasyncSetupExecutor. - Waits for all region requests to complete, then aggregates failures and throws a single exception if any region failed.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Outdated
Show resolved
Hide resolved
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Outdated
Show resolved
Hide resolved
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Outdated
Show resolved
Hide resolved
| for (Map.Entry<String, Future<?>> entry: regionFutures.entrySet()) { | ||
| String region = entry.getKey(); | ||
| try { | ||
| entry.getValue().get(ROLL_FORWARD_REQUEST_TIMEOUT, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
We are still waiting in the future.get() for each region, which means the max timeout will be 3 * 60s in total. I wonder why don't we use something like this instead CompletableFuture.allOf(TIMEOUT, TimeUnit.MILLISECONDS)?
There was a problem hiding this comment.
Good point - updated to make it a shared timeout
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Show resolved
Hide resolved
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Outdated
Show resolved
Hide resolved
...nice-controller/src/test/java/com/linkedin/venice/controller/TestVeniceParentHelixAdmin.java
Outdated
Show resolved
Hide resolved
...nice-controller/src/test/java/com/linkedin/venice/controller/TestVeniceParentHelixAdmin.java
Show resolved
Hide resolved
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem Statement
When a roll forward request hits the parent, it will call roll forward to each region sequentially. If one region is slow, the next region is also stalled and has to wait for the previous region to complete before starting
Solution
Submit the roll forward request to all regions in parallel and wait for each request to complete afterwards
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Existing CI
Does this PR introduce any user-facing or breaking changes?