[protocol] Stage degradedDatacenters on AdminOperation v100#2816
Open
mynameborat wants to merge 1 commit into
Open
[protocol] Stage degradedDatacenters on AdminOperation v100#2816mynameborat wants to merge 1 commit into
mynameborat wants to merge 1 commit into
Conversation
Add an array<string> field on the AddVersion admin operation carrying the set of datacenters marked as degraded at version creation time. Pinned via build.gradle versionOverrides until the Java wiring (parent populates the field, AdminExecutionTask reads it to enforce skipConsumption for degraded DCs even when versionSwapDeferred=true) lands in a follow-up PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
d99db19 to
7bd7420
Compare
Contributor
Author
|
Updated the field shape from Why: The avroutil vanilla code-gen does not initialize non-union defaults in the generated SpecificRecord constructor; the field comes out null. Avro then throws an NPE when serializing because a non-nullable Validated locally with |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem Statement
The degraded-mode batch push feature needs to propagate the set of currently-degraded datacenters from the parent controller to child controllers so that
AdminExecutionTaskon a child controller can decide whether to skip ingestion for its own region. Today,markDatacenterDegradedonly writes to the parent's local state and produces no admin-topic message, so child controllers never learn about degraded DCs. TheisDegradedDCcheck inAdminExecutionTaskalways evaluates to false on a child, which means a degraded DC silently ingests the version anyway whenversionSwapDeferred=trueis set (the auto-conversion path used by degraded-mode pushes).The selected fix is to embed the degraded-DC set in the AddVersion admin message itself — atomic with version creation and consistent with how other version-creation parameters (e.g.,
targetedRegions,versionSwapDeferred) flow from parent to child.Solution
Stage a new
degradedDatacentersfield on theAddVersionrecord in a newAdminOperationschema version (v100). Following the established pattern (see #2806, #2814), this PR introduces the schema only; Java wiring (parent populates the field,AdminExecutionTaskreads it) will land in a follow-up PR.services/venice-controller/src/main/resources/avro/AdminOperation/v100/AdminOperation.avsc— verbatim copy of v99 with one added field onAddVersion:build.gradleversionOverridescomment extended to note v100 is also staged. The actual pin stays at v98 — the generated JavaAdminOperationclass is unchanged, so no producers or consumers can see the new field yet.AvroProtocolDefinition.ADMIN_OPERATIONstays at 98.AdminOperationSerializer.initProtocolMap()continues to load v1..v98 only; v99 and v100 remain inert until activated by a future PR.This is a forward-compatible schema-only change with no behavioral impact.
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Schema-only change; no behavior change to test. Verified the existing protocol tests pass:
AdminOperationProtocolCompatibilityTest.testAdminOperationProtocolCompatibility— PASSEDAdminOperationSerializerTest(testAdminOperationSerializer, testDownloadAndSchemaIfNecessary, testGetSchema, testSerializeDeserializeWithDocChange, testValidateAdminOperation) — all PASSEDNew unit tests added.
New integration tests added.
Modified or extended existing tests.
Verified backward compatibility (if applicable).
Does this PR introduce any user-facing or breaking changes?