[protocol] Sanitize em dashes and add store-level storageMode in v44 schema#2822
Merged
ymuppala merged 2 commits intoMay 23, 2026
Merged
Conversation
The StoreMetaValue v44 / AdminOperation v99 doc strings use Unicode em dashes (U+2014). The schemas survive a parse->toString round-trip, but not the HTTP path used by ControllerClientBackedSystemSchemaInitializer: the bytes arrive at the controller as '?', so the in-process and HTTP registrations of v44 disagree on doc text. The controller's duplicate check treats this as a new schema, returns "next available id is 45", and the client retries the POST for ~40s -- enough to push cluster bring-up past the 60s setUp timeout in VeniceClientCompatibilityTest and consistently fail v44 / v99 activation. Restricting the new doc strings to ASCII unblocks activation without touching the underlying charset bug in SchemaEntry.getSchemaBytes() and the HTTP form-param encoding, which can be fixed independently.
There was a problem hiding this comment.
Pull request overview
This PR removes Unicode em dashes (U+2014) from Avro schema doc strings in the staged protocol versions StoreMetaValue v44 and AdminOperation v99, replacing them with ASCII -- to avoid schema-registration mismatches caused by a lossy HTTP encoding path during controller/server bootstrap.
Changes:
- Replaced em dashes with ASCII
--inAdminOperationv99docstrings (2 occurrences). - Replaced em dashes with ASCII
--inStoreMetaValuev44docstrings (2 occurrences).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| services/venice-controller/src/main/resources/avro/AdminOperation/v99/AdminOperation.avsc | Replaces non-ASCII em dashes in doc strings with ASCII to keep schema text identical across registration paths. |
| internal/venice-common/src/main/resources/avro/StoreMetaValue/v44/StoreMetaValue.avsc | Replaces non-ASCII em dashes in doc strings with ASCII to prevent duplicate-detection mismatches when activating v44. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Mirrors the existing UpdateStore.storageMode admin-op field with a persisted counterpart on the store record. UpdateStore.storageMode previously had nowhere to land at the store level; the controller could only forward the value into each newly-created StoreVersion.storageMode. StoreProperties.storageMode now lets the controller remember the store-level default so future versions inherit it without the operator having to repeat the admin op. Same int encoding as UpdateStore / StoreVersion (0=INTERNAL, 1=DUAL_WRITE, 2=EXTERNAL) with default 0 for forward compatibility with v43 readers. Doc on UpdateStore.storageMode updated to call out the new persistence path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem Statement
Two issues with the StoreMetaValue v44 / AdminOperation v99 schemas added by #2806 / #2814:
Issue 1: em-dash characters break v44 / v99 activation
The
docfields use Unicode em dashes (—, U+2014). The schemas survive an in-process parse->toString round-trip but not the HTTP path used byControllerClientBackedSystemSchemaInitializer— em dashes arrive at the controller as?. When the controller's duplicate-detection logic inHelixReadWriteSchemaRepository.getNextAvailableSchemaIdthen compares the just-registered v44 (with em dashes) against the new POST body (with?),Schema.equalsreturns true butAvroSchemaUtils.hasDocFieldChangereturns true. The POST is treated as a new schema, returnsnext available id is 45instead ofDUPLICATE_VALUE_SCHEMA_CODE, and the controller throws:The client retries this for ~40s before it eventually re-reads the ZK snapshot. This pushed
VeniceClientCompatibilityTestpast its 60s cluster-ready budget, manifesting asConnection refusedto the router on 5 consecutive CI runs of #2817.v1..v43 use only ASCII in their
docstrings, so the HTTP round-trip is lossless. The em dashes appear for the first time on v44.Issue 2: store-level
storageModewas missing fromStorePropertiesThe v99
UpdateStoreadmin op already carriesstorageModeas an operator-override, but its doc said the controller only copies the value intoStoreVersion.storageModefor new versions. There was no persistent store-level slot forstorageModeonStoreProperties. That meant the store-level default could not be remembered between version creations; the operator would need to repeat the admin op on every new version.StorePropertiesdid have aexternalStorageReadModefield but not astorageModefield, which is inconsistent with how the two external-storage knobs are intended to work together at the store level.Solution
--). Two files, four lines.storageMode(int, default 0) toStorePropertiesin v44, alongside the existingexternalStorageReadMode. Same encoding asStoreVersion.storageModeandUpdateStore.storageMode(0 = INTERNAL, 1 = DUAL_WRITE, 2 = EXTERNAL). UpdateUpdateStore.storageModedoc in v99 to call out that the controller now also persists the value on the store record.The underlying charset bug (
SchemaEntry.getSchemaBytes()using platform default charset, HTTP form-param encoding not pinned to UTF-8) is intentionally left for a follow-up PR; restricting doc strings to ASCII is the minimal change that unblocks v44 / v99 activation.Code changes
Concurrency-Specific Checks
How was this PR tested?
versionOverrides):schema_id=44POST attempts: 505,Inconsistent value schema iderrors: 405,setUp FAILED(Connection refused to router after 60s) for every Avro client version (1.4 .. 1.10). Total: build fails after ~5 min.:internal:venice-avro-compatibility-test:testAvro1_10:schema_id=44POSTs 4 (normal), Inconsistent errors 0, all 6 tests PASSED, build completed in 40 s.doctext (no impact on serialization). The newstorageModefield onStorePropertieshasdefault: 0, so v43 readers ignore it and writers that never set it produce records that v43 readers decode unchanged.Does this PR introduce any user-facing or breaking changes?
docis metadata-only. The newstorageModefield onStorePropertiesdefaults to 0 (INTERNAL, i.e. current Venice-only behavior), so stores that do not opt in see no change.