Skip to content

[fast-client] Version-switch listener on StoreMetadata#2818

Open
ymuppala wants to merge 3 commits into
linkedin:mainfrom
ymuppala:ymuppala/phase-1-1-version-switch-callback
Open

[fast-client] Version-switch listener on StoreMetadata#2818
ymuppala wants to merge 3 commits into
linkedin:mainfrom
ymuppala:ymuppala/phase-1-1-version-switch-callback

Conversation

@ymuppala
Copy link
Copy Markdown
Collaborator

Problem Statement

Fast Client consumers that wrap the Venice client (e.g. shadow-clients maintaining a parallel cache, version-aware routing layers) currently have no way to be notified when the metadata refresh observes a new current serving version. They have to poll getCurrentStoreVersion() from outside the refresh loop, which is racy with respect to the rest of the cache (replicas, partition count, schemas) being updated atomically inside RequestBasedMetadata.updateCache.

Solution

Add a registration-style callback that fires from inside the refresh loop after the new current version has been committed to the local cache, so listeners may safely read the rest of the cache and observe the new value.

  • New StoreVersionSwitchListener functional interface: onVersionSwitch(int previousVersion, int newVersion). First refresh after client start delivers previousVersion = -1 as a sentinel.

  • StoreMetadata gains register/unregisterVersionSwitchListener with default {} no-op bodies so existing test fakes / mocks aren't forced to implement them.

  • AbstractStoreMetadata owns the listener list (CopyOnWriteArrayList for rare-writer / frequent-reader access) and exposes a protected fireVersionSwitch(prev, next) that catches per-listener Throwable and logs at ERROR, so one buggy listener cannot break the refresh thread or starve sibling listeners.

  • RequestBasedMetadata.updateCache captures currentVersion.get() immediately before .set(fetchedCurrentVersion), then calls fireVersionSwitch(previousVersion, fetchedCurrentVersion) after the stat update. The fire happens only on the branch that actually accepts a new current version (i.e. when whetherToSwitchToFetchedCurrentVersion is true), so transitions that get deferred for partition-resource readiness do not generate spurious callbacks.

Code changes

  • Added new code behind a config. (no)
  • Introduced new log lines. One ERROR-level log on listener exception in AbstractStoreMetadata.fireVersionSwitch. Bounded by listener count, fires only on bad listener — no rate limiting needed.

Concurrency-Specific Checks

  • No race conditions: listeners are stored in a CopyOnWriteArrayList; the iterator in fireVersionSwitch is snapshot-stable across concurrent register/unregister calls.
  • Synchronization: registration is intrinsically thread-safe via CopyOnWriteArrayList.addIfAbsent / remove; fire walks the snapshot iterator without holding any lock.
  • No blocking calls inside critical sectionsfireVersionSwitch is called from RequestBasedMetadata.updateCache which is synchronized, but the listener iteration itself acquires no further locks. Listener contracts (Javadoc) require short, non-blocking work; exceptions are isolated per listener.
  • Thread-safe collection used (CopyOnWriteArrayList).
  • Validated proper exception handling — per-listener try/catch (Throwable) ensures one bad listener cannot kill the refresh thread.

How was this PR tested?

  • New unit tests added: AbstractStoreMetadataTest (8 cases: register+fire, no-listeners-safe, multi-listener fan-out, exception isolation, unregister, dedup on double-register, null-reject on register, null/unknown tolerance on unregister).
  • Modified or extended existing tests: yes — RequestBasedMetadataTest gains two wiring tests (listener fires on initial start() with sentinel -1; no fire when a subsequent updateCache returns the same version).
  • Verified backward compatibility: yes — the two new StoreMetadata methods are default no-ops, so existing implementations of StoreMetadata compile and run unchanged.

Does this PR introduce any user-facing or breaking changes?

  • No.

Middle of a 3-PR stack:

  1. Previously: [venice-common][protocol] Java accessors for storageMode + externalStorageReadMode + external-table identity ([venice-common][controller][compact] Activate StoreMetaValue v44 / AdminOperation v99 + Java accessors #2817) — independent of this PR
  2. This PR — version-switch listener (independent; can land in parallel with [venice-common][controller][compact] Activate StoreMetaValue v44 / AdminOperation v99 + Java accessors #2817)
  3. Next: [fast-client] Store-config-change listener on StoreMetadata — depends on [venice-common][controller][compact] Activate StoreMetaValue v44 / AdminOperation v99 + Java accessors #2817 for ExternalStorageReadMode

Copilot AI review requested due to automatic review settings May 22, 2026 15:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a version-switch notification mechanism for the Fast Client metadata layer so consumers can react when the observed “current serving version” changes (without racy polling). In addition, the PR also includes venice-common API additions for external-storage-related fields (enums + Store/Version accessors) plus associated tests.

Changes:

  • Add StoreVersionSwitchListener and register/unregisterVersionSwitchListener to Fast Client StoreMetadata, implemented in AbstractStoreMetadata, and invoked from RequestBasedMetadata during metadata refresh.
  • Add external-storage enums (StorageMode, ExternalStorageReadMode, ExternalTableStatus) and wire new accessors into Store/Version implementations (ZKStore, VersionImpl, ReadOnlyStore, SystemStore, StoreInfo).
  • Add/extend unit tests for both the new listener plumbing and the external-storage accessors/enums.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/venice-common/src/test/java/com/linkedin/venice/meta/TestZKStore.java Adds tests for externalStorageReadMode default/round-trip/clone behavior.
internal/venice-common/src/test/java/com/linkedin/venice/meta/TestVersion.java Adds tests for new Version external-storage fields (defaults/round-trip/clone).
internal/venice-common/src/test/java/com/linkedin/venice/meta/TestStoreInfo.java Adds tests for StoreInfo.externalStorageReadMode default/null-coercion/round-trip.
internal/venice-common/src/test/java/com/linkedin/venice/meta/StorageModeTest.java Adds enum mapping test for StorageMode.
internal/venice-common/src/test/java/com/linkedin/venice/meta/ExternalTableStatusTest.java Adds enum mapping test for ExternalTableStatus.
internal/venice-common/src/test/java/com/linkedin/venice/meta/ExternalStorageReadModeTest.java Adds enum mapping test for ExternalStorageReadMode.
internal/venice-common/src/main/java/com/linkedin/venice/meta/ZKStore.java Adds transient externalStorageReadMode field/accessors and copy-constructor propagation.
internal/venice-common/src/main/java/com/linkedin/venice/meta/VersionImpl.java Adds transient external-storage fields/accessors and clone propagation.
internal/venice-common/src/main/java/com/linkedin/venice/meta/Version.java Extends Version API with external-storage accessors.
internal/venice-common/src/main/java/com/linkedin/venice/meta/SystemStore.java Delegates getExternalStorageReadMode; rejects setter.
internal/venice-common/src/main/java/com/linkedin/venice/meta/StoreInfo.java Adds externalStorageReadMode field + JSON/POJO accessors and fromStore plumbing.
internal/venice-common/src/main/java/com/linkedin/venice/meta/Store.java Extends Store API with externalStorageReadMode accessors.
internal/venice-common/src/main/java/com/linkedin/venice/meta/StorageMode.java Introduces StorageMode enum with int mapping.
internal/venice-common/src/main/java/com/linkedin/venice/meta/ReadOnlyStore.java Delegates new Version getters and new Store getter; setters throw.
internal/venice-common/src/main/java/com/linkedin/venice/meta/ExternalTableStatus.java Introduces ExternalTableStatus enum with int mapping.
internal/venice-common/src/main/java/com/linkedin/venice/meta/ExternalStorageReadMode.java Introduces ExternalStorageReadMode enum with int mapping.
clients/venice-client/src/test/java/com/linkedin/venice/fastclient/meta/RequestBasedMetadataTest.java Adds tests validating listener firing on initial start and not firing when version unchanged.
clients/venice-client/src/test/java/com/linkedin/venice/fastclient/meta/AbstractStoreMetadataTest.java Adds unit tests for listener registration/dedup/unregister/exception isolation.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/meta/StoreVersionSwitchListener.java Adds new listener functional interface + contract docs.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/meta/StoreMetadata.java Adds default no-op register/unregister methods to the metadata interface.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/meta/RequestBasedMetadata.java Fires the version-switch callback when the refresh updates currentVersion.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/meta/AbstractStoreMetadata.java Implements listener registration and protected fireVersionSwitch(...).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

currentVersion.set(fetchedCurrentVersion);
clusterStats.updateCurrentVersion(fetchedCurrentVersion);
fetchedCurrentVersionPartitionResourceInCompletionRetries.set(0);
fireVersionSwitch(previousVersion, fetchedCurrentVersion);
/**
* @param previousVersion the version that was current before this refresh, or {@code -1} for the first refresh
* after client start
* @param newVersion the new current version observed by the metadata refresh; always {@code > 0}
Comment on lines +201 to +210
/**
* Store-level read routing for clients backed by both Venice local storage and a configured external storage
* system. Defaults to {@link ExternalStorageReadMode#VENICE_ONLY} (no external-storage involvement). Mirrors the
* {@code externalStorageReadMode} field staged on {@code StoreProperties} (StoreMetaValue v44) by PR #2814.
*
* <p>Field plumbing only at the OSS Venice layer; routing semantics live in proprietary client code.
*/
ExternalStorageReadMode getExternalStorageReadMode();

void setExternalStorageReadMode(ExternalStorageReadMode externalStorageReadMode);
Comment on lines +119 to +121
* Concrete subclass of {@link AbstractStoreMetadata} used purely to drive the listener-plumbing tests. All
* methods unrelated to version-switch notification are stubbed to throw, which guarantees that any future
* accidental dependency on them in a listener test would fail loudly rather than silently.
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 6895d34 to 6898af6 Compare May 22, 2026 16:15
Copilot AI review requested due to automatic review settings May 22, 2026 16:31
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 6898af6 to f7f8153 Compare May 22, 2026 16:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment on lines +201 to +210
/**
* Store-level read routing for clients backed by both Venice local storage and a configured external storage
* system. Defaults to {@link ExternalStorageReadMode#VENICE_ONLY} (no external-storage involvement). Mirrors the
* {@code externalStorageReadMode} field staged on {@code StoreProperties} (StoreMetaValue v44) by PR #2814.
*
* <p>Field plumbing only at the OSS Venice layer; routing semantics live in proprietary client code.
*/
ExternalStorageReadMode getExternalStorageReadMode();

void setExternalStorageReadMode(ExternalStorageReadMode externalStorageReadMode);
Comment on lines +758 to +776
/*
* Store-level externalStorageReadMode is staged on StoreMetaValue v44 (PR #2814) but the build.gradle pin is still
* v43, so the generated StoreProperties Avro record does not yet carry the field. Held as a transient Java member
* here until the pinning bump lands; at that point this accessor should read/write from
* {@code this.storeProperties.externalStorageReadMode} instead.
* Field plumbing only at the OSS Venice layer; routing semantics live in proprietary client code.
*/
private ExternalStorageReadMode externalStorageReadMode = ExternalStorageReadMode.VENICE_ONLY;

@Override
public ExternalStorageReadMode getExternalStorageReadMode() {
return externalStorageReadMode;
}

@Override
public void setExternalStorageReadMode(ExternalStorageReadMode externalStorageReadMode) {
this.externalStorageReadMode =
externalStorageReadMode == null ? ExternalStorageReadMode.VENICE_ONLY : externalStorageReadMode;
}
Comment on lines +503 to +519
/*
* Per-version storageMode is staged on StoreMetaValue v44 (PR #2814) but build.gradle still pins to v43, so the
* Avro-generated StoreVersion record does not yet carry the field. Held as a transient Java member here until the
* pin advances; at that point this accessor should read/write from {@code this.storeVersion.storageMode} instead.
* Field plumbing only at the OSS Venice layer; routing semantics live in proprietary client code.
*/
private StorageMode storageMode = StorageMode.INTERNAL;

@Override
public StorageMode getStorageMode() {
return storageMode;
}

@Override
public void setStorageMode(StorageMode storageMode) {
this.storageMode = storageMode == null ? StorageMode.INTERNAL : storageMode;
}
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch 2 times, most recently from 832ab2c to 0e68859 Compare May 22, 2026 17:46
Copilot AI review requested due to automatic review settings May 22, 2026 19:00
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 0e68859 to cc0ec57 Compare May 22, 2026 19:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.

{"name": "persistenceType", "type": "int", "default": 2, "doc": "Type of persistence storage engine, and default is 'ROCKS_DB'"},
{"name": "routingStrategy", "type": "int", "default": 0, "doc": "How to route the key to partition, and default is 'CONSISTENT_HASH'"},
{"name": "readStrategy", "type": "int", "default": 0, "doc": "How to read data from multiple replications, and default is 'ANY_OF_ONLINE'"},
{"name": "offlinePushStrategy", "type": "int", "default": 1, "doc": "When doing off-line push, how to decide the data is ready to serve, and default is 'WAIT_N_MINUS_ONE_REPLCIA_PER_PARTITION'"},
},
{"name": "accessControlled", "type": "boolean", "default": true, "doc": "Store-level ACL switch. When disabled, Venice Router should accept every request."},
{"name": "compressionStrategy", "type": "int", "default": 0, "doc": "Strategy used to compress/decompress Record's value, and default is 'NO_OP'"},
{"name": "clientDecompressionEnabled", "type": "boolean", "default": true, "doc": "le/Disable client-side record decompression (default: true)"},
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from cc0ec57 to bfa41fd Compare May 22, 2026 20:03
Copilot AI review requested due to automatic review settings May 22, 2026 20:52
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from bfa41fd to f5a3f5c Compare May 22, 2026 20:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

/**
* @param previousVersion the version that was current before this refresh, or {@code -1} for the first refresh
* after client start
* @param newVersion the new current version observed by the metadata refresh; always {@code > 0}
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from f5a3f5c to 5e5739a Compare May 22, 2026 21:03
Copilot AI review requested due to automatic review settings May 22, 2026 21:23
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 5e5739a to 6825171 Compare May 22, 2026 21:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 2 comments.

currentVersion.set(fetchedCurrentVersion);
clusterStats.updateCurrentVersion(fetchedCurrentVersion);
fetchedCurrentVersionPartitionResourceInCompletionRetries.set(0);
fireVersionSwitch(previousVersion, fetchedCurrentVersion);
Comment on lines +4 to +6
* Callback notified whenever the Fast Client observes a change in a store's current serving version. Fires from
* within the metadata refresh loop after the new version has been committed to the local cache, so listeners may
* safely call {@link StoreMetadata#getCurrentStoreVersion()} and observe the new value.
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 6825171 to 8d85170 Compare May 22, 2026 22:07
Copilot AI review requested due to automatic review settings May 22, 2026 22:17
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 8d85170 to f793aa1 Compare May 22, 2026 22:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated no new comments.

@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from f793aa1 to 920b311 Compare May 22, 2026 22:26
Copilot AI review requested due to automatic review settings May 22, 2026 22:30
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 920b311 to e20a1a9 Compare May 22, 2026 22:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 1 comment.

Comment on lines 504 to +507
currentVersion.set(fetchedCurrentVersion);
clusterStats.updateCurrentVersion(fetchedCurrentVersion);
fetchedCurrentVersionPartitionResourceInCompletionRetries.set(0);
fireVersionSwitch(previousVersion, fetchedCurrentVersion);
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from e20a1a9 to 3a7bfba Compare May 22, 2026 22:49
Copilot AI review requested due to automatic review settings May 22, 2026 23:01
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 3a7bfba to 684f621 Compare May 22, 2026 23:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

Comment on lines +4 to +8
* Callback notified whenever the Fast Client observes a change in a store's current serving version. Fires from
* within the metadata refresh loop after the new version has been committed to the local cache, so listeners may
* safely call {@link StoreMetadata#getCurrentStoreVersion()} and observe the new value.
*
* <p>Implementations must be thread-safe and short-running; the metadata refresh thread is shared across all reads
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 684f621 to 318fff9 Compare May 23, 2026 00:38
Copilot AI review requested due to automatic review settings May 23, 2026 00:59
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 318fff9 to 493947b Compare May 23, 2026 00:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

/**
* Store-level read routing for clients backed by both Venice local storage and a configured external storage
* system. Defaults to {@link ExternalStorageReadMode#VENICE_ONLY} (no external-storage involvement). Mirrors the
* {@code externalStorageReadMode} field staged on {@code StoreProperties} (StoreMetaValue v44) by PR #2814.
Comment on lines +334 to +335
* Defaults to {@link StorageMode#INTERNAL}. Mirrors the {@code storageMode} field staged on {@code StoreVersion}
* (StoreMetaValue v44) by PR #2814.
/**
* Store-level read routing for clients backed by both Venice local storage and a configured external storage system.
* Applies to the store as a whole, not per-version. Mirrors the {@code externalStorageReadMode} field staged on
* {@code StoreProperties} (StoreMetaValue v44) by PR #2814.
Comment on lines +10 to +11
* external storage system. Mirrors the {@code storageMode} field staged on {@code StoreVersion} (StoreMetaValue v44)
* by PR #2814.
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from 493947b to e8d4541 Compare May 23, 2026 22:49
ymuppala and others added 3 commits May 23, 2026 15:54
…dminOperation v99 + Java accessors

Predecessor PR (the schema PR in this stack) stages StoreMetaValue v44 and
AdminOperation v99 with these external-storage-related fields:

  * StoreVersion: storageMode, externalDbName, externalTableName (per-version,
    all newly-added in v44).
  * StoreProperties: externalStorageReadMode (store-level, per-store).
  * UpdateStore (admin op): storageMode, externalStorageReadMode,
    externalDbName, externalTableName (mirrors for operator-override).

build.gradle's `versionOverrides` list still forces the Avro compiler to use
v43 / v98 instead of letting it pick the highest-numeric directory, so v44 /
v99 are not yet wire-real. Downstream consumers (Fast Client metadata
refresh, controller, server, VPJ producers) need typed Java accessors to
start integrating against.

Drop the `versionOverrides` entries so the Avro compiler picks the latest
schema directories (v44 / v99) naturally — that's exactly the
remove-the-override step the comment above `versionOverrides` describes
("remove the override in a follow-up PR when actually using the new
protocol"). Bump `AvroProtocolDefinition` constants in lockstep so the
in-process protocol version matches the compiled schema. Add Java accessors
that read/write directly from the now-Avro-backed `StoreProperties` /
`StoreVersion` records (no transient POJO mirroring), so updates round-trip
through ZK / metadata-system-store serialization.

- `build.gradle`: remove the `versionOverrides` entries for StoreMetaValue
  and AdminOperation. The Avro generator now selects v44 / v99 by the
  default max-numeric rule.
- `AvroProtocolDefinition`: bump constants
  (`METADATA_SYSTEM_SCHEMA_STORE` 43 -> 44; `ADMIN_OPERATION` 98 -> 99)
  so the in-process protocol version matches the compiled schema.

- New enums (`VeniceEnumValue`, `EnumUtils`-backed `valueOf`):
  * `StorageMode { INTERNAL, DUAL_WRITE, EXTERNAL }` — mirrors
    `StoreVersion.storageMode`.
  * `ExternalStorageReadMode { VENICE_ONLY, DUAL_MODE_CONSISTENCY_CHECK,
    DUAL_MODE_EARLY_RETURN, EXTERNAL_ONLY }` — mirrors
    `StoreProperties.externalStorageReadMode`.

- Store-level Java accessor backed by `StoreProperties` Avro record:
  * `Store.get/setExternalStorageReadMode` → `storeProperties.externalStorageReadMode`.

  Implemented on `ZKStore` (int <-> enum). `ReadOnlyStore` delegates the
  getter and throws UOE on the setter; `SystemStore` does the same via its
  existing `throwUnsupportedOperationException` helper. `StoreInfo.fromStore`
  mirrors the field for JSON.

- Version-level Java accessors backed by `StoreVersion` Avro record:
  * `Version.get/setStorageMode` → `storeVersion.storageMode` (int <-> enum).
  * `Version.get/setExternalDbName` → `storeVersion.externalDbName`
    (null-coerces to "NOT_SPECIFIED", the schema default).
  * `Version.get/setExternalTableName` → `storeVersion.externalTableName`
    (null-coerces to "NOT_SPECIFIED", the schema default).

  Implemented on `VersionImpl`; `cloneVersion` propagates all three through
  the Avro-record copy. `ReadOnlyVersion` delegates getters and throws UOE
  on setters.

- `AdminMessageType.UPDATE_STORE.getNewInstance()` initializes the two new
  `UpdateStore` string fields (externalDbName, externalTableName) to "" at
  construction so the Avro generic writer does not NPE on null values. The
  Avro schema default for these fields on UpdateStore is "" (the "no-op for
  this admin op" sentinel), which is only applied at deserialization time;
  explicit initialization is required so existing controller code paths
  that call `getNewInstance()` and then conditionally set fields can
  serialize the result without populating every new field.

- [ ] Added new code behind a config. (no.)
- [ ] Introduced new log lines. (no.)

- [x] No race conditions: new fields are plain Avro-record fields on the
      existing storeProperties / storeVersion containers, accessed under
      the same synchronization invariants as the fields they sit alongside.
- [x] No new synchronization primitives required.
- [x] No blocking calls introduced.
- [x] No new collections; only enum singletons and a primitive-or-enum-or-String
      field per type.
- [x] No new threading.

- [x] New unit tests added: `ExternalStorageReadModeTest`, `StorageModeTest`
      (`VeniceEnumValueTest`-based int-mapping coverage); new test methods
      in `TestStoreInfo`, `TestZKStore`, `TestVersion` for defaults /
      round-trip / clone preservation / null coercion for storageMode,
      externalStorageReadMode, externalDbName, externalTableName; plus
      "persists through Avro data model" tests on `TestZKStore` and
      `TestVersion` that read the value back through `dataModel()` to
      guard against a regression to transient POJO mirroring.
- [x] Modified or extended existing tests: yes (`TestStoreInfo` /
      `TestZKStore` / `TestVersion`).
- [x] Verified the controller-side createStore admin-op flow no longer
      NPEs on the new UpdateStore fields (TestVeniceParentHelixAdmin
      passes locally; previously failed with NPE on
      `externalDbName.toString()` during AdminOperationSerializer write).
- [x] Verified backward compatibility: the v44 schema is forward-compatible
      with v43 because the new fields have defaults; downgrading a process
      that writes v44 records to one that reads v43 records works as long
      as both sides treat unknown fields as ignorable, which Avro does for
      record fields with defaults.

Activating the v44 / v99 schemas changes what the controller writes to ZK
and what controllers exchange via the admin topic. All Venice services
must be redeployed at this version before any caller starts setting
`storageMode` / `externalStorageReadMode` / `externalDbName` /
`externalTableName` to non-default values, otherwise a v43-only service
performing a read-modify-write could silently drop the new fields.

- [x] No (defaults preserve existing behavior; new fields are inert until
      callers opt in).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Problem Statement
Fast Client consumers that wrap the Venice client (e.g. shadow-clients
maintaining a parallel cache, version-aware routing layers) currently have
no way to be notified when the metadata refresh observes a new current
serving version. They have to poll `getCurrentStoreVersion()` from outside
the refresh loop, which is racy with respect to the rest of the cache
(replicas, partition count, schemas) being updated atomically inside
`RequestBasedMetadata.updateCache`.

## Solution
Add a registration-style callback that fires from inside the refresh loop
after the new current version has been committed to the local cache, so
listeners may safely read the rest of the cache and observe the new value.

- New `StoreVersionSwitchListener` functional interface:
  `onVersionSwitch(int previousVersion, int newVersion)`. First refresh
  after client start delivers `previousVersion = -1` as a sentinel.

- `StoreMetadata` gains `register/unregisterVersionSwitchListener` with
  `default {}` no-op bodies so existing test fakes / mocks aren't forced
  to implement them.

- `AbstractStoreMetadata` owns the listener list (`CopyOnWriteArrayList`
  for rare-writer / frequent-reader access) and exposes a protected
  `fireVersionSwitch(prev, next)` that catches per-listener `Throwable`
  and logs at ERROR, so one buggy listener cannot break the refresh
  thread or starve sibling listeners.

- `RequestBasedMetadata.updateCache` captures `currentVersion.get()`
  immediately before `.set(fetchedCurrentVersion)`, then calls
  `fireVersionSwitch(previousVersion, fetchedCurrentVersion)` after the
  stat update. The fire happens only on the branch that actually accepts
  a new current version (i.e. when `whetherToSwitchToFetchedCurrentVersion`
  is true), so transitions that get deferred for partition-resource
  readiness do not generate spurious callbacks.

### Code changes
- [ ] Added new code behind a config. (no)
- [ ] Introduced new log lines. (one ERROR-level log on listener exception in
      `AbstractStoreMetadata.fireVersionSwitch`. Bounded by listener count, fires
      only on bad listener — no rate limiting needed.)

### Concurrency-Specific Checks
- [x] No race conditions: listeners are stored in a `CopyOnWriteArrayList`; the
      iterator in `fireVersionSwitch` is snapshot-stable across concurrent
      register/unregister calls.
- [x] Synchronization: registration is intrinsically thread-safe via
      `CopyOnWriteArrayList.addIfAbsent` / `remove`; fire walks the snapshot
      iterator without holding any lock.
- [x] No blocking calls inside critical sections — `fireVersionSwitch` is
      called from `RequestBasedMetadata.updateCache` which is `synchronized`,
      but the listener iteration itself acquires no further locks. Listener
      contracts (Javadoc) require short, non-blocking work; exceptions are
      isolated per listener.
- [x] Thread-safe collection used (`CopyOnWriteArrayList`).
- [x] Validated proper exception handling — per-listener `try/catch (Throwable)`
      ensures one bad listener cannot kill the refresh thread.

## How was this PR tested?
- [x] New unit tests added: `AbstractStoreMetadataTest` (8 cases: register+fire,
      no-listeners-safe, multi-listener fan-out, exception isolation, unregister,
      dedup on double-register, null-reject on register, null/unknown tolerance
      on unregister).
- [x] Modified or extended existing tests: yes — `RequestBasedMetadataTest`
      gains two wiring tests (listener fires on initial `start()` with sentinel
      `-1`; no fire when a subsequent `updateCache` returns the same version).
- [x] Verified backward compatibility: yes — the two new `StoreMetadata` methods
      are `default` no-ops, so existing implementations of `StoreMetadata`
      compile and run unchanged.

## Does this PR introduce any user-facing or breaking changes?
- [x] No.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 24, 2026 03:25
@ymuppala ymuppala force-pushed the ymuppala/phase-1-1-version-switch-callback branch from e8d4541 to 17f1a2e Compare May 24, 2026 03:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Comment on lines +64 to +72
* <p>Default implementations of {@link StoreMetadata} (e.g. test fakes) treat this as a no-op.
*/
default void registerVersionSwitchListener(StoreVersionSwitchListener listener) {
}

/**
* Remove a previously registered version-switch listener. No-op if the listener was never registered.
*/
default void unregisterVersionSwitchListener(StoreVersionSwitchListener listener) {
Comment on lines +119 to +121
* Concrete subclass of {@link AbstractStoreMetadata} used purely to drive the listener-plumbing tests. All
* methods unrelated to version-switch notification are stubbed to throw, which guarantees that any future
* accidental dependency on them in a listener test would fail loudly rather than silently.
/**
* Store-level read routing for clients backed by both Venice local storage and a configured external storage
* system. Defaults to {@link ExternalStorageReadMode#VENICE_ONLY} (no external-storage involvement). Mirrors the
* {@code externalStorageReadMode} field staged on {@code StoreProperties} (StoreMetaValue v44) by PR #2814.
Comment on lines +334 to +335
* Defaults to {@link StorageMode#INTERNAL}. Mirrors the {@code storageMode} field staged on {@code StoreVersion}
* (StoreMetaValue v44) by PR #2814.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants