Skip to content

perf(ebean-dao): move SharedSchemaCache pre-warm to constructor#618

Merged
ybz1013 merged 1 commit into
linkedin:masterfrom
mridul111998:mrsharma/slo-fix
May 6, 2026
Merged

perf(ebean-dao): move SharedSchemaCache pre-warm to constructor#618
ybz1013 merged 1 commit into
linkedin:masterfrom
mridul111998:mrsharma/slo-fix

Conversation

@mridul111998
Copy link
Copy Markdown
Contributor

@mridul111998 mridul111998 commented May 6, 2026

Summary

PR #615 introduced SharedSchemaCache and added pre-warm calls inside
EbeanLocalAccess.ensureSchemaUpToDate(). In deploys where schema
migrations are run as a separate job (LinkedIn's prod pattern, and
likely others), ensureSchemaUpToDate() is gated off — so pre-warm
never runs and the first request after JVM startup pays the inline
information_schema query cost.

In production we observed:

  • p99 latency spikes of 700–950 ms on getWithContext correlated with
    Inline schema cache miss: loading columns for table 'metadata_entity_<entity>'
    log lines firing inside the request path.
  • The background refresh thread was started but registeredTables
    was empty, so the 9-minute refresh logged
    Background schema cache refresh: 0 tables and accomplished nothing.

This PR moves validator.registerAndPreWarm(...) from
ensureSchemaUpToDate() to the EbeanLocalAccess constructor, where
it always runs. EbeanLocalAccess is only instantiated for
NEW_SCHEMA_ONLY and DUAL_SCHEMA DAOs (the ones that actually use
SharedSchemaCache), so the change is naturally scoped.
refreshTable() already has its own try/catch — a transient DB
hiccup during construction logs a warning rather than failing boot,
so this is safe to run unconditionally at construction time.

Also drops the unused tableColumns field (dead code, replaced by
the shared cache in #615).

Two supporting changes

  1. SharedSchemaCache.clearRegistry() now also calls clearCaches()
    on each held instance
    before wiping the REGISTRY map.
    Previously, clearing only the static map left stale entries
    readable through any reference (e.g. EbeanLocalAccess.validator)
    that callers were already holding — a footgun the
    @VisibleForTesting name didn't suggest.

  2. EbeanLocalDAOTest.addIndex() helper now calls
    clearRegistry() after its dynamic ALTER TABLE ADD COLUMN.

    With pre-warm running in the constructor, the cache snapshots the
    schema before addIndex modifies it; the explicit invalidation
    keeps subsequent columnExists()/indexExists() lookups in sync
    with the actual table state. (The previous behavior worked by
    accident, because lazy initialization populated the cache only on
    first read — which happened to be after addIndex.)

Testing Done

Tested on metadata-graph-assets (LinkedIn's MGA service) in
corp-lva1.

Setup

  1. Built this branch as datahub-gma:0.6.185 and published to local
    maven:
    ./gradlew -Dmaven.repo.local=$HOME/local-repo -Dorg.gradle.parallel=false \
      -Pversion=0.6.185 publishToMavenLocal -x test
    
  2. Bumped metadata-models's product-spec.json to consume
    datahub-gma:0.6.185 and ran mint build && mint snapshot && mint release,
    producing metadata-models:277.0.15-SNAPSHOT.
  3. Bumped MGA's product-spec.json to consume
    metadata-models:277.0.15-SNAPSHOT and deployed to a local QEI
    instance via mint debug.

Result with the fix (snapshot deploy)

Background schema cache refresh: <N> tables fires for every
SharedSchemaCache singleton (one per EbeanServer/dbUrl). Each
non-zero count is direct evidence that registerAndPreWarm()
populated registeredTables at construction time:

2026/05/05 21:19:14.570 INFO [SharedSchemaCache] Background schema cache refresh: 14 tables
2026/05/05 21:19:27.472 INFO [SharedSchemaCache] Background schema cache refresh: 10 tables
2026/05/05 21:19:45.273 INFO [SharedSchemaCache] Background schema cache refresh: 14 tables
2026/05/05 21:19:45.273 INFO [SharedSchemaCache] Background schema cache refresh: 10 tables
2026/05/05 21:19:58.078 INFO [SharedSchemaCache] Background schema cache refresh: 52 tables
2026/05/05 21:20:29.713 INFO [SharedSchemaCache] Background schema cache refresh: 2 tables

Total: 78 entity tables registered across 4 databases (14 + 10 + 52 + 2).

After redeploy with the fix, Inline schema cache miss no longer
appears post-boot
for the request path.

Result on master (without the fix)

After re-deploying master MGA (using metadata-models:277.0.12
datahub-gma:0.6.183, which has SharedSchemaCache but pre-warm
gated behind ensureSchemaUpToDate()):

$ grep 'Background schema cache refresh' metadata-graph-assets.log
$ # no output — registeredTables is empty

The A/B difference confirms the constructor pre-warm is the change
that actually populates the cache in production deploys where
ensureSchemaUpToDate() is disabled.

Unit test verification

Locally ran the previously-failing EbeanLocalDAOTest cases that
exposed the pre-warm interaction with dynamic ALTER TABLE: all pass
with the test helper change.

Checklist

cc @yanbZhao (original author of #615)

PR linkedin#615 introduced SharedSchemaCache with pre-warm calls inside
EbeanLocalAccess.ensureSchemaUpToDate(). In deploys where schema
migrations run as a separate job (LinkedIn's prod pattern, and likely
others), ensureSchemaUpToDate() is gated off, so pre-warm never runs
and the first request after JVM startup pays the inline
information_schema query cost (the "Inline schema cache miss" log
line).

Moves the pre-warm calls to the EbeanLocalAccess constructor so they
always run, regardless of whether schema evolution is enabled.
refreshTable() already has its own try/catch, so a transient DB
hiccup during construction logs a warning rather than failing boot.

Also drops the unused tableColumns field (dead code, replaced by the
shared cache in linkedin#615).

Two supporting changes:

1. SharedSchemaCache.clearRegistry() now also calls clearCaches() on
   each held instance before wiping the REGISTRY map. Previously
   clearing only the static map left stale entries readable through
   any reference (e.g. EbeanLocalAccess.validator) that callers were
   already holding -- a footgun that the @VisibleForTesting name did
   not suggest.

2. EbeanLocalDAOTest.addIndex() helper now calls clearRegistry()
   after its dynamic ALTER TABLE ADD COLUMN. With pre-warm now
   running in the constructor, the cache snapshots the schema before
   addIndex modifies it; the explicit invalidation keeps subsequent
   columnExists()/indexExists() lookups in sync with the actual
   table state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ybz1013 ybz1013 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

@ybz1013 ybz1013 merged commit 4a190b9 into linkedin:master May 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants