docs: add ADR 0009 for static-context cache-first local persistence#75
docs: add ADR 0009 for static-context cache-first local persistence#75jonathannorris wants to merge 39 commits intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces ADR 0009, which proposes that OFREP static-context providers persist their last successful bulk evaluation in local storage by default to enable cache-first initialization. This change aims to eliminate the 'flash-of-defaults' problem and improve offline resilience for web and mobile applications. The review feedback suggests including a version field in the storage schema for better maintainability, recommending the use of platform-specific secure storage to mitigate security risks, and establishing a default TTL for persisted entries to prevent the use of excessively stale data.
There was a problem hiding this comment.
Pull request overview
Adds a new Architecture Decision Record (ADR-0009) describing cache-first startup with local persistence for OFREP static-context providers, to preserve last-known flag evaluations across restarts/offline startup and align provider lifecycle events with OpenFeature expectations.
Changes:
- Introduces ADR 0009 defining persisted cache contents (payload, ETag, cache-key hash, write time).
- Documents cache-hit vs cache-miss initialization flows, including background refresh and provider lifecycle/event mapping.
- Adds implementation notes and open questions (multi-context caching, TTL).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Jonathan Norris <jonathannorris@users.noreply.github.com> Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Co-authored-by: Jonathan Norris <jonathannorris@users.noreply.github.com> Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Clarify ADR 0009 with provider behavior, persistence examples, and implementation guidance for local cached bulk evaluations. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Clarify initialization flow, explain the persisted timestamp, and define temporary server failures as eligible for persisted fallback. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Specify the cacheKeyHash formula and restore explicit open questions for reviewer feedback. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Document an explicit provider option for turning off persisted local storage. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Drop authToken from cache key derivation and replace sha256 with generic hash(), per reviewer feedback. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…tion Specify CACHED as the evaluation reason when serving from persisted storage. Remove fallback scope open question since the decision section already addresses it. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Use must not for auth/config error fallback to prevent masking real problems. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Local storage availability is a platform constraint, not a consequence of the proposal. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Specify that flag values are stored in plaintext and accessible to same-origin code or compromised devices. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
The specific storage key and record model are implementation details. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Remove redundant implementation notes that overlap with the decision section. Simplify mermaid diagram initialize call to use context. Signed-off-by: Norris <jonathan.norris@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Replace fallback-on-failure with cache-first initialization pattern aligned with vendor SDKs (LaunchDarkly, Statsig, DevCycle, Eppo). Provider loads from persisted cache immediately on startup, refreshes from network in background, and emits PROVIDER_CONFIGURATION_CHANGED when fresh values arrive. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Fix PROVIDER_FATAL to PROVIDER_ERROR with fatal error code per spec. Add rationale for READY vs STALE on cache-hit startup. Clarify cache key tradeoff (targetingKey vs full context). Note existing provider implementations will need lifecycle refactors. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
On the cache-hit path, if the background refresh fails with 401/403/400, the provider continues serving cached values for the current session but clears the persisted entry. This ensures the next cold start uses the cache-miss path, making auth errors immediately visible. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…rt behavior Providers should cancel in-flight background refresh when onContextChanged() is called. Document that cache-first only applies after the first successful evaluation is persisted. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…e URL" This reverts commit b69eafd. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
… for prefixing the cache key (#74) Signed-off-by: Jason Salaber <jason.salaber@dynatrace.com> Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Don't clear the persisted cache on 401/403/400 errors. The cache TTL is responsible for eventual expiry. This avoids degrading subsequent cold starts to defaults while auth errors are investigated. Aligns with vendor SDK behavior (DevCycle keeps cache through auth errors with a 30-day TTL). Moved TTL from open question to implementation recommendation. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Move cacheKeyPrefix from open question to decision. Providers support an optional cacheKeyPrefix config option; when set, the cache key becomes hash(cacheKeyPrefix + targetingKey). Standardize terminology across the ADR. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Add version field to persisted entry example JSON for schema versioning. Rename ADR file from camelCase to kebab-case to match convention. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
8b7ee7b to
adac228
Compare
Use hash(cacheKeyPrefix + ":" + targetingKey) to avoid ambiguous concatenation where different prefix/key pairs produce the same hash. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
If persisting fails, continue with fresh in-memory values. The old persisted entry remains for the next cold start. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
toddbaert
left a comment
There was a problem hiding this comment.
I approved #64 previously. Reviewed the diff between the two; the key changes I see:
cacheKeyPrefixis now folded into the hash itself 👍- Auth/config errors (401/403/400) no longer clear the persisted cache; TTL handles eventual expiry instead
- TTL is promoted from an open question to a recommendation
- storage write failure stuff (log and continue)
All of these seem like improvements to me. LGTM.
The flickering concern is still a notable tradeoff. cc @beeme1mr
beeme1mr
left a comment
There was a problem hiding this comment.
Overall, I like the proposal. I just kept thinking about how I'd prefer it if the Provider evaluated the flags before falling back to the cached values. It would be nice if that were at least a configuration option. The reason I think this may be important is that some apps may not respond to flag changes and therefore continue using the old values for the duration of the session. Even ones that do react run the risk of the screen jitting when flag values change shortly after page load.
Replace the disableLocalCache boolean with a three-value cacheMode enum: cache-first (default), network-first, and disabled. network-first awaits the initial network request and falls back to cached values only on network errors or 5xx responses, which fits SPA use cases that block rendering on flag evaluation. Document timeoutMs tuning guidance for network-first, add alternatives considered, and update initialization flow and mermaid diagram notes. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
|
Pushed some updates after a discussion with @beeme1mr on the network-first use case. Summary of changes For mobile and most web apps, local-cache-first is clearly the right default (eliminates the flash-of-defaults problem, matches vendor SDK behavior). But for some web use cases, particularly SPAs that already block rendering behind a splash screen, it makes more sense to block initialization on a fresh evaluation rather than flash cached values that might shift under the user. To support both patterns, the ADR now defines a single
The earlier Other tweaks
|
…ehavior Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…llback Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…agram Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…iagrams Replace the single nested-alt diagram with three focused ones: cache hit with background refresh, cache miss, and the normal polling cycle. Addresses review feedback that nested alts were hard to read. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…TTL invariant Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
…g errors Per review feedback, the cache-hit path should both log the error and emit a PROVIDER_ERROR event when a background refresh fails with auth or config errors. Updates the decision text, cache matching section, and the sequence diagram. Signed-off-by: Jonathan Norris <jonathan.norris@dynatrace.com>
Summary
PROVIDER_READY,PROVIDER_CONFIGURATION_CHANGED)cacheModeoption withlocal-cache-first(default),network-first, anddisabledvalues so applications can opt into blocking-on-fresh-evaluation when appropriate (e.g., SPAs)hash(targetingKey), orhash(cacheKeyPrefix + ":" + targetingKey)when acacheKeyPrefixis configured for multi-provider deploymentsMotivation
Every major vendor SDK (LaunchDarkly, Statsig, DevCycle, Eppo) uses local-cache-first initialization by default. Current OFREP static-context providers keep their cache in memory only, losing all state on restart. See vendor mobile SDK caching research for a detailed comparison.
Some applications (most notably SPAs that already block rendering behind a splash screen) prefer to wait for a fresh evaluation rather than risk cached values shifting under the user. The
cacheMode: "network-first"option supports that pattern while still using cached values as a fallback when the network is unavailable.Notes
This PR replaces #64 which was closed due to a branch rename. All review feedback from #64 has been addressed.
Related
Test plan