Skip to content

feat: disk-based-persistence [android]#17

Merged
choudlet merged 22 commits into
mainfrom
chrish/sc-36793/story-disk-backed-event-persistence
Apr 7, 2026
Merged

feat: disk-based-persistence [android]#17
choudlet merged 22 commits into
mainfrom
chrish/sc-36793/story-disk-backed-event-persistence

Conversation

@choudlet
Copy link
Copy Markdown
Collaborator

Summary

Implements disk-backed event persistence for the Android SDK. Events are kept in memory as the primary queue; disk acts as a safety net for app backgrounding, termination, and buffer overflow.

  • PersistableEventQueue wraps EventQueue (inheritance) adding disk persistence with write-behind model
  • EventDiskStore handles atomic file I/O with versioned JSON snapshots to noBackupFilesDir
  • QueueSnapshot is a @Serializable envelope with version + events fields
  • Resilient per-event deserialization: individual corrupt events are skipped, valid events preserved
  • Dual capacity cap: 2000 events OR 5MB (whichever hit first), drop-oldest overflow
  • Flush threshold: 500 events OR 2MB triggers disk write
  • Rehydration: once per process via AtomicBoolean gate, disk file deleted after load
  • Flush triggers: app background (ProcessLifecycleOwner ON_STOP), best-effort terminate (onTrimMemory), threshold, explicit
  • Encryption at rest: deferred to v2 (documented conscious choice)

Ticket

https://app.shortcut.com/metarouter/story/36793

New Files

  • storage/QueueSnapshot.kt — Serializable snapshot envelope
  • storage/EventDiskStore.kt — File I/O for queue snapshots
  • queue/PersistableEventQueue.kt — Persistence-aware queue

Modified Files

  • queue/EventQueue.kt — Made class and methods open for subclassing
  • MetaRouterAnalyticsClient.kt — Wired PersistableEventQueue with lifecycle hooks

Cross-Platform Contract

Aligns with iOS and React Native implementations per the shared contract. Addresses the Android-specific gap: resilient per-event deserialization using manual JsonArray iteration instead of atomic decodeFromString.

Test plan

  • All 792 existing + new tests pass (0 failures)
  • QueueSnapshotTest — serialization roundtrip, version handling (5 tests)
  • EventDiskStoreTest — read/write/delete, corrupt file handling, per-event resilience (11 tests)
  • PersistableEventQueueTest — enqueue, drain, flush, rehydrate, capacity, threading, e2e lifecycle (29 tests)
  • PersistableEventQueueBenchmarkTest — 10K enqueue + flushToDisk performance (2 tests)
  • All existing EventQueueTest tests pass unchanged (regression check)
  • All existing MetaRouterAnalyticsClientTest tests pass unchanged (wiring regression check)

🤖 Generated with Claude Code

choudlet and others added 7 commits March 26, 2026 15:07
Prepare EventQueue for PersistableEventQueue subclass by making the class,
its backing ArrayDeque, and all public methods open.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Versioned JSON envelope wrapping List<EnrichedEventPayload> for disk
queue snapshots. Version field enables forward-compatible schema migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads/writes versioned JSON queue snapshots to noBackupFilesDir.
Atomic write-then-rename prevents partial writes. Unrecognized schema
versions and corrupted files are deleted with warning logs.

Implements resilient per-event deserialization: individual events that
fail to deserialize are skipped while valid events are preserved. This
addresses the cross-platform contract requirement for graceful handling
of partially corrupt snapshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write-behind queue that keeps all enqueue/drain operations in memory.
Disk flushes are full-overwrite snapshots triggered by lifecycle events
or flush threshold. Rehydrates once per process. Enforces combined
2000-event/5MB capacity cap with drop-oldest overflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SDK now creates PersistableEventQueue by default, rehydrates from disk
on first init, flushes to disk on background and app terminate (best-effort
via onTrimMemory), checks flush threshold after each enqueue.
EventDiskStore is injectable for testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers full session lifecycle, drain-before-flush correctness,
no-disk-touch in normal sessions, multi-cycle persistence, and
flush-overwrite deduplication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validates that enqueue remains memory-only and performant with 10,000
1KB events. Also benchmarks flushToDisk for 2,000 events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 26, 2026

Test Results

 56 files  +  8   56 suites  +8   55s ⏱️ +4s
412 tests + 60  412 ✅ + 60  0 💤 ±0  0 ❌ ±0 
824 runs  +120  824 ✅ +120  0 💤 ±0  0 ❌ ±0 

Results for commit c77cb6a. ± Comparison against base commit f0b05a4.

♻️ This comment has been updated with latest results.

choudlet and others added 15 commits April 5, 2026 17:16
iOS parity: empty queues should not write to disk. Deletes any existing
snapshot to avoid stale data on next rehydration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iOS parity: events older than eventTTLMs (default 7 days) are dropped
during rehydrate() before capacity trimming. Fail-open on unparseable
timestamps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hreshold

iOS parity: events now route through Dispatcher.offer() which checks
autoFlushThreshold (20 events) and triggers immediate flush. Previously
events were enqueued directly to the queue, bypassing the threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents immediate retry loop when both circuit breaker delay and
Retry-After header are 0 on first failure. Aligns with iOS SDK behavior
which uses max(100, delay).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@choudlet choudlet merged commit 9877ef0 into main Apr 7, 2026
3 checks passed
choudlet added a commit that referenced this pull request Apr 10, 2026
* refactor: make EventQueue open for subclassing

Prepare EventQueue for PersistableEventQueue subclass by making the class,
its backing ArrayDeque, and all public methods open.

* feat: add QueueSnapshot serializable envelope for disk persistence

Versioned JSON envelope wrapping List<EnrichedEventPayload> for disk
queue snapshots. Version field enables forward-compatible schema migration.


* feat: add EventDiskStore for queue snapshot file I/O

Reads/writes versioned JSON queue snapshots to noBackupFilesDir.
Atomic write-then-rename prevents partial writes. Unrecognized schema
versions and corrupted files are deleted with warning logs.

Implements resilient per-event deserialization: individual events that
fail to deserialize are skipped while valid events are preserved. This
addresses the cross-platform contract requirement for graceful handling
of partially corrupt snapshots.

* feat: add PersistableEventQueue with disk-backed persistence

Write-behind queue that keeps all enqueue/drain operations in memory.
Disk flushes are full-overwrite snapshots triggered by lifecycle events
or flush threshold. Rehydrates once per process. Enforces combined
2000-event/5MB capacity cap with drop-oldest overflow.


* feat: wire PersistableEventQueue into MetaRouterAnalyticsClient

SDK now creates PersistableEventQueue by default, rehydrates from disk
on first init, flushes to disk on background and app terminate (best-effort
via onTrimMemory), checks flush threshold after each enqueue.
EventDiskStore is injectable for testing.


* test: add end-to-end persistence lifecycle tests

Covers full session lifecycle, drain-before-flush correctness,
no-disk-touch in normal sessions, multi-cycle persistence, and
flush-overwrite deduplication.


* test: add enqueue performance benchmark for PersistableEventQueue

Validates that enqueue remains memory-only and performant with 10,000
1KB events. Also benchmarks flushToDisk for 2,000 events.


* fix: flushToDisk skips write and deletes snapshot when queue is empty

iOS parity: empty queues should not write to disk. Deletes any existing
snapshot to avoid stale data on next rehydration.


* feat: add 7-day event TTL filter during rehydration

iOS parity: events older than eventTTLMs (default 7 days) are dropped
during rehydrate() before capacity trimming. Fail-open on unparseable
timestamps.


* fix: wire event processor through dispatcher.offer() for auto-flush threshold

iOS parity: events now route through Dispatcher.offer() which checks
autoFlushThreshold (20 events) and triggers immediate flush. Previously
events were enqueued directly to the queue, bypassing the threshold.


* fix: improving logging, removing verbosity

* fix: dispatcher bugfix

* docs: updating readme

* fix: add 100ms minimum retry delay floor for 5xx/408 server errors

Prevents immediate retry loop when both circuit breaker delay and
Retry-After header are 0 on first failure. Aligns with iOS SDK behavior
which uses max(100, delay).


* fix: path errors with testing files

* fix: proxy bounding issue

* fix: dispatcher error handling

* fix: matching iOS data contracts

* fix: cleaning up circuit state bug

* fix: fixing error with API version mismatch

* fix: inheritance of EventQueue

* fix: docs and queue byte enforcement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant