Skip to content

feat: offline disk overflow + network-aware flush#22

Merged
choudlet merged 6 commits into
mainfrom
chrish/sc-36909/network-monitoring-dispatcher
Apr 16, 2026
Merged

feat: offline disk overflow + network-aware flush#22
choudlet merged 6 commits into
mainfrom
chrish/sc-36909/network-monitoring-dispatcher

Conversation

@choudlet
Copy link
Copy Markdown
Collaborator

@choudlet choudlet commented Apr 10, 2026

Summary

Adds offline-resilient event delivery to the Android SDK. When the device is offline or the queue is under backpressure, events flush to disk instead of being dropped. On connectivity recovery or after successful online flushes, disk events drain directly to the network.

Core changes

  • Flush-to-disk at capacity: When the memory queue hits maxQueueEvents OR maxCapacityBytes, the entire memory queue flushes to the disk store and memory resets to 0. Happens regardless of online/offline state — events are never dropped.
  • Offline flush transport swap: When the dispatcher's periodic flush triggers while offline, events flush to disk instead of the network. Same triggers, swapped destination via EventQueueInterface.flushToOfflineStorage().
  • requeueToFront preserves events at capacity: On failed-send retry, if adding requeued events would exceed capacity, the current memory queue flushes to disk first so older (requeued) events take priority and newer events follow via drain.
  • Consolidated disk store: Single queue.v1.json file holds all persisted unsent events (capacity overflow, offline flush, background snapshot). Replaces the original two-file design.
  • Overflow drain pipeline: On offline→online transition, after successful online flushes (via onFlushComplete callback), and at app launch if online, disk events drain directly to the network via Dispatcher.sendBatchDirect() — never loaded back into the memory queue.
  • O(n) drain with crash-safety checkpoints: Drain reads the disk file once, deletes it, then iterates in-memory by index. Writes a checkpoint every 10 successful batches so progress survives an app kill. On failure, writes remaining events back to disk.
  • Shared ResponseCategory enum: Single source of truth for HTTP status classification, used by both the main dispatcher and the overflow drain. Ensures consistent handling of 413 (batch halving), 401/403/404 (fatal config), 429 (rate limit), 5xx (retry), 4xx (drop).
  • Drain coordinates shutdown on FATAL_CONFIG: 401/403/404 invokes dispatcher.onFatalConfigError to stop the main pipeline, matching the main flush path.
  • Concurrency safety: isDraining AtomicBoolean prevents duplicate drain coroutines. diskLock mutex synchronizes disk access between the drain pipeline (suspend) and the synchronized queue methods. Drain short-circuits if dispatcher.isNetworkPaused().
  • hasOverflowData volatile flag: Lightweight gate for drain attempts, initialized via EventDiskStore.exists() (no file parse needed).
  • InitOptions.maxOfflineDiskEvents: Configurable cap (default 10,000) on disk storage.

Drain response handling (mirrors dispatcher)

Status Behavior
2xx Success, continue. Restore batch size if previously halved
413 Halve batch size, retry. Drop event at batchSize=1
5xx / 408 Stop, retry next online transition
429 Stop, retry next online transition
401 / 403 / 404 Fatal — invoke onFatalConfigError, stop
Other 4xx Drop batch, continue with remaining

Ticket: https://app.shortcut.com/metarouter/story/36909

Test plan

  • Capacity overflow (event count AND byte size) flushes entire queue to disk
  • flushToOfflineStorage drains memory queue to disk when offline
  • requeueToFront flushes memory queue to disk at capacity (older events preserved at front)
  • maxOfflineDiskEvents cap enforced across multiple flushes
  • Drain sends directly to network, cleans up disk on completion
  • Drain stops on server error (5xx), retains events on disk via checkpoint
  • Drain pauses on 429 rate limit, retains events on disk
  • Drain halves batch on 413, drops oversized event at batchSize=1
  • Drain invokes onFatalConfigError on 401/403/404
  • Drain drops batch on client error (400) and continues with remaining
  • Drain filters expired events by TTL before sending
  • Drain bails early when dispatcher.isNetworkPaused()
  • Concurrent drain invocations do not duplicate sends (isDraining guard)
  • App-kill-relaunch: previous session events drain on next online launch
  • sendBatchDirect returns NetworkResponse? for category-aware handling
  • All tests pass

Add offline disk overflow capability to PersistableEventQueue:
- Events that overflow from memory while offline are buffered and
  batched to a separate disk file (offline-overflow.v1.json)
- On reconnect, overflow drains directly to network via
  Dispatcher.sendBatchDirect() without loading into memory queue
- Respects maxOfflineDiskEvents cap (default 10000)
- EventDiskStore now accepts configurable filename for separate stores
- MetaRouterAnalyticsClient wires overflow enable/disable on
  network transitions and triggers drain on reconnect

Tests cover: overflow buffering, batch writes, disk cap enforcement,
drain-to-network success/failure, app-kill-relaunch scenario,
toggle behavior, sendBatchDirect, and configurable filename.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 10, 2026

Test Results

 58 files  ± 0   58 suites  ±0   1m 7s ⏱️ +12s
456 tests +14  456 ✅ +14  0 💤 ±0  0 ❌ ±0 
912 runs  +28  912 ✅ +28  0 💤 ±0  0 ❌ ±0 

Results for commit 934af10. ± Comparison against base commit a96ed6f.

This pull request removes 18 and adds 32 tests. Note that renamed tests count towards both.
com.metarouter.analytics.MetaRouterAnalyticsClientTest ‑ queue overflow drops oldest events
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ enqueue drops oldest when byte capacity exceeded
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ enqueue drops oldest when event count capacity exceeded
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ events drained before background are not in snapshot
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ flush overwrites previous state - no duplicates
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ flushToDisk fully overwrites previous snapshot
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ flushToDisk with empty queue skips write and deletes existing snapshot
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ flushToDisk writes current memory state to disk
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ full lifecycle - enqueue, background flush, process restart, rehydrate
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ multiple flush-rehydrate cycles preserve data correctly
…
com.metarouter.analytics.MetaRouterAnalyticsClientTest ‑ queue overflow flushes to disk instead of dropping
com.metarouter.analytics.dispatcher.DispatcherTest ‑ sendBatchDirect does not affect memory queue
com.metarouter.analytics.dispatcher.DispatcherTest ‑ sendBatchDirect returns 200 for empty list
com.metarouter.analytics.dispatcher.DispatcherTest ‑ sendBatchDirect returns 200 response on success
com.metarouter.analytics.dispatcher.DispatcherTest ‑ sendBatchDirect returns 500 response on server error
com.metarouter.analytics.dispatcher.DispatcherTest ‑ sendBatchDirect returns null on network error
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ app killed while offline, relaunch online, disk overflow drains to network
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ capacity overflow flushes entire queue to disk
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ clear deletes overflow disk file
com.metarouter.analytics.queue.PersistableEventQueueTest ‑ concurrent drain calls do not duplicate sends
…

♻️ This comment has been updated with latest results.

@choudlet choudlet changed the title feat: network-awarness [android] feat: network-awarness Apr 15, 2026
@choudlet choudlet changed the title feat: network-awarness feat: offline disk overflow + network-aware flush Apr 16, 2026
@choudlet choudlet merged commit 884a268 into main Apr 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant