Skip to content

Add server-side aggregation endpoints for dashboard charts #20

@Boanerges1996

Description

@Boanerges1996

Problem

The PeerMetrics dashboard (/apps/<id>) downloads all raw data into the browser and aggregates it with JavaScript. On production:

  • GET /v1/conferences?appId=...&created_at_gte=... returns 42,177 conferences in one response
  • Response size: 21.7 MB of JSON
  • Response time: ~13 seconds
  • Gunicorn worker timeout is 30s — one more doubling of data = every dashboard load returns 504

Even after the PRs we merged (pagination #19, expand_fields removal, annotations #22), the graphs tab still pulls the full dataset because charts aggregate client-side.

Context — Scale

Sessions Health has ~3 apps total (dev / staging / prod). This changes the optimal strategy significantly:

  • Cache hit rate will be near 100% once warm (all users hit the same small key set)
  • Pre-warming every app is trivially cheap (~180 KB Redis, ~1 qps on DB)
  • Cold-start latency can be fully eliminated

Goal

Three compounding layers that together make the dashboard load in ~50ms regardless of data volume, with zero 504s.

┌────────────────────────────────────────────────────────────────┐
│  LAYER 3: Vue Query (browser-side cache + dedup + SWR)         │
│  → Repeat navigations hit in-memory cache instantly            │
│  → Multiple charts requesting same data = one HTTP call        │
└──────────────────────┬─────────────────────────────────────────┘
                       ▼
┌────────────────────────────────────────────────────────────────┐
│  LAYER 2: Redis cache (server-side, 30-60s TTL)                │
│  → With ~3 apps, a pre-warming job keeps cache always hot      │
│  → 2 KB responses cache cheaply (vs 21 MB today)               │
└──────────────────────┬─────────────────────────────────────────┘
                       ▼
┌────────────────────────────────────────────────────────────────┐
│  LAYER 1: Server-side aggregation (PostgreSQL GROUP BY)        │
│  → 21 MB → 2 KB responses                                      │
│  → Query runs in <100ms with proper indexes                    │
│  → This is the foundational fix; nothing above works without it │
└────────────────────────────────────────────────────────────────┘

Expected results after all three layers ship (with pre-warming):

Scenario Response time Response size
Any dashboard load, any user ~50ms 2 KB
Same user revisiting (Vue Query cache) <10ms (no network) 0 bytes

With only ~3 apps total, pre-warming keeps all Redis caches always hot. No user ever pays the cold-start Postgres cost.


Layer 1 — Server-Side Aggregation (foundational)

Charts that need aggregation (use the big conferences array — priority)

Chart Current behavior Summary endpoint needed
Conferences (stacked bar) .reduce() groups by day, counts success/warning/error/ongoing GET /v1/conferences/summary?group_by=day
Most common issues Groups issues by code, counts per conference GET /v1/issues/summary?group_by=code
Conference duration Buckets conferences into duration ranges GET /v1/conferences/duration-summary
Call setup time Computes from connections, buckets in JS GET /v1/connections/setup-time-summary
Number of participants Counts participants per conference GET /v1/conferences/participant-count-summary

Charts using sessions/connections (smaller, lower priority)

Chart Data Current size Priority
Browsers sessions ~3 MB Medium
OS sessions ~3 MB Medium
Map sessions (geo_ip) ~3 MB Medium
GUM errors issues <1 KB Low
Relayed connections connections <1 KB Low

Click-to-Detail Pattern

When a user clicks a bar/segment, a modal shows matching conferences. Today this filters the in-memory array. With aggregation, we fetch just that slice via the paginated list endpoint:

async onChartClick(e) {
  const dayStart = new Date(e.xValue);
  const dayEnd = new Date(dayStart);
  dayEnd.setDate(dayEnd.getDate() + 1);

  const resp = await peermetrics.get('/v1/conferences', {
    appId: peermetrics.app.id,
    created_at_gte: dayStart.toISOString(),
    created_at_lte: dayEnd.toISOString(),
    limit: 50,
  });
  this.modalConferences = resp.results;
}

API filters needed for modals

Chart Filter Status
Conferences by day created_at_gte, created_at_lte ✓ supported
Most common issues issue_code=xxx add
Conference duration duration_gte, duration_lt add
Call setup time setup_time_gte, setup_time_lt complex (setup time is per-connection)

Example aggregation query

SELECT DATE(created_at) AS date,
  COUNT(*) FILTER (WHERE ongoing) AS ongoing,
  COUNT(*) FILTER (WHERE has_errors AND NOT ongoing) AS error,
  COUNT(*) FILTER (WHERE has_warnings AND NOT has_errors AND NOT ongoing) AS warning,
  COUNT(*) FILTER (WHERE NOT has_errors AND NOT has_warnings AND NOT ongoing) AS success
FROM conference
WHERE app_id = ? AND created_at >= ?
GROUP BY DATE(created_at);

Layer 2 — Redis Caching (server-side)

Redis is already deployed (rate limiter uses it). Add a thin cache wrapper around summary endpoints.

Memory footprint

  • 3 apps × ~10 summary endpoints × 3 date ranges = ~180 KB total
  • Completely negligible

TTL strategy

Endpoint TTL
Conference daily summary 60s
Duration/participant distributions 120s
Session browser/OS aggregates 300s
Click-to-detail paginated lists no cache

Pattern

def cached_summary(cache_key, ttl, compute_fn):
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    result = compute_fn()
    redis.setex(cache_key, ttl, json.dumps(result))
    return result

Invalidation

TTL-based. No active invalidation. Add ?no_cache=true param for admin tools that need fresh data.

Cache stampede mitigation

Add randomized TTL jitter (60s ± 10s) so cache keys don't all expire simultaneously.


Layer 3 — Vue Query (browser-side)

Add vue-query@1.x (Vue 2 compatible) to the dashboard.

Benefits

  • Instant cache hits on repeat navigation — dashboard → conference detail → back: renders in <10ms from memory
  • Automatic deduplication — 3 charts requesting the same summary = 1 HTTP call
  • Stale-while-revalidate — show cached data instantly, refetch in background
  • Automatic retries on network errors

Pattern

const { data, isLoading } = useQuery({
  queryKey: ['conferences-summary', appId, dateRange],
  queryFn: () => peermetrics.get('/v1/conferences/summary', { appId, created_at_gte: dateRange }),
  staleTime: 60_000,
  cacheTime: 300_000,
});

Implementation Phases

Phase 0 — Prerequisites

Phase 1 — Conferences chart (MVP)

  1. API: GET /v1/conferences/summary?appId=...&created_at_gte=...
  2. API: Add created_at_lte support to paginated /v1/conferences
  3. Web: conferencesChart.vue uses summary + paginated click handler
  4. Web: Remove big /v1/conferences call from app.vue graphs tab

Success metric: Conferences chart renders in <200ms.

Phase 2 — Remaining conference/issue/connection charts

  1. API: /conferences/duration-summary, /conferences/participant-count-summary
  2. API: /issues/summary?group_by=code
  3. API: /connections/setup-time-summary, /connections/summary?group_by=type
  4. API: Add issue_code, duration_gte/lt filters on /conferences
  5. Web: Migrate conferenceDurationChart, noParticipantsChart, mostCommongIssuesChart, callSetupTimeChart, connectionTypeChart

Phase 3 — Session charts

  1. API: /sessions/summary?group_by=browser|os|country
  2. API: Geo data endpoint for mapChart
  3. Web: Migrate browsersChart, osChart, mapChart
  4. Web: Remove the full /v1/sessions call (currently 3 MB)

Phase 4 — Redis caching

  1. Add cached_summary() helper
  2. Wrap each summary endpoint with appropriate TTL
  3. Cache keys include app_id, date range, filters
  4. Monitor cache hit rate in CloudWatch

Phase 5 — Vue Query

  1. Install vue-query@1.x (Vue 2 compat)
  2. Wrap dashboard with query client provider
  3. Refactor app.vue data fetching to use useQuery
  4. Migrate chart components to consume queries instead of props
  5. Remove ad-hoc state (this.data.conferences, this.paginatedConferences)

Phase 6 (recommended) — Cache pre-warming

Sessions Health has ~3 apps total. Pre-warming every app is cheap and worthwhile.

Implementation:

# management/commands/warm_dashboard_cache.py
def handle(self):
    for app in App.objects.filter(is_active=True):
        for endpoint in SUMMARY_ENDPOINTS:
            key, ttl = cache_key_for(app, endpoint)
            result = endpoint.compute(app)
            redis.setex(key, ttl, json.dumps(result))
  • Run every 30 seconds (TTL is 60s, so cache always fresh)
  • Total DB load: ~30 queries per 30s cycle = 1 qps
  • Effect: no user ever hits a cold cache, not even first load of the day

Edge Cases & Decisions

Dashboard filters (browser/OS/country/app version)

Filters run client-side today. With aggregation, they must be passed as query params to summary endpoints. Decision: include filter params from Phase 1 onward. Cache keys include filters so filtered/unfiltered queries are cached separately.

Redis key schema

conf_summary:{app_id}:{date_range_hash}:{filter_hash}
issues_summary:{app_id}:{date_range_hash}:{filter_hash}

Vue Query + Vue 2 compatibility

Use vue-query@1.x (Vue 2 compat). @tanstack/vue-query@5.x is Vue 3 only.

Timezone

Aggregate in UTC for Phase 1. Frontend shows in browser timezone. Add timezone_offset param later if users complain.

Ongoing + mutually exclusive status

Conferences chart shows success | warning | error | ongoing as mutually exclusive. SQL uses FILTER with explicit AND NOT clauses (see query example above).

Rate limiting

Summary endpoints hit GenericView (2000/min per IP). Vue Query's dedup helps — multiple charts asking for the same data fire one request, not several.

Backward compatibility

Keep the old /v1/conferences list working. Summary endpoints are additive. Migrate chart-by-chart.

Cache stampede

Randomized TTL jitter (60s ± 10s) so keys don't expire together.

Pre-warming scope

With only ~3 apps, pre-warm every app every 30 seconds. No complex "which apps to warm" logic needed.


What to Measure

Metric Before Target
Dashboard load time (warm Redis) 13s ~50ms
Dashboard load time (Vue Query hit) N/A <10ms
Largest API response size 21 MB <50 KB
504 error rate growing 0
Postgres dashboard query time 3-5s <100ms
Redis cache hit rate N/A ~100% with pre-warming

Dependencies


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions