Add server-side aggregation endpoints for dashboard charts

## Problem

The PeerMetrics dashboard (`/apps/<id>`) downloads **all raw data** into the browser and aggregates it with JavaScript. On production:

- `GET /v1/conferences?appId=...&created_at_gte=...` returns **42,177 conferences** in one response
- Response size: **21.7 MB of JSON**
- Response time: **~13 seconds**
- Gunicorn worker timeout is 30s — one more doubling of data = every dashboard load returns 504

Even after the PRs we merged (pagination #19, `expand_fields` removal, annotations #22), the graphs tab still pulls the full dataset because charts aggregate client-side.

## Context — Scale

Sessions Health has **~3 apps total** (dev / staging / prod). This changes the optimal strategy significantly:
- Cache hit rate will be near 100% once warm (all users hit the same small key set)
- Pre-warming every app is trivially cheap (~180 KB Redis, ~1 qps on DB)
- Cold-start latency can be fully eliminated

---

## Goal

**Three compounding layers** that together make the dashboard load in ~50ms regardless of data volume, with zero 504s.

```
┌────────────────────────────────────────────────────────────────┐
│  LAYER 3: Vue Query (browser-side cache + dedup + SWR)         │
│  → Repeat navigations hit in-memory cache instantly            │
│  → Multiple charts requesting same data = one HTTP call        │
└──────────────────────┬─────────────────────────────────────────┘
                       ▼
┌────────────────────────────────────────────────────────────────┐
│  LAYER 2: Redis cache (server-side, 30-60s TTL)                │
│  → With ~3 apps, a pre-warming job keeps cache always hot      │
│  → 2 KB responses cache cheaply (vs 21 MB today)               │
└──────────────────────┬─────────────────────────────────────────┘
                       ▼
┌────────────────────────────────────────────────────────────────┐
│  LAYER 1: Server-side aggregation (PostgreSQL GROUP BY)        │
│  → 21 MB → 2 KB responses                                      │
│  → Query runs in <100ms with proper indexes                    │
│  → This is the foundational fix; nothing above works without it │
└────────────────────────────────────────────────────────────────┘
```

**Expected results after all three layers ship (with pre-warming):**

| Scenario | Response time | Response size |
|---|---|---|
| Any dashboard load, any user | ~50ms | 2 KB |
| Same user revisiting (Vue Query cache) | **<10ms** (no network) | 0 bytes |

With only ~3 apps total, pre-warming keeps all Redis caches always hot. No user ever pays the cold-start Postgres cost.

---

## Layer 1 — Server-Side Aggregation (foundational)

### Charts that need aggregation (use the big conferences array — priority)

| Chart | Current behavior | Summary endpoint needed |
|---|---|---|
| **Conferences (stacked bar)** | `.reduce()` groups by day, counts success/warning/error/ongoing | `GET /v1/conferences/summary?group_by=day` |
| **Most common issues** | Groups issues by code, counts per conference | `GET /v1/issues/summary?group_by=code` |
| **Conference duration** | Buckets conferences into duration ranges | `GET /v1/conferences/duration-summary` |
| **Call setup time** | Computes from connections, buckets in JS | `GET /v1/connections/setup-time-summary` |
| **Number of participants** | Counts participants per conference | `GET /v1/conferences/participant-count-summary` |

### Charts using sessions/connections (smaller, lower priority)

| Chart | Data | Current size | Priority |
|---|---|---|---|
| Browsers | sessions | ~3 MB | Medium |
| OS | sessions | ~3 MB | Medium |
| Map | sessions (geo_ip) | ~3 MB | Medium |
| GUM errors | issues | <1 KB | Low |
| Relayed connections | connections | <1 KB | Low |

### Click-to-Detail Pattern

When a user clicks a bar/segment, a modal shows matching conferences. Today this filters the in-memory array. With aggregation, we fetch just that slice via the paginated list endpoint:

```js
async onChartClick(e) {
  const dayStart = new Date(e.xValue);
  const dayEnd = new Date(dayStart);
  dayEnd.setDate(dayEnd.getDate() + 1);

  const resp = await peermetrics.get('/v1/conferences', {
    appId: peermetrics.app.id,
    created_at_gte: dayStart.toISOString(),
    created_at_lte: dayEnd.toISOString(),
    limit: 50,
  });
  this.modalConferences = resp.results;
}
```

### API filters needed for modals

| Chart | Filter | Status |
|---|---|---|
| Conferences by day | `created_at_gte`, `created_at_lte` | ✓ supported |
| Most common issues | `issue_code=xxx` | **add** |
| Conference duration | `duration_gte`, `duration_lt` | **add** |
| Call setup time | `setup_time_gte`, `setup_time_lt` | complex (setup time is per-connection) |

### Example aggregation query

```sql
SELECT DATE(created_at) AS date,
  COUNT(*) FILTER (WHERE ongoing) AS ongoing,
  COUNT(*) FILTER (WHERE has_errors AND NOT ongoing) AS error,
  COUNT(*) FILTER (WHERE has_warnings AND NOT has_errors AND NOT ongoing) AS warning,
  COUNT(*) FILTER (WHERE NOT has_errors AND NOT has_warnings AND NOT ongoing) AS success
FROM conference
WHERE app_id = ? AND created_at >= ?
GROUP BY DATE(created_at);
```

---

## Layer 2 — Redis Caching (server-side)

Redis is already deployed (rate limiter uses it). Add a thin cache wrapper around summary endpoints.

### Memory footprint

- 3 apps × ~10 summary endpoints × 3 date ranges = **~180 KB total**
- Completely negligible

### TTL strategy

| Endpoint | TTL |
|---|---|
| Conference daily summary | 60s |
| Duration/participant distributions | 120s |
| Session browser/OS aggregates | 300s |
| Click-to-detail paginated lists | no cache |

### Pattern

```python
def cached_summary(cache_key, ttl, compute_fn):
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    result = compute_fn()
    redis.setex(cache_key, ttl, json.dumps(result))
    return result
```

### Invalidation

TTL-based. No active invalidation. Add `?no_cache=true` param for admin tools that need fresh data.

### Cache stampede mitigation

Add randomized TTL jitter (60s ± 10s) so cache keys don't all expire simultaneously.

---

## Layer 3 — Vue Query (browser-side)

Add `vue-query@1.x` (Vue 2 compatible) to the dashboard.

### Benefits

- **Instant cache hits on repeat navigation** — dashboard → conference detail → back: renders in <10ms from memory
- **Automatic deduplication** — 3 charts requesting the same summary = 1 HTTP call
- **Stale-while-revalidate** — show cached data instantly, refetch in background
- **Automatic retries** on network errors

### Pattern

```js
const { data, isLoading } = useQuery({
  queryKey: ['conferences-summary', appId, dateRange],
  queryFn: () => peermetrics.get('/v1/conferences/summary', { appId, created_at_gte: dateRange }),
  staleTime: 60_000,
  cacheTime: 300_000,
});
```

---

## Implementation Phases

### Phase 0 — Prerequisites

- [ ] Database indexes on `conference(app_id, created_at)`, `app_genericevent(app_id, created_at)` (tied to #15)
- [ ] Without these, Layer 1 is slow even with tiny response bodies

### Phase 1 — Conferences chart (MVP)

1. API: `GET /v1/conferences/summary?appId=...&created_at_gte=...`
2. API: Add `created_at_lte` support to paginated `/v1/conferences`
3. Web: `conferencesChart.vue` uses summary + paginated click handler
4. Web: Remove big `/v1/conferences` call from `app.vue` graphs tab

**Success metric:** Conferences chart renders in <200ms.

### Phase 2 — Remaining conference/issue/connection charts

1. API: `/conferences/duration-summary`, `/conferences/participant-count-summary`
2. API: `/issues/summary?group_by=code`
3. API: `/connections/setup-time-summary`, `/connections/summary?group_by=type`
4. API: Add `issue_code`, `duration_gte/lt` filters on `/conferences`
5. Web: Migrate `conferenceDurationChart`, `noParticipantsChart`, `mostCommongIssuesChart`, `callSetupTimeChart`, `connectionTypeChart`

### Phase 3 — Session charts

1. API: `/sessions/summary?group_by=browser|os|country`
2. API: Geo data endpoint for `mapChart`
3. Web: Migrate `browsersChart`, `osChart`, `mapChart`
4. Web: Remove the full `/v1/sessions` call (currently 3 MB)

### Phase 4 — Redis caching

1. Add `cached_summary()` helper
2. Wrap each summary endpoint with appropriate TTL
3. Cache keys include app_id, date range, filters
4. Monitor cache hit rate in CloudWatch

### Phase 5 — Vue Query

1. Install `vue-query@1.x` (Vue 2 compat)
2. Wrap dashboard with query client provider
3. Refactor `app.vue` data fetching to use `useQuery`
4. Migrate chart components to consume queries instead of props
5. Remove ad-hoc state (`this.data.conferences`, `this.paginatedConferences`)

### Phase 6 (recommended) — Cache pre-warming

Sessions Health has ~3 apps total. Pre-warming every app is cheap and worthwhile.

**Implementation:**
```python
# management/commands/warm_dashboard_cache.py
def handle(self):
    for app in App.objects.filter(is_active=True):
        for endpoint in SUMMARY_ENDPOINTS:
            key, ttl = cache_key_for(app, endpoint)
            result = endpoint.compute(app)
            redis.setex(key, ttl, json.dumps(result))
```

- Run every 30 seconds (TTL is 60s, so cache always fresh)
- Total DB load: ~30 queries per 30s cycle = 1 qps
- **Effect: no user ever hits a cold cache, not even first load of the day**

---

## Edge Cases & Decisions

### Dashboard filters (browser/OS/country/app version)

Filters run client-side today. With aggregation, they must be passed as query params to summary endpoints. **Decision:** include filter params from Phase 1 onward. Cache keys include filters so filtered/unfiltered queries are cached separately.

### Redis key schema

```
conf_summary:{app_id}:{date_range_hash}:{filter_hash}
issues_summary:{app_id}:{date_range_hash}:{filter_hash}
```

### Vue Query + Vue 2 compatibility

Use `vue-query@1.x` (Vue 2 compat). `@tanstack/vue-query@5.x` is Vue 3 only.

### Timezone

Aggregate in UTC for Phase 1. Frontend shows in browser timezone. Add `timezone_offset` param later if users complain.

### Ongoing + mutually exclusive status

Conferences chart shows `success | warning | error | ongoing` as mutually exclusive. SQL uses `FILTER` with explicit AND NOT clauses (see query example above).

### Rate limiting

Summary endpoints hit `GenericView` (2000/min per IP). Vue Query's dedup helps — multiple charts asking for the same data fire one request, not several.

### Backward compatibility

Keep the old `/v1/conferences` list working. Summary endpoints are additive. Migrate chart-by-chart.

### Cache stampede

Randomized TTL jitter (60s ± 10s) so keys don't expire together.

### Pre-warming scope

With only ~3 apps, pre-warm every app every 30 seconds. No complex "which apps to warm" logic needed.

---

## What to Measure

| Metric | Before | Target |
|---|---|---|
| Dashboard load time (warm Redis) | 13s | ~50ms |
| Dashboard load time (Vue Query hit) | N/A | <10ms |
| Largest API response size | 21 MB | <50 KB |
| 504 error rate | growing | 0 |
| Postgres dashboard query time | 3-5s | <100ms |
| Redis cache hit rate | N/A | ~100% with pre-warming |

---

## Dependencies

- **#15** — database indexes are a prerequisite for Layer 1

---


Chart	Current behavior	Summary endpoint needed
Conferences (stacked bar)	`.reduce()` groups by day, counts success/warning/error/ongoing	`GET /v1/conferences/summary?group_by=day`
Most common issues	Groups issues by code, counts per conference	`GET /v1/issues/summary?group_by=code`
Conference duration	Buckets conferences into duration ranges	`GET /v1/conferences/duration-summary`
Call setup time	Computes from connections, buckets in JS	`GET /v1/connections/setup-time-summary`
Number of participants	Counts participants per conference	`GET /v1/conferences/participant-count-summary`

Scenario	Response time	Response size
Any dashboard load, any user	~50ms	2 KB
Same user revisiting (Vue Query cache)	<10ms (no network)	0 bytes

Chart	Data	Current size	Priority
Browsers	sessions	~3 MB	Medium
OS	sessions	~3 MB	Medium
Map	sessions (geo_ip)	~3 MB	Medium
GUM errors	issues	<1 KB	Low
Relayed connections	connections	<1 KB	Low

Chart	Filter	Status
Conferences by day	`created_at_gte`, `created_at_lte`	✓ supported
Most common issues	`issue_code=xxx`	add
Conference duration	`duration_gte`, `duration_lt`	add
Call setup time	`setup_time_gte`, `setup_time_lt`	complex (setup time is per-connection)

Endpoint	TTL
Conference daily summary	60s
Duration/participant distributions	120s
Session browser/OS aggregates	300s
Click-to-detail paginated lists	no cache

Metric	Before	Target
Dashboard load time (warm Redis)	13s	~50ms
Dashboard load time (Vue Query hit)	N/A	<10ms
Largest API response size	21 MB	<50 KB
504 error rate	growing	0
Postgres dashboard query time	3-5s	<100ms
Redis cache hit rate	N/A	~100% with pre-warming

Add server-side aggregation endpoints for dashboard charts #20

Description

Problem

Context — Scale

Goal

Layer 1 — Server-Side Aggregation (foundational)

Charts that need aggregation (use the big conferences array — priority)

Charts using sessions/connections (smaller, lower priority)

Click-to-Detail Pattern

API filters needed for modals

Example aggregation query

Layer 2 — Redis Caching (server-side)

Memory footprint

TTL strategy

Pattern

Invalidation

Cache stampede mitigation

Layer 3 — Vue Query (browser-side)

Benefits

Pattern

Implementation Phases

Phase 0 — Prerequisites

Phase 1 — Conferences chart (MVP)

Phase 2 — Remaining conference/issue/connection charts

Phase 3 — Session charts

Phase 4 — Redis caching

Phase 5 — Vue Query

Phase 6 (recommended) — Cache pre-warming

Edge Cases & Decisions

Dashboard filters (browser/OS/country/app version)

Redis key schema

Vue Query + Vue 2 compatibility

Timezone

Ongoing + mutually exclusive status

Rate limiting

Backward compatibility

Cache stampede

Pre-warming scope

What to Measure

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions