Skip to content

Cherry-pick #1344 to main: Fix ClickHouse OOM in /internal/metrics#1402

Merged
BilalG1 merged 1 commit intomainfrom
cherry-pick-clickhouse-metrics-fix
May 1, 2026
Merged

Cherry-pick #1344 to main: Fix ClickHouse OOM in /internal/metrics#1402
BilalG1 merged 1 commit intomainfrom
cherry-pick-clickhouse-metrics-fix

Conversation

@BilalG1
Copy link
Copy Markdown
Collaborator

@BilalG1 BilalG1 commented May 1, 2026

Summary

Cherry-pick of #1344 (85ae4b1c9) from dev onto main to address user-reported ClickHouse errors on /internal/metrics without merging the rest of dev.

The original PR fixes the MEMORY_LIMIT_EXCEEDED (7.2 GiB cap) hit by loadMonthlyActiveUsers and applies related optimizations to other queries on the same route. Endpoint-level: −27% peak memory, −82% max query duration, −99.95% bytes shipped to Node. See #1344 for full root-cause analysis, benchmark methodology, and results.

Why cherry-pick rather than merge dev

Users are hitting the OOM now and we don't want to ship the other 42 commits on dev to main yet. The patch applied cleanly — no other dev commits between the main/dev divergence point and #1344 touched apps/backend/src/app/api/latest/internal/metrics/route.tsx or the snapshot file.

Files

  • apps/backend/src/app/api/latest/internal/metrics/route.tsx — query rewrites
  • apps/e2e/tests/backend/endpoints/api/v1/__snapshots__/internal-metrics.test.ts.snap — one documented snapshot reclassification (deleted-user moves from "reactivated" to "new"; totals match either way — see Fix ClickHouse OOM in MAU query + optimize /internal/metrics route #1344)
  • apps/backend/scripts/benchmark-internal-metrics.ts — local-only benchmark harness (new file)

Test plan

  • CI green on this branch
  • Sanity-check the dashboard renders correctly post-deploy (split charts, MAU counter, analytics overview)
  • Monitor Sentry for Failed to load monthly active users for internal metrics — should drop to zero

…1344)

## Summary

Fixes the Sentry `StackAssertionError: Failed to load monthly active
users for internal metrics` crash (ClickHouse OOM at the 7.2 GiB
per-query cap) and applies two related optimizations to other queries in
the same route while here. Adds a local benchmark harness that validates
correctness and measures peak memory / duration before & after.

## Root cause (the original Sentry error)

`loadMonthlyActiveUsers` was written as `SELECT user_id … GROUP BY
user_id` and then counting in Node via a `Set`. On a large project that
ships back millions of user_ids. Two failure modes stacked:

1. **Result materialization** — every distinct user_id had to be
buffered in the server before streaming to Node (~20 MiB of result for
450k users; much more at real scale).
2. **`JSONExtract(toJSONString(data), 'is_anonymous', 'UInt8')`** — the
`toJSONString(data)` per-row re-serialization of the entire nested JSON
column, billions of times, just to pull one boolean. Dominates
bytes-read.

Combined, on a single partition read from S3-backed MergeTree, this can
exceed ClickHouse's 7.2 GiB per-query memory cap. That's exactly what
the Sentry trace showed.

## Changes

### 1. Fix MAU query (`loadMonthlyActiveUsers`)

Moved counting to the server with
`uniqExact(sipHash64(normalized_user_id))` and pulled the JS-side
normalization (`lower`, `trim`, `isUuid`) into SQL. Picked `sipHash64`
after benchmarking 7 variants — it's exact (at <<2³² users) and halves
the uniqExact hash-state vs. raw string keys.

### 2. Fix 1 — `JSONExtract(toJSONString(data), …)` → direct
`CAST(data.is_anonymous, …)`

Applied everywhere the pattern appeared in the metrics route:
- `loadDailyActiveUsers`
- the `analyticsUserJoin` subquery
- the `nonAnonymousAnalyticsUserFilter`
- `analyticsOverview:topRegion`
- `analyticsOverview:online`

Semantics preserved (`coalesce(CAST(data.is_anonymous,
'Nullable(UInt8)'), 0)` matches `JSONExtract(…, 'UInt8')` behavior when
the field is missing).

### 3. Fix 3 — server-aggregate the split queries

`loadDailyActiveUsersSplit` and `loadDailyActiveTeamsSplit` used to ship
1.2M+ `(day, user_id)` rows back to Node just so the JS could bucket
them into new / retained / reactivated. Rewrote both as one CTE-style
query that returns 31 rows (one per day in the 30-day window) with the
counts precomputed.

**Minor semantic shift** (documented inline in `route.tsx`): \"new\" is
now based on the user's first-ever `\$token-refresh` event rather than
their Postgres `signedUpAt`. Agrees for users who log in immediately
after sign-up (the common case). Disagrees for the rare edge case of an
account that existed pre-window but never generated a `\$token-refresh`
until now — old code classified as \"reactivated,\" new code classifies
as \"new.\" Judged acceptable; can be revisited.

Postgres round-trips for `ProjectUser.signedUpAt` / `Team.createdAt` are
no longer needed for the split, and the 76 MiB-ish wire ship is gone.

### 4. Benchmark harness
(`apps/backend/scripts/benchmark-internal-metrics.ts`)

Local-only tool. Three modes:
- **MAU equivalence matrix** — 13 edge cases (empty, dedup, anonymous
filter, window boundary, null user_id, non-UUID user_id, case variation,
project isolation, missing/null `is_anonymous`, wrong event_type).
Asserts OLD pipeline and NEW query return the **same set** of users, not
just the same count.
- **MAU perf** — OLD vs NEW plus 6 other candidate variants (inline
regex, UUID keys, sipHash64, HLL sketches), reads `memory_usage` /
`read_rows` / `result_bytes` from `system.query_log` for each, prints a
ranked table.
- **Full-route benchmark** (`BENCH_ROUTE_QUERIES=1`) — runs every
ClickHouse query in `/internal/metrics` in three stages (BEFORE, AFTER,
candidate OPTIMIZED) against the same seed and prints per-query deltas
plus endpoint-level totals.

Seeds under a synthetic `project_id` so real data is never touched;
cleans up on exit via `ALTER TABLE … DELETE`.

## Benchmark results

### MAU query alone

Ran at two scales; set-equality verified (new query identifies the same
individual users, not just the same count).

| seed | MAU | peak memory (old → new) | bytes read | duration |
|---|---|---|---|---|
| 500k events | 89,939 | 158.7 MiB → 46.7 MiB (**3.4×**, −70%) | 175.7
MiB → 63.0 MiB (2.8×) | 483 ms → 76 ms (**6.4×**) |
| 2.5M events | 449,990 | 439.2 MiB → 281.4 MiB (1.56×, −36%) | 865.0
MiB → 310.9 MiB (2.8×) | 783 ms → 126 ms (**6.2×**) |

MAU variant bake-off at 2.5M events (all exact, all set-equal to OLD):

| variant | memory | duration | notes |
|---|---|---|---|
| v0_old (baseline) | 440 MiB | 567 ms | — |
| v1_uniqExact_string | 284 MiB | 110 ms | naive fix |
| v3_uniqExact_toUUID | 244 MiB | 153 ms | UUID keys, slower per-row |
| **v4_uniqExact_sipHash64** | **125 MiB** | **95 ms** | **shipped** |
| v5_uniq (HLL) ~approx | 30 MiB | 86 ms | −0.25% error |
| v6_uniqCombined ~approx | 31 MiB | 67 ms | −0.15% error |

### Full `/internal/metrics` route (2.7M events, 300k users + page-views
+ clicks + teams)

Ranked by BEFORE peak memory:

| query | mem BEFORE | mem AFTER | Δ mem | dur BEFORE | dur AFTER | Δ
dur |
|---|---|---|---|---|---|---|
| analyticsOverview:topReferrers | 588.1 MiB | 411.1 MiB | 1.43× | 1833
ms | 110 ms | **16.66×** |
| analyticsOverview:totalVisitors | 584.3 MiB | 403.5 MiB | 1.45× | 1829
ms | 121 ms | 15.12× |
| analyticsOverview:dailyEvents | 584.1 MiB | 403.7 MiB | 1.45× | 1897
ms | 140 ms | 13.55× |
| loadUsersByCountry | 393.1 MiB | 385.4 MiB | ≈same | 74 ms | 80 ms |
≈same |
| loadDailyActiveUsersSplit | 363.4 MiB | 396.8 MiB | *+9%* | 1966 ms |
356 ms | 5.52× |
| analyticsOverview:topRegion | 269.9 MiB | 106.4 MiB | 2.54× | 1602 ms
| 65 ms | 24.65× |
| loadDailyActiveUsers | 268.3 MiB | 84.0 MiB | 3.19× | 1111 ms | 44 ms
| 25.25× |
| loadDailyActiveTeamsSplit | 59.6 MiB | 78.1 MiB | *+31%* | 70 ms | 123
ms | *+76%* |
| loadMonthlyActiveUsers | 54.9 MiB | 54.9 MiB | ≈same | 68 ms | 56 ms |
≈same |
| analyticsOverview:online | 18.4 MiB | 5.8 MiB | 3.17× | 58 ms | 4 ms |
14.50× |

**Endpoint-level totals**

| metric | BEFORE | AFTER | Δ |
|---|---|---|---|
| Sum peak ClickHouse memory | 3.11 GiB | 2.28 GiB | **−27%** |
| **Max query duration** (endpoint wall-clock floor) | **1966 ms** |
**356 ms** | **−82%** (5.5×) |
| Sum query duration (total CPU) | 10508 ms | 1099 ms | **−90%** (9.6×)
|
| Bytes read | 10.70 GiB | 4.55 GiB | −57% |
| Bytes shipped to Node | 94.8 MiB | 44.2 KiB | **−99.95%** |

Both split queries show a small memory *regression* at this seed size
(the new server-side window-function + self-join has its own state cost
that's near break-even with \"materialize + ship\" at 300k users); at
prod scale the 76 MiB-ship saving dominates. Duration is unambiguously
better.

## Why we don't need to drop the `analyticsUserJoin` in this PR

The benchmark includes an OPTIMIZED stage that drops the LEFT JOIN and
trusts `e.data.is_anonymous` directly, which would shave another **1.2
GiB / 1.9× duration** off the endpoint. **But we can't ship that here**
— an audit of the client tracker
(`packages/js/src/lib/stack-app/apps/implementations/event-tracker.ts`)
confirmed `is_anonymous` is never set on client-emitted `$page-view` /
`$click` events. The JOIN is currently load-bearing. A follow-up PR will
enrich `is_anonymous` at the batch ingest endpoint using
`auth.user.is_anonymous`; after one metrics-window cycle (~30 days) the
JOIN can be dropped.

## Follow-up work (out of scope for this PR)

- **Batch-endpoint enrichment** + drop the analytics-overview LEFT JOIN
(est. further −53% endpoint memory, −46% duration per the benchmark).
- **Teams-split hash-variant count mismatch** — `sipHash64(team_id)`
variant of the teams split shows a count discrepancy vs. the
string-keyed version in the benchmark. Not blocking since teams-split is
only #8 by memory; needs a root-cause pass before shipping that
particular optimization.
- **`loadUsersByCountry` window bound** — currently scans every
`$token-refresh` event ever for the tenancy (no time filter). Bounding
to 30 days would bound memory growth with project age, but changes
semantics (\"country of latest login ever\" → \"in last 30 days\").
Deferred because it's product-facing.

## Snapshot changes in `internal-metrics.test.ts.snap`

The `should return metrics data with users` test signs in 10 users
today, then deletes one of them mid-test. Two small snapshot values
change on today's date; both are just a reclassification of that single
deleted user — the total (10 active users) is unchanged.

- **`daily_active_users_split.new[today]`: 9 → 10**
All 10 users really did sign in for the first time today. The old code
only counted 9 because the deleted user's Postgres row was gone by the
time the metrics query ran, so the old classifier couldn't see they were
created today. The new query looks at ClickHouse events directly, sees
the deleted user's first event was today, and counts them as new like
everyone else.

- **`daily_active_users_split.reactivated[today]`: 1 → 0**
No user was "reactivated" today — nobody was active on an earlier day
and came back. The old "1" was the deleted user falling into this bucket
by default (the old classifier had no other rule that fit them). The new
code correctly reports zero.

Totals match either way (9 + 1 = 10 + 0). We're moving one deleted user
out of the "returning visitor" bucket and into the "brand-new user"
bucket, which is what they actually were.

## Test plan

- [x] `pnpm typecheck` and `pnpm lint` pass on the backend package
- [x] MAU equivalence matrix: 13/13 cases return the same set of users
(not just the same count) between OLD and NEW pipelines
- [x] Set-equality verified at 500k-MAU perf scale
- [x] Full-route benchmark confirms the expected memory / duration
improvements
- [ ] Sanity-check the dashboard rendering after deploy (split charts,
MAU counter, analytics overview)
- [ ] Monitor Sentry for the assertion error — should drop to zero

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Performance Improvements**
* Monthly and daily active metrics are now computed entirely server-side
for faster queries and reduced client-side processing.

* **Bug Fixes**
* More consistent handling of anonymous/missing IDs and stricter ID
filtering to improve accuracy across edge cases.

* **Tests**
* Added a comprehensive benchmark and validation harness to measure
query performance and verify result equivalence across variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@vercel
Copy link
Copy Markdown

vercel Bot commented May 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-auth-hosted-components Ready Ready Preview, Comment May 1, 2026 7:43pm
stack-backend Building Building Preview, Comment May 1, 2026 7:43pm
stack-dashboard Building Building Preview, Comment May 1, 2026 7:43pm
stack-demo Building Building Preview, Comment May 1, 2026 7:43pm
stack-docs Building Building Preview, Comment May 1, 2026 7:43pm
stack-preview-backend Building Building Preview, Comment May 1, 2026 7:43pm
stack-preview-dashboard Building Building Preview, Comment May 1, 2026 7:43pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 242504ac-757f-4741-b5b1-a416e14bef59

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cherry-pick-clickhouse-metrics-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@BilalG1 BilalG1 merged commit a04bcfa into main May 1, 2026
24 of 33 checks passed
@BilalG1 BilalG1 deleted the cherry-pick-clickhouse-metrics-fix branch May 1, 2026 19:43
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 1, 2026

Greptile Summary

This cherry-pick fixes the ClickHouse MEMORY_LIMIT_EXCEEDED OOM in /internal/metrics by moving MAU counting server-side (uniqExact(sipHash64(...)) instead of shipping all user IDs to Node), rewriting the daily-activity-split queries to classify new/retained/reactivated entirely in ClickHouse via lagInFrame window functions (eliminating the Prisma fan-out), and replacing JSONExtract(toJSONString(data), ...) with direct Map path access (CAST(data.is_anonymous, 'Nullable(UInt8)')) across all is_anonymous filters.

The one snapshot change (deleted user shifts from "reactivated" → "new") is intentional and documented; a related semantic shift — where the new first_date subquery uses the entity's first non-anonymous event rather than Postgres signedUpAt — could similarly re-classify long-standing anonymous→non-anonymous users as "new" and is worth confirming as intended. The benchmark harness's equivalence matrix validates the uniqExact(string) variant (v1), not the sipHash64 variant (v4) actually deployed in production.

Confidence Score: 4/5

Safe to merge; fixes a live OOM with no correctness regressions beyond one documented and two minor semantic edge cases.

All findings are P2: one benchmark incongruence (matrix validates v1 not the shipped v4 sipHash64) and one undocumented semantic shift in new user classification for anonymous to non-anonymous transitions. No data loss, no broken contracts, and the totals are preserved in both cases. P2s alone cap confidence at 4/5.

apps/backend/scripts/benchmark-internal-metrics.ts — NEW_QUERY in runNew/runNewSet should mirror the production sipHash64 query; apps/backend/src/app/api/latest/internal/metrics/route.tsx — confirm intended behaviour of first_date classification when anonFilter is applied.

Important Files Changed

Filename Overview
apps/backend/src/app/api/latest/internal/metrics/route.tsx Core fix: replaces JS-side MAU counting (OOM root cause) with server-side ClickHouse aggregation; rewrites daily-split queries to push new/retained/reactivated classification into ClickHouse via window functions, eliminating Prisma fan-out; replaces JSONExtract(toJSONString(data), ...) with direct Map path access across all is_anonymous filters.
apps/backend/scripts/benchmark-internal-metrics.ts New 1843-line local-only benchmark harness covering equivalence matrix, perf run, and full-route comparison; the runNew/runNewSet helpers test v1_uniqExact_string, which differs from the v4_sipHash64 variant actually shipped in loadMonthlyActiveUsers.
apps/e2e/tests/backend/endpoints/api/v1/snapshots/internal-metrics.test.ts.snap One documented snapshot reclassification: a deleted user moves from the "reactivated" bucket to "new" (first_date now comes from ClickHouse events rather than Postgres signedUpAt); totals preserved.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Route as /internal/metrics route
    participant CH as ClickHouse
    participant PG as Postgres (replica)

    Note over Route,PG: BEFORE (OOM path)
    Client->>Route: GET /internal/metrics
    Route->>CH: SELECT assumeNotNull(user_id) GROUP BY user_id
    CH-->>Route: N x 36-byte user_id rows (7.2 GiB cap exceeded)
    Route->>PG: SELECT projectUserId, signedUpAt IN (activeUserIds)
    PG-->>Route: user creation dates
    Route-->>Client: split response

    Note over Route,PG: AFTER (this PR)
    Client->>Route: GET /internal/metrics
    Route->>CH: SELECT uniqExact(sipHash64(normalized_user_id)) AS mau
    CH-->>Route: single integer row
    Route->>CH: loadDailyActiveSplitFromClickhouse (lagInFrame + LEFT JOIN first_date)
    CH-->>Route: pre-aggregated new/retained/reactivated counts per day
    Note over PG: No Prisma fan-out for split classification
    Route-->>Client: split response
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/backend/scripts/benchmark-internal-metrics.ts:127-141
**Benchmark validates v1, not the shipped v4**

`NEW_QUERY` (used by `runNew` and `runNewSet`) uses `uniqExact(normalized_user_id)` — string-based exact count. The production `loadMonthlyActiveUsers` ships `uniqExact(sipHash64(normalized_user_id))` (v4). The 13-case equivalence matrix in `runMatrix` therefore validates v1 against the old query, not the variant actually running in production. The v4 hash variant is tested in `runPerf` via the `VARIANTS` array, but that is a statistical perf comparison, not a logical equivalence check. If a dataset ever produced a `sipHash64` collision (64-bit, birthday-paradox probability ~N²/2⁶⁵), the matrix would not catch the discrepancy. Consider making `NEW_QUERY` match `loadMonthlyActiveUsers` exactly, or adding a v4-specific equivalence assertion in `runPerf`.

### Issue 2 of 2
apps/backend/src/app/api/latest/internal/metrics/route.tsx:244-256
**`first_date` bounded by `anonFilter` changes "new" semantics for anonymous→non-anonymous users**

The `f` subquery that computes `first_date` applies the same `${anonFilter}` as `w`, so when `includeAnonymous = false`, `first_date` is the date of the entity's first *non-anonymous* `$token-refresh` event (not their actual sign-up date). A user who spent months as anonymous and then converted will appear as "new" on the day of their first non-anonymous event inside the window, even though their Postgres `signedUpAt` is much older. The old code looked up `signedUpAt` from Postgres for this classification. The PR documents the deleted-user reclassification in the snapshot, but this analogous case (anonymous→non-anonymous transition) could silently inflate "new" counts. Worth verifying this is the intended behaviour and documenting it alongside the existing snapshot note.

Reviews (1): Last reviewed commit: "Fix ClickHouse OOM in MAU query + optimi..." | Re-trigger Greptile

Comment on lines +127 to +141
AND project_id = {projectId:String}
AND branch_id = {branchId:String}
AND user_id IS NOT NULL
AND event_at >= {since:DateTime}
AND event_at < {untilExclusive:DateTime}
AND ({includeAnonymous:UInt8} = 1 OR coalesce(CAST(data.is_anonymous, 'Nullable(UInt8)'), 0) = 0)
)
WHERE match(normalized_user_id, {uuidRe:String})
`;

type QueryParams = {
projectId: string,
branchId: string,
since: Date,
untilExclusive: Date,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Benchmark validates v1, not the shipped v4

NEW_QUERY (used by runNew and runNewSet) uses uniqExact(normalized_user_id) — string-based exact count. The production loadMonthlyActiveUsers ships uniqExact(sipHash64(normalized_user_id)) (v4). The 13-case equivalence matrix in runMatrix therefore validates v1 against the old query, not the variant actually running in production. The v4 hash variant is tested in runPerf via the VARIANTS array, but that is a statistical perf comparison, not a logical equivalence check. If a dataset ever produced a sipHash64 collision (64-bit, birthday-paradox probability ~N²/2⁶⁵), the matrix would not catch the discrepancy. Consider making NEW_QUERY match loadMonthlyActiveUsers exactly, or adding a v4-specific equivalence assertion in runPerf.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/scripts/benchmark-internal-metrics.ts
Line: 127-141

Comment:
**Benchmark validates v1, not the shipped v4**

`NEW_QUERY` (used by `runNew` and `runNewSet`) uses `uniqExact(normalized_user_id)` — string-based exact count. The production `loadMonthlyActiveUsers` ships `uniqExact(sipHash64(normalized_user_id))` (v4). The 13-case equivalence matrix in `runMatrix` therefore validates v1 against the old query, not the variant actually running in production. The v4 hash variant is tested in `runPerf` via the `VARIANTS` array, but that is a statistical perf comparison, not a logical equivalence check. If a dataset ever produced a `sipHash64` collision (64-bit, birthday-paradox probability ~N²/2⁶⁵), the matrix would not catch the discrepancy. Consider making `NEW_QUERY` match `loadMonthlyActiveUsers` exactly, or adding a v4-specific equivalence assertion in `runPerf`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +244 to +256
LEFT JOIN (
SELECT
assumeNotNull(${idCol}) AS entity_id,
toDate(min(event_at)) AS first_date
FROM analytics_internal.events
WHERE event_type = '$token-refresh'
AND project_id = {projectId:String}
AND branch_id = {branchId:String}
AND ${idCol} IS NOT NULL
AND event_at < {untilExclusive:DateTime}
${anonFilter}
GROUP BY entity_id
) AS f USING (entity_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 first_date bounded by anonFilter changes "new" semantics for anonymous→non-anonymous users

The f subquery that computes first_date applies the same ${anonFilter} as w, so when includeAnonymous = false, first_date is the date of the entity's first non-anonymous $token-refresh event (not their actual sign-up date). A user who spent months as anonymous and then converted will appear as "new" on the day of their first non-anonymous event inside the window, even though their Postgres signedUpAt is much older. The old code looked up signedUpAt from Postgres for this classification. The PR documents the deleted-user reclassification in the snapshot, but this analogous case (anonymous→non-anonymous transition) could silently inflate "new" counts. Worth verifying this is the intended behaviour and documenting it alongside the existing snapshot note.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/app/api/latest/internal/metrics/route.tsx
Line: 244-256

Comment:
**`first_date` bounded by `anonFilter` changes "new" semantics for anonymous→non-anonymous users**

The `f` subquery that computes `first_date` applies the same `${anonFilter}` as `w`, so when `includeAnonymous = false`, `first_date` is the date of the entity's first *non-anonymous* `$token-refresh` event (not their actual sign-up date). A user who spent months as anonymous and then converted will appear as "new" on the day of their first non-anonymous event inside the window, even though their Postgres `signedUpAt` is much older. The old code looked up `signedUpAt` from Postgres for this classification. The PR documents the deleted-user reclassification in the snapshot, but this analogous case (anonymous→non-anonymous transition) could silently inflate "new" counts. Worth verifying this is the intended behaviour and documenting it alongside the existing snapshot note.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant