Move internal metrics queries to ClickHouse replica#1463
Conversation
Switches loadTotalUsers, loadAuthOverview, and loadRecentlyActiveUsers to read from the ClickHouse analytics tables instead of hitting Postgres directly, and routes the remaining Postgres reads through $replica().
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe internal metrics route now reads selected Postgres lookups from Prisma's read replica and computes daily totals and auth overview aggregates from ClickHouse within {since, untilExclusive}, mapping daily results into a fixed 31-day series with zero-filled missing days. ChangesInternal metrics aggregation data source migration
Sequence Diagram(s)sequenceDiagram
participant Client as MetricsRoute (API)
participant CH as ClickHouse
participant PR as Prisma.$replica()
Client->>CH: windowed aggregations (users, teams, contacts) / per-day signed_up_at counts
CH-->>Client: aggregated results
Client->>PR: projectUser.findMany enrichment (user IDs) for country/recent lists
PR-->>Client: enriched ProjectUser records
Client->>Client: assemble response (zero-fill 31-day series, compute auth splits)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR migrates
Confidence Score: 4/5Safe to merge; the ClickHouse migrations are logically correct and the parallel Promise.all structure is sound, but whether NULL signed_up_at values in the ETL are coalesced before landing in ClickHouse determines whether the daily-users series undercounts. The ClickHouse queries are well-structured, zero-filling is handled correctly in JS, and the replica routing additions are straightforward. The open question is whether the ETL pipeline populates signed_up_at via COALESCE(signedUpAt, createdAt) — if not, users whose signedUpAt was NULL in Postgres will be silently dropped from the loadTotalUsers chart, reproducing an undercount that the old Postgres query did not have. apps/backend/src/app/api/latest/internal/metrics/route.tsx — specifically the signed_up_at filter in loadTotalUsers and whether the ETL guarantees that column is always non-NULL. Important Files Changed
Sequence DiagramsequenceDiagram
participant Handler as GET /internal/metrics
participant CH as ClickHouse (analytics_internal)
participant PG as Postgres Replica
Handler->>CH: loadTotalUsers (users FINAL, 30-day window)
CH-->>Handler: daily signup counts
Handler->>CH: loadAuthOverview usersRow (users FINAL)
Handler->>CH: loadAuthOverview teamsRow (teams FINAL)
Handler->>CH: loadDailyActiveUsersSplit / loadDailyActiveTeamsSplit / loadMAU
CH-->>Handler: aggregate counts + daily splits
Handler->>PG: loadActiveUsersByCountry ($replica)
PG-->>Handler: user rows for GeoIP enrichment
Handler->>PG: loadRecentlyActiveUsers ($replica, top-5 by lastActiveAt)
PG-->>Handler: 5 most recently active users
Reviews (3): Last reviewed commit: "Fix remaining getClickhouseAdminClient r..." | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/backend/src/app/api/latest/internal/metrics/route.tsx`:
- Around line 544-545: The loadRecentlyActiveUsers function re-computes its own
timestamp causing inconsistent 30-day windows; change its signature to accept a
request-scoped Date (e.g., now: Date) and replace new Date() with that parameter
when calling getMetricsWindowBounds, then update all callers (including the
other occurrence noted around line 1585) to pass the same request-scoped now so
all loaders share the identical metrics window; reference
loadRecentlyActiveUsers(tenancy: Tenancy, includeAnonymous: boolean = false) and
Tenancy to locate and update the function and its call sites.
- Around line 550-573: The ClickHouse query in route.tsx that builds
`recently_active` currently filters events by `event_at >= {since}` and
`event_at < {untilExclusive}`, changing semantics by excluding users whose last
`$token-refresh` is older than 30 days; update the query used to compute
recently_active in the clickhouseClient.query call (the template string with
SELECT assumeNotNull(user_id) AS user_id, max(event_at) AS last_active ...) to
remove the time window filters (`AND event_at >= {since}` and `AND event_at <
{untilExclusive}`) or otherwise ensure it uses an unbounded time span so it
returns the latest RECENTLY_ACTIVE_USERS_LIMIT users regardless of age (you can
keep the other filters like projectId, branchId, includeAnonymous and keep using
formatClickhouseDateTimeParam elsewhere but do not restrict the token-refresh
selection by since/untilExclusive for this recently_active computation).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7618506d-37a7-495d-9388-512a4f11872d
📒 Files selected for processing (1)
apps/backend/src/app/api/latest/internal/metrics/route.tsx
There was a problem hiding this comment.
Pull request overview
This PR migrates internal admin metrics endpoints away from direct Postgres aggregation queries by reading user/team/auth aggregates from the ClickHouse analytics_internal tables, while also routing remaining Postgres reads through the Prisma replica client. It also makes the “recently active users” portion resilient to ClickHouse query failures by returning an empty list instead of failing the entire endpoint.
Changes:
- Moved
loadTotalUsersandloadAuthOverviewaggregates from Postgres SQL to ClickHouse queries overanalytics_internal.*. - Updated remaining Postgres lookups in metrics to use
prisma.$replica()(e.g., user joins for country/live/recent lists). - Reworked “recently active users” to be driven by ClickHouse
$token-refreshactivity, with a ClickHouseError-only fallback to[].
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Removed ClickHouse query fallback and directly utilized Prisma for fetching recently active users. The function now orders results by last active date and limits the output to the top 5 users, improving performance and simplifying error handling.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/backend/src/app/api/latest/internal/metrics/route.tsx (1)
324-324:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix missing/incorrect ClickHouse client usage in metrics route.
loadTotalUserscallsgetClickhouseAdminClient()(~line 324), butapps/backend/src/app/api/latest/internal/metrics/route.tsxonly importsgetClickhouseAdminClientForMetrics(no import forgetClickhouseAdminClient), sogetClickhouseAdminClientis undefined here.- Same issue in
loadAuthOverview(~line 1389).- Update both call sites to
getClickhouseAdminClientForMetrics()or explicitly importgetClickhouseAdminClientand document why the non-metrics client is required.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/backend/src/app/api/latest/internal/metrics/route.tsx` at line 324, The code calls getClickhouseAdminClient() in loadTotalUsers and loadAuthOverview but only imports getClickhouseAdminClientForMetrics, so replace the undefined calls with getClickhouseAdminClientForMetrics() at both call sites (or alternatively add an explicit import for getClickhouseAdminClient if the non-metrics admin client is actually required and add a short comment explaining why), and update any related type/usages to match the chosen client to ensure the client variable (e.g., clickhouseClient) is correctly instantiated and used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@apps/backend/src/app/api/latest/internal/metrics/route.tsx`:
- Line 324: The code calls getClickhouseAdminClient() in loadTotalUsers and
loadAuthOverview but only imports getClickhouseAdminClientForMetrics, so replace
the undefined calls with getClickhouseAdminClientForMetrics() at both call sites
(or alternatively add an explicit import for getClickhouseAdminClient if the
non-metrics admin client is actually required and add a short comment explaining
why), and update any related type/usages to match the chosen client to ensure
the client variable (e.g., clickhouseClient) is correctly instantiated and used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c560751d-5994-4334-9030-a7643a97c047
📒 Files selected for processing (1)
apps/backend/src/app/api/latest/internal/metrics/route.tsx
Summary
loadTotalUsers,loadAuthOverview, andloadRecentlyActiveUsersoff direct Postgres queries to read from the ClickHouseanalytics_internaltables.projectUser.findManyreads inloadActiveUsersByCountryandloadRecentlyActiveUsersthrough$replica().loadRecentlyActiveUsersfalls back to an empty list on ClickHouse query failure (captured viacaptureError) rather than failing the whole metrics endpoint.Test plan
Summary by CodeRabbit