Optimize large dataset queries and retention#1
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5e3ed79c00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| return db.prepare(` | ||
| SELECT * FROM wide_events | ||
| SELECT id, drop_id, trace_id, service_name, operation_type, field_name, outcome, duration_ms, user_id, error_count, rpc_call_count, created_at |
There was a problem hiding this comment.
Preserve attributes in dashboard samples
getRecentWideEvents now omits the attributes column, but /api/dashboards/generate still passes these rows into profileWideEvents, which parses attributes to discover candidate fields. As a result, generated dashboards lose all attribute-derived dimensions/metrics (and AI generation context) even when that data exists in wide_events, which materially reduces dashboard quality on real datasets.
Useful? React with 👍 / 👎.
|
|
||
| export function getRecentWideEvents(dropId: number, limit = 100, offset = 0) { | ||
| export function getRecentWideEvents(dropId: number, limit = 100, offset = 0, beforeId?: number | null) { | ||
| const lim = clampLimit(limit); |
There was a problem hiding this comment.
Honor requested dashboard sample limit
getRecentWideEvents now clamps every call to clampLimit (max 2000), but the dashboard generation route still accepts limits up to 20,000 for profiling. This means larger requested samples are silently truncated, so generated dashboards can be based on a much smaller and less representative slice than the caller requested.
Useful? React with 👍 / 👎.
Summary
Retention behavior
/api/admin/retentionscheduler diagnostics and README env docs.Verification
npm run test:paginationpassed.npm run test:retention-pathspassed.npm run test:retentionpassed on 200k traces / 50k events: indexed predicates used, deleted 5k traces + 5k events in 10 batches (~25ms), fresh rows preserved, WAL checkpoint busy=0.npm run test:retention -- --traces=2100000 --events=200000 --batch=1000 --max-ms=100deleted 16k traces + 15k events in 31 batches (~102.7ms), indexed predicates used, busy=false.npm run buildpassed.