Skip to content

feat: adds direct_read optimization by default for logs and traces#2341

Merged
kodiakhq[bot] merged 5 commits into
mainfrom
aaron/fts-for-26-4
May 27, 2026
Merged

feat: adds direct_read optimization by default for logs and traces#2341
kodiakhq[bot] merged 5 commits into
mainfrom
aaron/fts-for-26-4

Conversation

@knudtty

@knudtty knudtty commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Enables the direct_read FTS optimization to CH seed by default. This optimization makes map exact queries extremely fast, simply relying on index evaluation and never requiring loading the map. A bug prevented this previously, but now works with the latest versions of 26.2, 26.3, 26.4, and all versions of 26.5 or greater.

References

  • Linear Issue: HDX-4352

@vercel

vercel Bot commented May 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
hyperdx-oss Ignored Ignored Preview May 26, 2026 11:55pm

Request Review

@changeset-bot

changeset-bot Bot commented May 26, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 0e7ab98

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@hyperdx/common-utils Minor
@hyperdx/app Patch
@hyperdx/api Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label May 26, 2026
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (5):
    • docker/clickhouse/local/init-db-e2e.sh
    • docker/otel-collector/schema/seed/00002_otel_logs.sql
    • docker/otel-collector/schema/seed/00005_otel_traces.sql
    • docker/otel-collector/schema/seed/00005_otel_traces_compat.sql
    • packages/otel-collector/cmd/migrate/main.go

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 8
  • Production lines changed: 283 (+ 411 in test files, excluded from tier calculation)
  • Branch: aaron/fts-for-26-4
  • Author: knudtty

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

E2E Test Results

All tests passed • 188 passed • 3 skipped • 1263s

Status Count
✅ Passed 188
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Deep Review

✅ No critical issues found. The diff is well-tested (new unit tests for version parsing, gate, and ALIAS/MATERIALIZED rewrite paths) and the version gate fails closed for unknown branches. The findings below are all P2 or below.

🟡 P2 -- recommended

  • .changeset/direct-read-map-default.md:13 -- The phrase "freshly migrated tables get the optimization automatically" is misleading because the seeds use CREATE TABLE IF NOT EXISTS, so on existing deployments the new ALIAS items columns and idx_*_items skip indexes are silently no-op'd and operators get no error or warning when their tables don't pick up the optimization on upgrade.

    • Fix: Rephrase the changeset to scope the automatic-optimization claim to fresh installs, and add an ALTER TABLE … ADD COLUMN IF NOT EXISTS / ADD INDEX IF NOT EXISTS / MATERIALIZE INDEX runbook snippet for operators upgrading existing logs and traces tables.
    • correctness, data-migrations, reliability
  • docker/otel-collector/schema/seed/00002_otel_logs.sql:34 -- Fresh tables lose idx_res_attr_value / idx_scope_attr_value / idx_log_attr_value (text(tokenizer='array') on mapValues), and fresh traces lose the analogous idx_*_value bloom_filter indexes in 00005_otel_traces.sql:33; the new items index only covers 'key=value' tokens, so ad-hoc SQL or external dashboards that filter on a bare attribute value without a key (e.g. has(mapValues(LogAttributes), 'someUUID')) now do a full scan on freshly created tables.

    • Fix: Either keep the mapValues indexes alongside the new items indexes, or document the index-coverage tradeoff in the changeset so operators relying on bare-value lookups can decide whether to re-add them via ALTER TABLE.
    • correctness, data-migrations
🔵 P3 nitpicks (7)
  • packages/otel-collector/cmd/migrate/main.go:418 -- swapTracesSchemaForCompat does os.Remove then os.Rename, but on POSIX os.Rename overwrites the destination atomically; the two-step form leaves the temp dir without a 00005_otel_traces.sql between the calls if a SIGKILL lands there, mirroring the same pattern in the existing logs swap.

    • Fix: Drop the os.Remove and let os.Rename(compatPath, fullTextPath) do the atomic overwrite (and apply the same simplification to swapLogsSchemaForCompat).
    • correctness, data-migrations
  • docker/clickhouse/local/init-db-e2e.sh:89 -- e2e_otel_traces keeps the pre-existing idx_res_attr_value / idx_span_attr_value bloom_filter indexes while also adding the new items indexes, so the e2e environment exercises a different index inventory than the production traces seed (which removes the value indexes).

    • Fix: Remove idx_res_attr_value and idx_span_attr_value from e2e_otel_traces so the e2e plan matches the production fresh-install plan.
    • correctness, data-migrations
  • docker/clickhouse/local/init-db-e2e.sh:1 -- e2e_otel_logs is not updated to add ResourceAttributeItems / ScopeAttributeItems / LogAttributeItems ALIAS columns or replace the mapValues text indexes, so the e2e logs schema diverges from production and the items-index rewrite path is not exercised end-to-end for logs.

    • Fix: Mirror the 00002_otel_logs.sql schema change in e2e_otel_logs so e2e tests cover the new items-column rewrite path for logs.
    • data-migrations
  • packages/common-utils/src/core/metadata.ts:946 -- getServerVersion's broad catch returns undefined for every error class, but every other metadata fetcher in this file (getSetting, getSettings, getSkipIndices, getOtelTables) only swallows Not enough privileges and rethrows; the inconsistency hides parse failures and transient transport errors as silent "unknown version" downgrades.

    • Fix: Narrow the catch to recognise only Not enough privileges (and explicit parse-returns-undefined), and let other errors propagate to match the sibling methods.
    • reliability, kieran-typescript
  • packages/otel-collector/cmd/migrate/main.go:524 -- The compat-fallback path emits a single default-level log.Printf and the run still ends with the usual success message, so operators have no elevated signal that the cluster is on a downgraded schema that won't auto-upgrade when ClickHouse is later patched past 26.2.

    • Fix: Prefix the fallback line with WARNING: (matching the existing WARNING: Failed to list SQL files style on main.go:533) and repeat it at the end of a successful compat run.
  • packages/common-utils/src/core/clickhouseVersion.ts:88 -- The per-branch minima [26, 2, 19, 43], [26, 3, 12, 3], [26, 4, 3, 37] are hand-curated with no link to the upstream ClickHouse PR / changelog, so a future maintainer can't audit whether the cutoffs are correct without re-deriving them from scratch.

    • Fix: Add an inline comment pointing to the upstream ClickHouse PR or release-notes entry that landed the backport for each branch.
    • data-migrations
  • packages/common-utils/src/core/clickhouseVersion.ts:27 -- raw.match(/^\d+/) silently strips any non-digit suffix from the patch/tweak components, so an input like 26.4.3rc1.37 parses to [26, 4, 3, 37] and would be treated as a supported stable release; not a concern for canonical CH version strings but the silent fallback isn't pinned by tests.

    • Fix: Either tighten the per-component regex to /^\d+$/ (and document that pre-release suffixes are unsupported) or add a test that pins the current liberal behaviour so future tightening doesn't regress silently.
    • correctness, kieran-typescript

Reviewers (6): correctness, data-migrations, performance, kieran-typescript, reliability, deployment-verification.

Testing gaps:

  • No integration test runs the migrator against a pre-seeded existing otel_logs / otel_traces table to verify behaviour on the upgrade path (the changeset's "freshly migrated tables" claim is unverified by CI).
  • No test exercises MetadataCache.getOrFetch when the fetcher resolves to undefined; the current cache semantics (next call re-fetches) are load-bearing for getServerVersion retry behaviour but only implicit in the != null check.
  • parseClickHouseVersion has no test for embedded non-numeric patch/tweak components (e.g. 26.4.3rc1.37); the silent-coerce-to-leading-digits behaviour is unpinned.
  • The new idxExpr === candidate.default_expression branch in buildKvItemsLookup is unit-tested with a hand-crafted ALIAS expression, but no integration test verifies it matches ClickHouse's actual rendering of system.data_skipping_indices.expr for an ALIAS column.

@kodiakhq kodiakhq Bot merged commit dcab1cb into main May 27, 2026
18 checks passed
@kodiakhq kodiakhq Bot deleted the aaron/fts-for-26-4 branch May 27, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants