Skip to content

*: Phase 2.A — drop dynparquet/pqarrow from write/ingest path#6354

Open
thorfour wants to merge 1 commit into
remove-frostdbfrom
remove-frostdb-phase2
Open

*: Phase 2.A — drop dynparquet/pqarrow from write/ingest path#6354
thorfour wants to merge 1 commit into
remove-frostdbfrom
remove-frostdb-phase2

Conversation

@thorfour
Copy link
Copy Markdown
Contributor

@thorfour thorfour commented May 8, 2026

Summary

Phase 2.A of the FrostDB removal. Stacks on #6353 (Phase 1).

Replaces the dynparquet-driven Arrow schema construction and parquet-buffer ingest path on the WriteRaw / OTLP / V2 ingest sides with direct Arrow record construction.

Specifics

  • pkg/profile/schema.go — adds BuildArrowSchema(labelNames []string) *arrow.Schema. Constructs the parca write Arrow schema directly from the proto column definitions, expanding the dynamic labels column into one labels.<name> field per name. No dynparquet roundtrip.
  • pkg/normalizer/normalizer.go::WriteRawRequestToArrowRecord — drops the *dynparquet.Schema parameter and the schema.GetDynamicParquetSchemapqarrow.ParquetSchemaToArrowSchema detour. Drops the trailing arrowutils.SortRecord/Take. SampleToParquetRow is deleted (was only used by the OTel parquet path).
  • pkg/normalizer/otel.go::OtlpRequestToArrowRecord — full rewrite. Previous flow: pprof → parquet rows → dynparquet.BufferSortpqarrow.NewParquetConverter → Arrow. New flow: pprof → array.RecordBuilder → Arrow. The buffer-sort step is dropped because ClickHouse ORDER BYs at table level.
  • pkg/normalizer/arrow.go::arrowToInternalConverter.NewRecord — drops the trailing arrowutils.SortRecord/Take on the V2 ingest path. The constructor no longer takes a *dynparquet.Schema.
  • pkg/profilestore/profilecolumnstore.go — drops the *dynparquet.Schema field + constructor argument. Updates the three normalizer call sites.
  • pkg/parca/parca.go — drops the dynparquet.SchemaFromDefinition call.
  • Tests (parca_test.go, profilestore_test.go, ingester_test.go, columnquery_test.go, query_test.go, arrow_v2_test.go) — drop the schema arguments threaded into the changed signatures. No behaviour changes.

Behavioural notes

  • The Arrow records produced now come out unsorted. ClickHouse handles ordering at insert time, and the V2/symbolizer downstream doesn't rely on input order. All existing tests pass without modification.
  • Arrow column order changes from "whatever pqarrow produced" to the proto definition order. Consumers (pkg/clickhouse/ingester.go, the V2 path) look up columns by name, so order is irrelevant.

What's still on FrostDB after this

  • pkg/query/{flamegraph_arrow,table}.gopqarrow/builder Opt* / RecordBuilder / ListBuilder. → Phase 2.C.
  • pkg/query/columnquery.goarrowutils.SortRecord / Take / MergeRecords on the query side. → Phase 2.C.
  • pkg/parcacol/querier.go — FrostDB-only querier. → Phase 3 (test rework).
  • Test scaffolding that spins up a real frostdb columnstore. → Phase 3.
  • dynparquet.SchemaFromDefinition still wired up via the legacy profile.Schema() for tests. → Phase 3.

Test plan

  • go build ./...
  • go vet ./...
  • go test -short ./...
  • CI green
  • Manual smoke: ingest a pprof + an OTLP profile through a Parca pointed at ClickHouse, verify rows land

🤖 Generated with Claude Code

Phase 2.A of the FrostDB removal. Replaces the dynparquet-driven Arrow
schema construction and parquet-buffer ingest path on the
WriteRaw/OTLP/V2 ingest sides with direct Arrow record construction.

What changed:

* pkg/profile/schema.go: adds BuildArrowSchema(labelNames) which
  constructs the parca write Arrow schema from the proto column
  definitions, expanding the dynamic ColumnLabels into one
  "labels.<name>" field per labelName. No dynparquet roundtrip.
* pkg/normalizer/normalizer.go::WriteRawRequestToArrowRecord: drops the
  *dynparquet.Schema parameter, drops the
  schema.GetDynamicParquetSchema -> pqarrow.ParquetSchemaToArrowSchema
  detour, drops the trailing arrowutils.SortRecord/Take. Builds Arrow
  directly. SampleToParquetRow is deleted (was only used by the OTel
  parquet path).
* pkg/normalizer/otel.go: rewrites OtlpRequestToArrowRecord to skip the
  parquet detour entirely. The previous flow built parquet rows, wrote
  them into a dynparquet.Buffer, sorted, then converted to Arrow. The
  new flow builds Arrow records directly via array.RecordBuilder. The
  buffer.Sort() is dropped — ClickHouse ORDERs BY at table level so we
  don't need sorted Arrow input.
* pkg/normalizer/arrow.go::arrowToInternalConverter.NewRecord: drops
  the trailing arrowutils.SortRecord/Take on the V2 ingest path; same
  reasoning. The constructor no longer takes a *dynparquet.Schema.
* pkg/profilestore/profilecolumnstore.go: drops the *dynparquet.Schema
  field and constructor argument; updates the three normalizer call
  sites.
* pkg/parca/parca.go: drops the dynparquet.SchemaFromDefinition call
  and the now-unused profile / dynparquet imports.
* Tests (parca_test.go, profilestore_test.go, ingester_test.go,
  columnquery_test.go, query_test.go, arrow_v2_test.go): drop the
  schema arguments threaded into the changed signatures. No behaviour
  changes.

dynparquet/pqarrow are still imported by the FrostDB-only test
scaffolding (frostdb.New, frostdb.NewTableConfig) and by the query side
(pkg/query, pkg/parcacol/querier.go); those go in Phase 2.B/2.C.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thorfour thorfour requested a review from a team as a code owner May 8, 2026 18:47
@alwaysmeticulous
Copy link
Copy Markdown

alwaysmeticulous Bot commented May 8, 2026

✅ Meticulous spotted 0 visual differences across 288 screens tested: view results.

Meticulous evaluated ~4 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit 9f1aae5 *: Drop dynparquet/pqarrow from the parca write/ingest path. This comment will update as new commits are pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant