*: Phase 2.A — drop dynparquet/pqarrow from write/ingest path#6354
Open
thorfour wants to merge 1 commit into
Open
*: Phase 2.A — drop dynparquet/pqarrow from write/ingest path#6354thorfour wants to merge 1 commit into
thorfour wants to merge 1 commit into
Conversation
Phase 2.A of the FrostDB removal. Replaces the dynparquet-driven Arrow schema construction and parquet-buffer ingest path on the WriteRaw/OTLP/V2 ingest sides with direct Arrow record construction. What changed: * pkg/profile/schema.go: adds BuildArrowSchema(labelNames) which constructs the parca write Arrow schema from the proto column definitions, expanding the dynamic ColumnLabels into one "labels.<name>" field per labelName. No dynparquet roundtrip. * pkg/normalizer/normalizer.go::WriteRawRequestToArrowRecord: drops the *dynparquet.Schema parameter, drops the schema.GetDynamicParquetSchema -> pqarrow.ParquetSchemaToArrowSchema detour, drops the trailing arrowutils.SortRecord/Take. Builds Arrow directly. SampleToParquetRow is deleted (was only used by the OTel parquet path). * pkg/normalizer/otel.go: rewrites OtlpRequestToArrowRecord to skip the parquet detour entirely. The previous flow built parquet rows, wrote them into a dynparquet.Buffer, sorted, then converted to Arrow. The new flow builds Arrow records directly via array.RecordBuilder. The buffer.Sort() is dropped — ClickHouse ORDERs BY at table level so we don't need sorted Arrow input. * pkg/normalizer/arrow.go::arrowToInternalConverter.NewRecord: drops the trailing arrowutils.SortRecord/Take on the V2 ingest path; same reasoning. The constructor no longer takes a *dynparquet.Schema. * pkg/profilestore/profilecolumnstore.go: drops the *dynparquet.Schema field and constructor argument; updates the three normalizer call sites. * pkg/parca/parca.go: drops the dynparquet.SchemaFromDefinition call and the now-unused profile / dynparquet imports. * Tests (parca_test.go, profilestore_test.go, ingester_test.go, columnquery_test.go, query_test.go, arrow_v2_test.go): drop the schema arguments threaded into the changed signatures. No behaviour changes. dynparquet/pqarrow are still imported by the FrostDB-only test scaffolding (frostdb.New, frostdb.NewTableConfig) and by the query side (pkg/query, pkg/parcacol/querier.go); those go in Phase 2.B/2.C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
✅ Meticulous spotted 0 visual differences across 288 screens tested: view results. Meticulous evaluated ~4 hours of user flows against your PR. Expected differences? Click here. Last updated for commit |
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2.A of the FrostDB removal. Stacks on #6353 (Phase 1).
Replaces the
dynparquet-driven Arrow schema construction and parquet-buffer ingest path on the WriteRaw / OTLP / V2 ingest sides with direct Arrow record construction.Specifics
pkg/profile/schema.go— addsBuildArrowSchema(labelNames []string) *arrow.Schema. Constructs the parca write Arrow schema directly from the proto column definitions, expanding the dynamiclabelscolumn into onelabels.<name>field per name. No dynparquet roundtrip.pkg/normalizer/normalizer.go::WriteRawRequestToArrowRecord— drops the*dynparquet.Schemaparameter and theschema.GetDynamicParquetSchema→pqarrow.ParquetSchemaToArrowSchemadetour. Drops the trailingarrowutils.SortRecord/Take.SampleToParquetRowis deleted (was only used by the OTel parquet path).pkg/normalizer/otel.go::OtlpRequestToArrowRecord— full rewrite. Previous flow: pprof → parquet rows →dynparquet.Buffer→Sort→pqarrow.NewParquetConverter→ Arrow. New flow: pprof →array.RecordBuilder→ Arrow. The buffer-sort step is dropped because ClickHouseORDER BYs at table level.pkg/normalizer/arrow.go::arrowToInternalConverter.NewRecord— drops the trailingarrowutils.SortRecord/Takeon the V2 ingest path. The constructor no longer takes a*dynparquet.Schema.pkg/profilestore/profilecolumnstore.go— drops the*dynparquet.Schemafield + constructor argument. Updates the three normalizer call sites.pkg/parca/parca.go— drops thedynparquet.SchemaFromDefinitioncall.parca_test.go,profilestore_test.go,ingester_test.go,columnquery_test.go,query_test.go,arrow_v2_test.go) — drop the schema arguments threaded into the changed signatures. No behaviour changes.Behavioural notes
pkg/clickhouse/ingester.go, the V2 path) look up columns by name, so order is irrelevant.What's still on FrostDB after this
pkg/query/{flamegraph_arrow,table}.go—pqarrow/builderOpt* / RecordBuilder / ListBuilder. → Phase 2.C.pkg/query/columnquery.go—arrowutils.SortRecord/Take/MergeRecordson the query side. → Phase 2.C.pkg/parcacol/querier.go— FrostDB-only querier. → Phase 3 (test rework).dynparquet.SchemaFromDefinitionstill wired up via the legacyprofile.Schema()for tests. → Phase 3.Test plan
go build ./...go vet ./...go test -short ./...🤖 Generated with Claude Code