Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Introduce symdb #767

Merged
merged 48 commits into from Jun 26, 2023
Merged

Introduce symdb #767

merged 48 commits into from Jun 26, 2023

Conversation

kolesnikovae
Copy link
Contributor

@kolesnikovae kolesnikovae commented Jun 14, 2023

For context, please refer to the proposal.

TODO:

  • (@simonswine) Add mapping_name field to the Profile schema and ensure it is present at read time (merge step):
    • The field identifies the main binary of the profiled program and all its versions. It is supposed to be used as the partitioning key of the stack trace table:
      • Use Mapping.filename (mapping of the main binary) if it is available and valid (e.g is not "[vdso]");
      • Use service_name label if it is available (I'm not too sure if it should take precedence over Mapping.filename or not);
      • Use "unknown" otherwise.
    • The field is an index into the strings table.
    • The row order of the profiles.parquet table does not change (ordered by series ID);
    • Ensure backward compatibility.
  • (@kolesnikovae) Implement the wire format for the new symbols db;
  • (@kolesnikovae) Implement in-memory representation of the db;
  • (@cyriltovena) Integrate into phlaredb:
    • Write path
    • Read path
      • In-memory
      • File
      • Ensure backward compatibility
    • Block flush

@cyriltovena cyriltovena marked this pull request as ready for review June 26, 2023 08:17
@cyriltovena cyriltovena changed the title WIP: Introduce symdb Introduce symdb Jun 26, 2023
Copy link
Collaborator

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena
Copy link
Collaborator

I'll merge this in few hours !

@cyriltovena cyriltovena merged commit ebc3e04 into main Jun 26, 2023
17 checks passed
@cyriltovena cyriltovena deleted the feat/symdb branch June 26, 2023 12:20
@luisgerhorst
Copy link

Unfortunately, this introduced a dependency on https://github.com/dgryski/go-groupvarint for which no license has been specified. I have created an issue there: dgryski/go-groupvarint#3

Could you update the dependency once the issue has been resolved? This will ensure automatic license checkers also recognize this is fine.

@dgryski
Copy link

dgryski commented Jun 30, 2023

License updated for groupvarint.

simonswine added a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
* Increase parquet writer PageBufferSize

* reduce by 2 page buffer size

* Introduce symdb

* Add chunk format description

* Add chunk format description

* Improve naming

* Implement stack trace appender

* Limit chunk by number of nodes

* Stacktrace ID is uint32

* Add in-memory stacktrace resolver

* Add writer

* Add writer

* Fix stacktrace resolver

* Single pass write

* Index file refactoring

* Fixes, improvements, notes

* Ignore empty stacktraces

* Fix chunk boundary check

* Fix tests

* Store chunk headers sorted

* Make chunk index explicit

* Add file reader

* Use group varint encoding

* Refine stacktrace tree

* Stacktrace tree race condition elimination

* Remove unused stacktracesResolve.do

* Better nil coalescence in stack trace appender

* Format imports

* Use the new symDB package  (grafana/phlare#770)

* Ingest stacktraces in the new symdb

* Setup read in memory read path

* Fix up a comment placement

* Start setting up the read path

* Update to uint32

* Introduce stacktrace partition (grafana/phlare#775)

* Introduce stacktrace partition

This determines the partition of a particular profile, by looking first
at its metadata:

* If there is a `Filename` on the main mapping use its
  filepath.Base(Filename)
* Failing that take the externally supplied `service_name`
* Fallback to `unknown`

Take the underlying string value and hash.

* After a chat with cyril we decided to not longer mod and use the hash
straight away.

We don't wanted to risk the collisions of two very big stacktrace
applications.

* Remove reconstructMeta from singleBlockQuerier

* support multiple versions of stacktraces resolver

* Integrate v2 reader for stacktraces in block reader

* Fixes tests

* Rewrite locations Ids

* Rewrite test for counting uniq stacktraces

* lint and fmt

* Fixes more tests

* Fixes leftover from todo

---------

Co-authored-by: Christian Simon <simon@swine.de>

* Use prefixed bucket for symbols

* Initialize locationsIdsByStacktraceID

* Initialize locationsIdsByStacktraceID for pprof as well

* Fix chunk headers sort

* Inline node alloc

* Mapping filename extraction

* Tidy go.mod

* Fix TestHeadIngestStacktraces

* Use symdb.DefaultDirName

* Sort mappings on write

* Make column iterator to respect the context

* Fix unexpected EOF on stacktrace chunk unmarshal

* Fix symbols upload

* Fix symbols upload

* Release fetched data

* 3MB Page Buffer Size

* Sort stacktraces IDs as expected by the resolver

---------

Co-authored-by: Cyril Tovena <cyril.tovena@gmail.com>
Co-authored-by: Christian Simon <simon@swine.de>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants