Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Use the new symDB package #770

Merged
merged 21 commits into from
Jun 20, 2023
Merged

Use the new symDB package #770

merged 21 commits into from
Jun 20, 2023

Conversation

cyriltovena
Copy link
Collaborator

@cyriltovena cyriltovena commented Jun 15, 2023

Todos

  • Create a v2 block
  • Flush the new file
  • Read from the file.

cyriltovena and others added 21 commits June 15, 2023 09:52
* Introduce stacktrace partition

This determines the partition of a particular profile, by looking first
at its metadata:

* If there is a `Filename` on the main mapping use its
  filepath.Base(Filename)
* Failing that take the externally supplied `service_name`
* Fallback to `unknown`

Take the underlying string value and hash.

* After a chat with cyril we decided to not longer mod and use the hash
straight away.

We don't wanted to risk the collisions of two very big stacktrace
applications.
@cyriltovena cyriltovena merged commit e783311 into feat/symdb Jun 20, 2023
16 of 17 checks passed
@cyriltovena cyriltovena deleted the feat/symdb-write-path branch June 20, 2023 15:36
cyriltovena added a commit that referenced this pull request Jun 26, 2023
* Increase parquet writer PageBufferSize

* reduce by 2 page buffer size

* Introduce symdb

* Add chunk format description

* Add chunk format description

* Improve naming

* Implement stack trace appender

* Limit chunk by number of nodes

* Stacktrace ID is uint32

* Add in-memory stacktrace resolver

* Add writer

* Add writer

* Fix stacktrace resolver

* Single pass write

* Index file refactoring

* Fixes, improvements, notes

* Ignore empty stacktraces

* Fix chunk boundary check

* Fix tests

* Store chunk headers sorted

* Make chunk index explicit

* Add file reader

* Use group varint encoding

* Refine stacktrace tree

* Stacktrace tree race condition elimination

* Remove unused stacktracesResolve.do

* Better nil coalescence in stack trace appender

* Format imports

* Use the new symDB package  (#770)

* Ingest stacktraces in the new symdb

* Setup read in memory read path

* Fix up a comment placement

* Start setting up the read path

* Update to uint32

* Introduce stacktrace partition (#775)

* Introduce stacktrace partition

This determines the partition of a particular profile, by looking first
at its metadata:

* If there is a `Filename` on the main mapping use its
  filepath.Base(Filename)
* Failing that take the externally supplied `service_name`
* Fallback to `unknown`

Take the underlying string value and hash.

* After a chat with cyril we decided to not longer mod and use the hash
straight away.

We don't wanted to risk the collisions of two very big stacktrace
applications.

* Remove reconstructMeta from singleBlockQuerier

* support multiple versions of stacktraces resolver

* Integrate v2 reader for stacktraces in block reader

* Fixes tests

* Rewrite locations Ids

* Rewrite test for counting uniq stacktraces

* lint and fmt

* Fixes more tests

* Fixes leftover from todo

---------

Co-authored-by: Christian Simon <simon@swine.de>

* Use prefixed bucket for symbols

* Initialize locationsIdsByStacktraceID

* Initialize locationsIdsByStacktraceID for pprof as well

* Fix chunk headers sort

* Inline node alloc

* Mapping filename extraction

* Tidy go.mod

* Fix TestHeadIngestStacktraces

* Use symdb.DefaultDirName

* Sort mappings on write

* Make column iterator to respect the context

* Fix unexpected EOF on stacktrace chunk unmarshal

* Fix symbols upload

* Fix symbols upload

* Release fetched data

* 3MB Page Buffer Size

* Sort stacktraces IDs as expected by the resolver

---------

Co-authored-by: Cyril Tovena <cyril.tovena@gmail.com>
Co-authored-by: Christian Simon <simon@swine.de>
simonswine added a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
* Increase parquet writer PageBufferSize

* reduce by 2 page buffer size

* Introduce symdb

* Add chunk format description

* Add chunk format description

* Improve naming

* Implement stack trace appender

* Limit chunk by number of nodes

* Stacktrace ID is uint32

* Add in-memory stacktrace resolver

* Add writer

* Add writer

* Fix stacktrace resolver

* Single pass write

* Index file refactoring

* Fixes, improvements, notes

* Ignore empty stacktraces

* Fix chunk boundary check

* Fix tests

* Store chunk headers sorted

* Make chunk index explicit

* Add file reader

* Use group varint encoding

* Refine stacktrace tree

* Stacktrace tree race condition elimination

* Remove unused stacktracesResolve.do

* Better nil coalescence in stack trace appender

* Format imports

* Use the new symDB package  (grafana/phlare#770)

* Ingest stacktraces in the new symdb

* Setup read in memory read path

* Fix up a comment placement

* Start setting up the read path

* Update to uint32

* Introduce stacktrace partition (grafana/phlare#775)

* Introduce stacktrace partition

This determines the partition of a particular profile, by looking first
at its metadata:

* If there is a `Filename` on the main mapping use its
  filepath.Base(Filename)
* Failing that take the externally supplied `service_name`
* Fallback to `unknown`

Take the underlying string value and hash.

* After a chat with cyril we decided to not longer mod and use the hash
straight away.

We don't wanted to risk the collisions of two very big stacktrace
applications.

* Remove reconstructMeta from singleBlockQuerier

* support multiple versions of stacktraces resolver

* Integrate v2 reader for stacktraces in block reader

* Fixes tests

* Rewrite locations Ids

* Rewrite test for counting uniq stacktraces

* lint and fmt

* Fixes more tests

* Fixes leftover from todo

---------

Co-authored-by: Christian Simon <simon@swine.de>

* Use prefixed bucket for symbols

* Initialize locationsIdsByStacktraceID

* Initialize locationsIdsByStacktraceID for pprof as well

* Fix chunk headers sort

* Inline node alloc

* Mapping filename extraction

* Tidy go.mod

* Fix TestHeadIngestStacktraces

* Use symdb.DefaultDirName

* Sort mappings on write

* Make column iterator to respect the context

* Fix unexpected EOF on stacktrace chunk unmarshal

* Fix symbols upload

* Fix symbols upload

* Release fetched data

* 3MB Page Buffer Size

* Sort stacktraces IDs as expected by the resolver

---------

Co-authored-by: Cyril Tovena <cyril.tovena@gmail.com>
Co-authored-by: Christian Simon <simon@swine.de>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants