Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Improve Stacktraces Samples Memory Layout #796

Closed
wants to merge 57 commits into from

Conversation

cyriltovena
Copy link
Collaborator

@cyriltovena cyriltovena commented Jun 26, 2023

This introduce a new way to abstract away the memory representation and the file format on disk (parquet). The implementation heavily relies on parquet.RowReader

// RowReader reads a sequence of parquet rows.
type RowReader interface {
	// ReadRows reads rows from the reader, returning the number of rows read
	// into the buffer, and any error that occurred. Note that the rows read
	// into the buffer are not safe for reuse after a subsequent call to
	// ReadRows. Callers that want to reuse rows must copy the rows using Clone.
	//
	// When all rows have been read, the reader returns io.EOF to indicate the
	// end of the sequence. It is valid for the reader to return both a non-zero
	// number of rows and a non-nil error (including io.EOF).
	//
	// The buffer of rows passed as argument will be used to store values of
	// each row read from the reader. If the rows are not nil, the backing array
	// of the slices will be used as an optimization to avoid re-allocating new
	// arrays.
	//
	// The application is expected to handle the case where ReadRows returns
	// less rows than requested and no error, by looking at the first returned
	// value from ReadRows, which is the number of rows that were read.
	ReadRows([]Row) (int, error)
}

it uses this new abstraction to represent stacktraces samples differently.

Instead of using a slice of struct such as :

type Profile {
   Samples []*Samples
}
type Sample struct {
	StacktraceID uint64             `parquet:",delta"`
	Value        int64              `parquet:",delta"`
	Labels       []*profilev1.Label `parquet:",list"`
}

It uses a double slice :

type Profile {
   Samples Sample
}
type Samples struct {
	StacktraceIDs []uint32
	Values        []uint64
}

This highly reduce the amount of memory while ingesting profiles since we use less adress space. On top of that we don't use reflection anymore when flushing Profiles by using a custom parquet serialisation.

This is running in dev and has reduce memory usage by 50%
image

cyriltovena and others added 30 commits June 2, 2023 10:29
* Ingest stacktraces in the new symdb

* Setup read in memory read path

* Fix up a comment placement

* Start setting up the read path

* Update to uint32

* Introduce stacktrace partition (#775)

* Introduce stacktrace partition

This determines the partition of a particular profile, by looking first
at its metadata:

* If there is a `Filename` on the main mapping use its
  filepath.Base(Filename)
* Failing that take the externally supplied `service_name`
* Fallback to `unknown`

Take the underlying string value and hash.

* After a chat with cyril we decided to not longer mod and use the hash
straight away.

We don't wanted to risk the collisions of two very big stacktrace
applications.

* Remove reconstructMeta from singleBlockQuerier

* support multiple versions of stacktraces resolver

* Integrate v2 reader for stacktraces in block reader

* Fixes tests

* Rewrite locations Ids

* Rewrite test for counting uniq stacktraces

* lint and fmt

* Fixes more tests

* Fixes leftover from todo

---------

Co-authored-by: Christian Simon <simon@swine.de>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants