Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Improve Stacktraces Samples Memory Layout #801

Merged
merged 2 commits into from
Jun 27, 2023
Merged

Conversation

cyriltovena
Copy link
Collaborator

This introduce a new way to abstract away the memory representation and the file format on disk (parquet). The implementation heavily relies on parquet.RowReader

// RowReader reads a sequence of parquet rows.
type RowReader interface {
	// ReadRows reads rows from the reader, returning the number of rows read
	// into the buffer, and any error that occurred. Note that the rows read
	// into the buffer are not safe for reuse after a subsequent call to
	// ReadRows. Callers that want to reuse rows must copy the rows using Clone.
	//
	// When all rows have been read, the reader returns io.EOF to indicate the
	// end of the sequence. It is valid for the reader to return both a non-zero
	// number of rows and a non-nil error (including io.EOF).
	//
	// The buffer of rows passed as argument will be used to store values of
	// each row read from the reader. If the rows are not nil, the backing array
	// of the slices will be used as an optimization to avoid re-allocating new
	// arrays.
	//
	// The application is expected to handle the case where ReadRows returns
	// less rows than requested and no error, by looking at the first returned
	// value from ReadRows, which is the number of rows that were read.
	ReadRows([]Row) (int, error)
}

it uses this new abstraction to represent stacktraces samples differently.

Instead of using a slice of struct such as :

type Profile {
   Samples []*Samples
}
type Sample struct {
	StacktraceID uint64             `parquet:",delta"`
	Value        int64              `parquet:",delta"`
	Labels       []*profilev1.Label `parquet:",list"`
}

It uses a double slice :

type Profile {
   Samples Sample
}
type Samples struct {
	StacktraceIDs []uint32
	Values        []uint64
}

This highly reduce the amount of memory while ingesting profiles since we use less adress space. On top of that we don't use reflection anymore when flushing Profiles by using a custom parquet serialisation.

This is running in dev and has reduce memory usage by 50%
image

Copy link
Collaborator

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some great improvements

pkg/phlaredb/schemas/v1/profiles.go Outdated Show resolved Hide resolved
pkg/phlaredb/schemas/v1/profiles.go Outdated Show resolved Hide resolved
@cyriltovena cyriltovena enabled auto-merge (squash) June 27, 2023 20:17
@cyriltovena cyriltovena merged commit a92e007 into main Jun 27, 2023
17 checks passed
@cyriltovena cyriltovena deleted the improve-memory-layout-2 branch June 27, 2023 20:28
simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
* Improve Stacktraces Samples Memory Layout

* Add support for optional empty fields
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants