Improve Stacktraces Samples Memory Layout #801

cyriltovena · 2023-06-27T07:47:49Z

This introduce a new way to abstract away the memory representation and the file format on disk (parquet). The implementation heavily relies on parquet.RowReader

// RowReader reads a sequence of parquet rows.
type RowReader interface {
	// ReadRows reads rows from the reader, returning the number of rows read
	// into the buffer, and any error that occurred. Note that the rows read
	// into the buffer are not safe for reuse after a subsequent call to
	// ReadRows. Callers that want to reuse rows must copy the rows using Clone.
	//
	// When all rows have been read, the reader returns io.EOF to indicate the
	// end of the sequence. It is valid for the reader to return both a non-zero
	// number of rows and a non-nil error (including io.EOF).
	//
	// The buffer of rows passed as argument will be used to store values of
	// each row read from the reader. If the rows are not nil, the backing array
	// of the slices will be used as an optimization to avoid re-allocating new
	// arrays.
	//
	// The application is expected to handle the case where ReadRows returns
	// less rows than requested and no error, by looking at the first returned
	// value from ReadRows, which is the number of rows that were read.
	ReadRows([]Row) (int, error)
}

it uses this new abstraction to represent stacktraces samples differently.

Instead of using a slice of struct such as :

type Profile {
   Samples []*Samples
}
type Sample struct {
	StacktraceID uint64             `parquet:",delta"`
	Value        int64              `parquet:",delta"`
	Labels       []*profilev1.Label `parquet:",list"`
}

It uses a double slice :

type Profile {
   Samples Sample
}
type Samples struct {
	StacktraceIDs []uint32
	Values        []uint64
}

This highly reduce the amount of memory while ingesting profiles since we use less adress space. On top of that we don't use reflection anymore when flushing Profiles by using a custom parquet serialisation.

This is running in dev and has reduce memory usage by 50%

simonswine

LGTM. Some great improvements

pkg/phlaredb/schemas/v1/profiles.go

* Improve Stacktraces Samples Memory Layout * Add support for optional empty fields

Improve Stacktraces Samples Memory Layout

7aee1d9

simonswine approved these changes Jun 27, 2023

View reviewed changes

pkg/phlaredb/schemas/v1/profiles.go Outdated Show resolved Hide resolved

pkg/phlaredb/schemas/v1/profiles.go Outdated Show resolved Hide resolved

Add support for optional empty fields

3b4ffaa

cyriltovena enabled auto-merge (squash) June 27, 2023 20:17

cyriltovena merged commit a92e007 into main Jun 27, 2023
17 checks passed

cyriltovena deleted the improve-memory-layout-2 branch June 27, 2023 20:28

simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023

Improve Stacktraces Samples Memory Layout (grafana/phlare#801)

c5178df

* Improve Stacktraces Samples Memory Layout * Add support for optional empty fields

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Stacktraces Samples Memory Layout #801

Improve Stacktraces Samples Memory Layout #801

cyriltovena commented Jun 27, 2023

simonswine left a comment

Improve Stacktraces Samples Memory Layout #801

Improve Stacktraces Samples Memory Layout #801

Conversation

cyriltovena commented Jun 27, 2023

simonswine left a comment

Choose a reason for hiding this comment