Split write request at field boundary #8167

pstibrany · 2024-05-22T10:02:53Z

What this PR does

This PR implements splitting of WriteRequest by parsing marshalled WriteRequest and splitting it at field boundary.

This is alternative to #8077, implementing idea from comment #8077 (review).

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

…ger than max allowed size Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

dimitarvdimitrov

mostly nitpick. The only thing I'm not sure about is copying the buffers - I feel like that would eat up most of the benefit we get from the custom binary decoding

dimitarvdimitrov · 2024-05-22T14:26:18Z

pkg/storage/ingest/writer.go

+	return marshalWriteRequestsToRecords(tenantID, subrequests)
+}
+
+func marshalWriteRequestsToRecords(tenantID string, reqs [][]byte) ([]*kgo.Record, int, error) {


nitpick on naming: this doesn't marshal the requests it just creates records from slices of bytes

dimitarvdimitrov · 2024-05-22T14:35:41Z

pkg/storage/ingest/writer.go

+	var (
+		remaining  = atomic.NewInt64(int64(len(records)))
+		done       = make(chan struct{})
+		firstErrMx sync.Mutex


nit: you may be able to use atomic.Error here instead. Or a channel of errors

dimitarvdimitrov · 2024-05-22T14:38:58Z

pkg/mimirpb/split.go

+// into subrequests with given max size.
+//
+// This function partially parses WriteRequest and splits the request at field boundaries.
+// Some fields (source, skipLabelNameValidation) are copied into each returned subrequests.


i wonder if we don't need some protection against adding new fields in the WriteRequest which won't be copied over. Something like fuzzing the WriteRequest and checking if the split request contains everything but the Timeseries and Metadata. Or maybe just a test which fails if there are new fields in WriteRequest; that will force whoever adds the fields to also look at this function. WDYT?

pkg/mimirpb/split.go

dimitarvdimitrov · 2024-05-22T14:50:37Z

pkg/mimirpb/split.go

+func putUvarintWithExpectedLength(buf []byte, val uint64, expLength int) int {
+	n := binary.PutUvarint(buf, val)
+	if n != expLength {
+		panic(fmt.Sprintf("expected to write %d bytes, got %d", expLength, n))


the Sprintf makes the function just complex enough so it's not inlined. I'm not sure if it's worth simplifying at this point

the Sprintf makes the function just complex enough so it's not inlined.

curious how did you find that?

i have this external tool in GoLand

the source of go-escape-analysis.sh is this

PKG_PATH=$1 FILE_NAME=$2 FILEPATH_WIDTH="$( echo -n "$PKG_PATH/$FILE_NAME" | wc -c )" KEY_START=$(( $FILEPATH_WIDTH + 2 )) go build -gcflags='-m=2' "./$PKG_PATH" 2>&1 \ | grep "$PKG_PATH/$FILE_NAME" \ | sort -n -k1.$KEY_START -s

then the output looks like this

credit to @colega for sharing this

nice :) Thanks for sharing!

pkg/mimirpb/split.go

dimitarvdimitrov · 2024-05-22T14:58:32Z

pkg/mimirpb/split.go

+			if decodedLength <= 0 || decodedLength > math.MaxInt32 {
+				return nil, fmt.Errorf("invalid decoded length: %d", decodedLength)
+			}
+			if len(writeRequest) < int(decodedLength) {


here do we want to check if the rest of the buffer is smaller than decodedLength? The existing if includes the tag size and the length size

thanks for catching this, added test.

pkg/mimirpb/split.go

dimitarvdimitrov · 2024-05-22T15:18:46Z

pkg/mimirpb/split.go

+
+	if extraBytes > 0 {
+		for ix := range subrequests {
+			// Clone subrequest before appending bytes to it. Here we could use a buffer pool.


i'm not super sure about this. It will double allocations right when we're receiving large requests in the first place.

Instead when we are marshalling the original request can we provide a larger slice: req.MarshalToSizedBuffer(). And then at every new subrequest we'd leave enough capacity in the slice to fit in one source and one skipLabelName.... When we're done splitting the requests we go back over the slice of subrequests and append the extra bytes if they are necessary. This way we still do the 2x copying, but at least save on the allocations.

Am I overcomplicating it?

To put it another way - how much cheaper is to do this binary hacks+double allocation+copying vs just creating multiple WriteRequests and calling Marshall() on each of them?

Copied from Slack: Note that remote_write clients only ever writes field=1 or 3. source is our internal field, and we only set it to non-default values from rulers (default value, API=0, is NOT serialized). Similarly, skipLabelNameValidation is our internal field, set by our enterprise proxies. That means that in vast majority of cases, this extra copying will never happen.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci · 2024-05-31T10:20:10Z

Thanks a lot Peter for offering this alternative version. I've considered it and benchmarked it vs #8077. At the end we picked #8077 (rationale in the PR description), but we can always get back to this binary version if #8077 will have any unexpected issue.

pracucci and others added 6 commits May 7, 2024 15:43

Split a per-partition WriteRequest into multiple Kafka records if big…

72cb323

…ger than max allowed size Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fix partialReqSize reset

0379d8b

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Split write request.

ae48b38

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

No need to split tags into field/type.

d552084

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

Add metadata to the test.

e5b04c5

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

Comments.

614782d

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

dimitarvdimitrov reviewed May 22, 2024

View reviewed changes

pstibrany and others added 2 commits May 24, 2024 15:23

Address review feedback.

bb7be8c

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

Added BenchmarkWriteRequest_SplitByMaxMarshalSize

ab63a82

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci mentioned this pull request May 30, 2024

Split a per-partition WriteRequest into multiple Kafka records if bigger than max allowed size #8077

Merged

4 tasks

pracucci closed this May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split write request at field boundary #8167

Split write request at field boundary #8167

pstibrany commented May 22, 2024 •

edited

Loading

dimitarvdimitrov left a comment

dimitarvdimitrov May 22, 2024

dimitarvdimitrov May 22, 2024

dimitarvdimitrov May 22, 2024

dimitarvdimitrov May 22, 2024

pstibrany May 24, 2024

dimitarvdimitrov May 29, 2024

pstibrany May 30, 2024

dimitarvdimitrov May 22, 2024

pstibrany May 24, 2024

dimitarvdimitrov May 22, 2024

dimitarvdimitrov May 22, 2024

pstibrany May 22, 2024 •

edited

Loading

pracucci commented May 31, 2024

Split write request at field boundary #8167

Split write request at field boundary #8167

Conversation

pstibrany commented May 22, 2024 • edited Loading

What this PR does

Checklist

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstibrany May 22, 2024 • edited Loading

Choose a reason for hiding this comment

pracucci commented May 31, 2024

pstibrany commented May 22, 2024 •

edited

Loading

pstibrany May 22, 2024 •

edited

Loading