Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split write request at field boundary #8167

Closed
wants to merge 8 commits into from

Conversation

pstibrany
Copy link
Member

@pstibrany pstibrany commented May 22, 2024

What this PR does

This PR implements splitting of WriteRequest by parsing marshalled WriteRequest and splitting it at field boundary.

This is alternative to #8077, implementing idea from comment #8077 (review).

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

pracucci and others added 6 commits May 7, 2024 15:43
…ger than max allowed size

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly nitpick. The only thing I'm not sure about is copying the buffers - I feel like that would eat up most of the benefit we get from the custom binary decoding

return marshalWriteRequestsToRecords(tenantID, subrequests)
}

func marshalWriteRequestsToRecords(tenantID string, reqs [][]byte) ([]*kgo.Record, int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick on naming: this doesn't marshal the requests it just creates records from slices of bytes

var (
remaining = atomic.NewInt64(int64(len(records)))
done = make(chan struct{})
firstErrMx sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you may be able to use atomic.Error here instead. Or a channel of errors

// into subrequests with given max size.
//
// This function partially parses WriteRequest and splits the request at field boundaries.
// Some fields (source, skipLabelNameValidation) are copied into each returned subrequests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we don't need some protection against adding new fields in the WriteRequest which won't be copied over. Something like fuzzing the WriteRequest and checking if the split request contains everything but the Timeseries and Metadata. Or maybe just a test which fails if there are new fields in WriteRequest; that will force whoever adds the fields to also look at this function. WDYT?

pkg/mimirpb/split.go Outdated Show resolved Hide resolved
func putUvarintWithExpectedLength(buf []byte, val uint64, expLength int) int {
n := binary.PutUvarint(buf, val)
if n != expLength {
panic(fmt.Sprintf("expected to write %d bytes, got %d", expLength, n))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Sprintf makes the function just complex enough so it's not inlined. I'm not sure if it's worth simplifying at this point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Sprintf makes the function just complex enough so it's not inlined.

curious how did you find that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have this external tool in GoLand

Screenshot 2024-05-29 at 17 15 50

the source of go-escape-analysis.sh is this

PKG_PATH=$1
FILE_NAME=$2

FILEPATH_WIDTH="$( echo -n "$PKG_PATH/$FILE_NAME" | wc -c )"
KEY_START=$(( $FILEPATH_WIDTH + 2 ))

go build -gcflags='-m=2' "./$PKG_PATH" 2>&1 \
	| grep "$PKG_PATH/$FILE_NAME" \
	| sort -n -k1.$KEY_START -s

then the output looks like this
Screenshot 2024-05-29 at 17 20 14

credit to @colega for sharing this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :) Thanks for sharing!

pkg/mimirpb/split.go Outdated Show resolved Hide resolved
if decodedLength <= 0 || decodedLength > math.MaxInt32 {
return nil, fmt.Errorf("invalid decoded length: %d", decodedLength)
}
if len(writeRequest) < int(decodedLength) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here do we want to check if the rest of the buffer is smaller than decodedLength? The existing if includes the tag size and the length size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this, added test.

pkg/mimirpb/split.go Outdated Show resolved Hide resolved

if extraBytes > 0 {
for ix := range subrequests {
// Clone subrequest before appending bytes to it. Here we could use a buffer pool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not super sure about this. It will double allocations right when we're receiving large requests in the first place.

Instead when we are marshalling the original request can we provide a larger slice: req.MarshalToSizedBuffer(). And then at every new subrequest we'd leave enough capacity in the slice to fit in one source and one skipLabelName.... When we're done splitting the requests we go back over the slice of subrequests and append the extra bytes if they are necessary. This way we still do the 2x copying, but at least save on the allocations.

Am I overcomplicating it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put it another way - how much cheaper is to do this binary hacks+double allocation+copying vs just creating multiple WriteRequests and calling Marshall() on each of them?

Copy link
Member Author

@pstibrany pstibrany May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from Slack: Note that remote_write clients only ever writes field=1 or 3. source is our internal field, and we only set it to non-default values from rulers (default value, API=0, is NOT serialized). Similarly, skipLabelNameValidation is our internal field, set by our enterprise proxies. That means that in vast majority of cases, this extra copying will never happen.

pstibrany and others added 2 commits May 24, 2024 15:23
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Collaborator

Thanks a lot Peter for offering this alternative version. I've considered it and benchmarked it vs #8077. At the end we picked #8077 (rationale in the PR description), but we can always get back to this binary version if #8077 will have any unexpected issue.

@pracucci pracucci closed this May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants