Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote storage queue's concurrent sends cause out-of-order samples #1843

Closed
juliusv opened this Issue Jul 25, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@juliusv
Copy link
Member

juliusv commented Jul 25, 2016

The remote storage queue manager fires off concurrent requests to send sample batches to the remote storage:

go func() {
defer func() {
<-t.sendSemaphore
}()
// Samples are sent to the remote storage on a best-effort basis. If a
// sample isn't sent correctly the first time, it's simply dropped on the
// floor.
begin := time.Now()
err := t.tsdb.Store(s)
duration := time.Since(begin).Seconds()
labelValue := success
if err != nil {
log.Warnf("error sending %d samples to remote storage: %s", len(s), err)
labelValue = failure
t.failedBatches.Inc()
t.failedSamples.Add(float64(len(s)))
}
t.samplesCount.WithLabelValues(labelValue).Add(float64(len(s)))
t.sendLatency.Observe(duration)
}()

This causes samples to be sent out-of-order, which the receiving side might not be equipped to handle. Not sure what's going to happen with this interface in general, but if it's generally going to stay, we should consider removing the parallelism (or make it configurable in case people have a backend that doesn't care?).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2016

I think the other end will need to be able to deal with this. I'm expecting parallelism to be required to have the required throughput.

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Jul 25, 2016

We could have both parallelism and in order delivery. For instance, we could shard by metric name (or fingerprint...) and then deliver in parallel.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2016

To get to the point where you could consider not having parallelism would require major design first (as in we'd need to completely flesh out this interface), so I think that's very much out of scope for the medium term anyway.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.