Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable limit to scrape concurrency #4408

Open
hawkw opened this Issue Jul 23, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@hawkw
Copy link

hawkw commented Jul 23, 2018

Proposal

When Prometheus is configured with more than one scrape target, it sends requests to all those targets concurrently. In some cases, a very high number of concurrent requests can cause issues.

I propose adding a configuration option that limits the number of concurrent requests when scraping. When Prometheus issues a request to a scrape target, it should track the number of requests that are currently in-flight, and test to see whether it has reached the concurrency limit, if one has been configured. If the limit has been reached, Prometheus should wait until some in-flight requests complete before making more requests.

Use case. Why is this important?

Linkerd 2 deploys a Prometheus instance behind a proxy. This proxy places a limit on the number of routes (in this case, HTTP authorities) which can concurrently have requests in flight to them. When the limit is reached, the proxy will return errors. When the Prometheus instance attempts to scrape more targets than the proxy's route limit concurrently, the route cache bound is reached and subsequent scrape requests result in errors. Thus, the scrape is incomplete. If scrape concurrency could be limited, then routes whose requests have finished would be considered "inactive" by the proxy and could be evicted from the route cache, allowing new requests to succeed.

See linkerd/linkerd2#1234 and linkerd/linkerd2#1322 for more information on this specific use-case.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 23, 2018

I'm not sure that this would actually help you as if your proxy already can't handle the load, then moving the load around a bit isn't going to help as scrapes are already splayed over time to avoid spikes. In addition this would lead to scrapes not happening at regular intervals, which could cause artifacts. Failing as it currently is is probably the best behaviour in this scenario.

I'd suggest either having fewer targets, or removing the concurrency limit in your proxy. Allowing Prometheus to scrape the targets directly might also be beneficial, and removes a failure mode.

@briansmith

This comment has been minimized.

Copy link

briansmith commented Jul 23, 2018

Let's not fixate on the specific issue of a proxy imposing some limit on Prometheus's scraper. It sounds like Prometheus is probably doing something close to what we're asking for anyway.

scrapes are already splayed over time to avoid spikes.

That sounds like what we ultimately would want anyway, however we're not seeing the splaying done in the way we expected. Could you please point us to where we can understand how this splaying is done? Are there any tuning parameters to control it?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 23, 2018

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 24, 2018

Could you please point us to where we can understand how this splaying is done?

prometheus/scrape/target.go

Lines 125 to 139 in 6a464ae

// offset returns the time until the next scrape cycle for the target.
func (t *Target) offset(interval time.Duration) time.Duration {
now := time.Now().UnixNano()
var (
base = now % int64(interval)
offset = t.hash() % uint64(interval)
next = base + int64(offset)
)
if next > int64(interval) {
next -= int64(interval)
}
return time.Duration(next)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.