Skip to content

scrape: timeout WriteRaw, drop sp.mtx before waiting on loops#6361

Merged
brancz merged 1 commit into
mainfrom
scrape-timeout
May 19, 2026
Merged

scrape: timeout WriteRaw, drop sp.mtx before waiting on loops#6361
brancz merged 1 commit into
mainfrom
scrape-timeout

Conversation

@brancz
Copy link
Copy Markdown
Member

@brancz brancz commented May 19, 2026

The per-target scrape loop bounded only the HTTP scrape with scrape_timeout; the subsequent store.WriteRaw call ran under the bare scrapeCtx with no deadline. When write latency spiked, each iteration blocked indefinitely inside WriteRaw and the target appeared to stop scraping (time.Ticker doesn't accumulate, so only one make-up scrape fires after the call returns). WriteRaw now runs under context.WithTimeout(scrapeCtx, scrape_timeout) so a slow store fails the scrape and the next interval fires on time.

scrapePool.stop() and reload() also held sp.mtx across wg.Wait() for old loops to terminate. A loop wedged inside WriteRaw would therefore block sp.mtx, and — via Manager.ApplyConfig / Stop holding m.mtxScrape — block the manager's Run loop from draining new target sets, freezing every pool. Both methods now snapshot the loops under sp.mtx, release the lock, and wait on goroutines outside it, matching the pattern already used by sync().

The per-target scrape loop bounded only the HTTP scrape with
scrape_timeout; the subsequent store.WriteRaw call ran under the bare
scrapeCtx with no deadline. When write latency spiked, each iteration
blocked indefinitely inside WriteRaw and the target appeared to stop
scraping (time.Ticker doesn't accumulate, so only one make-up scrape
fires after the call returns). WriteRaw now runs under
context.WithTimeout(scrapeCtx, scrape_timeout) so a slow store fails the
scrape and the next interval fires on time.

scrapePool.stop() and reload() also held sp.mtx across wg.Wait() for old
loops to terminate. A loop wedged inside WriteRaw would therefore block
sp.mtx, and — via Manager.ApplyConfig / Stop holding m.mtxScrape — block
the manager's Run loop from draining new target sets, freezing every
pool. Both methods now snapshot the loops under sp.mtx, release the
lock, and wait on goroutines outside it, matching the pattern already
used by sync().
@alwaysmeticulous
Copy link
Copy Markdown

alwaysmeticulous Bot commented May 19, 2026

✅ Meticulous spotted 0 visual differences across 288 screens tested: view results.

Meticulous evaluated ~4 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit a25a590 scrape: timeout WriteRaw, drop sp.mtx before waiting on loops. This comment will update as new commits are pushed.

@brancz brancz merged commit 6c73af5 into main May 19, 2026
35 checks passed
@brancz brancz deleted the scrape-timeout branch May 19, 2026 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants