Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fingerprint collision handling is O(n^2) in the number of mappings #1513

Closed
brian-brazil opened this Issue Mar 30, 2016 · 4 comments

Comments

Projects
None yet
2 participants
@brian-brazil
Copy link
Member

brian-brazil commented Mar 30, 2016

The fingerprint mappings write out a checkpoint at each mapping, syncing it to disk (

func (p *persistence) checkpointFPMappings(fpm fpMappings) (err error) {
). This is O(n^2).

This is done while the raw fingerprint is locked (

s.fpLocker.Lock(rawFP)
), so a scrape of an existing timeseries which is now colliding is now blocked waiting for the checkpoint to be written. Each sample from a scrape is processed sequentially, so this can block many scrapes.

If you've many new collisions, this can block scraping for quite a while. The example I'm looking at seems to have tens of thousands of collisions which took an hour to work through before scraping worked again.

I recommend switching to appending new mappings rather than writing out a whole new file. We should also look at coalescing the write() and syncs() together, otherwise we'll be bottlenecked there too.

@brian-brazil brian-brazil added the bug label Mar 30, 2016

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 30, 2016

oops
the code is definitely designed assuming collisions are rare.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Mar 30, 2016

This case was with synthetic data for a loadtest, so more collisions and at a higher rate than usual.

We might away with just the coalescing, which'd also avoid a data format change.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Mar 30, 2016

Ah, I'd missed how the fingerprintlocker is implemented. This will hold up all scraping as that works with locks for 1024 buckets rather than one per timeseries.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.