Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staleness batch #6355

Merged
merged 10 commits into from
Feb 16, 2024
Merged

Staleness batch #6355

merged 10 commits into from
Feb 16, 2024

Conversation

mattdurham
Copy link
Collaborator

PR Description

This moves the staleness tracker to use batch operation. Using the two fanout options should ensure that tracking is always called.

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Tests updated

@mattdurham mattdurham marked this pull request as ready for review February 13, 2024 13:55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write some benchmarks and share the results?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added a benchmark. TrackStaleness with concurrent calls averages around 2ms per handling 100k entries.

Backporting the benchmark to main takes 14ms per 100k entries. So 7x better.

Copy link
Contributor

@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat idea to batch it! Would really like to see some benchmarks so we can know for sure how much better this is :)

// Tested this to ensure it had no cpu impact, since it is called so often.
a.ls.RemoveStaleMarker(uint64(ref))
}
a.stalenessTrackers = append(a.stalenessTrackers, labelstore.StalenessTracker{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing some context here, but don't we need similar staleness tracking for AppendExemplar and AppendHistogram?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do but it never had it so keeping the pr small. I will have 2-3 more prs incoming.

service/labelstore/service.go Outdated Show resolved Hide resolved
// Tested this to ensure it had no cpu impact, since it is called so often.
a.ls.RemoveStaleMarker(uint64(ref))
}
a.stalenessTrackers = append(a.stalenessTrackers, labelstore.StalenessTracker{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a labelstore.StalenessTracker here, but later in func (s *service) TrackStaleness(ids []StalenessTracker) we convert these to &staleMarker{} and calculate the labels hash. Could we instead create the &staleMarker{} type right away here and calculate the hash here?

That way there will be less work and structs to create, the code will get simpler too I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel those are owned by two different things. Mainly the last marked state should really only be set by the labelstore itself and exposing that field feels off. This gets cleaned up slightly in the next PR.

globalID: globalRefID,
for _, id := range ids {
if value.IsStaleNaN(id.Value) {
s.staleGlobals[id.GlobalRefID] = &staleMarker{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have a map of pointers, but in the fanout we use a slice of structs stalenessTrackers []labelstore.StalenessTracker. We also use slice of structs for labels. So it's a bit inconsistent and I'm not sure what's the thinking behind it. Do we have benchmarks for what performs better? Did this come up in allocation profiles somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You dont have the traverse the map to check if you need to add or update the value like you would an array. Now its unlikely you would need to add staleness markers mutliple times, but I would rather not assume that.

Comment on lines 202 to 203
s.mut.Lock()
defer s.mut.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: could we use a different mutex for s.staleGlobals and thus avoid contention with other methods like GetLocalRefID, GetGlobalRefID, GetOrAddGlobalRefID, etc? seems possible at a glance

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fiddled with that a bit but it felt a bit more error prone in developing. I would prefer to split that out to another pr if we wanted to go that route.

@mattdurham mattdurham merged commit c57cb77 into main Feb 16, 2024
10 checks passed
@mattdurham mattdurham deleted the staleness_batch branch February 16, 2024 14:31
BarunKGP pushed a commit to BarunKGP/grafana-agent that referenced this pull request Feb 20, 2024
* Move the staleness tracking to commit and rollback in a batch.

* Move the staleness tracking to commit and rollback in a batch.

* add more specific comment

* fix linting

* PR feedback
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Mar 18, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants