Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make "List Prometheus rules" API more responsive while synching rule groups #2289

Merged
merged 5 commits into from
Jul 8, 2022

Conversation

zenador
Copy link
Contributor

@zenador zenador commented Jun 30, 2022

What this PR does

Minimise the locking in DefaultMultiTenantManager to make "List Prometheus rules" API more responsive while synching rule groups.

Which issue(s) this PR fixes or relates to

Fixes #2283

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zenador zenador marked this pull request as draft June 30, 2022 11:03
@pracucci pracucci self-requested a review July 4, 2022 07:58
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zenador for working on this! I left few comments.

pkg/ruler/manager.go Outdated Show resolved Hide resolved
pkg/ruler/manager.go Show resolved Hide resolved
pkg/ruler/manager.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zenador for keep working on this. The new design has been improved, but there are still few things I would like to fix / cleanup. Thanks!

pkg/ruler/manager.go Show resolved Hide resolved
pkg/ruler/manager.go Outdated Show resolved Hide resolved
@@ -141,33 +137,50 @@ func (r *DefaultMultiTenantManager) syncRulesToManager(ctx context.Context, user
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above we call r.mapper.MapRules(). It's not thread safe. Currently syncRulesToManager() should be never called concurrently, but we haven't documented it, so at least we should update function comments to clarify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about this not being thread safe for the same user. I'm not sure if it's completely not safe due to the underlying MkdirAll call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the race would be in the creation of new files + deletion of old ones.

pkg/ruler/manager.go Outdated Show resolved Hide resolved
@56quarters 56quarters marked this pull request as ready for review July 7, 2022 17:17
@56quarters 56quarters requested a review from pracucci July 7, 2022 17:20
pkg/ruler/manager.go Outdated Show resolved Hide resolved
@@ -141,33 +137,50 @@ func (r *DefaultMultiTenantManager) syncRulesToManager(ctx context.Context, user
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the race would be in the creation of new files + deletion of old ones.

pkg/ruler/manager.go Outdated Show resolved Hide resolved
pkg/ruler/manager.go Show resolved Hide resolved
@pracucci
Copy link
Collaborator

pracucci commented Jul 8, 2022

I'm going to address my own latest comments.

Comment on lines +39 to +40
userManagerMtx sync.RWMutex
userManagers map[string]RulesManager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewrs: I split it, to better signal userManagerMtx only protects userManagers. userManagerMetrics is already thread safe.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zenador and @56quarters for the conjunt work! I pushed some last fixes and double checked the logic, which I think is the exact same of main (before this refactoring) with the exception of keeping the lock for a short period.

I will ask for another maintainer review, since I also mangled this PR.

zenador and others added 4 commits July 8, 2022 10:37
…t Prometheus rules" API more responsive while synching rule groups
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the zenador/fix-list-prom-rules-unresponsive branch from fb0dada to 037ff7c Compare July 8, 2022 08:37
@pracucci
Copy link
Collaborator

pracucci commented Jul 8, 2022

I pushed a CHANGELOG entry and rebased.

@stevesg could you review this PR, please?

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@stevesg stevesg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@pracucci pracucci merged commit a8709f3 into main Jul 8, 2022
@pracucci pracucci deleted the zenador/fix-list-prom-rules-unresponsive branch July 8, 2022 15:34
masonmei pushed a commit to udmire/mimir that referenced this pull request Jul 11, 2022
…groups (grafana#2289)

* Try to minimise the locking in DefaultMultiTenantManager to make "List Prometheus rules" API more responsive while synching rule groups

* Code review changes

* Address review feedback

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* Addressed self-review comments

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Co-authored-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
@pstibrany
Copy link
Member

Is there a way to test this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"List Prometheus rules" API unresponsive while synching rule groups
6 participants