Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruler: Support writeNotify on Loki ruler #10906

Merged
merged 10 commits into from Oct 16, 2023

Conversation

kavirajk
Copy link
Collaborator

@kavirajk kavirajk commented Oct 15, 2023

What this PR does / why we need it:
prometheus ruler added a feature of notifying the reader when a sample is appended, instead of waiting in a loop burning the CPU cycles. prometheus/prometheus#11949

This changes a default behaviour a bit. Now if notify is not enabled, next read is done only when next readTicker is triggered.

Which issue(s) this PR fixes:
Also should fix #10859

Special notes for your reviewer:
Adding few more details for the sake of completeness.

We found this via more frequent failures of rule-evaluation integration tests linked on the issue above. After some investigation, we tracked down to prometheus changes.

Prometheus introduced new type wlog.WriteNotified interface with Notify() method with a goal to notify any waiting readers, that some write is done.

Two types implements this type wlog.Watcher and remote.Storage. remote.Storage implements Notify() by just calling it's queues wlog.Watcher's Notify() under the hood.

How are these types impacts Loki ruler?

Loki ruler also uses remote.Storage. So when any samples got committed via appender, we have to notify the remote storage.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR

prometheus ruler added a feature of notifying the reader when a sample is appended instead of waiting loop burning the CPU cycles.
prometheus/prometheus#11949

Also should fix: #10859

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk kavirajk force-pushed the kavirajk/integrate-rule-write-notify branch from 78accf9 to 4111e1c Compare October 15, 2023 20:58
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Comment on lines 693 to 695
if a.writeNotified() != nil {
a.writeNotified().Notify()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little clunky; how about just passing a Notify() function to the appender? The appender doesn't need any knowledge of the wlog.WriteNotified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't have thought about this fix. This is cool. That said I'm a little confused. Shouldn't we implement Notify() and pass it on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the clunkiness :)

It comes from

type appender struct {
	w             *Storage
	writeNotified func() wlog.WriteNotified
	series        []record.RefSeries
	samples       []record.RefSample
	exemplars     []record.RefExemplar
}

writeNotified being func() type.

It got bit tricky because we initialize few things during NewStorage() and few other things during initialize() call.

wal is created during wal.NewStorage but the remoteStorage (that has Notify()) only created during initialize. We have to it as func type because we can only know if wal.Storage has write notify set that way. Also note, wal.Storage uses appender as a pool.

We can make it more clean by refactoring few things on NewStorage and initialize. But AFAIK not worth it now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SetWriteNotified (nit: I think it should be SetWriteNotifier, by the way, it's a pity the prom type is named weirdly), you could just reference the Notify function there, no?

You could then have something like this:

type appender struct {
	w             *Storage
	notify        func()
	series        []record.RefSeries
	samples       []record.RefSample
	exemplars     []record.RefExemplar
}

func (w *Storage) SetWriteNotifier(notifier wlog.WriteNotified) {
        if notifier == nil { return }

	w.notifier = notifier.Notify
}

	// Notify so that reader waiting for it can read without needing to wait for next read ticker.
	if a.notify == nil {
           // log unexpected nil notifier
	}

        a.notify()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make notify as just func(). Agree. But cannot be done quite like you mentioned here. Given relationship between wal.Storage and appender is via sync.Pool.

I made it work, by moving some things to appenderPool.New func. Seems better now and solves your original concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also about the nit: SetWriteNotified -> SetWriteNotifier.

I think they calling it WriteNotified and not WriteNotifier because, WriteNotifier is confusing in a sense, is it notifying the "write" that is done? or notifying the "writer".

We want to notify the "write" that is actually done (potentially to readers waiting for it).

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, thanks!

pkg/ruler/storage/wal/wal.go Show resolved Hide resolved
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk kavirajk marked this pull request as ready for review October 16, 2023 09:45
@kavirajk kavirajk requested a review from a team as a code owner October 16, 2023 09:45
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@pull-request-size pull-request-size bot added size/S and removed size/M labels Oct 16, 2023
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one small grammar nit

pkg/ruler/storage/wal/wal.go Outdated Show resolved Hide resolved
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk kavirajk merged commit 52a7c0f into main Oct 16, 2023
4 checks passed
@kavirajk kavirajk deleted the kavirajk/integrate-rule-write-notify branch October 16, 2023 12:07
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
**What this PR does / why we need it**:
prometheus ruler added a feature of notifying the reader when a sample
is appended, instead of waiting in a loop burning the CPU cycles.
prometheus/prometheus#11949

This changes a default behaviour a bit. Now if `notify` is not enabled,
next read is done only when next readTicker is triggered.

**Which issue(s) this PR fixes**:
Also should fix grafana#10859

**Special notes for your reviewer**:
Adding few more details for the sake of completeness.

We found this via more frequent failures of rule-evaluation integration
tests linked on the issue above. After some investigation, we tracked
down to prometheus changes.

Prometheus introduced new type `wlog.WriteNotified` interface with
`Notify()` method with a goal to notify any waiting readers, that some
write is done.

Two types implements this type `wlog.Watcher` and `remote.Storage`.
`remote.Storage` implements `Notify()` by just calling it's queues
`wlog.Watcher`'s `Notify()` under the hood.

How are these types impacts Loki ruler?

Loki ruler also uses `remote.Storage`. So when any samples got committed
via `appender`, we have to notify the remote storage.

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)

---------

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Test(Local|Remote)RuleEval integration test more reliable
3 participants