Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm: Fix Prometheus rule LokiTooManyCompactorsRunning #7049

Closed
wants to merge 12 commits into from
Closed

Helm: Fix Prometheus rule LokiTooManyCompactorsRunning #7049

wants to merge 12 commits into from

Conversation

calvinbui
Copy link
Contributor

Signed-off-by: Calvin Bui 3604363+calvinbui@users.noreply.github.com

What this PR does / why we need it:

The Helm chart deploys two services, loki and loki-headless.

The loki_boltdb_shipper_compactor_running metric will show all loki pods attached to both services.

The Prometheus rule in the chart, LokiTooManyCompactorsRunning, does an aggregation over an entire namespace, which will always double the amount of compactors it detects.

image

Which issue(s) this PR fixes:
Fixes N/A

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

Signed-off-by: Calvin Bui <3604363+calvinbui@users.noreply.github.com>
@calvinbui calvinbui requested a review from a team as a code owner September 5, 2022 06:54
@CLAassistant
Copy link

CLAassistant commented Sep 5, 2022

CLA assistant check
All committers have signed the CLA.

@calvinbui calvinbui marked this pull request as draft September 5, 2022 06:55
Signed-off-by: Calvin Bui <3604363+calvinbui@users.noreply.github.com>
@pull-request-size pull-request-size bot added size/S and removed size/XS labels Sep 5, 2022
@calvinbui calvinbui marked this pull request as ready for review September 5, 2022 06:56
@calvinbui
Copy link
Contributor Author

CLA signed and chart version bumped

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Collaborator

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @calvinbui for the PR. Sounds reasonable to me. Leave it to @trevorwhitney for final approval.

Added one minor suggestion to the changelog entry.

production/helm/loki/CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Kaviraj Kanagaraj <kavirajkanagaraj@gmail.com>
@calvinbui
Copy link
Contributor Author

calvinbui commented Sep 6, 2022

On second thought, I'm not sure if this is the best approach as there'll be two alerts if it does go >1

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I'm not sure if this is the best approach as there'll be two alerts if it does go >1

Agreed. I think instead we should only scrape 1 of those services. My inclination would be to drop loki-headless, WDYT?

@calvinbui
Copy link
Contributor Author

On second thought, I'm not sure if this is the best approach as there'll be two alerts if it does go >1

Agreed. I think instead we should only scrape 1 of those services. My inclination would be to drop loki-headless, WDYT?

what about the simple-scalable services? they use -read and -write suffixed services

@varac
Copy link

varac commented Sep 20, 2022

Hej, looking forward for this fix, what's the current state ?

@trevorwhitney
Copy link
Collaborator

what about the simple-scalable services? they use -read and -write suffixed services

yeah, good point, looks like maybe we want to always deploy the loki service (for both single binary and scalable topologies), and then we can just scrape this service and drop metrics from headless. WDYT?

@zarikx
Copy link
Contributor

zarikx commented Sep 23, 2022

If I'm not mistaken, loki-headless just needs prometheus.io/service-monitor: "false" label (similar to service-read-headless and service-write-headless services) to prevent discovering single-binary StatefulSet twice.

@calvinbui
Copy link
Contributor Author

If I'm not mistaken, loki-headless just needs prometheus.io/service-monitor: "false" label (similar to service-read-headless and service-write-headless services) to prevent discovering single-binary StatefulSet twice.

agreed, the best solution.

PR updated

@calvinbui calvinbui requested review from trevorwhitney and removed request for trevorwhitney September 25, 2022 23:27
@calvinbui calvinbui requested review from trevorwhitney and kavirajk and removed request for trevorwhitney September 25, 2022 23:27
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks!

@calvinbui calvinbui requested review from trevorwhitney and removed request for kavirajk October 10, 2022 00:53
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
- querier/queryrange	-0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@trevorwhitney
Copy link
Collaborator

@calvinbui looks like this just needs a helm-docs update and it's good to go.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@trevorwhitney
Copy link
Collaborator

@calvinbui looks like a markdown linting error?

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@calvinbui
Copy link
Contributor Author

@calvinbui looks like a markdown linting error?

I've updated my branch. I didn't do any changes to the markdown before, and this time around either.

@trevorwhitney
Copy link
Collaborator

@calvinbui looks like all we need is one more rebase, a lot moving on the helm chart these past few days 😄 Thanks!

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

-           ingester	-0.1%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@calvinbui
Copy link
Contributor Author

😦😩

@calvinbui calvinbui closed this Oct 31, 2022
MichelHollands pushed a commit that referenced this pull request Nov 10, 2022
)

**What this PR does / why we need it**:

Ignore the headless service from the Prometheus service-monitor as it
was resulting in duplicate metrics. Replaces
#7049
Abuelodelanada pushed a commit to canonical/loki that referenced this pull request Dec 1, 2022
…afana#7638)

**What this PR does / why we need it**:

Ignore the headless service from the Prometheus service-monitor as it
was resulting in duplicate metrics. Replaces
grafana#7049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants