Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't add the memberlist label as a selector to services. #6626

Merged
merged 1 commit into from
Jul 8, 2022

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Jul 8, 2022

Adding this selector can result in outages caused by all write traffic being routed to only new distributors during initial migrations to memberlists, causing the few new distributors at the start of a rollout to be OOM killed.

cc @JordanRushing @vlad-diachenko @slim-bean

Signed-off-by: Callum Styan callumstyan@gmail.com

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan cstyan requested a review from a team as a code owner July 8, 2022 05:40
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

1 similar comment
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find 🎉 !

Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. great find. 🚀 Thanks @cstyan.

Curious how did you endup finding it?

@chaudum chaudum merged commit 920a2d1 into main Jul 8, 2022
@chaudum chaudum deleted the callum-memberlist-selector-fix branch July 8, 2022 07:53
@cstyan
Copy link
Contributor Author

cstyan commented Jul 8, 2022

@kavirajk First I looked at the distributor services via kubectl describe service and noticed that our loki deployments had the selector but mimir deployments did not. Next I started looking for what could be different. I knew that we created the service from a deployment or statefulset via kausals serviceFor to create the service, and I noticed that it by default took all the labels from a deployment/statefulset and added them as selectors. However, the that serviceFor does take the optional ignoreLabels, so then I looked in Mimir's jsonnet for where they were adding those labels.

I missed it earlier when writing our memberlist.libsonnet because the definition of which labels to ignore was in a different file.

@kavirajk
Copy link
Contributor

kavirajk commented Jul 8, 2022

Nice thinking. thanks for the explanation @cstyan :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants