-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed: 22422 use singleflight to alleviate simultaneous calls to #112696
Conversation
@aimuz: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @aimuz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig scalability |
// throttling - see #22422 for details. | ||
liveList, err := l.client.CoreV1().LimitRanges(a.GetNamespace()).List(context.TODO(), metav1.ListOptions{}) | ||
// Fixed: #22422 | ||
// use singleflight to alleviate simultaneous calls to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a regression test verifying the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a test, please help me to see if this matches @aojea
37a3006
to
4192b79
Compare
go func(wg *sync.WaitGroup) { | ||
_, err := handler.GetLimitRanges(attributes) | ||
if err != nil { | ||
t.Errorf("unexpected error: %v", err) | ||
} | ||
wg.Done() | ||
}(&wg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to pass the wg var as parameter?
go func(wg *sync.WaitGroup) { | |
_, err := handler.GetLimitRanges(attributes) | |
if err != nil { | |
t.Errorf("unexpected error: %v", err) | |
} | |
wg.Done() | |
}(&wg) | |
go func() { | |
defer wg.Done() | |
_, err := handler.GetLimitRanges(attributes) | |
if err != nil { | |
t.Errorf("unexpected error: %v", err) | |
} | |
}() |
nit, defer wg.Done() use to be more idiomatic
limitRangeList.Items = append(limitRangeList.Items, value) | ||
} | ||
// Set the latency to simulate the real environment | ||
time.Sleep(200 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a low value and can flake, just set 1*time.Second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think we can do this more reliable if we add a select and block on a channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that you signal before the wg.Wait(), so you guarantee that all the request were sent
count := 0 | ||
mockClient := &fake.Clientset{} | ||
mockClient.AddReactor("list", "limitranges", func(action core.Action) (bool, runtime.Object, error) { | ||
count++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without the singleflight this will race, you can use an atomic counter
var count int64
atomic.AddInt64(&count,1)
if count != 1 { | ||
t.Errorf("Expected 1 limit range, got %d", count) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add another
_, err := handler.GetLimitRanges(attributes)
if err != nil {
t.Errorf("unexpected error: %v", err)
}
and check that count == 2
/retest |
liveList, err := l.client.CoreV1().LimitRanges(a.GetNamespace()).List(context.TODO(), metav1.ListOptions{}) | ||
// Fixed: #22422 | ||
// use singleflight to alleviate simultaneous calls to | ||
lruItemObj, err, _ = l.group.Do(a.GetNamespace(), func() (interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I switch this to partition on a constant string "x" instead of the namespace, the added unit test still passes, so the test is not effectively verifying this is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the constant string x will be used, then he will partition all calls. If there are two concurrent calls with different namespaces, he will only call them once until the call returns. This should be the wrong behavior.
Currently using namespaces as partitions, this is partitioning a set of calls with the same namespace. This is the same behavior as using namespaces as keys for LRU caching before
@liggitt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a constant string "x" here would be a bug, since it would make calls from different namespaces share a single lookup and use the same retrieved result. The unit test should fail under those circumstances, and it does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the constant string x, the test will block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not what I'm seeing:
git diff
diff --git a/plugin/pkg/admission/limitranger/admission.go b/plugin/pkg/admission/limitranger/admission.go
index 4525463d2e6..907a28bda32 100644
--- a/plugin/pkg/admission/limitranger/admission.go
+++ b/plugin/pkg/admission/limitranger/admission.go
@@ -165,7 +165,7 @@ func (l *LimitRanger) GetLimitRanges(a admission.Attributes) ([]*corev1.LimitRan
if !ok || lruItemObj.(liveLookupEntry).expiry.Before(time.Now()) {
// Fixed: #22422
// use singleflight to alleviate simultaneous calls to
- lruItemObj, err, _ = l.group.Do(a.GetNamespace(), func() (interface{}, error) {
+ lruItemObj, err, _ = l.group.Do("x", func() (interface{}, error) {
liveList, err := l.client.CoreV1().LimitRanges(a.GetNamespace()).List(context.TODO(), metav1.ListOptions{})
if err != nil {
return nil, admission.NewForbidden(a, err)
go test ./plugin/pkg/admission/limitranger -count=1 -v -run TestLimitRanger_GetLimitRangesFixed22422
=== RUN TestLimitRanger_GetLimitRangesFixed22422
--- PASS: TestLimitRanger_GetLimitRangesFixed22422 (0.00s)
PASS
ok k8s.io/kubernetes/plugin/pkg/admission/limitranger 0.800s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated it and the problem should be fixed, Please look at the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go test ./plugin/pkg/admission/limitranger -count=1 -v -run TestLimitRanger_GetLimitRangesFixed22422
=== RUN TestLimitRanger_GetLimitRangesFixed22422
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
admission_test.go:910: Expected test namespace, got test1
--- FAIL: TestLimitRanger_GetLimitRangesFixed22422 (0.00s)
FAIL
FAIL k8s.io/kubernetes/plugin/pkg/admission/limitranger 0.792s
FAIL
// since all the calls with the same namespace will be holded, they must be catched on the singleflight group, | ||
// There are two different sets of namespace calls | ||
// hence only 2 | ||
if count != 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually shows up in the test, in the loop I turned on two different sets of namespace calls and he also correctly increased the count to 2, so I don't think it's a problem. @liggitt
// simulating concurrent calls after a cache failure | ||
go func() { | ||
defer wg.Done() | ||
_, err := handler.GetLimitRanges(attributes) | ||
if err != nil { | ||
t.Errorf("unexpected error: %v", err) | ||
} | ||
}() | ||
|
||
// simulation of different namespaces is not a call | ||
go func() { | ||
defer wg.Done() | ||
_, err := handler.GetLimitRanges(attributesTest1) | ||
if err != nil { | ||
t.Errorf("unexpected error: %v", err) | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two different sets of namespace calls are initiated here
Signed-off-by: aimuz <mr.imuz@gmail.com>
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aimuz, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Signed-off-by: aimuz mr.imuz@gmail.com
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #22422
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: