Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upfix(daemon): create more expections when skipping pods #74856
Conversation
This comment has been minimized.
This comment has been minimized.
|
Hi @draveness. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
k8s-ci-robot
added
needs-ok-to-test
cncf-cla: yes
labels
Mar 3, 2019
draveness
changed the title
fix(daemon): create more expections when skippingy pods
fix(daemon): create more expections when skipping pods
Mar 3, 2019
k8s-ci-robot
requested review from
lukaszo and
mikedanese
Mar 3, 2019
k8s-ci-robot
added
sig/apps
release-note
and removed
needs-sig
do-not-merge/release-note-label-needed
labels
Mar 3, 2019
This comment has been minimized.
This comment has been minimized.
|
/assign @k82cn |
k8s-ci-robot
assigned
k82cn
Mar 3, 2019
This comment has been minimized.
This comment has been minimized.
|
can you help to add e2e test for this case? |
draveness
force-pushed the
draveness:fix/daemonset-controller-slow-batch-creation
branch
from
46f21a1
to
1dad9c4
Mar 3, 2019
k8s-ci-robot
added
size/M
and removed
size/XS
labels
Mar 3, 2019
draveness
force-pushed the
draveness:fix/daemonset-controller-slow-batch-creation
branch
2 times, most recently
from
97ec9d0
to
c45fd23
Mar 3, 2019
This comment has been minimized.
This comment has been minimized.
|
Hi @k82cn, I use a unit test to verify the pods creation expectations which works the same but simpler, is that ok? |
draveness
force-pushed the
draveness:fix/daemonset-controller-slow-batch-creation
branch
3 times, most recently
from
e9e2cd4
to
7e1d314
Mar 3, 2019
This comment has been minimized.
This comment has been minimized.
|
Hi, @k82cn it's already been a while. Could you give me some advice on this? Many thanks. |
This comment has been minimized.
This comment has been minimized.
|
/ok-to-test |
k8s-ci-robot
added
the
ok-to-test
label
Mar 26, 2019
draveness
force-pushed the
draveness:fix/daemonset-controller-slow-batch-creation
branch
from
80607e2
to
8e76760
Apr 6, 2019
This comment has been minimized.
This comment has been minimized.
|
/test pull-kubernetes-bazel-test
…On Apr 6, 2019, 10:49 AM +0800, kubernetes/kubernetes ***@***.***>, wrote:
/test pull-kubernetes-bazel-test
|
draveness
force-pushed the
draveness:fix/daemonset-controller-slow-batch-creation
branch
2 times, most recently
from
ea35b7a
to
d8e45d3
Apr 7, 2019
k8s-ci-robot
added
size/L
area/test
sig/testing
and removed
size/M
labels
Apr 7, 2019
This comment has been minimized.
This comment has been minimized.
|
The expectations for the controller is just like remainingCreations what @lavalamp has mentioned above. When there isn't an error with pod creations, the timing of sending creation expectations is when daemon set controller receives create pod notification by the informer. func (dsc *DaemonSetsController) addPod(obj interface{}) {
pod := obj.(*v1.Pod)
// ...
if controllerRef := metav1.GetControllerOf(pod); controllerRef != nil {
ds := dsc.resolveControllerRef(pod.Namespace, controllerRef)
dsKey, err := controller.KeyFunc(ds)
dsc.expectations.CreationObserved(dsKey)
dsc.enqueueDaemonSet(ds)
return
}
// ...
}But FakePodControl does not create pod actually, so we send pods creation expectations in it which means daemonset satisfied expectations after func (dsc *DaemonSetsController) syncDaemonSet(key string) error {
if !dsc.expectations.SatisfiedExpectations(dsKey) {
return dsc.updateDaemonSetStatus(ds, hash, false) // originally returned from here.
}
err = dsc.manage(ds, hash)
if err != nil {
return err
}
if dsc.expectations.SatisfiedExpectations(dsKey) {
switch ds.Spec.UpdateStrategy.Type {
case apps.OnDeleteDaemonSetStrategyType:
case apps.RollingUpdateDaemonSetStrategyType:
err = dsc.rollingUpdate(ds, hash) // !! with rolling update strategy, update daemonset status and send an event.
}
if err != nil {
return err
}
}
err = dsc.cleanupHistory(ds, old)
if err != nil {
return fmt.Errorf("failed to clean up revisions of DaemonSet: %v", err)
}
return dsc.updateDaemonSetStatus(ds, hash, true) // send an event.
}So I change the two test case func TestSufficientCapacityWithTerminatedPodsDaemonLaunchesPod(t *testing.T) {
defer utilfeaturetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.ScheduleDaemonSetPods, false)()
{
strategy := newOnDeleteStrategy()
podSpec := resourcePodSpec("too-much-mem", "75M", "75m")
ds := newDaemonSet("foo")
ds.Spec.UpdateStrategy = *strategy
ds.Spec.Template.Spec = podSpec
manager, podControl, _, err := newTestController(ds)
if err != nil {
t.Fatalf("error creating DaemonSets controller: %v", err)
}
node := newNode("too-much-mem", nil)
node.Status.Allocatable = allocatableResources("100M", "200m")
manager.nodeStore.Add(node)
manager.podStore.Add(&v1.Pod{
Spec: podSpec,
Status: v1.PodStatus{Phase: v1.PodSucceeded},
})
manager.dsStore.Add(ds)
syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 1)
}
{
strategy := newRollbackStrategy()
podSpec := resourcePodSpec("too-much-mem", "75M", "75m")
ds := newDaemonSet("foo")
ds.Spec.UpdateStrategy = *strategy
ds.Spec.Template.Spec = podSpec
manager, podControl, _, err := newTestController(ds)
if err != nil {
t.Fatalf("error creating DaemonSets controller: %v", err)
}
node := newNode("too-much-mem", nil)
node.Status.Allocatable = allocatableResources("100M", "200m")
manager.nodeStore.Add(node)
manager.podStore.Add(&v1.Pod{
Spec: podSpec,
Status: v1.PodStatus{Phase: v1.PodSucceeded},
})
manager.dsStore.Add(ds)
syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 2)
}
}I worked on this for quite a period to get it done, and I have to say it's really hard to test and verify it. And I'm very happy to change it if someone gives me more context and advice. |
This comment has been minimized.
This comment has been minimized.
|
/hold cancel |
k8s-ci-robot
removed
the
do-not-merge/hold
label
Apr 7, 2019
draveness
force-pushed the
draveness:fix/daemonset-controller-slow-batch-creation
branch
2 times, most recently
from
e339259
to
7a02eb3
Apr 7, 2019
This comment has been minimized.
This comment has been minimized.
|
/test pull-kubernetes-kubemark-e2e-gce-big |
This comment has been minimized.
This comment has been minimized.
|
Kindly ping @k82cn @janetkuo @mikedanese for review and approval. |
mikedanese
reviewed
May 3, 2019
mikedanese
reviewed
May 3, 2019
This comment has been minimized.
This comment has been minimized.
|
Agree that this is way to hard to understand, but the fix looks correct. I have one question about why we are seeing two events in those tests now when we weren't before. |
This comment has been minimized.
This comment has been minimized.
I think I explain the two events in the previous comment - when the stateful set update strategy is
When the expectations are satisfied, and the update strategy is rolling update, the logic will go into
|
This comment has been minimized.
This comment has been minimized.
|
Can you address my comments? |
draveness
force-pushed the
draveness:fix/daemonset-controller-slow-batch-creation
branch
from
7a02eb3
to
5f8dfdc
May 4, 2019
This comment has been minimized.
This comment has been minimized.
|
Hi @mikedanese, just resolved the comments, PTAL. |
This comment has been minimized.
This comment has been minimized.
|
/lgtm I see SlowStart got copied and pasted a few times. I filed #77436 to clean this up. |
k8s-ci-robot
added
the
lgtm
label
May 4, 2019
This comment has been minimized.
This comment has been minimized.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: draveness, mikedanese The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
draveness commentedMar 3, 2019
•
edited
What type of PR is this?
/kind bug
What this PR does / why we need it:
DaemonSetsController creates more expections when errors happens in the 2nd batch.
https://github.com/draveness/kubernetes/blame/master/pkg/controller/daemon/daemon_controller.go#L1027-L1084
skippedPods := createDiff - batchSizeexpressions doesn't consider pods which created previously. However, in job_controller it decreases thediffafter each batch's creation.https://github.com/draveness/kubernetes/blame/master/pkg/controller/job/job_controller.go#L765-L809
Does this PR introduce a user-facing change?: