Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Sig-windows Memory Limits tests to not assume all nodes are the same #107477

Merged
merged 1 commit into from
Mar 11, 2022

Conversation

NikhilSharmaWe
Copy link
Member

@NikhilSharmaWe NikhilSharmaWe commented Jan 11, 2022

What type of PR is this?

Fixing Bug

/kind bug

What this PR does / why we need it:

Updated Sig-windows Memory Limits tests to not assume all nodes are the same.

Which issue(s) this PR fixes:

Fixes #106608

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 11, 2022
@k8s-ci-robot
Copy link
Contributor

@NikhilSharmaWe: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 11, 2022
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 11, 2022
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 13, 2022
},
}
f.PodClient().Create(&pod)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant Is this is right method for adding image name, node selector in Pod spec. If yes, Could you please tell about how to find the correct memory to be assigned to each pod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is how you would create a pod. You will want to capture the result and to verify the pod started:

testPod, err = f.ClientSet.CoreV1().Pods(f.Namespace.Name).Create(context.TODO(), testPod, metav1.CreateOptions{})
framework.ExpectNoError(err)

Add memory limit like this:

Resources: v1.ResourceRequirements{
Limits: v1.ResourceList{
v1.ResourceMemory: memLimitQuantity,
v1.ResourceCPU: cpuLimitQuantity,
},
},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get that by querying the node for allocatable memory see an example in the getTotalAllocatableMemory function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant

  • I think we have to add only v1.ResourceMemory: status.Allocatable[v1.ResourceMemory] in Limits: v1.ResourceList.

Am I thinking correct here, do we also have to add value for v1.ResourceCPU ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we only need memory in this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant in case of pod we can add v1.ResourceMemory: status.Allocatable[v1.ResourceMemory] for each node.status, but what should be added as memory for failurePod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be anything > than what is left over in the total Allocatable in the cluster for windows, since we have used it all up. something like 500MB or 1GB should be good.

},
},
}
pod, err = f.ClientSet.CoreV1().Pods(f.Namespace.Name).Create(context.TODO(), &pod, metav1.CreateOptions{})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant I think &pod should be correct argument for this case and &failurePod for the next case but it is showing an error. It is saying that pointer type should not be used, but in this example pointer to v1.Pod type is used in the argument.

What is the cause of the error here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant Could you please give your thoughts on the above comment ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pod, err = f.ClientSet.CoreV1().Pods(f.Namespace.Name).Create(context.TODO(), &pod, metav1.CreateOptions{})
pod, err = f.ClientSet.CoreV1().Pods(f.Namespace.Name).Create(context.TODO(), pod, metav1.CreateOptions{})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marosset marosset added this to In Progress (v1.24) in SIG-Windows Jan 27, 2022
ginkgo.By(fmt.Sprintf("Deploying %d pods with mem limit %v, then one additional pod", allocatablePods, memPerPod))

// these should all work
pods := newMemLimitTestPods(allocatablePods, imageutils.GetPauseImageName(), podType, strconv.FormatInt(memPerPod, 10))

This comment was marked as resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant Yes true,

Could you please help me with this error

  • test/e2e/windows/memory_limits.go:132:3: ineffectual assignment to pod (ineffassign).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from https://github.com/gordonklaus/ineffassign

An assignment is ineffectual if the variable assigned is not thereafter used.

you are assigning pod but it's not used again. you can use _ instead as an example.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 5, 2022
@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 8, 2022

@jsturtevant I think PR is ready to merge, please inform if it need further improvements.

@marosset marosset moved this from In Progress (v1.24) to In Review (v1.24) in SIG-Windows Feb 24, 2022
@jsturtevant
Copy link
Contributor

/cc

@jsturtevant
Copy link
Contributor

/assign

framework.ExpectNoError(err)
}
failurePod := &v1.Pod{
Spec: v1.PodSpec{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this and the other pod spec are missing the pod name and container name (see the original pod spec definition for reference)

I would suggest this one be called something like "mem-failure-pod" and the other pods be the "mem-test-"

},
for _, node := range nodeList.Items {
status := node.Status
pod := &v1.Pod{
Spec: v1.PodSpec{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pod spec should also include a node: node.Name field to make it deploy to that node

@NikhilSharmaWe NikhilSharmaWe changed the title Updated Sig-windows Memory Limits tests to not assume all nodes are t… Updated Sig-windows Memory Limits tests to not assume all nodes are the same Mar 11, 2022
@jsturtevant
Copy link
Contributor

Thanks for sticking with this one!

The tests passed on a cluster with different size VMs:

k get nodes -l "kubernetes.io/os=windows" -o json | jq '.items[].status.allocatable.memory'

"8285748Ki"
"8285748Ki"
"16674356Ki"
"16674356Ki"
Mar 11 08:52:31.809: INFO: Found FailedScheduling event with message 0/5 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 Insufficient memory. preemption: 0/5 nodes are available: 1 No victims found on node capz-conf-4pkcs for preemptor pod mem-failure-pod, 1 No victims found on node capz-conf-6q4qt for preemptor pod mem-failure-pod, 1 No victims found on node capz-conf-cmqhw for preemptor pod mem-failure-pod, 1 No victims found on node capz-conf-l5nsx for preemptor pod mem-failure-pod, 1 Preemption is not helpful for scheduling.
[AfterEach] [sig-windows] [Feature:Windows] Memory Limits [Serial] [Slow]
  test/e2e/framework/framework.go:186
Mar 11 08:52:31.809: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "memory-limit-test-windows-8971" for this suite.
•{"msg":"PASSED [sig-windows] [Feature:Windows] Memory Limits [Serial] [Slow] attempt to deploy past allocatable memory limits should fail deployments of pods once there isn't enough memory","total":1,"completed":1,"skipped":4182,"failed":0}

@jsturtevant
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 11, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsturtevant, NikhilSharmaWe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 11, 2022
@jsturtevant
Copy link
Contributor

/test pull-kubernetes-node-e2e-containerd

@jsturtevant
Copy link
Contributor

unrelated flakes
/retest

@k8s-ci-robot k8s-ci-robot merged commit 86ad8fc into kubernetes:master Mar 11, 2022
SIG-Windows automation moved this from In Review (v1.24) to Done (v1.24) Mar 11, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
SIG-Windows
  
Done (v1.24)
Development

Successfully merging this pull request may close these issues.

Update Sig-windows Memory Limits tests to not assume all nodes are the same
3 participants