New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't reuse the device of a restartable init container #120461
Don't reuse the device of a restartable init container #120461
Conversation
/triage accepted priority subject to review later |
8375a9c
to
ce97647
Compare
ce97647
to
74047c0
Compare
1ffa195
to
3f91d87
Compare
test/e2e_node/device_plugin_test.go
Outdated
// Note that the kubelet does not report resources for init | ||
// containers for now. | ||
// See (*v1PodResourcesServer).List in pkg/kubelet/apis/podresources/server_v1.go | ||
// TBD: Do we have to fix this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes/pkg/kubelet/apis/podresources/server_v1.go
Lines 51 to 86 in 56cc5e7
// List returns information about the resources assigned to pods on the node | |
func (p *v1PodResourcesServer) List(ctx context.Context, req *v1.ListPodResourcesRequest) (*v1.ListPodResourcesResponse, error) { | |
metrics.PodResourcesEndpointRequestsTotalCount.WithLabelValues("v1").Inc() | |
metrics.PodResourcesEndpointRequestsListCount.WithLabelValues("v1").Inc() | |
pods := p.podsProvider.GetPods() | |
podResources := make([]*v1.PodResources, len(pods)) | |
p.devicesProvider.UpdateAllocatedDevices() | |
for i, pod := range pods { | |
pRes := v1.PodResources{ | |
Name: pod.Name, | |
Namespace: pod.Namespace, | |
Containers: make([]*v1.ContainerResources, len(pod.Spec.Containers)), | |
} | |
for j, container := range pod.Spec.Containers { | |
pRes.Containers[j] = &v1.ContainerResources{ | |
Name: container.Name, | |
Devices: p.devicesProvider.GetDevices(string(pod.UID), container.Name), | |
CpuIds: p.cpusProvider.GetCPUs(string(pod.UID), container.Name), | |
Memory: p.memoryProvider.GetMemory(string(pod.UID), container.Name), | |
} | |
if utilfeature.DefaultFeatureGate.Enabled(kubefeatures.KubeletPodResourcesDynamicResources) { | |
pRes.Containers[j].DynamicResources = p.dynamicResourcesProvider.GetDynamicResources(pod, &container) | |
} | |
} | |
podResources[i] = &pRes | |
} | |
response := &v1.ListPodResourcesResponse{ | |
PodResources: podResources, | |
} | |
return response, nil | |
} |
Currently, the kubelet does not report the resources for init containers. Do we need to fix it..??
@SergeyKanzhelev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is an oversight. The reason why the podresource API doesn't report resources allocated to sidecars is because... the code predates sidecars. The API is meant to report resources allocated to longrunning containers, and at time the initcontainers were all transient.
Feel free to change the test code in a different way which workarounds this bug.
I think fixing podresources API can be a beta item, meaning we should fix this, but is not very urgent. If we want to fix it early to use the API for tests, even better
EDIT: a possible (and possibly the only one?) complication is if we need to distinguish sidecars and app containers in the podresources output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll separate it from this PR and create an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect. Happy to help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more question.
Who are the primary consumers of this kubelet podResource API?
(I want to know the impact and urgency of the issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consumers I'm aware of are the SRIOV stack, GPU Monitoring agents and consumers of topology-aware-scheduling. While sidecar feature is alpha and disabled by default I don't think is very urgent. OTOH it should be fixed before move to beta (enabled by default).
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
Now that #120911 has merged, can you please rebase this PR. |
cbc9f91
to
d2b8032
Compare
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 8702c585583d858bee94d574c2dae3212823e21e
|
/assign @derekwaynecarr @klueska /assign @SergeyKanzhelev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
for tests. Logic looks OK too, but I don't know if there are any implications.
/assign @mrunalp Could you take a look at this? This fixes a bug that regular containers reuse the sidecar container's devices. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gjkim42, mrunalp, SergeyKanzhelev, swatisehgal The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This makes sure that the device resources allocated to a restartable init container are not reused by regular containers.
Which issue(s) this PR fixes:
Fixes #
xref: #119442
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/cc @ffromani