Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeProxy: Support hook server deployed by k8s pod #718

Merged
merged 1 commit into from
Oct 27, 2022

Conversation

honpey
Copy link
Contributor

@honpey honpey commented Oct 19, 2022

If hookserver is deployed as k8s pod, there may exist cycled dependency: If hookserver is updated, the new hookserver pod create CRI request would come to kubelet, runtime-proxy and then hookserver, however, at this time, the previous pod may have exited, which results in calling hookserver timeout. If RuntimeHookConfig.FailurePolicy is configed as FailurePolicyType, such pod hookserver would always be rejected by RuntimeProxy to re-create.

To fix this issue, we introduce annotations to tag pod as hook server. For such tagged pods, runtime-proxy would transfer cri request to backend runtime engine transparently to solve the cycled dependency.

Signed-off-by: honpey honpey@gmail.com

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@codecov
Copy link

codecov bot commented Oct 19, 2022

Codecov Report

Base: 68.85% // Head: 68.71% // Decreases project coverage by -0.13% ⚠️

Coverage data is based on head (2fde2d4) compared to base (50a8072).
Patch coverage: 37.83% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #718      +/-   ##
==========================================
- Coverage   68.85%   68.71%   -0.14%     
==========================================
  Files         205      205              
  Lines       23319    23378      +59     
==========================================
+ Hits        16057    16065       +8     
- Misses       6152     6207      +55     
+ Partials     1110     1106       -4     
Flag Coverage Δ
unittests 68.71% <37.83%> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/runtimeproxy/resexecutor/resource_executor.go 9.52% <0.00%> (ø)
pkg/runtimeproxy/utils/utils.go 100.00% <ø> (ø)
pkg/runtimeproxy/resexecutor/cri/container.go 70.00% <26.66%> (-2.60%) ⬇️
pkg/runtimeproxy/resexecutor/cri/pod.go 72.54% <36.36%> (-4.05%) ⬇️
pkg/runtimeproxy/resexecutor/cri/utils.go 87.90% <66.66%> (-1.67%) ⬇️
...r/plugins/elasticquota/core/group_quota_manager.go 71.12% <0.00%> (-10.85%) ⬇️
pkg/scheduler/plugins/elasticquota/pod_handler.go 64.28% <0.00%> (-7.72%) ⬇️
.../scheduler/plugins/elasticquota/core/quota_info.go 82.27% <0.00%> (-2.90%) ⬇️
pkg/koordlet/pleg/pleg.go 66.89% <0.00%> (-1.86%) ⬇️
pkg/scheduler/plugins/nodenumaresource/plugin.go 63.04% <0.00%> (-1.15%) ⬇️
... and 25 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hormes hormes requested review from zwzhang0107 and removed request for FillZpp October 24, 2022 02:46
@honpey honpey force-pushed the runtime-proxy-dev branch 2 times, most recently from faac717 to b93c736 Compare October 27, 2022 07:07
If hookserver is deployed as k8s pod, there may exist cycled dependency:
If hookserver is updated, the new hookserver pod create CRI request would
come to kubelet, runtime-proxy and then hookserver, however, at this time,
the previous pod may have exited, which results in calling hookserver timeout.
If RuntimeHookConfig.FailurePolicy is configed as FailurePolicyType, such
pod hookserver would always be rejected by RuntimeProxy to re-create.

To fix this issue, we introduce annotations to tag pod as hook server. For such
tagged pods, runtime-proxy would transfer cri request to backend runtime engine
transparently to solve the cycled dependency.

Signed-off-by: honpey <honpey@gmail.com>
@FillZpp
Copy link
Member

FillZpp commented Oct 27, 2022

So how about expose the hook server in Koordlet instead of a new daemonset on all nodes?

@hormes
Copy link
Member

hormes commented Oct 27, 2022

So how about expose the hook server in Koordlet instead of a new daemonset on all nodes?

In fact, this is the case, the implementation of hooks is in koordlet, and koordlet is the process of the hooks server.

klog.Errorf("fail to call hook server %v", err)
} else if response == nil {
// when hook is not registered, the response will become nil
klog.Warningf("runtime hook path %s does not register any PreHooks", string(runtimeHookPath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe level klog.V(4).Info is enough for no hook registered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe level klog.V(4).Info is enough for no hook registered

this log would be removed in next patch

@zwzhang0107
Copy link
Contributor

/lgtm

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 3959c0c into koordinator-sh:main Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants