-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A ovs process gets killed when oom-killer is invoked #80
Conversation
@squeed PTAL |
@pecameron did the updated resource limits seem OK to the OVS team? 400M seems pretty large. Are we also sure that whoever gets this error is running with the revalidor thread limits we added a while back? |
As I understand it, the As for requests.memory: the amount of memory the pod uses is going to depend on the size of the cluster, and there doesn't seem to be any way to represent that... We could see if there was some install-time "cluster size" hint, or maybe CNO could sporadically update the daemonset with a current estimate (except that that would be terrible because any change to the daemonset will cause all the pods to be redeployed). |
@danwinship So it sounds like we should pick a number that works in "most" clusters and let the admin adjust it as needed based on guidelines provided by Red Hat. If that is the case, are the proposed numbers OK? |
There is no way for the admin to adjust it; the values are hardcoded into the CNO. The only current option is to specify a value that will inevitably be incorrect for most users... |
@danwinship Your observation is generally applicable to the cluster. Admins will have little opportunity to tune anything. Unless these numbers are very sensitive and must be carefully tuned, a number must be selected that will generally work. 4.0 is new turf. |
livenessProbe: | ||
exec: | ||
command: | ||
- cat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be /usr/share/openvswitch/scripts/ovs-ctl status
. Then you can get replace the loop in the pod's command with just sleep 10000
.
Yeah, we're in a tricky spot. We cant get guaranteed resources unless we also set a limit, which is reasonable (We would also have to change the QOSClass). We also don't really know what a safe limit is, since it depends on cluster size. We also don't want to waste resources, which a too-high limit would tie up memory unnecessarily. For the time being, I think the right thing to do is stay in BestEffort, set a reasonable request, and have a good liveness probe. @sjenning explicitly removed the limits from OVS two weeks ago in #62 - I don't recall the exact reason why. However, I don't think we should be re-adding limits. |
Hm... we should probably ask someone who knows this stuff better, but my understanding is that the Maybe we should just drop
Alternatively, if we could run ovsdb-server and ovs-vswitchd in separate containers with appropriate Um... wait a minute... the
and we are not passing |
@danwinship oom likely killed the ovs monitor as well. It kills until it gets "enough" memory. OOM kills what it likes. You can give it hints, and sometimes it takes them. OOM will keep the kernel alive. Nothing else matters. |
Aaron C sent email on this topic: I think it's crazy to put a limit on the memory, anyway. Someday, when |
That seems improbable; the monitor is tiny, so killing it does not help the oomkiller free up memory. |
@danwinship because the monitor code only restarts on program-type errors. SIGKILL is not in that list; I guess that's not considered a program crash (and I think I'd agree).
|
under no circumstances should we let an admin control this value. period. If we wish to have this value change it must be because the network operator is smart enough to manage it. |
@pecameron here's what I think we should do. We set some pretty high requests, but no limit. That means we are Burstable, but OVS will be killed later than other burstable things. We have to balance our request amount though because higher requests will have scheduling implications. There's no way to set ourselves Guaranteed (eg killed after everything else) unless we set requests == limits, and since our memory usage is likely quite variable based on cluster size, we can't find a good value that fits all clusters. So maybe bump it to 500m request and call it a day? |
@dcbw This is the 4.0 fix for https://bugzilla.redhat.com/show_bug.cgi?id=1669311 which is on 3.10. We need to reach consensus there as well. |
/retest |
At this point, the additional "ovs-ctl" check on line 77 is redundant. Can you just replace it with a sleep 10000? And check that it all works when you kill openvswitch? |
/retest |
Changes from 3.9 to 3.10 now has OVS running in a pod. There must be sufficient memory or the OOM killer will be invoked. This change adds a liveness probe that checks the process is running. Also, resource limits are removed. bug 1671822 https://bugzilla.redhat.com/show_bug.cgi?id=1671822 clone of bug 1669311 https://bugzilla.redhat.com/show_bug.cgi?id=1669311 Signed-off-by: Phil Cameron <pcameron@redhat.com>
@squeed made the change, PTAL |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pecameron, squeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
related kubernetes/kubernetes#73758 This would allow pods with system critical priority to get a low Looking to backport this. |
/retest |
Changes from 3.9 to 3.10 now has OVS running in a pod. There must be
sufficient memory or the OOM killer will be invoked. This change adds a
liveness probe that checks the process is running. Also, resource limits
are relaxed a little.
bug 1671822
https://bugzilla.redhat.com/show_bug.cgi?id=1671822
clone of bug 1669311
https://bugzilla.redhat.com/show_bug.cgi?id=1669311
Signed-off-by: Phil Cameron pcameron@redhat.com