New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1821788:libvirt: Bump bootstrap memory to 5G for ppc64le #3396
Bug 1821788:libvirt: Bump bootstrap memory to 5G for ppc64le #3396
Conversation
a541903
to
3d834a1
Compare
Do you have examples or error logs for these? |
journalctl on the bootstrap:
|
Thanks @Prashanth684 ! |
Thanks for that! next do we know exactly which process(es) are involved in reaching OOMKilled state.
|
This happens after etcd is up and the cluster-bootstrap starts. Around the time that the hyperkube process gets killed this is the ps output:
Talking to @zeenix , the defaults for the master is 7G and worker node is 5G and the suggestion was to change the bootstrap memory to match that to keep it on the lower side. I completely understand the concern and also given that this issue only happens on ppc64le, I have asked the IBM team to see if there are any parameters they can tune to improve the performance. In the interim is there at least an option to configure the memory for bootstrap through some env variable or such or should this PR address that rather than making this change? |
@smarterclayton @deads2k the kube-apiserver on the bootstrap host is comsuming a lot of memory.. any suggestions as to how to make sure we can fit everything in 2 Gigs? maybe modify the cache sizes on the bootstrap kube-apiserver?? |
@abhinavdahiya Instead of hardcoding this increase across the board, would it be better to have a "terraform overrides" asset which would read terraform overrides files and apply it when the cluster is created. I played around with something like that here: https://gist.github.com/Prashanth684/c52737c522b379edb5cf154d859315e4 We could even make this specific to libvirt if there are concerns that users would muck around with terraform . This would allow us to just drop in a file in the installation directory inside a tf folder which contains something like:
Thoughts? |
3d834a1
to
f94d653
Compare
Updated PR - have a bootstrap variable and bump the memory specifically for ppc64le based on the control plane arch on suggestion from @crawford |
f94d653
to
7494900
Compare
On ppc64le there were OOM kills being observed during the bootstrap process because of insufficient memory and bumping the memory seemed to solve the problem. The libvirt defaults for the master and worker memory are 7G and 5G respectively, so setting the boostrap default to 5G for ppc64le. ppc64le uses 64K pages rather than the default 4K page size and thus requires more memory.
7494900
to
c57c680
Compare
High-level approach looks good to me. |
/retitle Bug 1820219:libvirt: Bump bootstrap memory to 5G for ppc64le |
@Prashanth684: This pull request references Bugzilla bug 1820219, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retitle Bug 1821788:libvirt: Bump bootstrap memory to 5G for ppc64le |
@Prashanth684: This pull request references Bugzilla bug 1821788, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Tested on x86 and ppc64le. On ppc64le this is how the terraform libvirt variables file looks:
And this is a snippet of the tfstate file which has the profiles for the machines:
And on x86:
and the tfstate file:
|
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@Prashanth684: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@Prashanth684: All pull requests linked via external trackers have merged: openshift/installer#3396. Bugzilla bug 1821788 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.4 |
@Prashanth684: new pull request created: #3426 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
On ppc64le there were OOM kills being observed during the bootstrap process
because of insufficient memory and bumping the memory seemed to solve the problem.
The libvirt defaults for the master and worker memory are 7G and 5G respectively,
so setting the boostrap default to 5G.