Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preflight checks shall check hardware requirements. #1052

Closed
deknos opened this issue Aug 13, 2018 · 32 comments · Fixed by kubernetes/kubernetes#93275
Closed

Preflight checks shall check hardware requirements. #1052

deknos opened this issue Aug 13, 2018 · 32 comments · Fixed by kubernetes/kubernetes#93275
Assignees
Labels
area/UX lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@deknos
Copy link

deknos commented Aug 13, 2018

What keywords did you search in kubeadm issues before filing this one?

core, cpu

Is this a BUG REPORT or FEATURE REQUEST?

depends.. a FeatureRequest.

Preflight Checks should include whether at least 2 cores and 2Gb RAM are availabe.

I had a machine where i tried to install it and it had only 1 core. and it initially worked often, but containers went down after a time and it did not work properly.

I wish for a comment like: "This machine has only 1 core and NGb Ram, but usually kubeadm needs at least 2 cores and 2 GB Ram to work properly."

Versions

kubeadm version (use kubeadm version) :
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:
    Debian 9, OpenTelekom Cloud, Elastic Computing Instance.

  • OS (e.g. from /etc/os-release):
    root@debian9kub01:/home/linux# cat /etc/os-release
    PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
    NAME="Debian GNU/Linux"
    VERSION_ID="9"
    VERSION="9 (stretch)"
    ID=debian
    HOME_URL="https://www.debian.org/"
    SUPPORT_URL="https://www.debian.org/support"
    BUG_REPORT_URL="https://bugs.debian.org/"

  • Kernel (e.g. uname -a):
    Linux debian9kub01 4.9.0-7-amd64 kubeadm join on slave node fails preflight checks #1 SMP Debian 4.9.110-3+deb9u1 (2018-08-03) x86_64 GNU/Linux

  • Others:

What happened?

kubeadm installed often successfully but operations never really worked, but containers often shut down.

What you expected to happen?

a warning that the host configuration is not sufficient

How to reproduce it (as minimally and precisely as possible)?

try to run a kubernetes cluster installed with kubeadm on a machine with 1 core and 2gb ram

@neolit123
Copy link
Member

neolit123 commented Aug 13, 2018

currently we do outline such requirements here:
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

while we can use the runtime package to obtain the number of CPUs there is no way to obtain the available memory in a portable manner from golang without introducing platform specific code.

what we can do is print a message about the system requirements when kubeadm starts.

/priority backlog
/area UX

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. area/UX labels Aug 13, 2018
@deknos
Copy link
Author

deknos commented Aug 14, 2018

Hi,
yes i know that the requirements are listed in the documentation, but since you even warn about much other stuff and that is something you could fall over as well, testing memory and cores (or at least cores) should be reported/warned. i would be very glad if you do that. It costed days until i remembered that my machine was too small...

@xlgao-zju
Copy link

what we can do is print a message about the system requirements when kubeadm starts.

Print the requirements in the Preflight?

@deknos
Copy link
Author

deknos commented Aug 17, 2018

that only helps if there is a difference to the used system.
if that is not possible to do within kubeadm, perhaps from a solution standpoint, whould i make sense to fetch a preflight docker check container beforehand which tests all this stuff? or does that not work?

because this of course would be not only helpful for kubeadm but master/nodes as well.

But there should definitely be a warning, if there are too few cores and ram allocated for the kubeadm/kubernetes master/nodes system

@neolit123
Copy link
Member

neolit123 commented Aug 17, 2018

if that is not possible to do within kubeadm, perhaps from a solution standpoint, whould i make sense to fetch a preflight docker check container beforehand which tests all this stuff? or does that not work?

this is doable but it has a complicated maintenance process.

But there should definitely be a warning, if there are too few cores and ram allocated for the kubeadm/kubernetes master/nodes system

as mentioned earlier, we cannot have portable multi-platform detection for that using golang yet and we wouldn't want to have to homecook different solutions for different OSes.
we can only show a static message with system requirements similar to the one outlined in the docs.

@neolit123 neolit123 added this to the GA milestone Aug 17, 2018
@deknos
Copy link
Author

deknos commented Aug 20, 2018

this is doable but it has a complicated maintenance process.

Would be even easier to do preflight checks within a docker container than in kubeadm itself. since containers have to adhere to linux environments, /proc/{cpuinfo,meminfo} could be parsed and you wouldnt have to worry about other environments? :-)

we can only show a static message with system requirements similar to the one outlined in the docs.

Better than nothing, i'll take it. Though i do not understand why you couldnt do the other thing (docker preflight check) in the long term :-)

@neolit123 neolit123 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 20, 2018
@neolit123
Copy link
Member

Would be even easier to do preflight checks within a docker container than in kubeadm itself. since containers have to adhere to linux environments, /proc/{cpuinfo,meminfo} could be parsed and you wouldnt have to worry about other environments? :-)

contributions to kubeadm are always encouraged, of course. 👍

@timothysc
Copy link
Member

Perhaps we just update the docs to run the node e2e test suite prior to setup.

@neolit123
Copy link
Member

@timothysc

Perhaps we just update the docs to run the node e2e test suite prior to setup.

we can do that.

however, @deknos noted that the users are going to miss the system requirements notes and start a cluster anyway, which we don't handle correctly and it results in unexpected behavior:

I had a machine where i tried to install it and it had only 1 core. and it initially worked often, but containers went down after a time and it did not work properly.

given this is node specific it seems to me that this should be on the kubelet side.

what we can do on the kubeadm side is show a warning / system req. notification.

@bart0sh
Copy link

bart0sh commented Oct 20, 2018

@neolit123, @timothysc what do you think about using something like this: https://github.com/pbnjay/memory to get total memory?

According to this issue golang/go#21816 this kind of API doesn't have a lot of chances to go to standard library, so we probably should come up with something similar to above package.

After all we need just a several lines of code for windows and linux to implement this. I think it makes sense to do to enforce requirements stated in the documentation. Even if we do it only for linux it will cover most of kubeadm user base in my opinion.

@neolit123
Copy link
Member

yep, i saw the memory package when writing my previous responses in this thread.
but not sure what the general policy of vendoring such multi-platform packages in kubeadm is.

@bart0sh
Copy link

bart0sh commented Oct 20, 2018

Implemented check for number of CPUs on master, please review: kubernetes/kubernetes#70048

@bart0sh
Copy link

bart0sh commented Oct 20, 2018

@neolit123 as vendoring can be a problem can we just write our own similar code?

@neolit123
Copy link
Member

i'm personally fine with having https://github.com/pbnjay/memory vendored in kubeadm.
will defer to @timothysc on this one.

@bart0sh
Copy link

bart0sh commented Oct 25, 2018

/assign @timothysc

@timothysc timothysc removed this from the GA milestone Oct 26, 2018
@deknos
Copy link
Author

deknos commented Nov 5, 2018

The check for number of CPUs on master looks good! but i do not really know to formally review it other than testing it in a VM? Shall i do that?

The go memory library looks good, though i am a newbie to Golang, i could at least look at it.

@neolit123
Copy link
Member

we have not decided yet if the the memory check library is going to be used.
the CPU check is all we are going to have for k8s 1.13.

@deknos
Copy link
Author

deknos commented Nov 5, 2018

Yeah, i wanted to know if i could do anything to help. reviewing code or testing with a too small installation for example? :)

@bart0sh
Copy link

bart0sh commented Nov 6, 2018

i'm personally fine with having https://github.com/pbnjay/memory vendored in kubeadm.
will defer to @timothysc on this one.

@timothysc wdyt?

@pbnjay
Copy link

pbnjay commented Jan 4, 2019

(just stumbled across this!) happy to contribute directly if you all don't want to vendor the package. I also just added a BSD license to the repo if the missing license was a concern

@timothysc timothysc removed their assignment Jan 7, 2019
@timothysc timothysc added this to the Next milestone Jan 7, 2019
@timothysc
Copy link
Member

I would channel what is done by the kubelet/cadvisor where possible b/c it's already vendored.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 7, 2019
@neolit123
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 7, 2019
@astrieanna
Copy link

I'm interested in picking this up. To make sure I've understood the issue correctly, I'll start by summarizing the current state:

It looks like this issue had two parts: checking the number of CPUs/cores and checking the amount of RAM. These checks need to work across platforms, including both Linux and Windows. The first part has already been merged, but the second part will require a cross-platform implementation that is not included in the Go standard library. kubelet/cadvisor is the top option for this because it is already vendored in k/k and supports Linux and Windows.

kubelet/cadvisor has a MachineInfo type that includes both CPUs and RAM. Based on this, it makes sense to move the CPU check to also be based on cadvisor.

@astrieanna
Copy link

@dpb587-pivotal and I have implemented this check using the pbnjay/memory library suggested earlier in this issue. You can see the changes here. We have not made a PR yet to avoid splitting the discussion, since there is still discussion to be had on the approach to take (vendoring this new library vs using kubectl/cadvisor).

Before writing the above code, we looked at what would be needed to implement this using kubectl/cadvisor. That appears to be an uncomfortable fit for this task for a couple reasons:

(1) The New function requires a bunch of arguments that don't seem relevant to this use-case. We'd either need to pass fake inputs or use not-yet-validated settings, neither of which seem clean.

(2) The Windows implementation claims to start a metrics server (as opposed to just returning a couple of current numbers). Windows support is a reason to use kubelet/cadvisor rather than the plain vendored cadvisor. The winstat code that kubelet/cadvisor uses does not look reusable.

@timothysc given that additional technical context, do you have concerns about pulling in this new metrics package vs trying to use kubelet/cadvisor?

@neolit123
Copy link
Member

this fell of my radar completely, but to answer the above query:

...using the pbnjay/memory library suggested earlier in this issue. You can see the changes here. We have not made a PR yet to avoid splitting the discussion, since there is still discussion to be had on the approach to take (vendoring this new library vs using kubectl/cadvisor).

this seems like the right approach, so if someone wants to send a PR we might get acceptance for it.

given that additional technical context, do you have concerns about pulling in this new metrics package vs trying to use kubelet/cadvisor?

we should definitely not use kubelet/cadvisor given kubeadm is moving away from k/k and the kubelet is trying to decouple from cadvisor.

PRs welcome.

@xlgao-zju
Copy link

@neolit123 I'd like to pick this up. And will use https://github.com/pbnjay/memory to check the memory.

/assign

@neolit123
Copy link
Member

@xlgao-zju thanks. we might want to make this change for 1.20, because it's too close to feature-freeze (9th of July).

@BenTheElder
Copy link
Member

Dropping a note that 2GB is definitely not a hard requirement and that necessary resources depend on the size and scope of the rest of the cluster.

kind runs kubeadm + a light workload in less.

@neolit123
Copy link
Member

on the PR i recommended to have this is a warning.
it will not be targeting 1.19, also unclear if the feature will be accepted in general.

@neolit123
Copy link
Member

/lifecycle active
this should close with kubernetes/kubernetes#93275

@k8s-ci-robot k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Sep 3, 2020
@neolit123 neolit123 removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/UX lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
10 participants