Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubemark] Failures in master kubelet trying to start pods #68190

Closed
shyamjvs opened this issue Sep 3, 2018 · 9 comments
Closed

[Kubemark] Failures in master kubelet trying to start pods #68190

shyamjvs opened this issue Sep 3, 2018 · 9 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@shyamjvs
Copy link
Member

shyamjvs commented Sep 3, 2018

We recently started observing flaky failures in couple of kubemark jobs:

E.g failed run - https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-kubemark-500-gce/16634

The reason seems to be that kubelet was failing to start kube-apiserver pod with such errors continuously:

E0902 22:43:47.915221    2169 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-kubemark-500-kubemark-master": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:245: running exec setns process for init caused \"exit status 16\""
E0902 22:43:47.915292    2169 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-apiserver-kubemark-500-kubemark-master_kube-system(bd6f956583b54bdf5a90b93b7e3a7e3d)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-kubemark-500-kubemark-master": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:245: running exec setns process for init caused \"exit status 16\""

I'll try digging up a bit, but @yujuhong @mtaufen - do you have any leads on why this might be happening?

cc @kubernetes/sig-scalability-bugs @wojtek-t

@k8s-ci-robot k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Sep 3, 2018
@shyamjvs shyamjvs changed the title [Kubemark] Failures in kubelet trying to start kube-apiserver [Kubemark] Failures in master kubelet trying to start pods Sep 3, 2018
@shyamjvs
Copy link
Member Author

shyamjvs commented Sep 3, 2018

Changed title, as it seems like kubelet is failing to start all master pods not just apiserver).

@dims
Copy link
Member

dims commented Sep 4, 2018

@shyamjvs possibly related to moby/moby#31614 ?

@bclouser
Copy link

bclouser commented Dec 3, 2018

For what its worth, I think I am seeing this as well, or at least something similar:

Error

Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "jenkins-slave-j69pq": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"signal: killed\"": unknown

Using jenkins/jnlp-slave configured manually from jenkins UI. This error appears when kubernetes attempts to create the pod.

Versions

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T16:55:41Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T16:55:41Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

$ docker --version
Docker version 18.06.1-ce, build e68fc7a

Jenkins ver. 2.138.3

Using the latest jenkins/jnlp-slave:alpine container (eb079fd09f8e)

Bonus

Interestingly, if I add the container to the cluster manually configured for a static jenkins node it comes up all smiles:
kubectl create -f ./jnlp-slave.yaml

jnlp-slave.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: jenkins-slave
  name: jenkins-slave
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: jenkins-slave
    spec:
      containers:
      - env:
        - name: JENKINS_URL
          value: http://10.12.1.72
        - name: JENKINS_SECRET
          # I am using a static node configuration 'benTheBuilder' as a test
          value: 8bdc272187ec3f9fc1fa89df540a10d3e1b77178e5cb9d3bda0db8b8acb9e7a7
        - name: JENKINS_AGENT_NAME
          value: benTheBuilder
        image: jenkins/jnlp-slave:alpine
        name: jenkins-slave
        ports:
        resources: {}

I am happy to provide more info if this is indeed related.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 19, 2019
@wojtek-t
Copy link
Member

Seems obsolete now.

@sulunemre
Copy link
Contributor

@bclouser I have a similar issue, did you find any solution? Thanks!

@Resonance1584
Copy link

Resonance1584 commented Aug 27, 2019

I encountered this error when I mis-configured a deployment to have

          requests:
            memory: "200m"
            cpu: "500m"
          limits:
            memory: "300m"
            cpu: "500m"

Note the incorrect m instead of Mi

@betinro
Copy link

betinro commented Mar 2, 2020

I encountered this error when I mis-configured a deployment to have

          requests:
            memory: "200m"
            cpu: "500m"
          limits:
            memory: "300m"
            cpu: "500m"

Note the incorrect m instead of Mi

The same happens if you also specify memory limits too low. I tried starting with a 10Mi memory limits and i got the same errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

9 participants