New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The jenkins-x-gc-activities pods start to fail and bring down Jenkins (still present) #2218

Closed
g-foster opened this Issue Nov 9, 2018 · 28 comments

Comments

Projects
None yet
9 participants
@g-foster
Copy link

g-foster commented Nov 9, 2018

I am still seeing this issue #1705; my description is as per #1705 and I think this should be closed and that issue reopened, but for context:

Steps to reproduce the behavior:

  1. Create a cluster in EKS with default environments.
  2. Let it run for a few hours and the failing pods start to build up.

For example, on a default install (no imported project) of JX into EKS, after two days I have just deleted over 400 of these

jenkins-x-gcactivities-1541615400-8s7sd 0/1 Error 0 7m
jenkins-x-gcactivities-1541615400-fj2tb 0/1 Error 0 3m
jenkins-x-gcactivities-1541615400-lpkxt 0/1 Error 0 5m

My versions:

Running in namespace: jx
Jenkins X Version:
Using helmBinary helm with feature flag: none
NAME VERSION
jx 1.3.535
jenkins x platform 0.0.2859
Kubernetes cluster v1.10.3-eks
kubectl v1.12.2
helm client v2.11.0+g2e55dbe
helm server v2.11.0+g2e55dbe
git git version 2.17.2 (Apple Git-113)

but I assumed I would have the fix/workaround from #1705.

@g-foster g-foster changed the title The jenkins-x-gc-activities pods start to fail and bring down Jenkins STILL PRESENT The jenkins-x-gc-activities pods start to fail and bring down Jenkins Nov 9, 2018

@g-foster g-foster changed the title The jenkins-x-gc-activities pods start to fail and bring down Jenkins The jenkins-x-gc-activities pods start to fail and bring down Jenkins (still present) Nov 9, 2018

@ranginuitrot

This comment has been minimized.

Copy link

ranginuitrot commented Nov 14, 2018

Can confirm it's happening to me too.
NAME VERSION
jx 1.3.532
jenkins x platform 0.0.2859
Kubernetes cluster v1.10.3-eks
kubectl v1.12.2
helm client v2.11.0+g2e55dbe
helm server v2.11.0+g2e55dbe
git git version 2.17.2 (Apple Git-113)

@rawlingsj

This comment has been minimized.

Copy link
Member

rawlingsj commented Nov 14, 2018

Can you paste the logs from one of the failed pods please?

@ranginuitrot

This comment has been minimized.

Copy link

ranginuitrot commented Nov 14, 2018

Will do next time it happens. I reinstalled Jenkins X right after posting.

@ranginuitrot

This comment has been minimized.

Copy link

ranginuitrot commented Dec 11, 2018

Here's one of the pod's logs:

Error loading team settings. environments.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list environments.jenkins.io in the namespace "jx"
Error loading team settings. environments.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list environments.jenkins.io in the namespace "jx"
plugins.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list plugins.jenkins.io in the namespace "jx"
 To be able to connect to the Jenkins server we need a username and API Token
 error: EOF
? Jenkins user name: (admin) �7

kc describe pod jenkins-x-gcactivities-1544562000-29d6v:

Name:           jenkins-x-gcactivities-1544562000-29d6v
Namespace:      jx
Node:           ip-10-1-9-10.ec2.internal/10.1.9.10
Start Time:     Tue, 11 Dec 2018 15:02:45 -0600
Labels:         app=gcactivities
                controller-uid=badd0f1d-fd87-11e8-b690-0a0697b453e4
                job-name=jenkins-x-gcactivities-1544562000
                release=jenkins-x
Annotations:    <none>
Status:         Failed
IP:             10.1.8.255
Controlled By:  Job/jenkins-x-gcactivities-1544562000
Containers:
  gcactivities:
    Container ID:  docker://3802db8b4c0428316e1462b5a4b769725b71a8ba9a91a00e764fb0157bbf2874
    Image:         jenkinsxio/jx:1.3.639
    Image ID:      docker-pullable://jenkinsxio/jx@sha256:b6ad86b6b3b54c45f358417cfcc8895e1710d3759105c0b92228db362e07510e
    Port:          <none>
    Host Port:     <none>
    Command:
      jx
    Args:
      gc
      activities
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 11 Dec 2018 15:02:45 -0600
      Finished:     Tue, 11 Dec 2018 15:02:45 -0600
    Ready:          False
    Restart Count:  0
    Environment:
      cheese:  wine
      foo:     bar
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from jenkins-x-gcactivities-token-xlqhs (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  jenkins-x-gcactivities-token-xlqhs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jenkins-x-gcactivities-token-xlqhs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason                 Age   From                                Message
  ----    ------                 ----  ----                                -------
  Normal  Scheduled              10m   default-scheduler                   Successfully assigned jenkins-x-gcactivities-1544562000-29d6v to ip-10-1-9-10.ec2.internal
  Normal  SuccessfulMountVolume  10m   kubelet, ip-10-1-9-10.ec2.internal  MountVolume.SetUp succeeded for volume "jenkins-x-gcactivities-token-xlqhs"
  Normal  Pulled                 10m   kubelet, ip-10-1-9-10.ec2.internal  Container image "jenkinsxio/jx:1.3.639" already present on machine
  Normal  Created                10m   kubelet, ip-10-1-9-10.ec2.internal  Created container
  Normal  Started                10m   kubelet, ip-10-1-9-10.ec2.internal  Started container
@ranginuitrot

This comment has been minimized.

Copy link

ranginuitrot commented Dec 11, 2018

NAME               VERSION
jx                 1.3.651
jenkins x platform 0.0.3036
Kubernetes cluster v1.10.11-eks
kubectl            v1.13.0
helm client        v2.12.0+gd325d2a
helm server        v2.12.0+gd325d2a
git                git version 2.17.2 (Apple Git-113)
Operating System   Mac OS X 10.14.1 build 18B75
@rust84

This comment has been minimized.

Copy link

rust84 commented Dec 12, 2018

We are also experiencing this issue. Pods build up after some time. The cluster is built on eks and has been running for one week. I followed the getting started guide to deploy jenkins x but the actual cluster itself was deployed using Terraform. We are using a static master setup.

Our configuration is quite basic at the moment, I have created two environments with which to test, and some Springboot applications. The pipeline and other jx pods appear to run ok except for these pods which I have killed but they keep coming back after a few hours.

Please let me know if I can provide any further detail.

Jx version

NAME               VERSION
jx                 1.3.647
jenkins x platform 0.0.3036
Kubernetes cluster v1.10.11-eks
kubectl            v1.12.2
helm client        v2.11.0+g2e55dbe
helm server        v2.11.0+g2e55dbe
git                git version 2.19.1
Operating System   Unkown Linux distribution Linux version 4.19.1-arch1-1-ARCH (builduser@heftig-16768) (gcc version 8.2.1 20180831 (GCC)) #1 SMP PREEMPT Sun Nov 4 16:49:26 UTC 2018

Describe pod

Name:           jenkins-x-gcactivities-1544617800-xsrgd
Namespace:      jx
Node:           ip-10-8-0-113.eu-west-1.compute.internal/10.8.0.113
Start Time:     Wed, 12 Dec 2018 12:30:03 +0000
Labels:         app=gcactivities
                controller-uid=a696b9a8-fe09-11e8-b319-06522f68053e
                job-name=jenkins-x-gcactivities-1544617800
                release=jenkins-x
Annotations:    <none>
Status:         Failed
IP:             10.8.0.84
Controlled By:  Job/jenkins-x-gcactivities-1544617800
Containers:
  gcactivities:
    Container ID:  docker://cc8309c71368de5c4312bd4cbe249f7aee2ceb6e0d0e142943f470f571787d7b
    Image:         jenkinsxio/jx:1.3.639
    Image ID:      docker-pullable://jenkinsxio/jx@sha256:b6ad86b6b3b54c45f358417cfcc8895e1710d3759105c0b92228db362e07510e
    Port:          <none>
    Host Port:     <none>
    Command:
      jx
    Args:
      gc
      activities
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 12 Dec 2018 12:30:04 +0000
      Finished:     Wed, 12 Dec 2018 12:30:04 +0000
    Ready:          False
    Restart Count:  0
    Environment:
      cheese:  wine
      foo:     bar
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from jenkins-x-gcactivities-token-62227 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  jenkins-x-gcactivities-token-62227:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jenkins-x-gcactivities-token-62227
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

Logs

Error loading team settings. environments.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list environments.jenkins.io in the namespace "jx"
Error loading team settings. environments.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list environments.jenkins.io in the namespace "jx"
plugins.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list plugins.jenkins.io in the namespace "jx"

To be able to connect to the Jenkins server we need a username and API Token

error: EOF
? Jenkins user name: (admin) `

ClusterRole

$ kubectl describe clusterrolebinding gcactivities-jx
Name:         gcactivities-jx
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  gcactivities-jx
Subjects:
  Kind            Name                    Namespace
  ----            ----                    ---------
  ServiceAccount  jenkins-x-gcactivities  jx

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: 2018-12-10T16:11:36Z
  name: gcactivities-jx
  resourceVersion: "3111406"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/gcactivities-jx
  uid: 451ec341-fc96-11e8-b319-06522f68053e
rules:
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - get
  - create
  - update
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
  - delete
  - list
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - get

Status

$ jx status
Jenkins X checks passed for Cluster(arn:aws:eks:eu-west-1:878062504042:cluster/dev-cluster): 3 nodes, memory 4% of 24199212Ki, cpu 23% of 6. Jenkins is running at http://jenkins.jx.eks.actual-experience.com
@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 14, 2018

Can confirm this is also happening on our GKE cluster. Symptoms identical to @rust84 and @ranginuitrot.

$ jx version
NAME               VERSION
jx                 1.3.640
jenkins x platform 0.0.3036
Kubernetes cluster v1.10.9-gke.5
kubectl            v1.11.3
helm client        v2.11.0+g2e55dbe
helm server        v2.11.0+g2e55dbe
git                git version 2.18.0
Operating System   Mac OS X 10.13.6 build 17G4015

Should the jenkins-x-gc-activities be asking for those resources in the first place, or is something wrong in our setup? It seems that the gcactivities should not have these permissions in the first place looking at values.yaml here: https://github.com/jenkins-x/jenkins-x-platform/blob/master/values.yaml#L63

The controllerbuild on the other hand has them: https://github.com/jenkins-x/jenkins-x-platform/blob/master/values.yaml#L313

Are the permissions added at another point in time? Perhaps when adding an environment (we installed Jenkins X with the --no-default-environments option).

The workaround of adding cluster-admin to the service account would work but I would much prefer knowing what the problem is.

@thienlh

This comment has been minimized.

Copy link

thienlh commented Dec 17, 2018

Can confirm this is also happening on our EKS cluster.

NAME               VERSION
jx                 1.3.661
jenkins x platform 0.0.3036
Kubernetes cluster v1.10.11-eks
kubectl            v1.13.1
helm client        v2.12.0+gd325d2a
helm server        v2.12.0+gd325d2a
git                git version 2.20.1
Operating System   Mac OS X 10.14.2 build 18C54
@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 17, 2018

Adding @pmuir to this as he mentioned in the jenkins-x-user Slack channel that

The permissions errors in that log statement are misleading, they don't cause the code to fail
I'll get them removed

Any idea why this may be happening @pmuir?

One idea I have is that, through switching jx contexts (I had a minikube installation and a GKE installation running), the values in the yaml files created during installation on one cluster were somehow used/included/replaced the values in the other installation.

/Users/sboardwell/.jx/adminSecrets.yaml
/Users/sboardwell/.jx/chartmuseumAuth.yaml
/Users/sboardwell/.jx/gitAuth.yaml
/Users/sboardwell/.jx/jenkinsAuth.yaml
@pmuir

This comment has been minimized.

Copy link
Member

pmuir commented Dec 18, 2018

@ccojocar is this becaue of removing old secrets?

@pmuir

This comment has been minimized.

Copy link
Member

pmuir commented Dec 18, 2018

Actually no, this looks like an issue in the gc /cc @jstrachan

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

Hi @jstrachan, I used one of the failing pods to recreate a test pod and then used kubectl exec -it .... bash. Running jx gc activities indeed asked for credentials (and I can confirm, that the permissions log output has no meaning - @pmuir).

I'm not sure what is expected here, but here is what I found:

  • on initial log in as root, there is no /root/.jx/jenkinsAuth.yaml present.
  • jx then asks for authentication
  • using username admin and the respective api token did not work
    • however, we have installed and configured the keycloak plugin so perhaps the "local" admin user is no longer applicable
    • testing with my own username and my own api key worked (needed to delete the jenkinsAuth.yaml first). I could have also just edited the file I guess.

So:

  • is the /root/.jx/jenkinsAuth.yaml supposed to be there when the jenkins-x-gcactivities pod is created?
  • re: local admin vs keycloak. how do we best change the jenkins authentication values?
    • the jenkinsAuth.yaml is found in the jx-install-config and the password in a number of other secrets?
    • can we change the admin users "username" to something other than 'admin'? The jenkins secret has the parameter jenkins-admin-user: "true" and jenkins-admin-password but no jenkins-admin-username

Let me know if I can supply any more information, and thanks for all the cool work.

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

@jstrachan - I don't think this issue has been fixed by just adding the missing roles. The missing roles just removed

Error loading team settings. environments.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list environments.jenkins.io in the namespace "jx"
Error loading team settings. environments.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list environments.jenkins.io in the namespace "jx"
plugins.jenkins.io is forbidden: User "system:serviceaccount:jx:jenkins-x-gcactivities" cannot list plugins.jenkins.io in the namespace "jx"

from the logs. The pods are still failing because there is no authentication method.

@jstrachan jstrachan reopened this Dec 18, 2018

@jstrachan

This comment has been minimized.

Copy link
Member

jstrachan commented Dec 18, 2018

OK I think those RBAC issues are fixed - am just waiting to see if I get any failures.

Though I'm still not sure why you are getting errors about the Jenkins API Token not being present. You are using Static Jenkins masters right - you're not using Prow / Serverless jenkins?

@jstrachan

This comment has been minimized.

Copy link
Member

jstrachan commented Dec 18, 2018

btw version 0.0.2859 is pretty old - we just released the RBAC fix in 0.0.3086

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

Yes, I have ported your RBAC fixes and can confirm the log "forbidden" messages are gone. However, the pods are still failing as the Jenkins API token is missing. Where should it be? and how should it get there? I can see anything in the k8s cron job or job template.

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

Btw, I'm seeing this on jenkins-x-platform-0.0.3078

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

You are using Static Jenkins masters

yes, sorry, I'm using a static master

@jstrachan

This comment has been minimized.

Copy link
Member

jstrachan commented Dec 18, 2018

the API token should be created for you when you first import a project into Jenkins X - have you done that yet - do you have any projects? e.g. does jx get pipeline return anything?

@jstrachan

This comment has been minimized.

Copy link
Member

jstrachan commented Dec 18, 2018

usually when using the static Jenkins master the api token has to be setup to be able to create the Staging and Production environments

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

No, no projects yet. I created a test pipeline job by hand. Let me check to see if it works.

@jstrachan

This comment has been minimized.

Copy link
Member

jstrachan commented Dec 18, 2018

I wonder if you are experiencing issues of an older cluster being upgrading not having the necessary Jenkins secret? e.g. if you rm ~/.jx/jenkinsAuth.yaml and then create/import a project it should create the necessary Jenkins secret?

@jstrachan

This comment has been minimized.

Copy link
Member

jstrachan commented Dec 18, 2018

usually the initial install creates the Jenkins API token secret as it sets up CI/CD for the Environments

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

Yes the jenkins secret is there, complete with api token, etc.

@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

$ k get secrets jenkins -o yaml | ksd
apiVersion: v1
data:
  jenkins-admin-api-token: {REDACTED}
  jenkins-admin-password: {REDACTED}
  jenkins-admin-user: admin
  jenkins-bearer-token: ""
kind: Secret
metadata:
  creationTimestamp: "2018-12-16T19:59:23Z"
  labels:
    app: jenkins
    chart: jenkins-0.10.31
    heritage: Tiller
    release: jenkins-x
  name: jenkins
  namespace: jx
  resourceVersion: "14531"
  selfLink: /api/v1/namespaces/jx/secrets/jenkins
  uid: 15a140f3-016d-11e9-8c5f-1e557db74be1
type: Opaque
@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 18, 2018

Hi @rawlingsj, could you keep this open please? I believe there is still another problem that @jstrachan wants to address.

@rawlingsj

This comment has been minimized.

Copy link
Member

rawlingsj commented Dec 18, 2018

oops - bad copy paste of issues sorry

@rawlingsj rawlingsj reopened this Dec 18, 2018

jstrachan added a commit to jstrachan/jx that referenced this issue Dec 19, 2018

fix: handle creating a jenkins client inside pods
if we default a jenkins auth token from the Secrets we need to save the `jenkinsAuth.yaml` file as we re-load it later on and lose the defaulting logic.

lets also handle batch mode a little nicer

fixes jenkins-x#2218
@sboardwell

This comment has been minimized.

Copy link
Contributor

sboardwell commented Dec 19, 2018

Just confirming this has fixed my problem. Great work. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment