Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container stuck on Running at... #8

Closed
kuskoman opened this issue Jun 15, 2020 · 11 comments
Closed

Container stuck on Running at... #8

kuskoman opened this issue Jun 15, 2020 · 11 comments

Comments

@kuskoman
Copy link

kuskoman commented Jun 15, 2020

I am trying to use the project with following config:

---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: staging
  name: ecr-credentials-role
rules:
  - apiGroups: ["staging"]
    resources: ["secrets"]
    verbs: ["get", "delete"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["create"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ecr-credentials-service
  namespace: staging
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: ecr-credentials-role-binding
  namespace: staging
subjects:
  - kind: ServiceAccount
    name: ecr-credentials-service
    namespace: staging
roleRef:
  kind: Role
  name: ecr-credentials-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  namespace: staging
  name: cron-ecr-credentials-helper
spec:
  schedule: "0 */6 * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          serviceAccountName: ecr-credentials-service
          containers:
            - name: ecr-renew
              image: nabsul/k8s-ecr-login-renew:latest
              imagePullPolicy: "IfNotPresent"
              env:
                - name: DOCKER_SECRET_NAME
                  value: eu-west-1-ecr-registry
                - name: TARGET_NAMESPACE
                  value: staging
                - name: AWS_REGION
                  value: eu-west-1
                - name: AWS_ACCESS_KEY_ID
                  value: <key>
                - name: AWS_SECRET_ACCESS_KEY
                  value: <key>

then i create job to perform cronjob instantly

sudo kubectl create job --from=cronjob/cron-ecr-credentials-helper ecr2

But the container does not want to finish its job

$ sudo kubectl get jobs
NAME   COMPLETIONS   DURATION   AGE
ecr    0/1           5m52s      5m52s
ecr2   0/1           3m9s       3m9s

All i get is

$ sudo kubectl logs ecr-ql9fw
Running at 2020-06-15 18:16:25.248476379 +0000 UTC

This behavior is present from about 2 hours (only one run was successful), however when I was trying to run the container few hours ago it finished its job correctly in like 5 seconds.

I guess it may be related to the lack of checking if AWS API is up (I am going to investigate it more soon)

@kuskoman
Copy link
Author

Just now I got these logs

panic: RequestError: send request failed
caused by: Post "https://api.ecr.eu-west-1.amazonaws.com/": dial tcp: i/o timeout

goroutine 1 [running]:
main.checkErr(...)
        /app/main.go:19
main.main()
        /app/main.go:38 +0x7b7

so seems like the app is not handling problems with connection to AWS

@nabsul
Copy link
Owner

nabsul commented Jun 15, 2020

Interesting. If the app is not able to connect to AWS, I'm not sure what else it can do besides fail. Any suggestions?

@kuskoman
Copy link
Author

The strange thing is that it was working once, and ecr api seems to be working whole time
i will try to deploy it on another cluster and see what happens

@kuskoman
Copy link
Author

@nabsul i tried to deploy it to other cluster (k3s, arm), but instead of last error i am getting

standard_init_linux.go:211: exec user process caused “exec format error”

i guess i should create another issue

@nabsul
Copy link
Owner

nabsul commented Jun 15, 2020

Could you try reverting back to version v1.1 ? I recently released v1.2. Nothing there changed in the AWS area, but it might be good to double check...

@nabsul
Copy link
Owner

nabsul commented Jun 15, 2020

i tried to deploy it to other cluster (k3s, arm), but instead of last error i am getting

Is that an ARM architecture? (we can continue that on a second issue)

@kuskoman
Copy link
Author

kuskoman commented Jun 15, 2020

Could you try reverting back to version v1.1

I will try in a moment

Is that an ARM architecture?

yes, first run was on RaspberryPi with Raspberry Pi OS, second run was on Orange Pi running on Armbian, both failed for the same reason

edit: i am talking about separate cluster, the first one is not ARM

@kuskoman
Copy link
Author

kuskoman commented Jun 15, 2020

@nabsul v1.1 seems to have the same issue

$ sudo kubectl logs ecr-qs4zz -p
Running at 2020-06-15 19:38:52.935119562 +0000 UTC
panic: RequestError: send request failed
caused by: Post "https://api.ecr.eu-west-1.amazonaws.com/": dial tcp: i/o timeout

goroutine 1 [running]:
main.checkErr(...)
        /app/main.go:14
main.getUserAndPass(0x171c720, 0xc000010018, 0xc00010ff30, 0x1, 0x1, 0x1f)
        /app/aws.go:13 +0x222
main.main()
        /app/main.go:32 +0x234
Fetching auth data from AWS...

@nabsul
Copy link
Owner

nabsul commented Jun 15, 2020

I wonder if maybe you've made some changes to your cluster's firewall or network settings that would block the tool from contacting AWS?

Can you "exe sh" into a pod in your cluster and see if you can run something like:

curl https://api.ecr.eu-west-1.amazonaws.com

@kuskoman
Copy link
Author

@nabsul thanks
seems like this may be the issue
for some reason i can;t access network from this pod
i need to make more investigations to find what is the reason, so i am not closing the issue yet

@kuskoman
Copy link
Author

Issue resolved
The problem was network rules on one of worker nodes, which disallowed inbound traffic from master
It was hard to find, because error existence was dependent on node where container was deployed
Thanks again @nabsul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants