Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck in Reconciling and SSH_AUTH_SOCKET not specified #565

Closed
ZeroDeltaAlpha opened this issue Jan 28, 2022 · 9 comments
Closed

Stuck in Reconciling and SSH_AUTH_SOCKET not specified #565

ZeroDeltaAlpha opened this issue Jan 28, 2022 · 9 comments
Labels
question Further information is requested

Comments

@ZeroDeltaAlpha
Copy link

Hi All,

We are running the following flux components in our EKS cluster:

flux: v0.25.3
helm-controller: v0.15.0
image-automation-controller: rc-90dcdfd7
image-reflector-controller: v0.15.0
kustomize-controller: v0.19.1
notification-controller: v0.20.1
source-controller: v0.20.1

We are observing the following issues:

  • GitRepo stuck in an eternal reconciling in progress status, even when a timeout is set.

  • GitRepo reconcile failing immediately with the message: unable to clone 'ssh://git@github.com/example/examplerepo': error creating SSH agent: "SSH agent requested but SSH_AUTH_SOCK not-specified"

We have tried full uninstall and re bootstrapping to no success, we ran the same gitrepos in a sandbox cluster, we also tried applying https://github.com/fluxcd/source-controller/blob/main/config/testdata/git/large-repo.yaml to ensure that it wasn't a authentication issue, this workload also gets stuck in reconcile. Rolling back the source controller also does not resolve the issue.

Also setting the source controller to trace level doesn't reveal any issue in the logs.

Please let me know if we can provide anything further.

@hiddeco
Copy link
Member

hiddeco commented Jan 28, 2022

Can you share a pseudo copy of the GitRepository, and a sanitized version of the Secret the repository refers to?

@ZeroDeltaAlpha
Copy link
Author

ZeroDeltaAlpha commented Jan 28, 2022

Hi Hidde, it's as follows

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
metadata:
  creationTimestamp: "2022-01-28T10:43:55Z"
  finalizers:
  - finalizers.fluxcd.io
  generation: 1
  labels:
    kustomize.toolkit.fluxcd.io/name: flux-system
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: app-development-source
  namespace: flux-system
  resourceVersion: "242546603"
  uid: ad982a4e-8e42-45d1-b095-9f929f289698
spec:
  gitImplementation: go-git
  ignore: |
    # exclude all
    /*
    # include deploy dir
    !/k8s
  interval: 2m0s
  ref:
    branch: development
  secretRef:
    name: app-development-source
  timeout: 3m0s
  url: ssh://git@github.com/corp/app
apiVersion: v1
data:
  identity: ecdsa-privkey
  identity.pub:  ecsda-pubkey
  known_hosts: Z2l0aHViLmNvbSBlY2RzYS1zaGEyLW5pc3RwMjU2IEFBQUFFMlZqWkhOaExYTm9ZVEl0Ym1semRIQXlOVFlBQUFBSWJtbHpkSEF5TlRZQUFBQkJCRW1LU0VOalFFZXpPbXhrWk15N29wS2d3RkI5bmt0NVlScllNak51RzVOODd1UmdnNkNMcmJvNXdBZFQveTZ2MG1LVjBVMncwV1oyWUIvKytUcG9ja2c9
kind: Secret
metadata:
  creationTimestamp: "2022-01-28T10:40:14Z"
  name: app-development-source
  namespace: flux-system
  resourceVersion: "242441682"
  uid: 84e27627-bf0a-4a79-a9ae-93672961b58f
type: Opaque

@kingdonb
Copy link
Member

kingdonb commented Feb 9, 2022

Could you please try to reproduce this on an upgraded source-controller from the latest Flux release?

There are some changes in source-controller.

Also, could you try setting the spec.gitImplementation to both options, and let us know if the issue reproduces for both implementations? The possible values are go-git and libgit2. (This may be enough to get you un-stuck.)

If it still reproduces on the latest version, we will of course still want to follow up this issue, even if you are un-blocked!

Another suggestion: you might want to try setting interval to a greater number than timeout. It might be related to timing out and retrying in the wrong order.

One more thing to clarify: are there any submodules in the test repo?

@kingdonb kingdonb added the question Further information is requested label Feb 9, 2022
@ghost
Copy link

ghost commented Feb 20, 2022

I am having the same issue. This is a brand new/fresh cluster on DigitalOcean following steps with brew install and fluxctl bootstrap. It seems to fail with both go-git and libgit2, but libgit2 produces this error instead

► annotating GitRepository wdr-frontend in flux-system namespace
✔ GitRepository annotated
◎ waiting for GitRepository reconciliation
✗ GitRepository reconciliation failed: 'unable to clone: Passthrough'

Here is my GitRepository


apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
metadata:
name: wdr-frontend
namespace: flux-system
spec:
gitImplementation: libgit2
interval: 1m0s
ref:
branch: main
url: ssh://git@github.com/waltdizzy/waltdizzy-web

I would be happy to provide information on your outlined steps, but need some help. How do I get the Secret? How do I upgrade to the latest source-controller?

@ghost
Copy link

ghost commented Feb 20, 2022

Solution is I needed to add a secret to Kubernetes. I did not like the idea of committing a secret to git so I went ahead and created a secrets YAML that I applied directly (be sure to base64 encode both the username and password). Use kubectl explain gitrepository.spec to get information on secret format for https vs ssh. I do not know if there is a better way to do GitOps with secrets rather then applying the secret manually but it felt bad to commit it alongside the flux configurations. But in any case I think the Flux getting started guides should call out this requirement to add the secret up front. It may be obvious but I think myself and others missed it.

apiVersion: v1
kind: Secret
metadata:
name: https-credentials
namespace: flux-system
type: Opaque
data:
username: aaa
password: bbb==


apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
metadata:
name: wdr-frontend
namespace: flux-system
spec:
interval: 1m0s
ref:
branch: main
url: https://github.com/waltdizzy/waltdizzy-web.git
secretRef:
name: https-credentials

EDIT: Seems like Flux will remove the secret manually applied by-design. So we must put it in the repository. Is this right?

@pjbgf
Copy link
Member

pjbgf commented Mar 29, 2022

@ZeroDeltaAlpha did you get a chance to try on Kingdon's recommendation for your main issue?

On the timeouts not being honoured, a current fix we have for this is to use the Libgit2 Managed Transport we have recently released. Here's how to enable and use it: #636 (comment)

@ZeroDeltaAlpha
Copy link
Author

Hi @pjbgf we moved onto ArgoCD, as It basically deadlocked our deployment pipeline.

Even using flux uninstall to remove state, git repos would immediately get into a state where they could never fetch commits and would also never timeout even if one was set. We tried to replicate on sandbox cluster and it worked fine but not on our production cluster.

@pjbgf
Copy link
Member

pjbgf commented Jun 29, 2022

@ZeroDeltaAlpha thank you for getting back to me on this.

The first issue was fixed by #740, so GitRepository reconciliations should no longer hang indefinitely on latest versions of Flux.

The second issue is caused by using password-protected SSH keys without providing its password to Flux. The error message is really not user friendly, so I created a new issue to tackle that (#802).
The information as to how provide the password to Flux is being added to the latest API version, so it is more easily discoverable by other users on the Flux' documentation website (#801).

@pjbgf
Copy link
Member

pjbgf commented Jun 29, 2022

Closing this in favour of #802 to track the outstanding work. But happy to re-open in case of similar occurrences in the future.

@pjbgf pjbgf closed this as completed Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants