Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after knative 0.2.0 upgrade we're intermittently seeing duplicate build pods #2561

Closed
rawlingsj opened this issue Dec 21, 2018 · 4 comments
Closed
Assignees
Labels
area/build kind/bug Issue is a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@rawlingsj
Copy link
Member

FWIW think this is different from issue #2515 which is caused by multiple knative build controllers running in a cluster.

This issue has only started since the 0.2.0 knative build upgrade. Prow is creating a single build resources and the knative build controller is starting two build pods with the exact same details, i.e org repo ref buildid etc.

This doesn't happen on every build which suggests it could be a timing issue maybe. This is me just speculating right now but it could be an issue with the upstream knative build project. Let's gather facts and understand the issue so we can engage in a positive manner.

I just had a quick look and it seems this is getting called twice https://github.com/knative/build/blob/a0c7c07/pkg/reconciler/build/build.go#L173
So I'm wondering if the build resource is getting updated (possible by prow) in quick succession which triggers the reconcile function twice, so perhaps this logic https://github.com/knative/build/blob/a0c7c07/pkg/reconciler/build/build.go#L159 to check if a build pod is started is correct? Could the cluster or pod name not be updated by the first watch event by the time another is received?

As a side note: it would be really good to get a test case to replicate what we're seeing with duplicate builds

@rawlingsj
Copy link
Member Author

/kind bug
/priority critical-urgent
/area build
/assign rawlingsj

@jenkins-x-bot jenkins-x-bot added kind/bug Issue is a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/build labels Dec 21, 2018
@rawlingsj
Copy link
Member Author

I've deployed to our infra a possible fix which avoids a possible race condition. knative checks is the build has a status to tell if a pod has started. It then creates the pod and updates the build status. I'm trying instead to update the build first with a dummy status, then start the pod and update with the real status as usual after. Will need to leave running for a while to check if this helps.

jenkins-x/build@c527eb3

@rawlingsj
Copy link
Member Author

possible fix in a PR upstream knative/build#519

@rawlingsj
Copy link
Member Author

closing as we have released a patched version until upstream PR is merged jenkins-x-charts/knative-build#14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build kind/bug Issue is a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

2 participants