Skip to content

Avoids race condition on first gitjob creation whith polling enabled#4986

Merged
0xavi0 merged 1 commit intorancher:mainfrom
0xavi0:gitjob-twice-at-beginning
Apr 16, 2026
Merged

Avoids race condition on first gitjob creation whith polling enabled#4986
0xavi0 merged 1 commit intorancher:mainfrom
0xavi0:gitjob-twice-at-beginning

Conversation

@0xavi0
Copy link
Copy Markdown
Contributor

@0xavi0 0xavi0 commented Apr 15, 2026

When polling is enabled Fleet is creating the gijob for calling fleet apply twice. The first time we're creating the gitjob because of the creation of the GitRepo itself and the second time it is because we receive the pollingCommi asynchronously from the quartz scheduler.

This was producing a concurrent execution of fleet apply, which was ending up in Bundle creation conflicts.

Also, when deleting a previous job Fleet was not deleting the child pods. The PR is also doing this and requeuing after deleting a previous job so k8s has time to effectively delete fleet apply pods and avoid race conditions.

Refers to: #4984

Additional Information

Checklist

- [ ] I have updated the documentation via a pull request in the fleet-product-docs repository.

When polling is enabled Fleet is creating the gijob for calling fleet apply twice.
The first time we're creating the gitjob because of the creation of the GitRepo itself and
the second time it is because we receive the pollingCommi asynchronously from the
quartz scheduler.

This was producing a concurrent execution of fleet apply, which was ending up in Bundle creation conflicts.

Also, when deleting a previous job Fleet was not deleting the child pods.
The PR is also doing this and requeuing after deleting a previous job so k8s has time to effectively
delete fleet apply pods and avoid race conditions.

Refers to: rancher#4984
Signed-off-by: Xavi Garcia <xavi.garcia@suse.com>
@0xavi0 0xavi0 added this to the v2.14.2 milestone Apr 15, 2026
@0xavi0 0xavi0 self-assigned this Apr 15, 2026
@0xavi0 0xavi0 added this to Fleet Apr 15, 2026
@0xavi0 0xavi0 marked this pull request as ready for review April 15, 2026 16:16
@0xavi0 0xavi0 requested a review from a team as a code owner April 15, 2026 16:16
Copilot AI review requested due to automatic review settings April 15, 2026 16:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a race in Fleet’s gitjob controller when polling is enabled, where multiple fleet-apply Jobs can be created concurrently during initial GitRepo creation / first polling commit, leading to Bundle creation conflicts. It also adjusts Job deletion behavior to better avoid concurrent fleet-apply pod execution.

Changes:

  • Prevent gitjob creation when polling is enabled but Status.Commit is still empty.
  • Delete the previous gitjob when the commit changes and requeue to allow time for termination before creating the next Job.
  • Add unit tests covering the “empty old commit” deletion case and shouldCreateJob behavior with/without polling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
internal/cmd/controller/gitops/reconciler/gitjob_controller.go Adds requeue-after-delete flow, updates previous-job deletion semantics, and blocks job creation on empty commit when polling is enabled.
internal/cmd/controller/gitops/reconciler/gitjob_test.go Adds tests for deleting an empty-commit previous job and for shouldCreateJob behavior with polling enabled/disabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/cmd/controller/gitops/reconciler/gitjob_controller.go
@weyfonk weyfonk moved this to 👀 In review in Fleet Apr 16, 2026
Copy link
Copy Markdown
Contributor

@weyfonk weyfonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@0xavi0 0xavi0 merged commit 595d66f into rancher:main Apr 16, 2026
26 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Fleet Apr 16, 2026
0xavi0 added a commit to 0xavi0/fleet that referenced this pull request Apr 16, 2026
…ancher#4986)

When polling is enabled Fleet is creating the gijob for calling fleet apply twice.
The first time we're creating the gitjob because of the creation of the GitRepo itself and
the second time it is because we receive the pollingCommi asynchronously from the
quartz scheduler.

This was producing a concurrent execution of fleet apply, which was ending up in Bundle creation conflicts.

Also, when deleting a previous job Fleet was not deleting the child pods.
The PR is also doing this and requeuing after deleting a previous job so k8s has time to effectively
delete fleet apply pods and avoid race conditions.

Refers to: rancher#4984

Signed-off-by: Xavi Garcia <xavi.garcia@suse.com>
@0xavi0 0xavi0 moved this from ✅ Done to Needs QA review in Fleet Apr 16, 2026
0xavi0 added a commit to 0xavi0/fleet that referenced this pull request Apr 16, 2026
…ancher#4986)

When polling is enabled Fleet is creating the gijob for calling fleet apply twice.
The first time we're creating the gitjob because of the creation of the GitRepo itself and
the second time it is because we receive the pollingCommi asynchronously from the
quartz scheduler.

This was producing a concurrent execution of fleet apply, which was ending up in Bundle creation conflicts.

Also, when deleting a previous job Fleet was not deleting the child pods.
The PR is also doing this and requeuing after deleting a previous job so k8s has time to effectively
delete fleet apply pods and avoid race conditions.

Refers to: rancher#4984

Signed-off-by: Xavi Garcia <xavi.garcia@suse.com>
0xavi0 added a commit that referenced this pull request Apr 16, 2026
…4986) (#4996)

When polling is enabled Fleet is creating the gijob for calling fleet apply twice.
The first time we're creating the gitjob because of the creation of the GitRepo itself and
the second time it is because we receive the pollingCommi asynchronously from the
quartz scheduler.

This was producing a concurrent execution of fleet apply, which was ending up in Bundle creation conflicts.

Also, when deleting a previous job Fleet was not deleting the child pods.
The PR is also doing this and requeuing after deleting a previous job so k8s has time to effectively
delete fleet apply pods and avoid race conditions.

Refers to: #4984

Signed-off-by: Xavi Garcia <xavi.garcia@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Needs QA review

Development

Successfully merging this pull request may close these issues.

3 participants