Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source controller fails to clone when specifying just the revision (and default branch main at origin) #315

Closed
squaremo opened this issue Mar 22, 2021 · 19 comments · Fixed by #462
Assignees

Comments

@squaremo
Copy link
Member

If I create a GitRepository with a spec like this:

spec:
  url: https://github.com/squaremo/flux2-example
  ref:
    commit: 92f0a9e59583732a052178427e376b77ae9e248b

then the source controller fails to clone the repo. Naively, I would expect it to do the equivalent of

git clone --no-checkout https://github.com/squaremo/flux2-example
cd flux2-example; git checkout 92f0a9e59583732a052178427e376b77ae9e248b

Unfortunately, the .branch field has a default of master, and the checkout code checks out the branch first, then the commit (whether or not it's on the branch, so far as I can tell). This seems like a needless step; but it also means you can't specify just a commit, you have to name a branch that exists at the origin, otherwise it'll fail.

squaremo added a commit to weaveworks-experiments/fleeet that referenced this issue Mar 22, 2021
See fluxcd/source-controller#315.

Signed-off-by: Michael Bridgen <mikeb@squaremobius.net>
squaremo added a commit to weaveworks-experiments/fleeet that referenced this issue Mar 22, 2021
See fluxcd/source-controller#315.

Signed-off-by: Michael Bridgen <mikeb@squaremobius.net>
@jonathan-innis
Copy link
Contributor

@squaremo interested in picking this one up

@jonathan-innis
Copy link
Contributor

This can be achieved in a fairly straight-forward way. A question about UX:

  • The return from the branch step typically returns the branch; however, there's a roundabout way to achieve this with commit as we essentially have to iterate over the branches to get the branch that contains the commit. If the branch name isn't strictly needed we can avoid this

@stefanprodan
Copy link
Member

The ref.commit field is for pinning a branch, I would not break the <branch>/<commit> revision format as users depend on it to confirm syncs are working via flux commands, events and notifications. If we remove the branch/tag from the revision string, then we will not be able to expose vars like GIT_BRANCH and GIT_TAG that users are asking for.

@squaremo
Copy link
Member Author

The ref.commit field is for pinning a branch

How do I target a commit without knowing or caring which branch it's on?

Why would I want to do this -- I'm treating GitRepository as a compilation target, i.e., it is constructed by another controller. It's not the end of the world if that controller has to supply a branch, it just complicates using sourcev1 as an API, since that has to be traced through the layers above which don't otherwise need it.

as users depend on it to confirm syncs are working via flux commands, events and notifications

Are people parsing that field? It seems like a mistake to encourage that, since it's brittle.

we will not be able to expose vars like GIT_BRANCH and GIT_TAG that users are asking for

Presumably if you ask for a branch, then GIT_TAG is not set, and if you ask for a tag, GIT_BRANCH is not set. This is like those situations, except you haven't asked for either of a branch or tag.

@stefanprodan
Copy link
Member

@squaremo my point is that if both the branch and the commit are specified, we should set the revision as <branch>/<commit>, if the commit doesn't exists in the specified branch we should error out. I'm ok with setting the revision to <commit> only if the branch is not present in the definition.

@squaremo
Copy link
Member Author

my point is that if both the branch and the commit are specified, we should set the revision as <branch>/<commit>

OK yes I see, that makes sense.

if the commit doesn't exists in the specified branch we should error out

This also makes sense -- does it do this now though? I suspect not, since it clones with the branch then does a checkout of the commit.

@hiddeco
Copy link
Member

hiddeco commented Mar 31, 2021

does it do this now though? I suspect not, since it clones with the branch then does a checkout of the commit.

It does, just ran into this while working on kustomize-controller tests:

  ref:
    branch: main
    commit: 16e375850ade4e5c77cc5ad37d0b643ff9aca44c

results in: git commit '16e375850ade4e5c77cc5ad37d0b643ff9aca44c' not found: object not found

While

  ref:
    branch: age-encryption
    commit: 16e375850ade4e5c77cc5ad37d0b643ff9aca44c

gives back: Fetched revision: age-encryption/16e375850ade4e5c77cc5ad37d0b643ff9aca44c

@squaremo
Copy link
Member Author

It does [...]

I stand corrected :-)

@simongottschlag
Copy link

simongottschlag commented Apr 3, 2021

What we are talking about is using a detached head, maybe we could specify this using a boolean (detachedHead: true, defaults to false) and when using it we skip the branch checkout?

The ref could maybe be in the format detachedHead/checksum?

@squaremo
Copy link
Member Author

squaremo commented Apr 3, 2021

What we are talking about is using a detached head, maybe we could specify this using a boolean (detachedHead: true, defaults to false) and when using it we skip the branch checkout?

Whether or not you check out the branch first, once you checkout a commit it's a detached head.

@simongottschlag
Copy link

As per Slack with @squaremo (https://cloud-native.slack.com/archives/C01A402P0R2/p1617451966151300) a boolean is redundant.

@jonathan-innis
Copy link
Contributor

To try to condense some of the discussion, what is the work to be done here? As I understand it, most of the desired behavior is currently covered by the current source-controller application logic. What is the delta here?

@squaremo
Copy link
Member Author

The change is to let people specify just a commit, without specifying a branch as well. When the branch is given, it should behave as it does now, including recording <branch>/<commit> as the revision, and returning an error if the commit is not on the given branch. When a branch is not given, it can do something else, e.g., record the revision as just the SHA1.

@jonathan-innis
Copy link
Contributor

Does this not break the current default behavior? If a user specifies no branch, the expectation is that this will check out the master branch?

@kingdonb
Copy link
Member

kingdonb commented May 14, 2021

@jonathan-innis The user must specify one of either branch, tag, semver, or commit according to gitrepo spec for type GitRepositoryRef. If the content of this struct is omitted, the struct is filled out automatically with a branch ref to master, that is the current behavior and I don't think it is changing.

If you specify a commit hash, Flux checks out that commit hash only if it is on the branch specified. This is also currently the behavior implemented, not changing. If the user specified only a commit hash and no branch ref, Flux will no longer fill the branch ref with a default value of master.

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
metadata:
  name: foo
  namespace: flux-system
spec:
  gitImplementation: go-git
  interval: 1m0s
  ref:
    branch: ''
    commit: 1b0d21cc635e28d51992d53e1ecc23a3f6cf4ac5
  timeout: 20s
  url: https://github.com/kingdonb/hephynator

If I create this source now, in 0.13.4, Flux fills out the branch ref as master anyway. I have to know what branch it's on in order to specify the git SHA in commit, (I explicitly didn't ask for the master branch, but Flux didn't allow me to do that.)

Yes this is a slight breaking change but I think the number of users who are specifying a commit hash in their source and want it to be rejected if the commit hash wasn't on the (implicit default) master branch, or reject the commit if the master branch didn't exist (but I also don't want to speak its name in the source YAML) ... must be miniscule. If this is your use case, you should be explicit and request a commit hash from the branch you want.

This is surprising behavior and we should axe it, even if it is technically a breaking change IMHO.

@stealthybox
Copy link
Member

stealthybox commented May 22, 2021

Should we even support restricting the commit to the specified branch?
I personally would have never invented or expected that behavior with my mental model and usage of git.
I would always expect a detached HEAD.

I view the commit as a complete override over all other values because it's the most specific.
The only case where I see it adds value/safety is if the platform-admin deploys an OPA policy, mutating webhook, or similar to restrict the object to have a specific branch name that cannot be changed by the owner of the object.

If we continue supporting this behavior, we should really make sure that the error message is way more clear, because reporting that the object isn't found is super misleading:

results in: git commit '16e375850ade4e5c77cc5ad37d0b643ff9aca44c' not found: object not found

The error should specify that it's not part of the particular branch and also print the branch name.

It might be difficult to differentiate the default branch value from one that's explicitly set if we're trying to still support this alongside "Any Commit from any branch".

@jonathan-innis
Copy link
Contributor

I think we need to come to a decision on how the API should react here. I agree with @stealthybox where specifying both a branch and a commit is redundant as a commit should exist outside of a branch construct.

From an ease-of-use perspective, the API is much more readable and easier understood if it's an order of precedence thing,

Branch < Tag < Semver < Commit

@ellieayla
Copy link

We got bit by this, trying to pin a single spec.ref.commit sourcing from a repo where the default branch was main. We switched to populating both spec.ref.commit and spec.ref.branch=main.

hiddeco added a commit that referenced this issue Oct 24, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
hiddeco added a commit that referenced this issue Oct 24, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
hiddeco added a commit that referenced this issue Oct 24, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
hiddeco added a commit that referenced this issue Oct 24, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
hiddeco added a commit that referenced this issue Oct 24, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
hiddeco added a commit that referenced this issue Oct 24, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
hiddeco added a commit that referenced this issue Oct 24, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
@hiddeco
Copy link
Member

hiddeco commented Oct 24, 2021

I think we need to come to a decision on how the API should react here. I agree with @stealthybox where specifying both a branch and a commit is redundant as a commit should exist outside of a branch construct.

It is not redundant, as the information can be used to perform a more efficient clone, in which just the history of the targeted branch is fetched.

Given this, I made changes in #462 to allow defining a commit without a branch configured, but defining a branch and a commit will fallback to the previous behavior (and in the case of go-git, does a more efficient clone).

darkowlzz pushed a commit that referenced this issue Oct 26, 2021
This commit changes the `gogit` behavior for commit checkouts,
now allowing one to reference to just a commit while omitting any
branch reference. Doing this creates an Artifact with a
`HEAD/<commit>` revision.

If both a `branch` and `commit` are defined, the commit is expected
to exist within the branch. This results in a more efficient clone
of just the target branch, and also makes this change backwards
compatible.

Fixes #407
Fixes #315

Signed-off-by: Hidde Beydals <hello@hidde.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants