Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1708648: pkg/oc/cli/admin/release/new: Guard against inconsistent sources #22816

Closed

Conversation

wking
Copy link
Member

@wking wking commented May 10, 2019

Sometimes folks attempt to build release images with images built from different commits from the same Git repository. This can sometimes break the release and is almost always a bad idea. This commit fails fast in this situation, because it's harder to root-cause this if we fail later on. I'd be ok with a flag to allow divergence for callers who feel like they know what they're doing, but have not added it in this commit [edit: now I have]. And of course, the real fix is to not do this ;), although I don't think that means this oc code shouldn't guard against it.

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 10, 2019
@smarterclayton
Copy link
Contributor

/hold

until we identify what the ART fix is (which might be using this to gate sync)

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2019
@wking
Copy link
Member Author

wking commented May 10, 2019

until we identify what the ART fix is (which might be using this to gate sync)

Under what ART outcome would we not want this guard? Even if we are confident that we will never do this as a result of some ART/CI fix, the guard won't hurt us, and it might help other folks who are cutting their own releases with this tooling.

@smarterclayton
Copy link
Contributor

So a case this blocks which I'm not positive I want to blanket give up on is where we decide to rebuild a particular image for a specific fix without touching anything else. The image stream is our composition tool. We do some checks in oc for sanity, but this is a very strong check that may have legitimate cases.

If we put this in here, we can't bypass it later.

@cgwalters
Copy link
Member

If we put this in here, we can't bypass it later.

Add a --allow-image-divergence option?

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

pkg/oc/cli/admin/release/new.go Outdated Show resolved Hide resolved
@wking wking force-pushed the release-ensure-consistent-commits branch from 3094f69 to d1aebc9 Compare May 10, 2019 20:11
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 10, 2019
@wking
Copy link
Member Author

wking commented May 10, 2019

Add a --allow-image-divergence option?

I've added --allow-commit-divergence in 3094f69 -> d1aebc9.

@wking
Copy link
Member Author

wking commented May 13, 2019

/retest

Now that Friday's Route 53 meltdown has been resolved.

@wking wking force-pushed the release-ensure-consistent-commits branch from d1aebc9 to c9d8e9d Compare May 13, 2019 17:24
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: enj

If they are not already assigned, you can assign the PR to them by writing /assign @enj in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member Author

wking commented May 13, 2019

Bumped the completions to fix the verify failure with d1aebc9 -> c9d8e9d.

@wking
Copy link
Member Author

wking commented May 13, 2019

unit:

    cloud_request_manager_test.go:98: Timeout waiting for "10.0.1.13" address to appear

/retest

@wking
Copy link
Member Author

wking commented May 13, 2019

unit:

could not resolve inputs: could not determine inputs for step [input:machine-os-content-base]: could not resolve base image: imagestreamtags.image.openshift.io "4.2" not found

Since resolved by changes I didn't follow closely enough to understand ;).

/retest

@wking
Copy link
Member Author

wking commented May 14, 2019

unit:

E0513 23:08:08.049450   15398 plugin_watcher.go:130] error failed to get plugin info using RPC GetInfo at socket /tmp/volume/plugin_test082342293/plugin-5.sock, err: rpc error: code = DeadlineExceeded desc = context deadline exceeded when handling create event: "/tmp/volume/plugin_test082342293/plugin-5.sock": CREATE
--- FAIL: TestPluginRegistration (30.29s)
    plugin_watcher_test.go:275: Timed out while waiting for registration status 0

/retest

@wking
Copy link
Member Author

wking commented May 14, 2019

All green now :).

@wking wking force-pushed the release-ensure-consistent-commits branch from c9d8e9d to 0a1557b Compare May 14, 2019 23:20
@wking wking changed the title pkg/oc/cli/admin/release/new: Guard against commit divergence Bug 1708648: pkg/oc/cli/admin/release/new: Guard against inconsistent sources May 14, 2019
@stevekuznetsov
Copy link
Contributor

/retest

@wking
Copy link
Member Author

wking commented May 15, 2019

unit:

FAIL: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/value/encrypt/envelope TestGRPCService 1.6s
FAIL: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/value/encrypt/envelope TestUnsupportedVersion 1.81s

/retest

@wking
Copy link
Member Author

wking commented May 15, 2019

unit:

FAIL: github.com/openshift/origin/pkg/build/controller/build TestHandleBuild 3.03s

/retest

@wking
Copy link
Member Author

wking commented May 15, 2019

unit:

FAIL: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/value/encrypt/envelope TestGRPCService 1.18s
FAIL: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/value/encrypt/envelope TestUnsupportedVersion 1.81s

/retest

@wking
Copy link
Member Author

wking commented May 17, 2019

All green :).

@@ -1466,6 +1475,32 @@ func pruneEmptyDirectories(dir string) error {
})
}

// ensureConsistentSources checks that images built from the same Git repository are from the same Git commit.
func ensureConsistentSources(is *imageapi.ImageStream) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this function should return the error with all the different commits for a location, not the difference to first found....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this function should return the error with all the different commits for a location...

Dup of your other comment? I'll mark this resolved based on the change I referenced there, but please re-open if there is another facet of this that I'm overlooking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example today MCO has 4 images.

When all 4 differ error will be something like
mcc was xx mco was x for mco repo
mcs was xy mco was x for mco repo
mcd was xz mco was x for mco repo.

But I guess a better error message would be
mco repo has inconsistent commits mco x mcc xx mcs xy mcd xz

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all 4 differ error will be something like...

Yeah, I'm fine with that. Doesn't seem like something that is going to come up often, and we can always come back and polish the error message if it does.

Sometimes folks attempt to build release images with images built from
different commits from the same Git repository.  This can sometimes
break the release [1] and is almost always a bad idea.  This commit
fails fast in this situation, because it's harder to root-cause this
if we fail later on.  Folks who want to take responsibility for any
divergence can set --allow-commit-divergence, in which case we warn
and continue instead of erroring out.  And of course, the real fix for
folks who want to avoid this is to fix the source you're pointing oc
at, this guard in oc is just covering your back ;).  Completions
bumped with:

  $ make build WHAT=cmd/oc
  $ hack/update-generated-completions.sh

I'm personally fine with ensureConsistentSources returning only the
first difference found, because I don't expect there to be many within
a given source stream.  But Abhinav asked for all of them [2], so I'm
using NewAggregate to collect them.  From the NewAggregate docs [3]:

  If the slice is empty, this returns nil.

so we don't need a len(errs) guard locally.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1708648
[2]: openshift#22816 (comment)
[3]: https://godoc.org/k8s.io/apimachinery/pkg/util/errors#NewAggregate
@wking wking force-pushed the release-ensure-consistent-commits branch from 0a1557b to 2d09425 Compare May 19, 2019 13:57
@wking
Copy link
Member Author

wking commented May 19, 2019

unit:

FAIL: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/value/encrypt/envelope TestGRPCService 1.2s
FAIL: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/value/encrypt/envelope TestUnsupportedVersion 1.43s

/test unit

@wking
Copy link
Member Author

wking commented May 23, 2019

/retest

@wking
Copy link
Member Author

wking commented May 23, 2019

All green, just waiting on approval and a hold-pull :).

@openshift-ci-robot
Copy link

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2019
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2019
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 13, 2019
@enj
Copy link
Contributor

enj commented Oct 16, 2019

/uncc

@stlaz @sttts @mfojtik

@openshift-ci-robot openshift-ci-robot removed the request for review from enj October 16, 2019 15:13
@openshift-ci-robot
Copy link

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/images-artifacts 2d09425 link /test images-artifacts
ci/prow/e2e-aws-upgrade 2d09425 link /test e2e-aws-upgrade
ci/prow/e2e-gcp-upgrade 2d09425 link /test e2e-gcp-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants