Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring deployment config with history leads to weird state #20729

Closed
mfojtik opened this issue Aug 22, 2018 · 16 comments
Closed

Restoring deployment config with history leads to weird state #20729

mfojtik opened this issue Aug 22, 2018 · 16 comments
Assignees
Labels
component/apps kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2 sig/master

Comments

@mfojtik
Copy link
Member

mfojtik commented Aug 22, 2018

Consider this scenario:

# *(this is broken, you have to specify the command manually with oc edit...)
$ oc create deploymentconfig test --image=centos:7 -- /bin/sleep infinity
# run this 3 times:
$ oc rollout latest test
$ oc get rc
NAME      DESIRED   CURRENT   READY     AGE
test-1    0         0         0         1m
test-2    0         0         0         1m
test-3    1         1         1         1m

$ oc get rc,dc -o yaml > backup.yaml
$ oc delete all --all

# Now restore everything back:

$ oc create -f backup.yaml

What happens here is basically all replication controllers are deleted right after they are created. Then a new RC is created with revision=1... This is because the replication controllers have ownerRefs set to DC which was deleted and the UUID does not match with the newly created DC.

If you edit the backup.yaml and remove all ownerRef fields from the RCs and recreate everything
then the 3 RC's will stay, but the revision for the DC is set to 1 instead of 3. My guess is that the adoption is broken or we simply forget to bump the revision to match the currently active RC...

That means, when you do oc rollout latest test, it will tell you that it successfully rolled out, but nothing will happen (just the DC revision is bumped) until you call that command three times. On fourth time, it will actually trigger a new rollout. My guess here is that the controller will fail to create the RC because it already exists, then on reconcile it sees that the RC is there and bump the revision.

*#20728

@mfojtik
Copy link
Member Author

mfojtik commented Aug 22, 2018

@smarterclayton I believe this is broken since 3.7 (haven't confirmed that).

@mfojtik
Copy link
Member Author

mfojtik commented Aug 22, 2018

/cc @openshift/sig-master

@mfojtik mfojtik added kind/bug Categorizes issue or PR as related to a bug. priority/P2 component/apps labels Aug 22, 2018
@tnozicka
Copy link
Contributor

The fact that we can't rollout immediately after adoption is a DC bug - how the owner references are exported is more for cli and possibly related to new plans w.r.t to export
/cc @soltysh

@mfojtik
Copy link
Member Author

mfojtik commented Aug 22, 2018

Yeah, there should be a way to remove the UUID inside owneRefs. I would expect this will also affect builds (@bparees).

@tnozicka
Copy link
Contributor

Yeah, there should be a way to remove the UUID inside owneRefs.

technically, you need to remove the whole ownerref

@mfojtik
Copy link
Member Author

mfojtik commented Aug 22, 2018

To be clear, there seems to be 2 issues:

  1. The backup.yaml have ownerRefs which breaks adoption (this is more CLI problem, we can document that you have to strip the ownerRefs and we should have upstream issue created as this likely affecting upstream workloads as well

  2. The revision not matching the 3 RC's is our bug, the deployment config controller should set the revision to 3 after it observes the latest running RC is test-3 (I don't think we store the revision in RC, just name?)

@bparees
Copy link
Contributor

bparees commented Aug 22, 2018

Yeah, there should be a way to remove the UUID inside owneRefs. I would expect this will also affect builds (@bparees).

I think @deads2k would tell us this is an import problem (export should export everything it knows, import needs to be more judicious about what gets stripped/mutated/cleaned during import). But yeah, i would guess builds and also various service broker objects (including templateinstances i think) would be affected by this... pretty much anything w/ an ownerRef, no?

@mfojtik
Copy link
Member Author

mfojtik commented Aug 22, 2018

In this case "import" is oc create and I don't think we want to have any special logic there treating ownerRefs pointing to non-existing UUIDS /cc @soltysh

@bparees
Copy link
Contributor

bparees commented Aug 22, 2018

yeah i understand, the point is that import needs to exist because that is where such logic would need to live. (or some secondary tooling that you run to process your exported resources before you send them to oc create).

@tnozicka
Copy link
Contributor

I like oc create --strip-ownerrefs

@soltysh
Copy link
Member

soltysh commented Aug 23, 2018

In this case "import" is oc create and I don't think we want to have any special logic there treating ownerRefs pointing to non-existing UUIDS /cc @soltysh

Nope, create must be generic as possible. What you're describing is import and that's different from create.

I like oc create --strip-ownerrefs

Nada!

@bparees
Copy link
Contributor

bparees commented Aug 23, 2018

+1 to @soltysh's responses. overloading create doesn't seem like the right answer here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 26, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 26, 2018
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/apps kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2 sig/master
Projects
None yet
Development

No branches or pull requests

7 participants