Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1783221: lib/resourcemerge/core: Fix panic on container/port removal #282

Conversation

wking
Copy link
Member

@wking wking commented Dec 17, 2019

Avoid:

$ go test ./lib/resourcemerge/
panic: runtime error: index out of range [recovered]
			 panic: runtime error: index out of range

goroutine 38 [running]:
testing.tRunner.func1(0xc0001ab000)
  /home/trking/sdk/go1.12.9/src/testing/testing.go:830 +0x392
panic(0xccb520, 0x163f880)
  /home/trking/sdk/go1.12.9/src/runtime/panic.go:522 +0x1b5
github.com/openshift/cluster-version-operator/lib/resourcemerge.ensureContainers(0xc0000bbd57, 0xc0001d4040, 0xc0001cd760, 0x1, 0x1)
  /home/trking/.local/lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:69 +0x840
github.com/openshift/cluster-version-operator/lib/resourcemerge.ensurePodSpec(0xc0001c5d57, 0xc0001d4010, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0001cd760, 0x1, ...)
  /home/trking/.local/lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:28 +0xc6
github.com/openshift/cluster-version-operator/lib/resourcemerge.TestEnsurePodSpec.func1(0xc0001ab000)
  /home/trking/.local/lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core_test.go:276 +0xc7
testing.tRunner(0xc0001ab000, 0xc0001d8770)
  /home/trking/sdk/go1.12.9/src/testing/testing.go:865 +0xc0
created by testing.(*T).Run
  /home/trking/sdk/go1.12.9/src/testing/testing.go:916 +0x35a
FAIL		github.com/openshift/cluster-version-operator/lib/resourcemerge	0.010s

when removing an entry mutated the existing slice without re-entering the for i, whatever := range *existing.

No fix yet, but there's a test that reproduces the bug's panic.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Dec 17, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Bugzilla bug 1783221, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

WIP: Bug 1783221: lib/resourcemerge/core: Fix panic on container/port removal

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 17, 2019
@wking wking force-pushed the resource-merge-index-mutating-existing branch from 252c62b to caec9c2 Compare December 17, 2019 00:27
@wking wking changed the title WIP: Bug 1783221: lib/resourcemerge/core: Fix panic on container/port removal Bug 1783221: lib/resourcemerge/core: Fix panic on container/port removal Dec 17, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 17, 2019
@wking
Copy link
Member Author

wking commented Dec 17, 2019

Fixed by iterating through existing from the back of the slice, so any removals affect indexes that we've already covered.

@wking
Copy link
Member Author

wking commented Dec 17, 2019

/cherrypick release-4.3

@openshift-cherrypick-robot

@wking: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member Author

wking commented Dec 17, 2019

/cherrypick release-4.2

@openshift-cherrypick-robot

@wking: once the present PR merges, I will cherry-pick it on top of release-4.2 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member Author

wking commented Dec 17, 2019

/cherrypick release-4.1

@openshift-cherrypick-robot

@wking: once the present PR merges, I will cherry-pick it on top of release-4.1 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Avoid:

  $ go test ./lib/resourcemerge/
  panic: runtime error: index out of range [recovered]
  			 panic: runtime error: index out of range

  goroutine 38 [running]:
  testing.tRunner.func1(0xc0001ab000)
    .../sdk/go1.12.9/src/testing/testing.go:830 +0x392
  panic(0xccb520, 0x163f880)
    .../sdk/go1.12.9/src/runtime/panic.go:522 +0x1b5
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensureContainers(0xc0000bbd57, 0xc0001d4040, 0xc0001cd760, 0x1, 0x1)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:69 +0x840
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensurePodSpec(0xc0001c5d57, 0xc0001d4010, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0001cd760, 0x1, ...)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:28 +0xc6
  github.com/openshift/cluster-version-operator/lib/resourcemerge.TestEnsurePodSpec.func1(0xc0001ab000)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core_test.go:276 +0xc7
  testing.tRunner(0xc0001ab000, 0xc0001d8770)
    .../sdk/go1.12.9/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
    .../sdk/go1.12.9/src/testing/testing.go:916 +0x35a
  FAIL		github.com/openshift/cluster-version-operator/lib/resourcemerge	0.010s

(with the core_test.go but the old core.go) when removing an entry
mutated the existing slice without re-entering the:

  for i, whatever := range *existing

With this commit, we iterate from the back of the existing slice, so
any removals affect indexes that we've already covered.  For both
containers and service ports, any appends happen later in the
function, so we don't need to worry about slice expansion at this
point.

The buggy logic was originally from 3d1ad76 (Remove containers if
requested in Update, 2019-04-26, openshift#178).
@wking wking force-pushed the resource-merge-index-mutating-existing branch from caec9c2 to e8e80c2 Compare December 17, 2019 00:32
@wking
Copy link
Member Author

wking commented Dec 17, 2019

Hmm, buggy logic was originally from 3d1ad76 (#178), so we may not need to backport to 4.1. Not sure why that change wasn't backported, but it was linked to rhbz#1710172, which ended up getting closed NOTABUG. I'm happy manually adjusting the 4.1 backport to include both #178 and this change.

@patrickdillon
Copy link
Contributor

/lgtm

I might have suggested a comment to explain reverse iteration but git blame plus comprehensive commit message is probably better.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 17, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member Author

wking commented Dec 17, 2019

/hold

No need for retest noise with intergration broken:

!!! [1217 23:36:29] Timed out waiting for kube-apiserver:  to answer at https://172.16.114.9:8443/healthz; tried 60 waiting 1 between each
!!! [1217 23:36:29] check kube-apiserver logs: /tmp/shared/logs/kube-apiserver.log

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 17, 2019
@sdodson
Copy link
Member

sdodson commented Feb 3, 2020

/retest

@sdodson
Copy link
Member

sdodson commented Feb 3, 2020

@wking PTAL, unsure if anything has changed in the 1.5 months since a hold was put on this or if we're just hoping to stop retests

@wking
Copy link
Member Author

wking commented Feb 3, 2020

Hah, yeah, probably fixed integration by now :p.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 3, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit abfb48d into openshift:master Feb 4, 2020
@openshift-ci-robot
Copy link
Contributor

@wking: All pull requests linked via external trackers have merged. Bugzilla bug 1783221 has been moved to the MODIFIED state.

In response to this:

Bug 1783221: lib/resourcemerge/core: Fix panic on container/port removal

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@wking: new pull request created: #313

In response to this:

/cherrypick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@wking: #282 failed to apply on top of branch "release-4.2":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	lib/resourcemerge/core.go
M	lib/resourcemerge/core_test.go
Falling back to patching base and 3-way merge...
Auto-merging lib/resourcemerge/core_test.go
Auto-merging lib/resourcemerge/core.go
CONFLICT (content): Merge conflict in lib/resourcemerge/core.go
Patch failed at 0001 lib/resourcemerge/core: Fix panic on container/port removal

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@wking: #282 failed to apply on top of branch "release-4.1":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	lib/resourcemerge/core.go
A	lib/resourcemerge/core_test.go
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): lib/resourcemerge/core_test.go deleted in HEAD and modified in lib/resourcemerge/core: Fix panic on container/port removal. Version lib/resourcemerge/core: Fix panic on container/port removal of lib/resourcemerge/core_test.go left in tree.
Auto-merging lib/resourcemerge/core.go
CONFLICT (content): Merge conflict in lib/resourcemerge/core.go
Patch failed at 0001 lib/resourcemerge/core: Fix panic on container/port removal

In response to this:

/cherrypick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the resource-merge-index-mutating-existing branch February 5, 2020 01:14
wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 6, 2020
Avoid:

  $ go test ./lib/resourcemerge/
  panic: runtime error: index out of range [recovered]
  			 panic: runtime error: index out of range

  goroutine 38 [running]:
  testing.tRunner.func1(0xc0001ab000)
    .../sdk/go1.12.9/src/testing/testing.go:830 +0x392
  panic(0xccb520, 0x163f880)
    .../sdk/go1.12.9/src/runtime/panic.go:522 +0x1b5
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensureContainers(0xc0000bbd57, 0xc0001d4040, 0xc0001cd760, 0x1, 0x1)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:69 +0x840
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensurePodSpec(0xc0001c5d57, 0xc0001d4010, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0001cd760, 0x1, ...)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:28 +0xc6
  github.com/openshift/cluster-version-operator/lib/resourcemerge.TestEnsurePodSpec.func1(0xc0001ab000)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core_test.go:276 +0xc7
  testing.tRunner(0xc0001ab000, 0xc0001d8770)
    .../sdk/go1.12.9/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
    .../sdk/go1.12.9/src/testing/testing.go:916 +0x35a
  FAIL		github.com/openshift/cluster-version-operator/lib/resourcemerge	0.010s

(with the core_test.go but the old core.go) when removing an entry
mutated the existing slice without re-entering the:

  for i, whatever := range *existing

With this commit, we iterate from the back of the existing slice, so
any removals affect indexes that we've already covered.  For both
containers and service ports, any appends happen later in the
function, so we don't need to worry about slice expansion at this
point.

The buggy logic was originally from 3d1ad76 (Remove containers if
requested in Update, 2019-04-26, openshift#178).

This commit cherry-picks e8e80c2 (lib/resourcemerge/core: Fix panic
on container/port removal, 2019-12-16, openshift#282) back to the 4.2 branch.
But since f268b6d (Add support to EnsureServicePorts, 2019-11-25,
branch, I've only kept the container portion of e8e80c2 in this
commit.
wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 6, 2020
Avoid:

  $ go test ./lib/resourcemerge/
  panic: runtime error: index out of range [recovered]
  			 panic: runtime error: index out of range

  goroutine 38 [running]:
  testing.tRunner.func1(0xc0001ab000)
    .../sdk/go1.12.9/src/testing/testing.go:830 +0x392
  panic(0xccb520, 0x163f880)
    .../sdk/go1.12.9/src/runtime/panic.go:522 +0x1b5
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensureContainers(0xc0000bbd57, 0xc0001d4040, 0xc0001cd760, 0x1, 0x1)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:69 +0x840
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensurePodSpec(0xc0001c5d57, 0xc0001d4010, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0001cd760, 0x1, ...)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:28 +0xc6
  github.com/openshift/cluster-version-operator/lib/resourcemerge.TestEnsurePodSpec.func1(0xc0001ab000)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core_test.go:276 +0xc7
  testing.tRunner(0xc0001ab000, 0xc0001d8770)
    .../sdk/go1.12.9/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
    .../sdk/go1.12.9/src/testing/testing.go:916 +0x35a
  FAIL		github.com/openshift/cluster-version-operator/lib/resourcemerge	0.010s

(with the core_test.go but the old core.go) when removing an entry
mutated the existing slice without re-entering the:

  for i, whatever := range *existing

With this commit, we iterate from the back of the existing slice, so
any removals affect indexes that we've already covered.  For both
containers and service ports, any appends happen later in the
function, so we don't need to worry about slice expansion at this
point.

The buggy logic was originally from 3d1ad76 (Remove containers if
requested in Update, 2019-04-26, openshift#178).

This commit cherry-picks e8e80c2 (lib/resourcemerge/core: Fix panic
on container/port removal, 2019-12-16, openshift#282) back to the 4.2 branch.
But since f268b6d (Add support to EnsureServicePorts, 2019-11-25, openshift#272)
and its EnsureServicePorts was never backported to the 4.2 branch,
I've only kept the container portion of e8e80c2 in this commit.
@wking
Copy link
Member Author

wking commented Feb 6, 2020

@wking: #282 failed to apply on top of branch "release-4.2":

Manually backported (dropping the port changes) to 4.2 in #314.

wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 24, 2020
Avoid:

  $ go test ./lib/resourcemerge/
  panic: runtime error: index out of range [recovered]
  			 panic: runtime error: index out of range

  goroutine 38 [running]:
  testing.tRunner.func1(0xc0001ab000)
    .../sdk/go1.12.9/src/testing/testing.go:830 +0x392
  panic(0xccb520, 0x163f880)
    .../sdk/go1.12.9/src/runtime/panic.go:522 +0x1b5
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensureContainers(0xc0000bbd57, 0xc0001d4040, 0xc0001cd760, 0x1, 0x1)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:69 +0x840
  github.com/openshift/cluster-version-operator/lib/resourcemerge.ensurePodSpec(0xc0001c5d57, 0xc0001d4010, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0001cd760, 0x1, ...)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:28 +0xc6
  github.com/openshift/cluster-version-operator/lib/resourcemerge.TestEnsurePodSpec.func1(0xc0001ab000)
    .../lib/go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core_test.go:276 +0xc7
  testing.tRunner(0xc0001ab000, 0xc0001d8770)
    .../sdk/go1.12.9/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
    .../sdk/go1.12.9/src/testing/testing.go:916 +0x35a
  FAIL		github.com/openshift/cluster-version-operator/lib/resourcemerge	0.010s

(with the core_test.go but the old core.go) when removing an entry
mutated the existing slice without re-entering the:

  for i, whatever := range *existing

With this commit, we iterate from the back of the existing slice, so
any removals affect indexes that we've already covered.  For both
containers and service ports, any appends happen later in the
function, so we don't need to worry about slice expansion at this
point.

The buggy logic was originally from 3d1ad76 (Remove containers if
requested in Update, 2019-04-26, openshift#178).

This commit cherry-picks e8e80c2 (lib/resourcemerge/core: Fix panic
on container/port removal, 2019-12-16, openshift#282) back to the 4.2 branch.
But since f268b6d (Add support to EnsureServicePorts, 2019-11-25, openshift#272)
and its EnsureServicePorts was never backported to the 4.2 branch,
I've only kept the container portion of e8e80c2 in this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants