Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config #2315

Merged
merged 1 commit into from Jan 16, 2021

Conversation

hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Jan 4, 2021

Since 4.4 etcd no longer has a DNS dependency. Currently, we are still passing etcd records to the proxy config which should be removed.

In order to remove the deprecated etcd records from the proxy config, we must tolerate the old etcd records.
This PR allows the toleration of this divergence providing the user a chance to update the NoProxy config and in the meantime allow CI to pass for the below PRs.

blocks:

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2021
@hexfusion hexfusion changed the title [wip] pkg/operator: tolerate removal of etcd records from proxy config [wip] Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config Jan 4, 2021
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jan 4, 2021
@openshift-ci-robot
Copy link
Contributor

@hexfusion: This pull request references Bugzilla bug 1909502, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

[wip] Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor Author

/retest

@hexfusion hexfusion changed the title [wip] Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config Jan 5, 2021
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2021
@hexfusion
Copy link
Contributor Author

hexfusion commented Jan 5, 2021

From proxy test cluster
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1346306484308283392

/etc/machine-config-daemon/currentconfig
NO_PROXY=.cluster.local,.svc,.us-west-1.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.ci-ln-w076dpt-d5d6b.origin-ci-int-aws.dev.rhcloud.com,localhost\n
/etc/mcs-machine-config-content.json:338:                                "contents": "[Unit]\n[Service]\nEnvironment=HTTP_PROXY=http://ewolinet:5f6ccbbbafc66013d012839921ada773@35.196.128.173:3128/\nEnvironment=HTTPS_PROXY=http://ewolinet:5f6ccbbbafc66013d012839921ada773@35.196.128.173:3128/\nEnvironment=NO_PROXY=.cluster.local,.svc,.us-west-1.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.ci-ln-w076dpt-d5d6b.origin-ci-int-aws.dev.rhcloud.com,localhost\n",
 $ oc get proxy -o json | jq '.items[0].status.noProxy'
".cluster.local,.svc,.us-west-1.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.ci-ln-w076dpt-d5d6b.origin-ci-int-aws.dev.rhcloud.com,etcd-0.,[etcd-1](https://issues.redhat.com/browse/etcd-1).,[etcd-2](https://issues.redhat.com/browse/etcd-2).,localhost"
$ oc logs -n openshift-machine-config-operator machine-config-daemon-nhhsz -c machine-config-daemon
I0105 14:35:42.014625    5900 start.go:108] Version: machine-config-daemon-4.6.0-202006240615.p0-542-gc85d770f-dirty (c85d770f65130f48efa47ab3f309662877ad2d09)
I0105 14:35:42.018761    5900 start.go:121] Calling chroot("/rootfs")
I0105 14:35:42.018919    5900 rpm-ostree.go:261] Running captured: rpm-ostree status --json
I0105 14:35:42.284149    5900 daemon.go:224] Booted osImageURL: registry.svc.ci.openshift.org/ci-ln-w076dpt/stable@sha256:52c20c647d964d98066603a00412a18b876d6da1f7235e409f184794790f47a6 (47.83.202101041743-0)
I0105 14:35:42.404616    5900 daemon.go:231] Installed Ignition binary version: 2.8.1
I0105 14:35:42.497463    5900 start.go:97] Copied self to /run/bin/machine-config-daemon on host
I0105 14:35:42.500829    5900 update.go:1844] Starting to manage node: ip-10-0-220-241.us-west-1.compute.internal
I0105 14:35:42.500913    5900 metrics.go:105] Registering Prometheus metrics
I0105 14:35:42.502061    5900 metrics.go:110] Starting metrics listener on 127.0.0.1:8797
I0105 14:35:42.508337    5900 rpm-ostree.go:261] Running captured: rpm-ostree status
I0105 14:35:42.545589    5900 daemon.go:863] State: idle
Deployments:
* pivot://registry.svc.ci.openshift.org/ci-ln-w076dpt/stable@sha256:52c20c647d964d98066603a00412a18b876d6da1f7235e409f184794790f47a6
              CustomOrigin: Managed by machine-config-operator
                   Version: 47.83.202101041743-0 (2021-01-04T17:46:18Z)

  ostree://950ea2dee9038fd09c88883ad258b4c3fa91cde7155409fb385e348387074d46
                   Version: 47.83.202012030221-0 (2020-12-03T02:25:01Z)
I0105 14:35:42.545609    5900 rpm-ostree.go:261] Running captured: journalctl --list-boots
I0105 14:35:42.550667    5900 daemon.go:870] journalctl --list-boots:
-1 097e9a0e68984f7fa9557144f0aba666 Tue 2021-01-05 14:15:17 UTC—Tue 2021-01-05 14:32:00 UTC
 0 d5f8226b9f634475bd25823731ed1c6a Tue 2021-01-05 14:32:17 UTC—Tue 2021-01-05 14:35:42 UTC
I0105 14:35:42.550682    5900 rpm-ostree.go:261] Running captured: systemctl list-units --state=failed --no-legend
I0105 14:35:42.557422    5900 daemon.go:885] systemd service state: OK
I0105 14:35:42.557437    5900 daemon.go:617] Starting MachineConfigDaemon
I0105 14:35:42.557516    5900 daemon.go:624] Enabling Kubelet Healthz Monitor
I0105 14:35:43.514422    5900 daemon.go:395] Node ip-10-0-220-241.us-west-1.compute.internal is part of the control plane
I0105 14:35:45.622219    5900 node.go:24] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-220-241.us-west-1.compute.internal: map[volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
I0105 14:35:45.622899    5900 node.go:45] Setting initial node config: rendered-master-d7e09bd37c2c9ec80abe64782130d660
I0105 14:35:45.665914    5900 daemon.go:781] In bootstrap mode
E0105 14:35:45.666005    5900 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-d7e09bd37c2c9ec80abe64782130d660" not found
I0105 14:35:47.669738    5900 daemon.go:781] In bootstrap mode
E0105 14:35:47.670189    5900 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-d7e09bd37c2c9ec80abe64782130d660" not found
I0105 14:36:19.689761    5900 daemon.go:781] In bootstrap mode
I0105 14:36:19.689789    5900 daemon.go:809] Current+desired config: rendered-master-d7e09bd37c2c9ec80abe64782130d660
I0105 14:36:19.696682    5900 daemon.go:1061] No bootstrap pivot required; unlinking bootstrap node annotations
I0105 14:36:19.696734    5900 daemon.go:1099] Validating against pending config rendered-master-d7e09bd37c2c9ec80abe64782130d660
I0105 14:36:19.716080    5900 daemon.go:1110] Validated on-disk state
I0105 14:36:19.748807    5900 daemon.go:1165] Completing pending config rendered-master-d7e09bd37c2c9ec80abe64782130d660
I0105 14:36:19.748895    5900 update.go:1844] completed update for config rendered-master-d7e09bd37c2c9ec80abe64782130d660
I0105 14:36:19.760796    5900 daemon.go:1181] In desired config rendered-master-d7e09bd37c2c9ec80abe64782130d660

@openshift-ci-robot
Copy link
Contributor

@hexfusion: This pull request references Bugzilla bug 1909502, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci-robot
Copy link
Contributor

@hexfusion: This pull request references Bugzilla bug 1909502, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor Author

/test e2e-agnostic-upgrade

@hexfusion
Copy link
Contributor Author

/test e2e-aws-serial

pkg/operator/sync.go Outdated Show resolved Hide resolved
pkg/operator/sync.go Outdated Show resolved Hide resolved
@hexfusion hexfusion force-pushed the fix-proxy-config branch 2 times, most recently from 5711f04 to 4691c78 Compare January 5, 2021 22:07
@hexfusion
Copy link
Contributor Author

/hold
When I initially put up this PR the intention was to tolertate a transient condition in CI. But the more I think about this, during the upgrade to 4.7 we need to also ensure the etcd records are removed from existing configs. For this reason more thought is required for the solution.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 6, 2021
@hexfusion
Copy link
Contributor Author

/hold cancel

At this point I am not 100% sure what other direction we would go here. I think this PR will mitigate any sync issues with etcd records in the NoProxy config. It seems though that a doc should be added to the release outlining manual removal of the stale records themselves. It is hard to say what the cluster might still be doing with these old etcd A records if we remove them we could break something. I think it would be reasonable to tolerate them and ask the user to remove?

If someone has another solution to automating removal please let me know.

cc @kikisdeliveryservice @staebler PTAL

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Jan 13, 2021

I'd like a successful e2e-aws run before approving (and it's reqd to merge anyway) but we're waiting on the kubelet fix upstream for ci to unblock.

But just to understand we will merge this, then the two prs above, then revert this?

/retest

@hexfusion
Copy link
Contributor Author

I'd like a successful e2e-aws run before approving (and it's reqd to merge anyway) but we're waiting on the kubelet fix upstream for ci to unblock.

But just to understand we will merge this, then the two prs above, then revert this?

/retest

correct

@hexfusion
Copy link
Contributor Author

/test e2e-aws

@hexfusion
Copy link
Contributor Author

/retest

@hexfusion
Copy link
Contributor Author

/test e2e-ovn-step-registry

@hexfusion
Copy link
Contributor Author

cc @kikisdeliveryservice think CI is back !

@hexfusion
Copy link
Contributor Author

/retest

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2021
@kikisdeliveryservice
Copy link
Contributor

/assign @staebler

pkg/operator/sync_test.go Outdated Show resolved Hide resolved
pkg/operator/sync.go Outdated Show resolved Hide resolved
@kikisdeliveryservice
Copy link
Contributor

good q above from @marun

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2021
@hexfusion hexfusion force-pushed the fix-proxy-config branch 2 times, most recently from 9dc6834 to 812b23d Compare January 15, 2021 21:50
@kikisdeliveryservice
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2021
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion
Copy link
Contributor Author

hexfusion commented Jan 15, 2021

Based on comments from @staebler speaking with @marun as well as reviewing API[1], we feel proxy should never contain the newline rune at end of the config. I have updated the tests to reflect.

I had based my test on the BZ[2] which after looking at it further is not the actual value.

[1] https://github.com/openshift/machine-config-operator/blob/master/vendor/github.com/openshift/api/config/v1/types_proxy.go#L82
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1909502#c2

@marun
Copy link

marun commented Jan 15, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, kikisdeliveryservice, marun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 16, 2021

@hexfusion: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovn-step-registry 81df151 link /test e2e-ovn-step-registry

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit c64b38f into openshift:master Jan 16, 2021
@openshift-ci-robot
Copy link
Contributor

@hexfusion: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1909502 has not been moved to the MODIFIED state.

In response to this:

Bug 1909502: pkg/operator: tolerate removal of etcd records from proxy config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants