Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beef up clusterwide proxy support #1999

Merged

Conversation

2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Apr 17, 2023

This PR rounds out our support for clusterwide proxies by doing two
things:

  • Previously, we relied on an outside agent (typically OLM) to set
    HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables on the
    hive-operator deployment We would pass these values down to the
    controllers and thence to workload pods such as provisioner and
    deprovisioner. This commit adds logic to discover those values from the
    cluster proxy object if not already set.
  • Install the cluster proxy's trusted CA bundle in all provision and
    deprovision pods. The ultimate source of this bundle is the ConfigMap
    referenced by the proxy's spec.trustedCA.name. We take advantage of the
    Cluster Network Operator's facility [1] to inject the merged CA bundle --
    which includes the proxy's trusted CA bundle -- into a ConfigMap in the
    appropriate namespace and mount that ConfigMap on the pods.

[1] https://docs.openshift.com/container-platform/4.12/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki

HIVE-2198

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2023
@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Merging #1999 (253408e) into master (ed93f91) will increase coverage by 0.00%.
The diff coverage is 63.46%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1999   +/-   ##
=======================================
  Coverage   57.71%   57.72%           
=======================================
  Files         186      186           
  Lines       25171    25219   +48     
=======================================
+ Hits        14528    14557   +29     
- Misses       9409     9425   +16     
- Partials     1234     1237    +3     
Impacted Files Coverage Δ
contrib/pkg/utils/alibabacloud/alibabacloud.go 0.00% <0.00%> (ø)
contrib/pkg/utils/aws/aws.go 0.00% <0.00%> (ø)
contrib/pkg/utils/azure/azure.go 0.00% <0.00%> (ø)
contrib/pkg/utils/gcp/gcp.go 0.00% <0.00%> (ø)
contrib/pkg/utils/ibmcloud/ibmcloud.go 0.00% <0.00%> (ø)
contrib/pkg/utils/openstack/openstack.go 0.00% <0.00%> (ø)
contrib/pkg/utils/ovirt/ovirt.go 0.00% <0.00%> (ø)
contrib/pkg/utils/vsphere/vsphere.go 0.00% <0.00%> (ø)
pkg/constants/constants.go 100.00% <ø> (ø)
.../clusterdeployment/clusterdeployment_controller.go 62.36% <47.61%> (-0.22%) ⬇️
... and 1 more

@2uasimojo 2uasimojo force-pushed the HIVE-2198/clusterwide-proxy-2 branch 3 times, most recently from d526970 to 3ec5962 Compare April 18, 2023 17:11
@2uasimojo
Copy link
Member Author

/hold pending verification

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2023
@2uasimojo 2uasimojo force-pushed the HIVE-2198/clusterwide-proxy-2 branch 2 times, most recently from 5a7880f to e5c883c Compare April 19, 2023 23:21
@2uasimojo
Copy link
Member Author

/hold cancel

Pending ack by @jianping-shu, but I think this might be ready to go now.

/assign @abutcher
/cc @lleshchi

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 19, 2023
@2uasimojo
Copy link
Member Author

/retest

1 similar comment
@abutcher
Copy link
Member

/retest

This commit rounds out our support for clusterwide proxies by doing two
things:
- Previously, we relied on an outside agent (typically OLM) to set
`HTTP_PROXY`, `HTTPS_PROXY`, and `NO_PROXY` environment variables on the
hive-operator deployment  We would pass these values down to the
controllers and thence to workload pods such as provisioner and
deprovisioner. This commit adds logic to discover those values from the
cluster proxy object if not already set.
- Install the cluster proxy's trusted CA bundle in all provision and
deprovision pods. The ultimate source of this bundle is the ConfigMap
referenced by the proxy's spec.trustedCA.name. We take advantage of the
Cluster Network Operator's facility [1] to inject the merged CA bundle --
which includes the proxy's trusted CA bundle -- into a ConfigMap in the
appropriate namespace and mount that ConfigMap on the pods.

[1] https://docs.openshift.com/container-platform/4.12/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki

HIVE-2198
@2uasimojo 2uasimojo force-pushed the HIVE-2198/clusterwide-proxy-2 branch from e5c883c to 253408e Compare April 20, 2023 21:58
@@ -2239,7 +2239,7 @@ func TestClusterDeploymentReconcile(t *testing.T) {
func() runtime.Object {
cd := testClusterDeploymentWithDefaultConditions(testClusterDeploymentWithInitializedConditions(testClusterDeployment()))
cd.Status.InstallRestarts = 1
cd.Spec.InstallAttemptsLimit = pointer.Int32Ptr(2)
cd.Spec.InstallAttemptsLimit = pointer.Int32(2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ntr: latent, resolve linter complaints throughout

@2uasimojo
Copy link
Member Author

/retest

@abutcher
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 21, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, abutcher

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 21, 2023

@2uasimojo: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit dc41d5e into openshift:master Apr 21, 2023
9 checks passed
@2uasimojo 2uasimojo deleted the HIVE-2198/clusterwide-proxy-2 branch April 21, 2023 16:00
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Apr 24, 2023
In HIVE-2198 / openshift#1999 / 253408e we started injecting a ConfigMap
containing a CA bundle for better proxy support. Alas, we were only
injecting it in the provisioning path. For hives that never had the
earlier code, this probably would have been fine, as the ConfigMap would
continue to exist until the cluster was deleted, whereupon it could
properly be mounted on the deprovisioning pods. But for spokes that
existed prior to upgrading to that code, and which were never
re-reconciled before they were deleted (or, conceivably, if a user
deleted the ConfigMap at some point in there) the ConfigMap would not
exist, resulting in mount failures attempting to bring up the
deprovisioning pods.

This commit moves the ConfigMap injection earlier in the flow, ensuring
it exists for both provisioning and deprovisioning paths.

HIVE-2207
tsorya pushed a commit to tsorya/hive that referenced this pull request Apr 27, 2023
In HIVE-2198 / openshift#1999 / 253408e we started injecting a ConfigMap
containing a CA bundle for better proxy support. Alas, we were only
injecting it in the provisioning path. For hives that never had the
earlier code, this probably would have been fine, as the ConfigMap would
continue to exist until the cluster was deleted, whereupon it could
properly be mounted on the deprovisioning pods. But for spokes that
existed prior to upgrading to that code, and which were never
re-reconciled before they were deleted (or, conceivably, if a user
deleted the ConfigMap at some point in there) the ConfigMap would not
exist, resulting in mount failures attempting to bring up the
deprovisioning pods.

This commit moves the ConfigMap injection earlier in the flow, ensuring
it exists for both provisioning and deprovisioning paths.

HIVE-2207
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request May 4, 2023
HIVE-2198 / openshift#1999 / 253408e introduced a `Watch()` on `Proxy` so we can
update environment variables and CA bundles when it changes. But `Proxy`
is an OpenShift CRD, so that `Watch()` blows up the operator when
running on vanilla k8s (and similar).

This commit conditions that `Watch()` such that we only add it if we're
running on OpenShift.

HIVE-2210
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request May 5, 2023
In HIVE-2198 / openshift#1999 / 253408e we started injecting a ConfigMap
containing a CA bundle for better proxy support. Alas, we were only
injecting it in the provisioning path. For hives that never had the
earlier code, this probably would have been fine, as the ConfigMap would
continue to exist until the cluster was deleted, whereupon it could
properly be mounted on the deprovisioning pods. But for spokes that
existed prior to upgrading to that code, and which were never
re-reconciled before they were deleted (or, conceivably, if a user
deleted the ConfigMap at some point in there) the ConfigMap would not
exist, resulting in mount failures attempting to bring up the
deprovisioning pods.

This commit moves the ConfigMap injection earlier in the flow, ensuring
it exists for both provisioning and deprovisioning paths.

HIVE-2207

(cherry picked from commit af76f94)
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request May 5, 2023
HIVE-2198 / openshift#1999 / 253408e introduced a `Watch()` on `Proxy` so we can
update environment variables and CA bundles when it changes. But `Proxy`
is an OpenShift CRD, so that `Watch()` blows up the operator when
running on vanilla k8s (and similar).

This commit conditions that `Watch()` such that we only add it if we're
running on OpenShift.

HIVE-2210

(cherry picked from commit f1f63cb)
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request May 11, 2023
In HIVE-2198 / openshift#1999 / 253408e we started injecting a ConfigMap
containing a CA bundle for better proxy support. Alas, we were only
injecting it in the provisioning path. For hives that never had the
earlier code, this probably would have been fine, as the ConfigMap would
continue to exist until the cluster was deleted, whereupon it could
properly be mounted on the deprovisioning pods. But for spokes that
existed prior to upgrading to that code, and which were never
re-reconciled before they were deleted (or, conceivably, if a user
deleted the ConfigMap at some point in there) the ConfigMap would not
exist, resulting in mount failures attempting to bring up the
deprovisioning pods.

This commit moves the ConfigMap injection earlier in the flow, ensuring
it exists for both provisioning and deprovisioning paths.

HIVE-2207

(cherry picked from commit af76f94)
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request May 11, 2023
HIVE-2198 / openshift#1999 / 253408e introduced a `Watch()` on `Proxy` so we can
update environment variables and CA bundles when it changes. But `Proxy`
is an OpenShift CRD, so that `Watch()` blows up the operator when
running on vanilla k8s (and similar).

This commit conditions that `Watch()` such that we only add it if we're
running on OpenShift.

HIVE-2210

(cherry picked from commit f1f63cb)
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Jun 15, 2023
It is valid to want to clean up a ClusterDeployment and its associated
artifacts by deleting the whole namespace. Unfortunately, that results
in the hive-trusted-cabundle ConfigMap (introduced by 253408e / openshift#1999)
being deleted before the CD gets a chance to finish cleaning up. Then
the clusterdeployment controller hot loops trying to recreate the
ConfigMap, which fails because you can't create new things in a
namespace that's terminating.

With this commit, we add OwnerReferences from that ConfigMap to any
ClusterDeployments in the same namespace. This should prevent the
ConfigMap from being deleted until the owning CDs have finished cleaning
up and been garbage collected.

HIVE-2249
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Jun 16, 2023
Deleting a namespace containing a ClusterDeployment and its associated
artifacts results in an impossible situation: the deletion of the
ClusterDeployment wants to trigger a deprovision, but we cannot create
the necessary artifacts (e.g. ClusterDeprovision CR, deprovision Pod) in
a namespace that is terminating. We have logic to skip deprovisioning
and simply leak the cloud resources in this case.

...but we forgot to incorporate this logic for the hive-trusted-cabundle
ConfigMap (introduced by 253408e / openshift#1999), and would hot-loop
attempting and failing to create it, never reaching the aforementioned
"give up" logic, resulting in wedging the namespace. This commit adds
the logic, resulting in the same behavior as before.

NOTE: It is still not supported / a good idea to delete a namespace
containing extant ClusterDeployments, as this leaks cloud resources as
described above.

HIVE-2249
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Jul 3, 2023
Deleting a namespace containing a ClusterDeployment and its associated
artifacts results in an impossible situation: the deletion of the
ClusterDeployment wants to trigger a deprovision, but we cannot create
the necessary artifacts (e.g. ClusterDeprovision CR, deprovision Pod) in
a namespace that is terminating. We have logic to skip deprovisioning
and simply leak the cloud resources in this case.

...but we forgot to incorporate this logic for the hive-trusted-cabundle
ConfigMap (introduced by 253408e / openshift#1999), and would hot-loop
attempting and failing to create it, never reaching the aforementioned
"give up" logic, resulting in wedging the namespace. This commit adds
the logic, resulting in the same behavior as before.

NOTE: It is still not supported / a good idea to delete a namespace
containing extant ClusterDeployments, as this leaks cloud resources as
described above.

HIVE-2249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants