Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revendor installer #2052

Merged
merged 7 commits into from
Aug 16, 2023
Merged

Conversation

lleshchi
Copy link
Contributor

@lleshchi lleshchi commented Jun 23, 2023

xref: https://issues.redhat.com/browse/HIVE-2144

Includes following revendoring updates:

- github.com/openshift/installer v0.9.0-master.0.20230216120756-2659252502ba => v0.9.0-master.0.20230721132804-2c449439afd9
- k8s.io/api v0.26.2 => v0.27.2
- k8s.io/api v0.26.2 => v0.27.2
- k8s.io/client-go v0.26.2 => v0.27.2
- sigs.k8s.io/controller-runtime v0.14.2 => v0.15.0
- google.golang.org/grpc v1.52.0 => v1.56.1
- github.com/metal3-io/baremetal-operator v0.0.0-20220128094204-28771f489634 => v0.0.0-20230531194024-8dde0991ffdd
- github.com/openshift/machine-api-operator v0.2.1-0.20220930152820-30825f121cc5 => v0.2.1-0.20230613002216-b15f199bf388

Update to installer includes:

- Deprecated legacy VSphere platform fields
The deprecated fields were included in pkg/clusterresource/vsphere.go
and switched over to the new fields in pkg/controller/machinepool/vsphereactuator.go

- Accommodate changes to function signatures:
    - func (a *AzureActuator) GenerateMachineSets
    - func (a *AWSActuator) validateSubnets
    - func (a *AWSActuator) getSubnetsByAvailabilityZone 

Update to controller-runtime v0.15.0 includes:

- Change to the behavior of the fake client
deletionTimestamp. The fake client will panic if initialized with an
object that has a DeletionTimestamp and no finalizer, and will
silently ignore a DeletionTimestamp on an object in a Create request,
matching the behavior of the kube-apiserver. Tests have been rewritten
to work with this new behavior.
https://github.com/kubernetes-sigs/controller-runtime/pull/2316

- Requirement to register a fake client
WithStatusSubresource in order to successfully update the Status
of an object. Due to this change, the helper function
NewFakeClientBuilder() is added to pkg/test/fake to allow for easier
setup of a fake client with the necessary sub-resources.
https://github.com/kubernetes-sigs/controller-runtime/pull/2259

Other noteworthy changes:

- New singleton scheme introduced to codebase.  
The hive specific scheme is registered once, with all necessary
packages, and used throughout the codebase, removing the need to specify
unique schemes for each use case, and decreasing the overall amount of
imports.

- Due to the introduction of status subresource awareness in the fake
client, it has been uncovered that, in cases of an assignment conflicts
between claims for a cluster, the AssignmentConflict condition was not
being set (requiring a `Status().Update()` call rather than `Update()`).
Update log message to reflect that spec (and not status) is being
updated, and add TODO to investigate behavior in
https://issues.redhat.com/browse/HIVE-2274

- The updated controller-runtime version requirement to explicitly set
status sub-resource uncovered an issue in
reconcileExistingInstallingClusterInstall in which the cd conditions
were not being updated.
Create a DeepCopy of the `conditions` variable, which was previously just
a pointer to cd.Status.Conditions. Pass this variable (instead of
cd.Status.Conditions) to controllerutils.SetClusterDeploymentCondition
so as not to mutate cd.Status.Conditions, but rather to append the new
conditions to the `conditions` variable.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 23, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2023
},
}
err := installertypesconversion.ConvertInstallConfig(ic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one that needs to not use the converter: the resulting install-config needs to work for all OCP versions, and <4.14* won't understand the new format.

Let's add a comment to watch https://issues.redhat.com/browse/SPLAT-1093 for potential removal of the deprecated fields though. We can't allow that to happen while we still support a <4.14 release.

*4.13 with a feature gate enabled

pkg/controller/machinepool/vsphereactuator.go Outdated Show resolved Hide resolved
pkg/imageset/updateinstaller.go Outdated Show resolved Hide resolved
pkg/installmanager/installmanager.go Outdated Show resolved Hide resolved
@lleshchi lleshchi force-pushed the HIVE-2144 branch 2 times, most recently from 454a865 to 1a9514a Compare July 5, 2023 18:07
@lleshchi
Copy link
Contributor Author

lleshchi commented Jul 5, 2023

/test e2e

2 similar comments
@lleshchi
Copy link
Contributor Author

lleshchi commented Jul 6, 2023

/test e2e

@lleshchi
Copy link
Contributor Author

lleshchi commented Jul 7, 2023

/test e2e

@2uasimojo
Copy link
Member

/test e2e

409, could be a flake

@2uasimojo
Copy link
Member

/test e2e-azure

@lleshchi lleshchi force-pushed the HIVE-2144 branch 2 times, most recently from 36dcf49 to d6109c2 Compare July 10, 2023 19:20
pkg/remoteclient/fake.go Outdated Show resolved Hide resolved
pkg/test/fake/fake.go Outdated Show resolved Hide resolved
@lleshchi lleshchi force-pushed the HIVE-2144 branch 3 times, most recently from 502295f to 0c0da01 Compare July 12, 2023 18:33
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2023
@lleshchi lleshchi force-pushed the HIVE-2144 branch 3 times, most recently from eb18ccd to 6959c03 Compare July 17, 2023 18:39
Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to fix these two things while working on #2057

pkg/controller/machinepool/vsphereactuator.go Outdated Show resolved Hide resolved
pkg/controller/machinepool/vsphereactuator.go Outdated Show resolved Hide resolved
cmd/manager/main.go Outdated Show resolved Hide resolved
@@ -424,7 +422,7 @@ func TestDNSEndpointReconcile(t *testing.T) {

cut := &ReconcileDNSEndpoint{
Client: fakeClient,
scheme: scheme.Scheme,
scheme: scheme,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tech debt: this field isn't actually used anywhere, and I imagine we have similar cases elsewhere. We should scrub such things.

This need not be tackled as part of the revendor, unless you want to, but please open a card.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing again, I see this is probably the case in a lot of our reconciler structs.

@lleshchi
Copy link
Contributor Author

/retest

Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. If it weren't for the awsprivatelink thing (below) I would say land it, but since we're doing that, please also add the comments in the new scheme and fake client libs. The other stuff can be ignored.

contrib/pkg/awsprivatelink/awsprivatelink.go was added while you were working on this revendor. It is still using the global scheme. Please fix.

(I wonder if there's some way we can add a custom linting rule to enforce avoiding the global scheme. Another project for another day...)

@@ -103,3 +126,29 @@ func getVSphereOSImage(masterMachine *machineapi.Machine, scheme *runtime.Scheme
logger.WithField("image", osImage).Debug("resolved image to use for new machinesets")
return osImage, nil
}

// Copied from https://github.com/openshift/installer/blob/f7731922a0f17a8339a3e837f72898ac77643611/pkg/types/vsphere/conversion/installconfig.go#L75-L97
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should open an installer issue and request that they make these public so we don't have to maintain copies.

log.WithError(err).Fatal("could not create a decoder")
}
decoder := admission.NewDecoder(scheme)
// FIXME check for error?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't error unless scheme is nil, in which case it will panic. Which is okay on startup of a controller.

@@ -941,7 +940,7 @@ func TestReconcileClusterPool(t *testing.T) {
name: "no scale up with max concurrent and some deleting",
existing: []runtime.Object{
initializedPoolBuilder.Build(testcp.WithSize(5), testcp.WithMaxConcurrent(2)),
unclaimedCDBuilder("c1").GenericOptions(testgeneric.Deleted()).Build(testcd.Running()),
unclaimedCDBuilder("c1").GenericOptions(testgeneric.Deleted(), testgeneric.WithFinalizer("test-finalizer")).Build(testcd.Running()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you used a const for "test-finalizer" in other files, why not here too?

@@ -122,7 +121,7 @@ func TestReconcileClusterPoolNamespace_Reconcile_Movement(t *testing.T) {
namespaceBuilder: namespaceBuilder,
resources: []runtime.Object{
testcd.FullBuilder(namespaceName, "test-cd", scheme).
GenericOptions(testgeneric.Deleted()).
GenericOptions(testgeneric.Deleted(), testgeneric.WithFinalizer("test-finalizer")).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -274,7 +275,7 @@ func TestReconcileClusterSync_NoWorkToDo(t *testing.T) {
},
{
name: "deleted ClusterDeployment",
cd: cdBuilder(scheme).GenericOptions(testgeneric.Deleted()).Build(),
cd: cdBuilder(scheme).GenericOptions(testgeneric.Deleted(), testgeneric.WithFinalizer("test-finalizer")).Build(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

...etc.

@@ -424,7 +422,7 @@ func TestDNSEndpointReconcile(t *testing.T) {

cut := &ReconcileDNSEndpoint{
Client: fakeClient,
scheme: scheme.Scheme,
scheme: scheme,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing again, I see this is probably the case in a lot of our reconciler structs.

"sigs.k8s.io/controller-runtime/pkg/client/fake"
)

func NewFakeClientBuilder() *fake.ClientBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really need some comments on this thing. I understand it because we talked about it, but our posterity would have no clue what's going on here.

@@ -0,0 +1,70 @@
package scheme

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a file-level comment here explaining what we're doing (building a singleton scheme containing all the things we're gonna care about in our project) and why (not s'posed to be using the global one, and it sucks piecing it together everywhere).

@2uasimojo
Copy link
Member

2uasimojo commented Aug 14, 2023

As noted offline, if e2e-pool "fails" for infra reasons after our actual test succeeds, like this, I think it is acceptable to override that context. (That flake is tracked by https://issues.redhat.com/browse/DPTP-3611 fmr.)

Includes following updates:
- k8s.io/api v0.26.2 => v0.27.2
- k8s.io/api v0.26.2 => v0.27.2
- k8s.io/client-go v0.26.2 => v0.27.2
- sigs.k8s.io/controller-runtime v0.14.2 => v0.15.0
- google.golang.org/grpc v1.52.0 => v1.56.1
- github.com/metal3-io/baremetal-operator v0.0.0-20220128094204-28771f489634 => v0.0.0-20230531194024-8dde0991ffdd
- github.com/openshift/machine-api-operator v0.2.1-0.20220930152820-30825f121cc5 => v0.2.1-0.20230613002216-b15f199bf388

Signed-off-by: Leah Leshchinsky <lleshchi@redhat.com>
- func (a *AzureActuator) GenerateMachineSets
- func (a *AWSActuator) validateSubnets
- func (a *AWSActuator) getSubnetsByAvailabilityZone

Signed-off-by: Leah Leshchinsky <lleshchi@redhat.com>
The legacy VSphere platform fields are deprecated.

Rename to deprecated fields in pkg/clusterresource/vsphere.go
and switch to the new fields in pkg/controller/machinepool/vsphereactuator.go

Signed-off-by: Leah Leshchinsky <lleshchi@redhat.com>
Signed-off-by: Leah Leshchinsky <lleshchi@redhat.com>
The updated controller-runtime version requirement to explicitly set
status sub-resource uncovered an issue in
reconcileExistingInstallingClusterInstall in which the cd conditions
were not being updated.

Create a DeepCopy of the `conditions` variable, which was previously just
a pointer to cd.Status.Conditions. Pass this variable (instead of
cd.Status.Conditions) to controllerutils.SetClusterDeploymentCondition
so as not to mutate cd.Status.Conditions, but rather to append the new
conditions to the `conditions` variable.

Signed-off-by: Leah Leshchinsky <lleshchi@redhat.com>
Due to the introduction of status subresource awareness in the fake
client, it has been uncovered that, in cases of an assignment conflicts
between claims for a cluster, the AssignmentConflict condition was not
being set (requiring a `Status().Update()` call rather than `Update()`).

Update log message to reflect that spec (and not status) is being
updated, and add TODO to investigate behavior in
https://issues.redhat.com/browse/HIVE-2274

Signed-off-by: Leah Leshchinsky <lleshchi@redhat.com>
@lleshchi lleshchi force-pushed the HIVE-2144 branch 2 times, most recently from abfcfd6 to 5af4d4c Compare August 16, 2023 12:08
@lleshchi
Copy link
Contributor Author

/retest

Rewrite unittests to accommodate changes in controller-runtime
v0.15.0 and create a hive specific scheme.

One such major change involves the behavior of the fake client
deletionTimestamp. The fake client will panic if initialized with an
object that has a DeletionTimestamp and no finalizer, and will
silently ignore a DeletionTimestamp on an object in a Create request,
matching the behavior of the kube-apiserver. Tests have been rewritten
to work with this new behavior.
kubernetes-sigs/controller-runtime#2316

This update also includes a requirement to register a fake client
WithStatusSubresource in order to successfully update the Status
subresource of an object. Due to this change, the helper function
NewFakeClientBuilder() is added to pkg/test/fake to allow for easier
setup of a fake client with the necessary sub-resources.
kubernetes-sigs/controller-runtime#2259

Lastly, a new singleton scheme has been introduced to the codebase.
The hive specific scheme is registered once, with all necessary
packages, and used throughout the codebase, removing the need to specify
unique schemes for each use case, and decreasing the overall amount of
imports.

Signed-off-by: Leah Leshchinsky <lleshchi@redhat.com>
@2uasimojo
Copy link
Member

Thanks for the updates.

/test e2e-azure

⬆️ infra flake

/test e2e-gcp

Can't tell exactly what went wrong here, but may be a temporary permissions problem?

Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"c21056ba-e89b-4ff8-8d14-be5546a63d85","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}"

And in case those pass without further changes...

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 16, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, lleshchi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@2uasimojo
Copy link
Member

BTW:

As noted offline, if e2e-pool "fails" for infra reasons after our actual test succeeds, like this, I think it is acceptable to override that context. (That flake is tracked by https://issues.redhat.com/browse/DPTP-3611 fmr.)

We reverted to stop using HCP hubs, so this should no longer be an issue.

@openshift-merge-robot openshift-merge-robot merged commit 3c31b4d into openshift:master Aug 16, 2023
8 of 10 checks passed
@2uasimojo
Copy link
Member

Oh. Crap. e2e-azure and -gcp weren't required, so this just merged. Let's open a dummy PR to get those tested for real please.

@2uasimojo 2uasimojo mentioned this pull request Aug 16, 2023
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Sep 13, 2023
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Sep 18, 2023
Since the recent revendor (openshift#2052 / 26a0b81), OpenstackProviderSpec
(embedded in MachineSets under OpenStack) stopped being able to
unmarshal. This is because its schema moved from
github.com/openshift/cluster-api-provider-openstack to
github.com/openshift/api/machine/v1alpha1 and the revendor picked up
this move from the installer code: see
openshift/installer#6382

This commit moves our references accordingly to match that upstream
code.

Quirk 1: Strangely, this one type is in o/api/machine v1alpha1 even
though everything else in o/api/machine is in v1beta1.

Quirk 2: o/api/machine v1alpha1's registration methods don't actually
register the OpenstackProviderSpec type. (Why??) So we have to register
it explicitly, as the installer does.

This more or less reverts the prior attempt at openshift#2114 / 1f1ec2c, which
turned out to be completely redundant (the added registration call ended
up in the same place as the one on the previous line).

HIVE-2308
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Sep 26, 2023
This was missed in openshift#2052 / 26a0b81.

HIVE-2308

(cherry picked from commit 1f1ec2c)
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Sep 26, 2023
Since the recent revendor (openshift#2052 / 26a0b81), OpenstackProviderSpec
(embedded in MachineSets under OpenStack) stopped being able to
unmarshal. This is because its schema moved from
github.com/openshift/cluster-api-provider-openstack to
github.com/openshift/api/machine/v1alpha1 and the revendor picked up
this move from the installer code: see
openshift/installer#6382

This commit moves our references accordingly to match that upstream
code.

Quirk 1: Strangely, this one type is in o/api/machine v1alpha1 even
though everything else in o/api/machine is in v1beta1.

Quirk 2: o/api/machine v1alpha1's registration methods don't actually
register the OpenstackProviderSpec type. (Why??) So we have to register
it explicitly, as the installer does.

This more or less reverts the prior attempt at openshift#2114 / 1f1ec2c, which
turned out to be completely redundant (the added registration call ended
up in the same place as the one on the previous line).

HIVE-2308

(cherry picked from commit 43376d8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants