Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRKLDS-728: Capabilities: drop build/apps APIService when capabilities are not enabled #532

Merged

Conversation

ingvagabund
Copy link
Member

Unless a capability is unknown, install APIService object for corresponding API only when enabled.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 2, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 2, 2023

@ingvagabund: This pull request references WRKLDS-728 which is a valid jira issue.

In response to this:

Unless a capability is unknown, install APIService object for corresponding API only when enabled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ingvagabund
Copy link
Member Author

/retest-required

1 similar comment
@ingvagabund
Copy link
Member Author

/retest-required

@ingvagabund ingvagabund force-pushed the capabilities-build-dc branch 3 times, most recently from 70ebfd5 to 96e3280 Compare July 19, 2023 12:42
@ingvagabund ingvagabund force-pushed the capabilities-build-dc branch 2 times, most recently from c1b2527 to dc8a55f Compare August 3, 2023 20:36
@ingvagabund
Copy link
Member Author

/retest-required

@ingvagabund ingvagabund changed the title WIP: WRKLDS-728: Capabilities: drop build/apps APIService when capabilities are not enabled WRKLDS-728: Capabilities: drop build/apps APIService when capabilities are not enabled Aug 4, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 4, 2023
}

func apiServicesReferences() []configv1.ObjectReference {
ret := []configv1.ObjectReference{}
for _, apiService := range apiServices() {
ret = append(ret, configv1.ObjectReference{Group: "apiregistration.k8s.io", Resource: "apiservices", Name: apiService.Spec.Version + "." + apiService.Spec.Group})
for _, apiService := range apiServiceGroupVersions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this doesn't have to operate only on the enabled services?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiServicesReferences() is consumed by .WithClusterOperatorStatusController which uses the references to build a list of related objects. In case a reference has no relevant object, it gets ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it looks like this is purely informational. It doesn't have any impact on the core logic.

{Group: "security.openshift.io", Version: "v1"},
{Group: "template.openshift.io", Version: "v1"},
func apiServices(clusterVersionInformer configinformersconfigv1.ClusterVersionInformer) ([]*apiregistrationv1.APIService, []*apiregistrationv1.APIService, error) {
clusterVersion, err := clusterVersionInformer.Lister().Get("version")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should either change the NewAPIServiceController controller or this function. Otherwise we cannot guarantee the clusterVersionInformer will be synced. Am I right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to that I think that we should add a reactor to the NewAPIServiceController to catch changes to clusterVersionInformer. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiServices can have various implementations so any change needs to happen inside apiService.

Are you suggesting to invoke WaitForCacheSync? Checking https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/pkg/operator/starter.go#L368-L374 noone of the informers is waiting for caches to be fully synced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add clusterVersionInformer to WithInformers here https://github.com/openshift/library-go/blob/75c7d51fc4155264ba71551d22d0a1968b7bf989/pkg/operator/apiserver/controller/apiservice/apiservice_controller.go#L66 because

// WithInformers is used to register event handlers and get the caches synchronized functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe we should change GetAPIServicesToMangeFunc to accept clusterVersionInformer configinformersconfigv1.ClusterVersionInformer ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternative would be to add WaitForCacheSync to the apiServices function and somehow trigger the controller to call apiServices when clusterVersionInformer changes/get an event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add clusterVersionInformer to WithInformers here

openshift/library-go#1562

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for passing informers as the list of managed api services may have many different implementation which do not need clusterVersionInformer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks.

configMap := resourceread.ReadConfigMapV1OrDie(v311_00_assets.MustAsset("v3.11.0/openshift-apiserver/cm.yaml"))
defaultConfig := v311_00_assets.MustAsset("v3.11.0/config/defaultconfig.yaml")

clusterVersion, err := clusterVersionInformer.Lister().Get("version")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question here. Should we guarantee that the clusterVersionInformer is synced?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And whether we need to add a reactor to the controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where informer passed in dc8a55f#diff-0d623dfd885adb20f991bda4c2453aebd732ca6dbb4d1d4be6e79805c3b48de6R243 takes affect. Everytime a cluster version changes, OpenShiftAPIServerWorkload.Sync triggers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so it looks like we don't need an informer, we need a Lister here :) Please change it to a Lister.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

requiredConfigMap, _, err := resourcemerge.MergePrunedConfigMap(
&openshiftcontrolplanev1.OpenShiftAPIServerConfig{},
configMap,
"config.yaml",
nil,
defaultConfig,
bytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a unit test to cover merging ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PerGroupOptions: []openshiftcontrolplanev1.PerGroupOptions{},
},
}
if knownCaps.Has(configv1.ClusterVersionCapabilityBuild) && !capsEnabled.Has(configv1.ClusterVersionCapabilityBuild) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you have a test that will test the disabling of this APIs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should have an e2e test for it, wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating/adding e2es is the next step once this PR and openshift/cluster-openshift-controller-manager-operator#291 are merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are you planing to add that test ? will it be this repo or in the origin ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning origin as we need to check for disabled controllers as well. E.g. read the config or parse the controllers logs. On the other hand we first need to update the origin tests to react on missing API so other tests (e.g. running in parallel) for Builds/DCs do not fail. Ultimately, each operator repo might have its own version of the e2e. The e2e (going here or into origin) will need a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally if we could add a test here and then also run it from the origin repo. Are you planing to add the test before we ship 4.14 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. All tests need to land before 4.14 ships.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I would like to avoid shipping a feature without the tests. Thanks.

}

if knownCaps.Has(configv1.ClusterVersionCapabilityDeploymentConfig) && !capsEnabled.Has(configv1.ClusterVersionCapabilityDeploymentConfig) {
klog.Infof("Capability %q not enabled, disabling 'openshift.io/apps' controller", configv1.ClusterVersionCapabilityDeploymentConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this spam logs on every Sync ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will increase the log level to 4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ingvagabund
Copy link
Member Author

/retest-required

@ingvagabund
Copy link
Member Author

ingvagabund commented Aug 6, 2023

[sig-cli] oc adm must-gather runs successfully for audit logs [apigroup:config.openshift.io][apigroup:oauth.openshift.io] [Suite:openshift/conformance/parallel]

The last two e2e-aws-own failed due to empty openshift-apiserver audit logs

All OA instances are reporting bunch of:

2023-08-03T21:07:10.547646790Z W0803 21:07:10.547610       1 logging.go:59] [core] [Channel #91 SubChannel #92] grpc: addrConn.createTransport failed to connect to {
2023-08-03T21:07:10.547646790Z   "Addr": "10.0.43.149:2379",
2023-08-03T21:07:10.547646790Z   "ServerName": "10.0.43.149",
2023-08-03T21:07:10.547646790Z   "Attributes": null,
2023-08-03T21:07:10.547646790Z   "BalancerAttributes": null,
2023-08-03T21:07:10.547646790Z   "Type": 0,
2023-08-03T21:07:10.547646790Z   "Metadata": null

… capabilities are not enabled

Unless a capability is unknown, install APIService object and
run apiserver for corresponding API only when enabled.
configInformers.Config().V1().ClusterVersions().Informer()

ctx := context.Background()
configInformers.Start(ctx.Done())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and having a lister instead of an informer will simplify this unit tests :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

apiServersConfig.APIServers.PerGroupOptions = append(apiServersConfig.APIServers.PerGroupOptions, openshiftcontrolplanev1.PerGroupOptions{Name: openshiftcontrolplanev1.OpenShiftAppsAPIserver, DisabledVersions: []string{"v1"}})
}

bytes, err := json.Marshal(apiServersConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm, should we be passing a yaml ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

t.Fatal(err)
}

cm, err := kubeClient.CoreV1().ConfigMaps("openshift-apiserver").Get(ctx, "config", metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check kubeClient.Actions() instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually do this to check what resource would be sent to the server and how many times. The Get method reads the state stored in memory, which may not necessarily correspond to the resource that would be sent to the server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like https://github.com/openshift/openshift-apiserver/blob/064c2d0ef0ecaeda2bcc4387eaaa7258cee5adcf/pkg/project/apiserver/registry/project/proxy/proxy_test.go#L96-L101? Is the goal here to add just a test for Matches("update", "configmaps")? Or, to more inspect the corresponding action? Do you have an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {

kubeClient := fake.NewSimpleClientset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename it to fakeKubeClient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestOperatorConfigProgressingCondition uses kubeClient well. Do you wanna rename it on other test(s) as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, if this is not a problem for you that would be fine :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

config := openshiftcontrolplanev1.OpenShiftAPIServerConfig{}
if err := json.NewDecoder(bytes.NewBuffer([]byte(cm.Data["config.yaml"]))).Decode(&config); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oas uses a versioned decoder. Could we do the same here ?

scheme := runtime.NewScheme()
utilruntime.Must(openshiftcontrolplanev1.Install(scheme))
codecs := serializer.NewCodecFactory(scheme)
obj, err := runtime.Decode(codecs.UniversalDecoder(openshiftcontrolplanev1.GroupVersion, configv1.GroupVersion), configContent)
if err != nil {
	return err
}
config := obj.(*openshiftcontrolplanev1.OpenShiftAPIServerConfig)

https://github.com/openshift/openshift-apiserver/blob/df3ca642f426cf2f34abeae152a9fce80b44c63e/pkg/cmd/openshift-apiserver/cmd.go#L129

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

name string
knownCapabilities []configv1.ClusterVersionCapability
enabledCapabilities []configv1.ClusterVersionCapability
expectedPerGroupOptions []openshiftcontrolplanev1.PerGroupOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to validate the whole config and not only PerGroupOptions ?
assuming that other fields remain unchanged, this shouldn't require much more effort

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

if knownCaps.Has(configv1.ClusterVersionCapabilityDeploymentConfig) && !capsEnabled.Has(configv1.ClusterVersionCapabilityDeploymentConfig) {
klog.V(4).Infof("Capability %q not enabled, disabling 'openshift.io/apps' controller", configv1.ClusterVersionCapabilityDeploymentConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm, I still think it will be logged on every sync. What is the value of spamming the log file with this information ? Ideally if we could log it just once, right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a new condition ? Does the service controller set a condition ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With log 4 it is good to know the operator properly reads the capabilities. Otherwise, I'd need to get the CM, decode it, read the list of disabled apiservers and compare it. Too much additional code for just a single log line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API service controller sets the new Degraded condition for API service endpoints until the disabled APIService objects are deleted. The test is reported through the already existing conditions for operands.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2023

@ingvagabund: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-operator-encryption-rotation fe768dd link true /test e2e-gcp-operator-encryption-rotation
ci/prow/e2e-gcp-operator-encryption fe768dd link true /test e2e-gcp-operator-encryption

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tkashem
Copy link
Contributor

tkashem commented Aug 14, 2023

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2023
@openshift-merge-robot openshift-merge-robot merged commit 00f7e4c into openshift:master Aug 14, 2023
7 checks passed
@ingvagabund ingvagabund deleted the capabilities-build-dc branch August 14, 2023 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants