Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Partition reconciliation #2469

Merged

Conversation

fgiloux
Copy link
Contributor

@fgiloux fgiloux commented Dec 9, 2022

Summary

This PR implements a reconciliation mechanism so that the endpoints listed in the status of an APIExportEndpointSlice match the filtering provided by the specified partition.

Design document

Related issue(s)

Follows #2432
Fixes #2333

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2022
@openshift-ci openshift-ci bot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Dec 9, 2022
@fgiloux
Copy link
Contributor Author

fgiloux commented Dec 9, 2022

/hold

this PR is based on #2432 and will be rebased once it has merged

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 9, 2022
@fgiloux fgiloux force-pushed the partition-reconciliation branch 3 times, most recently from dc435b3 to 731507d Compare December 20, 2022 07:51
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 20, 2022
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 23, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 6, 2023
@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 7, 2023

/retest

@fgiloux fgiloux force-pushed the partition-reconciliation branch 2 times, most recently from 2d4ab7b to ba989f8 Compare January 10, 2023 08:28
@fgiloux fgiloux force-pushed the partition-reconciliation branch 2 times, most recently from 7fc8039 to 33a632d Compare January 10, 2023 11:26
@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 10, 2023

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2023
@@ -224,6 +254,32 @@ func (c *controller) enqueueAllAPIExportEndpointSlices(shard interface{}) {
}
}

// enqueuePartition maps a Partition to APIExportEndpointSlices for enqueuing.
func (c *controller) enqueuePartition(obj interface{}) {
key, err := kcpcache.DeletionHandlingMetaClusterNamespaceKeyFunc(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to use an obj to get slices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the previous comment. This is only used for enqueuing.

Copy link
Contributor

@p0lyn0mial p0lyn0mial Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it seems we could make use of the object to find what we need

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are partition names unique across workspace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, does it mean we always have to find a local partition for the given slice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to list endpoints for the given cluster and check which ones reference a particular name instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean we always have to find a local partition for the given slice?

yes that is the intended design, partition and APIExportEndpointSlices being in the same workspace. There can however be multiple of them in different workspaces for covering different regions for instance.
Otherwise we would need to make them global and at this point in time I don't see the need for it.

would it make sense to list endpoints for the given cluster and check which ones reference a particular name instead?

there is max one endpoint per shard, the Partition resource allows to specify shards for which endpoints get added to the APIExportEndpointSlice.

getPartition: func(clusterName logicalcluster.Name, name string) (*topologyv1alpha1.Partition, error) {
return partitionClusterInformer.Lister().Cluster(clusterName).Get(name)
},
getAPIExportEndpointSlicesByPartition: func(key string) ([]interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to change the signature of this method to return strong types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the signature could accept partition name not a key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean looping one more time, although I expect the loop to be small. This is however only used for enqueuing and not exposed to the reconcile (there is a separate struct).

},
)

if err := apiExportEndpointSliceClusterInformer.Informer().AddIndexers(cache.Indexers{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use indexers.AddIfNotPresentOrDie

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

listShards: func() ([]*corev1alpha1.Shard, error) {
return shardClusterInformer.Lister().List(labels.Everything())
listShards: func(selector labels.Selector) ([]*corev1alpha1.Shard, error) {
return shardClusterInformer.Lister().List(selector)
},
getAPIExportEndpointSlice: func(clusterName logicalcluster.Name, name string) (*apisv1alpha1.APIExportEndpointSlice, error) {
return apiExportEndpointSliceClusterInformer.Lister().Cluster(clusterName).Get(name)
},
getAPIExport: func(path logicalcluster.Path, name string) (*apisv1alpha1.APIExport, error) {
return indexers.ByPathAndName[*apisv1alpha1.APIExport](apisv1alpha1.Resource("apiexports"), apiExportClusterInformer.Informer().GetIndexer(), path, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we make it local ? (doesn't use the global lister)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the other way around, only looking at the cache server. This was discussed and done in the previous PR (APIExportEndpointSlice reconciliation)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we change the name of the variable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my strategy so far has been to use local and global prefix when both are used and not to have any prefix otherwise. I thought it was the consensus. It seems it is not the case (anymore?). I will add the global prefix to shardClusterInformer and apiExportClusterInformer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

logger := logging.WithObject(logging.WithReconciler(klog.Background(), ControllerName), obj.(*topologyv1alpha1.Partition))
logging.WithQueueKey(logger, key).V(2).Info("queuing APIExportEndpointSlices because Partition changed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too verbose, make it V(4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not consistent across the other controllers, sometimes V(2) is used, not more often V(4). I can use V(4) here if you want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -64,35 +68,103 @@ func (r *endpointsReconciler) reconcile(ctx context.Context, apiExportEndpointSl
}
apiExport, err := r.getAPIExport(apiExportPath, apiExportEndpointSlice.Spec.APIExport.Name)
if err != nil {
reason := apisv1alpha1.InternalErrorReason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an internal error so that the conditions stay stable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will we know the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an internal error not to leak any information to the user(discussed in the previous PR).
The error details are available in the logs for a privileged user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an error could be used for troubleshooting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hence in the logs

)
conditions.MarkFalse(
apiExportEndpointSlice,
apisv1alpha1.APIExportEndpointSliceURLsReady,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so a single transient error can flip this condition false, should we do better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is an error, which is recoverable the condition is indeed set to false because the reconciliation was not successful but it is queued again. If it is transient it flips back to true when the condition has gone.
The alternative is to set it as "unknown"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set to unknown

apisv1alpha1.APIExportEndpointSliceURLsReady,
apisv1alpha1.ErrorGeneratingURLsReason,
conditionsv1alpha1.ConditionSeverityError,
"error listing shards",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will we known the err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logs for privileged users.


if err = r.updateEndpoints(ctx, apiExportEndpointSlice, apiExport); err != nil {
if err = r.updateEndpoints(ctx, apiExportEndpointSlice, apiExport, shards); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or pass the selector :)

@fgiloux fgiloux force-pushed the partition-reconciliation branch 3 times, most recently from 6090da8 to 5b6195a Compare January 11, 2023 12:27
apisv1alpha1.PartitionValid,
apisv1alpha1.InternalErrorReason,
conditionsv1alpha1.ConditionSeverityError,
err.Error(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't you want to hide errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I wanted to avoid leaking information and it does not. APIExportEndpointSlice and Partitions are cluster scoped resources in the same workspace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An APIExport from a different workspace may be referenced in the APIExportEndpointSlice. It is not the case for a Partition (only considering the same workspace). By creating an APIExportEndpointSlice (or an APIBinding) a user may get additional information on the APIExport and other workspaces if the error was returned. This scenario does not happen for Partition.

Copy link
Contributor

@p0lyn0mial p0lyn0mial Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that an invalid request - a one that would use an incorrect APIExport would be rejected by https://github.com/kcp-dev/kcp/pull/2560/files#r1064359708

apisv1alpha1.PartitionValid,
apisv1alpha1.PartitionInvalidReferenceReason,
conditionsv1alpha1.ConditionSeverityError,
err.Error(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't you want to hide errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

// Partition reference is invalid.
PartitionInvalidReferenceReason = "PartitionInvalidReference"
// PartitionNotFoundReason is a reason for the PartitionValid condition that the referenced Partition is not found.
PartitionNotFoundReason = "PartitionNotFound"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used, is it? (not a blocker to merge)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. I will remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

apisv1alpha1.PartitionValid,
apisv1alpha1.PartitionInvalidReferenceReason,
conditionsv1alpha1.ConditionSeverityError,
err.Error(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like exposing this error. Rule of thumb: we must not give a hint about existence of a workspace ever. Here, this is not that clear. (follow-up is fine)

@sttts
Copy link
Member

sttts commented Jan 11, 2023

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2023
@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 11, 2023

/retest

2 similar comments
@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 11, 2023

/retest

@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 11, 2023

/retest

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2023
@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 12, 2023

@sttts rebased and repushed. Checks are now successful but lgtm label has been removed.

…rtition

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
@sttts
Copy link
Member

sttts commented Jan 12, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2023
@openshift-merge-robot openshift-merge-robot merged commit 929f7a6 into kcp-dev:main Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: support of Partition logic in APIExportEndpointSlice
4 participants