Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰Fix multiple CRDs/workspaces/versions #2751

Merged

Conversation

ncdc
Copy link
Member

@ncdc ncdc commented Feb 6, 2023

Summary

  • Fix the actual bug
  • Add an e2e test that verifies that we can support multiple CRDs for the same group+resource in different workspaces, with different API versions.

k8s PRs:

Related issue(s)

Fixes #2738

@ncdc ncdc force-pushed the fix-multiple-crds-different-versions branch 2 times, most recently from cbb0219 to 4e453a8 Compare February 7, 2023 18:58
@ncdc ncdc changed the title 馃悰 WIP Fix multiple CRDs/workspaces/versions 馃悰Fix multiple CRDs/workspaces/versions Feb 7, 2023
@ncdc ncdc force-pushed the fix-multiple-crds-different-versions branch 2 times, most recently from 8d8048b to 6d4f676 Compare February 8, 2023 17:05
@ncdc
Copy link
Member Author

ncdc commented Feb 8, 2023

Got a green test run! Doing 1 more with a slight update to the kube PR

@ncdc
Copy link
Member Author

ncdc commented Feb 8, 2023

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2023
@ncdc ncdc requested review from s-urbaniak, stevekuznetsov and sttts and removed request for kylape and shawn-hurley February 8, 2023 17:06
@ncdc
Copy link
Member Author

ncdc commented Feb 8, 2023

/retest

@@ -143,3 +150,232 @@ func TestPartialMetadataCRD(t *testing.T) {
require.True(t, enabled)
}
}

func TestPartialMetadataSameCRDMultipleWorkspaces(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test exercises the partialmetadata issue.

Do you think we should also cover the watch-cache issue with a new test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add to the existing watch cache test?

@ncdc
Copy link
Member Author

ncdc commented Feb 9, 2023

/retest

@ncdc ncdc force-pushed the fix-multiple-crds-different-versions branch from 6d4f676 to a996da8 Compare February 9, 2023 12:25
@ncdc
Copy link
Member Author

ncdc commented Feb 9, 2023

Updated to pull in the latest kube

@ncdc
Copy link
Member Author

ncdc commented Feb 9, 2023

@p0lyn0mial let's merge this and then follow up with a watch cache test?

@ncdc
Copy link
Member Author

ncdc commented Feb 9, 2023

Ugh I really need to nail down what's going on with quota & gc

@p0lyn0mial
Copy link
Contributor

@p0lyn0mial let's merge this and then follow up with a watch cache test?

sgtm, thanks

@ncdc
Copy link
Member Author

ncdc commented Feb 9, 2023

Same test failure as what this PR is trying to fix, but a different root cause now:

  1. 14:53:16.384656: Org workspace created for test assigned a logical cluster
  2. 14:53:45.262625: permissionclaimlabel_reconcile.go tried to patch in-vw-before to apply claim labels, but got a 404
  3. 14:53:46.725584: e2e starts its 30s Eventually loop

Need to figure out why it's getting a 404

@ncdc ncdc force-pushed the fix-multiple-crds-different-versions branch 5 times, most recently from 2716cf2 to 80879a5 Compare February 9, 2023 21:02
@@ -187,6 +188,18 @@ func (c *Controller) process(ctx context.Context, gvr schema.GroupVersionResourc
return nil, fmt.Errorf("object to synchronize is expected to be Unstructured, but is %T", obj)
}

if actualVersion := upstreamObj.GetAnnotations()[conversion.KCPOriginalAPIVersionAnnotation]; actualVersion != "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -352,7 +353,20 @@ func (c *Controller) processResource(ctx context.Context, key string) error {
}
unstr = unstr.DeepCopy()

return c.reconcileResource(ctx, lclusterName, unstr, gvr)
actualGVR := gvr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an e2e test that verifies that we can support multiple CRDs for the
same group+resource in different workspaces, with different API
versions.

Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
Pull in changes that fix wildcard partial metadata requests when there
are CRDs serving different versions of the same group/resource in
different workspaces.

Pull in changes that fix the watch cache for normal CRDs - it was
breaking because each workspace has its own watch cache per normal CRD,
yet the cache was listing/watching CRs from all workspaces, leading to
conversion errors.

Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
The dynamic discovery shared informer factory does a wildcard partial
metadata list/watch against specific API versions. When an informer
event comes in for a specific item, it's for a specific API version
(whatever the factory used), but that doesn't necessarily mean it's the
API version of the actual object being retrieved (the API version
doesn't matter because we're only getting partial metadata).

But if you want to UPDATE the object, you must know its appropriate API
version. This is now presented via annotations in
kcp.io/original-api-version. This updates the permission claim
reconcilers to use the annotation when making dynamic client requests.

Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
The dynamic discovery shared informer factory does a wildcard partial
metadata list/watch against specific API versions. When an informer
event comes in for a specific item, it's for a specific API version
(whatever the factory used), but that doesn't necessarily mean it's the
API version of the actual object being retrieved (the API version
doesn't matter because we're only getting partial metadata).

But if you want to UPDATE/DELETE the object, you must know its appropriate API
version. This is now presented via annotations in kcp.io/original-api-version.
This updates the reconciler to use the annotation when making dynamic client
requests.

Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
The dynamic discovery shared informer factory does a wildcard partial
metadata list/watch against specific API versions. When an informer
event comes in for a specific item, it's for a specific API version
(whatever the factory used), but that doesn't necessarily mean it's the
API version of the actual object being retrieved (the API version
doesn't matter because we're only getting partial metadata).

But if you want to UPDATE/DELETE the object, you must know its appropriate API
version. This is now presented via annotations in kcp.io/original-api-version.
This updates the spec syncer to use the annotation when making dynamic client
requests against the upstream.

Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
@ncdc ncdc force-pushed the fix-multiple-crds-different-versions branch from bdf1d7e to b82da8a Compare February 10, 2023 19:43
@stevekuznetsov
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 10, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ncdc,stevekuznetsov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ncdc
Copy link
Member Author

ncdc commented Feb 10, 2023

Pushed a commit similar to #2684 to deflake the status half of the test.

Flake was kcp-dev/contrib-tmc#53

Make sure the informer factories have synced the specific GVRs we care
about in the test before clearing test client actions. This way, we know
we won't get watch actions from the informers after we've cleared the
actions, which is what was causing the flakiness.

Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
@ncdc ncdc force-pushed the fix-multiple-crds-different-versions branch from 59fb6f1 to fbcfc28 Compare February 10, 2023 20:27
@stevekuznetsov
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2023
@openshift-merge-robot openshift-merge-robot merged commit 66f8f29 into kcp-dev:main Feb 10, 2023
@kcp-ci-bot kcp-ci-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: sometimes preexisting resources don't get permission claims labels applied
5 participants