Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor search reconciling logic #3670

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

tedli
Copy link
Contributor

@tedli tedli commented Jun 14, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

  1. clusters change didn't trigger registries reconcile.
  2. a cluster may has more than one cluster manager informer.

Which issue(s) this PR fixes:
Fixes #3669

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 14, 2023
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 14, 2023
Copy link
Member

@whitewindmills whitewindmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly add e2e case to cover this scenario.

@tedli tedli marked this pull request as draft June 15, 2023 06:05
@karmada-bot karmada-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 15, 2023
@tedli tedli marked this pull request as ready for review June 15, 2023 11:40
@karmada-bot karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2023

Codecov Report

Merging #3670 (8fddeea) into master (c921acd) will increase coverage by 0.01%.
The diff coverage is 57.89%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #3670      +/-   ##
==========================================
+ Coverage   55.68%   55.70%   +0.01%     
==========================================
  Files         222      223       +1     
  Lines       21195    21233      +38     
==========================================
+ Hits        11803    11828      +25     
- Misses       8765     8774       +9     
- Partials      627      631       +4     
Flag Coverage Δ
unittests 55.70% <57.89%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/util/common.go 57.89% <57.89%> (ø)

... and 2 files with indirect coverage changes

pkg/search/backendstore/store.go Outdated Show resolved Hide resolved
pkg/search/controller.go Outdated Show resolved Hide resolved
pkg/search/controller.go Outdated Show resolved Hide resolved
pkg/search/controller.go Outdated Show resolved Hide resolved
@XiShanYongYe-Chang
Copy link
Member

Thanks @tedli @whitewindmills
I'll take a review ASAP.

pkg/search/controller.go Outdated Show resolved Hide resolved
pkg/util/common.go Show resolved Hide resolved
pkg/search/controller.go Outdated Show resolved Hide resolved
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~
Still reviewing.


c.queue.Add(cls)
clusterSet := sets.New[string]()
for _, registry := range []*searchv1alpha1.ResourceRegistry{newRR, oldRR} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This processing may result in some redundancy. For certain clusters in rr, there have been no changes and we can focus only on the ones with differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This controller reconcile the relationship between clusters and registries, what's more the key of the controller's queue is cluster name.
At this point, we don't know cluster changed or not, the only thing we know is if reflect.DeepEqual(oldRR.Spec, newRR.Spec) not true, the registry must changed.
So it's need to do some reconcile. But, as I mentioned above, the key of the controller queue is cluster name, this leave us only 2 choices, 1 is to queue the cluster name, do reconcile at cluster level, that's what I did; 2 is to introduce another worker, which the key of queue is registry name, then we can reconcile at registry level.
However, if we decice to do using method 2, then we also have to retrieve clusters of old registry spec, and clusters of new registry spec, to find out what changed, then reconcile changed clusters, because finally reconciling multi cluster informer, is what we want.
It seemd not possible to reduce the redunancy, since if cluster-registry relationship not changed, the resources may also be changed, like previously we want to cache Pod, but currently we no longer want caching Pod, the cluster-registry relation not changed, but the resource did. That's why I use diff logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Comment on lines +203 to +204
defer func() {
updated = cr
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cr a pointer type, do we need to handle this step separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we not have to do this.

But by doing this we have 2 benefits, 1 is that the input cr arg could be nil; 2 is that at the callee level we know the cr may be changed by this function, which is more expressive.

That's some like append, a = append(a, obj).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx~

pkg/search/controller.go Outdated Show resolved Hide resolved
Comment on lines 211 to 221
for _, name := range added {
cr.registries[name] = struct{}{}
addedResources = c.getRegistryAddedResources(registries[name], cr, addedResources)
}
for _, name := range removed {
delete(cr.registries, name)
for resource, previousRegistries := range cr.resources {
delete(previousRegistries, name)
if len(previousRegistries) == 0 {
removedResources = append(removedResources, resource)
}
}
for _, resource := range removedResources {
delete(cr.resources, resource)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle deletions first before handling additions, because the added elements may contain deleted ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't got you, did you mean it's better to handle removed (L215) before added (L211)?

Either is ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I mean is to delete first and then add, so as to prevent the elements from being cleared when deleting and adding elements that contain the same content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +227 to +228
for _, registry := range matched {
addedResources = c.getRegistryAddedResources(registry, cr, addedResources)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This processing step feels redundant. All the matched RRs should have already been processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there may be added ones, and removed ones. But still remaining ones may have resources changed.
Like, registry target cluster is aaa=bbb, resources, Pod, Node, cluster label changed, and resources also changed, like adding Event.
So we not only need to handle registries added or removed, but also need to handle resources changes.

cr = &crv
addedRegistries, removedRegistries = util.DiffKey(cr.registries, matchedRegistries)
}
if len(addedRegistries) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we omit these judgments to improve the code readability and reduce cyclomatic complexity? Same as below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at first I thought it's not need to stop the informer, just remove uneeded resources. So I did diff to find out what are added, what are removed.

But it's not work, we have to stop a informer if something changed, to init a new one. Unless we do a huge refactor of informer.

Comment on lines +356 to +358
if resourcesChanged {
klog.Infof("Create new informer for cluster %s", cluster)
c.InformerManager.Stop(cluster)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to stop the informer before making changes to the resources, or can we directly modify the current informer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, like the comment above, I firstly came the same idea as you, so I did diff to find out what are added and what are removed. But it just not work.

You can find the first commit of this PR, I did the exact same thing, modifing the current informer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type SingleClusterInformerManager interface {
// UnwatchResource removes all existing event handlers from the informer of the given
// resource.
UnwatchResource(resource schema.GroupVersionResource)

Here, the first commit of this PR, I tried to add a UnwatchResource(resource schema.GroupVersionResource), without success.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx~

}
)

func TestDiffKey(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this test achieve condition coverage? Consider the boundary conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Everything has been reviewed once, and only some testing issues remain.

var proxyList *corev1.NodeList
gomega.Eventually(func(g gomega.Gomega) {
var err error
proxyList, err = proxyClient.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current modifications made in PR should not involve any changes in the search/proxy functionality. This test example may have been placed in the wrong location and can be moved to this side of line 344.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -710,5 +710,72 @@ var _ = ginkgo.Describe("[karmada-search] karmada search testing", ginkgo.Ordere
})
})
})
ginkgo.Describe("update cluster testing", ginkgo.Ordered, func() {
ginkgo.Context("cluster labeled after registry created", ginkgo.Ordered, func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't touch this test case, it's provided by other commiter previously. It seemed that the diff algo of github get confused, it marks this line green treated as modification of this pr, but it's not.

t.Errorf("removed = %v, want []string{\"kubefed\"}", removed)
}
})
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo lint errors here:

Installing golangci-lint v1.[5](https://github.com/karmada-io/karmada/actions/runs/5387138616/jobs/9778049938?pr=3670#step:5:6)2.2
golangci/golangci-lint info checking GitHub for tag 'v1.52.2'
golangci/golangci-lint info found version: 1.52.2 for v1.52.2/linux/amd[6](https://github.com/karmada-io/karmada/actions/runs/5387138616/jobs/9778049938?pr=3670#step:5:7)4
golangci/golangci-lint info installed /home/runner/go/bin/golangci-lint
Error: pkg/util/common_test.go:[8](https://github.com/karmada-io/karmada/actions/runs/5387138616/jobs/9778049938?pr=3670#step:5:9)6:2: S1023: redundant `return` statement (gosimple)
	return
	^

Please review the above warnings.
If the above warnings do not make sense, feel free to file an issue.
Error: Process completed with exit code 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -710,5 +756,42 @@ var _ = ginkgo.Describe("[karmada-search] karmada search testing", ginkgo.Ordere
})
})
})
ginkgo.Describe("update cluster testing", ginkgo.Ordered, func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new add test block can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

test/e2e/search_test.go Outdated Show resolved Hide resolved
m1Dynamic = framework.GetClusterDynamicClient(member1)
gomega.Expect(m1Dynamic).ShouldNot(gomega.BeNil())
proxyConfig := *restConfig
proxyConfig.Host += "/apis/search.karmada.io/v1alpha1/proxying/karmada/proxy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tedli, this e2e might not be written correctly. The test coverage does not include the modified content. In this revision, we have added handling for cluster modification events, but this only applies to search cache and not search proxy capability.

Have you tested the new processing logic locally? If it passes the local test, please share the test results.
And then we can submit an E2E test for the modification in a subsequent PR to promote the integration of the current refactoring code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi reminder that @tedli might not be familiar with E2E tests. Can we go with a simple PR first? If you think we need an E2E test, open an issue to track.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying that because this PR has been blocked for weeks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can start by removing the E2E part and adding a separate PR to include e2e testing. This test can be completed by anyone who is interested. I create Issue #3731 to track it. How do you think @tedli

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all,
Sorry for late reply. I will remove e2e test case, and I'm glad to take a deeper dive with karmada e2e test, and then finish #3731

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tedli

@XiShanYongYe-Chang
Copy link
Member

I think we can merge now :)
Before merging it, can you help squash those commits into one commit?

Signed-off-by: lizhen6 <lizhen6@360.cn>
@XiShanYongYe-Chang
Copy link
Member

/lgtm
Ask an approve from @RainbowMango
/cc @RainbowMango

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 30, 2023
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2023
@karmada-bot karmada-bot merged commit dfd12ef into karmada-io:master Jun 30, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

registries not reconciled when clusters changed (add, update, del)
6 participants