Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-1037: Fix issues pertaining to DNSNameResolver controller #6

Merged
merged 2 commits into from
May 7, 2024

Conversation

arkadeepsen
Copy link
Contributor

This PR adds the support to provide port on which the CoreDNS is listening on as an input. RBAC permissions are added for getting events related to DNSNameResolver status updates. It also fixes the issue of deadlock condition in the resolver.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 30, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 30, 2024

@arkadeepsen: This pull request references CFE-1037 which is a valid jira issue.

In response to this:

This PR adds the support to provide port on which the CoreDNS is listening on as an input. RBAC permissions are added for getting events related to DNSNameResolver status updates. It also fixes the issue of deadlock condition in the resolver.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@arkadeepsen
Copy link
Contributor Author

/retest

@@ -149,6 +147,11 @@ func (resolver *Resolver) Add(
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return and the one on the line 138 are not protected any longer since you don't defer the unlock call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the unlock before the return.

Comment on lines 238 to 242
resolver.dnsLock.Unlock()

// Iterate through the dnsNameList slice.
for _, dnsName := range dnsNameList {
resolver.dnsLock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be a context switch between the unlock and lock here. The state of the resolver can change in the mean time. Any reason you call Lock twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to release the lock before sending the signal on the delete channel. That's why I have unlocked before the loop and then taken the lock inside it. Probably the lock can be released before sending the signal to the channel and then again take the lock on return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, we can create another slice of DNS names which needs sending signal to the delete channel. Once the changes have been made to the resolver.dnsNames then we can release the lock and send the signals one by one in a loop.

}

// Delete is called whenever a DNSNameResolver object is deleted.
func (resolver *Resolver) Delete(objName string) {
resolver.dnsLock.Lock()
defer resolver.dnsLock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in Add() about the unsafe returns down the code (github doesn't allow to put a comment there).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the unlock before the returns.

Add support to provide port on which the CoreDNS is listening on as an input. Add RBAC permissions for getting events related to DNSNameResolver status updates. Fix the issue of deadlock condition in the resolver.
Copy link
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this design of the locking more, definitely . Looks clear, leaves little space for races. Thanks!

Just some nit picks.

@@ -163,12 +161,12 @@ func (r *reconciler) Reconcile(ctx context.Context, request reconcile.Request) (
// event then this will ensure that the resolver sends the resolution request
// for the DNS name, so that the status of the object gets updated with the
// corresponding IP addresses.
r.resolver.Add(dnsName, []ocpnetworkv1alpha1.DNSNameResolverResolvedAddress{}, matchesRegular, request.Name)
r.resolver.AddDNSNameResolverResolvedName(dnsName, []ocpnetworkv1alpha1.DNSNameResolverResolvedAddress{}, matchesRegular, request.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the name be more concise? Something like

Suggested change
r.resolver.AddDNSNameResolverResolvedName(dnsName, []ocpnetworkv1alpha1.DNSNameResolverResolvedAddress{}, matchesRegular, request.Name)
r.resolver.AddResolvedName(dnsName, []ocpnetworkv1alpha1.DNSNameResolverResolvedAddress{}, matchesRegular, request.Name)

Or even shorter:

Suggested change
r.resolver.AddDNSNameResolverResolvedName(dnsName, []ocpnetworkv1alpha1.DNSNameResolverResolvedAddress{}, matchesRegular, request.Name)
r.resolver.AddName(dnsName, []ocpnetworkv1alpha1.DNSNameResolverResolvedAddress{}, matchesRegular, request.Name)

@@ -129,7 +127,7 @@ func (r *reconciler) Reconcile(ctx context.Context, request reconcile.Request) (
// Check if the DNSNameResolver resource is deleted. If so, delete DNS names matching the DNSNameResolver resource
// from the resolver.
if errors.IsNotFound(err) {
r.resolver.Delete(request.Name)
r.resolver.DeleteDNSNameResolver(request.Name)
Copy link
Contributor

@alebedev87 alebedev87 May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be consistent in naming with the Add method? Something like

Suggested change
r.resolver.DeleteDNSNameResolver(request.Name)
r.resolver.DeleteResolvedName(request.Name)

Comment on lines 88 to 89
nextDNSName, nextLookupTime, numIPs, exists = resolver.add(addedResolvedAddressDetails.dnsName,
addedResolvedAddressDetails.resolvedAddresses, addedResolvedAddressDetails.matchesRegular, addedResolvedAddressDetails.objName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pass the details struct down to add() instead of N args?

regularObjInfo map[string]string
wildcardObjInfo map[string]sets.Set[string]
dnsLock sync.Mutex

added chan struct{}
added chan resolvedAddressDetails
deleted chan string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about unification of 2 channels? Something like:

added chan dnsDetails
deleted chan dnsDetails

Where dnsDetails is renamed resolvedAddressDetails.

Then when you Add or Delete in controller you can do:

resolver.AddName(dnsDetials{dnsName, resolvedAddresses, matchesRegulr, objName})
resolved.DeleteName(dnsDetails{objName})

Copy link

openshift-ci bot commented May 7, 2024

@arkadeepsen: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@alebedev87
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2024
Copy link

openshift-ci bot commented May 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2024
@arkadeepsen
Copy link
Contributor Author

@snarayan-redhat @melvinjoseph86 @CFields651 adding the labels as the changes in this PR will be imported into openshift/cluster-dns-operator#394

/label docs-approved
/label qe-approved
/label px-approved

@openshift-ci openshift-ci bot added docs-approved Signifies that Docs has signed off on this PR qe-approved Signifies that QE has signed off on this PR labels May 7, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 7, 2024

@arkadeepsen: This pull request references CFE-1037 which is a valid jira issue.

In response to this:

This PR adds the support to provide port on which the CoreDNS is listening on as an input. RBAC permissions are added for getting events related to DNSNameResolver status updates. It also fixes the issue of deadlock condition in the resolver.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label May 7, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 5fa57d9 into openshift:main May 7, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants