Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service #500

Merged
merged 1 commit into from Jan 13, 2021

Conversation

miheer
Copy link
Contributor

@miheer miheer commented Dec 1, 2020

GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service

@miheer miheer changed the title Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service [WIP] Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Dec 1, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2020
@miheer miheer force-pushed the replace-dnsRecord branch 4 times, most recently from 537103f to b617bc6 Compare December 1, 2020 11:31
"google.golang.org/api/googleapi"
"net/http"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard go libraries should be imported first, so I would undo this change.

Copy link
Contributor Author

@miheer miheer Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The go fmt does not allow to do this. I think it is ordering alphabetically

Copy link
Contributor

@sgreene570 sgreene570 Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the blank line back, sorting shouldn't matter. ie:

Suggested change
"net/http"
import (
"context"
"net/http"
"google.golang.org/api/googleapi"
...

Comment on lines 60 to 79
delete := p.Delete(record, zone)
if delete != nil {
return delete
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
delete := p.Delete(record, zone)
if delete != nil {
return delete
}
if err := p.Delete(record, zone); err != nil {
return err
}

return delete
}
create := p.Ensure(record, zone)
if create != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as above.

@miheer miheer force-pushed the replace-dnsRecord branch 4 times, most recently from 32d3c02 to af0ebd5 Compare December 15, 2020 09:37
@openshift-merge-robot
Copy link
Contributor

openshift-merge-robot commented Dec 15, 2020

@miheer: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-operator af0ebd5 link /test e2e-aws-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@miheer miheer changed the title [WIP] Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Dec 15, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 15, 2020
@lihongan
Copy link
Contributor

/bugzilla cc-qa

@openshift-ci-robot
Copy link
Contributor

@lihongan: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

/bugzilla cc-qa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lihongan
Copy link
Contributor

@miheer could you please update the title of this pull request to Bug 1898417: xxxx (bug ID followed by colon)

@miheer miheer changed the title Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Dec 16, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Dec 16, 2020
@openshift-ci-robot
Copy link
Contributor

@miheer: This pull request references Bugzilla bug 1898417, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@miheer
Copy link
Contributor Author

miheer commented Dec 16, 2020

@miheer could you please update the title of this pull request to Bug 1898417: xxxx (bug ID followed by colon)

Done @lihongan

@lihongan
Copy link
Contributor

/bugzilla cc-qa

@openshift-ci-robot
Copy link
Contributor

@lihongan: This pull request references Bugzilla bug 1898417, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/bugzilla cc-qa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -2,6 +2,8 @@ package gcp

import (
"context"
"fmt"
"log"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't import the log package; instead, import github.com/openshift/cluster-ingress-operator/pkg/log and declare a logger as the other DNS providers do:

import (
	// ...
	logf "github.com/openshift/cluster-ingress-operator/pkg/log"
)

var (
	_   dns.Provider = &Provider{}
	log              = logf.Logger.WithName("dns")
)

oldRecord := p.dnsService.ResourceRecordSets.List(p.config.Project, zone.ID).Name(record.Spec.DNSName)
if err := oldRecord.Pages(ctx, func(page *gdnsv1.ResourceRecordSetsListResponse) error {
for _, resourceRecordSet := range page.Rrsets {
log.Println(fmt.Printf("%#v\n", resourceRecordSet))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be deleted, or replaced with something like this: log.Info("found old DNS resource record set", "resourceRecordSet" resourceRecordSet).

}
return nil
}); err != nil {
log.Fatal(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this fatal? Why not just return err?

_, err := call.Do()
if ae, ok := err.(*googleapi.Error); ok && ae.Code == http.StatusNotFound {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a return err after this if block so that the anonymous function returns the error value, which the Pages method will then return to be handled below.

if record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
log.Info("skipping zone to which the DNS record is already published", "record", record.Spec, "dnszone", zone)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this block?

log.Info("replacing DNS record", "record", record.Spec, "dnszone", zone)

if err := r.dnsProvider.Replace(record, zone); err != nil {
log.Error(err, "failed to replace DNS record to zone", "record", record.Spec, "dnszone", zone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Error(err, "failed to replace DNS record to zone", "record", record.Spec, "dnszone", zone)
log.Error(err, "failed to replace DNS record in zone", "record", record.Spec, "dnszone", zone)

Or just delete " to zone".

condition.Message = fmt.Sprintf("The DNS provider failed to replace the record: %v", err)
result.RequeueAfter = 30 * time.Second
} else {
log.Info("replaced DNS record to zone", "record", record.Spec, "dnszone", zone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("replaced DNS record to zone", "record", record.Spec, "dnszone", zone)
log.Info("replaced DNS record in zone", "record", record.Spec, "dnszone", zone)

condition.Reason = "ProviderSuccess"
condition.Message = "The DNS provider succeeded in replacing the record"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update statuses and then continue here to avoid executing the Ensure logic below (and failing). Alternatively, add an else block and pull the Ensure logic into it, which would be simpler and more readable:

		if record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
			log.Info("skipping zone to which the DNS record is already published", "record", record.Spec, "dnszone", zone)
			continue
		}

		condition := iov1.DNSZoneCondition{
			Status:             string(operatorv1.ConditionUnknown),
			Type:               iov1.DNSRecordFailedConditionType,
			LastTransitionTime: metav1.Now(),
		}
		if recordIsAlreadyPublishedToZone(record, &zone) {
			if err := r.dnsProvider.Replace(record, zone); err != nil {
				log.Error(err, "failed to replace DNS record", "record", record.Spec, "dnszone", zone)
				condition.Status = string(operatorv1.ConditionTrue)
				condition.Reason = "ProviderError"
				condition.Message = fmt.Sprintf("The DNS provider failed to replace the record: %v", err)
				result.RequeueAfter = 30 * time.Second
			} else {
				log.Info("replaced DNS record", "record", record.Spec, "dnszone", zone)
				condition.Status = string(operatorv1.ConditionFalse)
				condition.Reason = "ProviderSuccess"
				condition.Message = "The DNS provider succeeded in replacing the record"
			}
		} else {
			if err := r.dnsProvider.Ensure(record, zone); err != nil {
				log.Error(err, "failed to publish DNS record to zone", "record", record.Spec, "dnszone", zone)
				condition.Status = string(operatorv1.ConditionTrue)
				condition.Reason = "ProviderError"
				condition.Message = fmt.Sprintf("The DNS provider failed to ensure the record: %v", err)
				result.RequeueAfter = 30 * time.Second
			} else {
				log.Info("published DNS record to zone", "record", record.Spec, "dnszone", zone)
				condition.Status = string(operatorv1.ConditionFalse)
				condition.Reason = "ProviderSuccess"
				condition.Message = "The DNS provider succeeded in ensuring the record"
			}
		}
		statuses = append(statuses, iov1.DNSZoneStatus{
			DNSZone:    zone,
			Conditions: []iov1.DNSZoneCondition{condition},
		})


condition := iov1.DNSZoneCondition{
Status: string(operatorv1.ConditionUnknown),
Type: iov1.DNSRecordFailedConditionType,
LastTransitionTime: metav1.Now(),
}

if record.Generation != record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
log.Info("replacing DNS record", "record", record.Spec, "dnszone", zone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this log statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK just added so that in the logs we can get it went to replace action

pkg/dns/dns.go Outdated
@@ -13,11 +13,15 @@ type Provider interface {

// Delete will delete record.
Delete(record *iov1.DNSRecord, zone configv1.DNSZone) error

//Replace will replace the record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//Replace will replace the record
// Replace will replace the record

@@ -13,10 +13,13 @@ import (

gdnsv1 "google.golang.org/api/dns/v1"
"google.golang.org/api/option"

logf "github.com/openshift/cluster-ingress-operator/pkg/log"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be grouped with "github.com/openshift/cluster-ingress-operator/pkg/dns".

condition.Reason = "ProviderError"
condition.Message = fmt.Sprintf("The DNS provider failed to ensure the record: %v", err)
result.RequeueAfter = 30 * time.Second
if record.Generation != record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) above and return if that condition is true, so we can simplify this condition as follows:

Suggested change
if record.Generation != record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
if recordIsAlreadyPublishedToZone(record, &zone) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't need to check the Generation ? @Miciah

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it because we already checked that condition on line 254 ? @Miciah

Comment on lines 281 to 283
} else if record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
log.Info("skipping zone to which the DNS record is already published", "record", record.Spec, "dnszone", zone)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else if block is redundant and can be deleted.

@miheer
Copy link
Contributor Author

miheer commented Jan 8, 2021

@Miciah Here is what I did ->
When the "oc delete svc router-default -n openshift-ingress" hung I removed the finalizer from it and the command completed and the external IP was assigned.

Also this issue does not seem to be related this PR. I deleted the svc router default from a cluster which did not have this PR fix and that also hung where I had to delete the finalizer from the svc to get the delete command completed from hung state. After that the external IP was assigned.

@Miciah
Copy link
Contributor

Miciah commented Jan 8, 2021

Error syncing load balancer: failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}\nThe kube-controller-manager logs may contain more details.)

That does look like a new issue. The service controller adds the annotation, but it should be adding it when the service is created. How quickly did you delete the service after it was created? Can you check the service controller logs for clues? This likely warrants a new Bugzilla report.

@miheer
Copy link
Contributor Author

miheer commented Jan 8, 2021

@miheer
Copy link
Contributor Author

miheer commented Jan 12, 2021

/retest

@miheer
Copy link
Contributor Author

miheer commented Jan 12, 2021

@Miciah I think we don't have any control at kubernetes service level code so before deleting the service we need to delete the finalizers and then perform the delete action.

Shall we close this BZ bhttps://bugzilla.redhat.com/show_bug.cgi?id=1914127

@miheer
Copy link
Contributor Author

miheer commented Jan 12, 2021

@Miciah also can you approve this PR ?

@Miciah
Copy link
Contributor

Miciah commented Jan 12, 2021

Last couple of updates fixed the only remaining outstanding feedback, so
/lgtm
Thanks!

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 12, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@miheer
Copy link
Contributor Author

miheer commented Jan 12, 2021

/retest

@Miciah
Copy link
Contributor

Miciah commented Jan 13, 2021

/test ?

@openshift-ci-robot
Copy link
Contributor

@Miciah: The following commands are available to trigger jobs:

  • /test e2e-aws
  • /test e2e-aws-operator
  • /test e2e-azure
  • /test e2e-upgrade
  • /test images
  • /test unit
  • /test verify

Use /test all to run the following jobs:

  • pull-ci-openshift-cluster-ingress-operator-master-e2e-aws
  • pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator
  • pull-ci-openshift-cluster-ingress-operator-master-e2e-upgrade
  • pull-ci-openshift-cluster-ingress-operator-master-images
  • pull-ci-openshift-cluster-ingress-operator-master-unit
  • pull-ci-openshift-cluster-ingress-operator-master-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@miheer
Copy link
Contributor Author

miheer commented Jan 13, 2021

/retest

@lihongan
Copy link
Contributor

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lihongan, Miciah, miheer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit c5d307f into openshift:master Jan 13, 2021
@openshift-ci-robot
Copy link
Contributor

@miheer: Bugzilla bug 1898417 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

In response to this:

Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants