Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service #500

miheer · 2020-12-01T09:08:04Z

GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service

sgreene570 · 2020-12-01T21:35:47Z

pkg/dns/gcp/provider.go

 	"google.golang.org/api/googleapi"
+	"net/http"


Standard go libraries should be imported first, so I would undo this change.

The go fmt does not allow to do this. I think it is ordering alphabetically

If you add the blank line back, sorting shouldn't matter. ie:

Suggested change

"net/http"

import (

"context"

"net/http"

"google.golang.org/api/googleapi"

...

sgreene570 · 2020-12-01T21:41:58Z

pkg/dns/gcp/provider.go

+	delete := p.Delete(record, zone)
+	if delete != nil {
+		return delete
+	}


Suggested change

delete := p.Delete(record, zone)

if delete != nil {

return delete

}

if err := p.Delete(record, zone); err != nil {

return err

}

sgreene570 · 2020-12-01T21:42:16Z

pkg/dns/gcp/provider.go

+		return delete
+	}
+	create := p.Ensure(record, zone)
+	if create != nil {


Same suggestion as above.

openshift-merge-robot · 2020-12-15T10:53:24Z

@miheer: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-operator	`af0ebd5`	link	`/test e2e-aws-operator`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

lihongan · 2020-12-16T09:30:08Z

/bugzilla cc-qa

openshift-ci-robot · 2020-12-16T09:30:09Z

@lihongan: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

/bugzilla cc-qa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

lihongan · 2020-12-16T09:42:11Z

@miheer could you please update the title of this pull request to Bug 1898417: xxxx (bug ID followed by colon)

openshift-ci-robot · 2020-12-16T15:48:46Z

@miheer: This pull request references Bugzilla bug 1898417, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

miheer · 2020-12-16T15:51:16Z

@miheer could you please update the title of this pull request to Bug 1898417: xxxx (bug ID followed by colon)

Done @lihongan

lihongan · 2020-12-17T01:12:56Z

/bugzilla cc-qa

openshift-ci-robot · 2020-12-17T01:13:00Z

@lihongan: This pull request references Bugzilla bug 1898417, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/bugzilla cc-qa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2021-01-07T02:48:49Z

pkg/dns/gcp/provider.go

@@ -2,6 +2,8 @@ package gcp

 import (
 	"context"
+	"fmt"
+	"log"


Please don't import the log package; instead, import github.com/openshift/cluster-ingress-operator/pkg/log and declare a logger as the other DNS providers do:

import ( // ... logf "github.com/openshift/cluster-ingress-operator/pkg/log" ) var ( _ dns.Provider = &Provider{} log = logf.Logger.WithName("dns") )

Miciah · 2021-01-07T02:50:43Z

pkg/dns/gcp/provider.go

+	oldRecord := p.dnsService.ResourceRecordSets.List(p.config.Project, zone.ID).Name(record.Spec.DNSName)
+	if err := oldRecord.Pages(ctx, func(page *gdnsv1.ResourceRecordSetsListResponse) error {
+		for _, resourceRecordSet := range page.Rrsets {
+			log.Println(fmt.Printf("%#v\n", resourceRecordSet))


This can be deleted, or replaced with something like this: log.Info("found old DNS resource record set", "resourceRecordSet" resourceRecordSet).

Miciah · 2021-01-07T02:56:03Z

pkg/dns/gcp/provider.go

+		}
+		return nil
+	}); err != nil {
+		log.Fatal(err)


Why is this fatal? Why not just return err?

Miciah · 2021-01-07T02:58:10Z

pkg/dns/gcp/provider.go

+			_, err := call.Do()
+			if ae, ok := err.(*googleapi.Error); ok && ae.Code == http.StatusNotFound {
+				return nil
+			}


You need to add a return err after this if block so that the anonymous function returns the error value, which the Pages method will then return to be handled below.

Miciah · 2021-01-07T03:01:40Z

pkg/operator/controller/dns/controller.go

-		if record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
-			log.Info("skipping zone to which the DNS record is already published", "record", record.Spec, "dnszone", zone)
-			continue
-		}


Why move this block?

Miciah · 2021-01-07T03:05:21Z

pkg/operator/controller/dns/controller.go

+			log.Info("replacing DNS record", "record", record.Spec, "dnszone", zone)
+
+			if err := r.dnsProvider.Replace(record, zone); err != nil {
+				log.Error(err, "failed to replace DNS record to zone", "record", record.Spec, "dnszone", zone)


Suggested change

log.Error(err, "failed to replace DNS record to zone", "record", record.Spec, "dnszone", zone)

log.Error(err, "failed to replace DNS record in zone", "record", record.Spec, "dnszone", zone)

Or just delete " to zone".

Miciah · 2021-01-07T03:05:55Z

pkg/operator/controller/dns/controller.go

+				condition.Message = fmt.Sprintf("The DNS provider failed to replace the record: %v", err)
+				result.RequeueAfter = 30 * time.Second
+			} else {
+				log.Info("replaced DNS record to zone", "record", record.Spec, "dnszone", zone)


Suggested change

log.Info("replaced DNS record to zone", "record", record.Spec, "dnszone", zone)

log.Info("replaced DNS record in zone", "record", record.Spec, "dnszone", zone)

Miciah · 2021-01-07T03:07:59Z

pkg/operator/controller/dns/controller.go

+				condition.Reason = "ProviderSuccess"
+				condition.Message = "The DNS provider succeeded in replacing the record"
+			}
+


You need to update statuses and then continue here to avoid executing the Ensure logic below (and failing). Alternatively, add an else block and pull the Ensure logic into it, which would be simpler and more readable:

if record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) { log.Info("skipping zone to which the DNS record is already published", "record", record.Spec, "dnszone", zone) continue } condition := iov1.DNSZoneCondition{ Status: string(operatorv1.ConditionUnknown), Type: iov1.DNSRecordFailedConditionType, LastTransitionTime: metav1.Now(), } if recordIsAlreadyPublishedToZone(record, &zone) { if err := r.dnsProvider.Replace(record, zone); err != nil { log.Error(err, "failed to replace DNS record", "record", record.Spec, "dnszone", zone) condition.Status = string(operatorv1.ConditionTrue) condition.Reason = "ProviderError" condition.Message = fmt.Sprintf("The DNS provider failed to replace the record: %v", err) result.RequeueAfter = 30 * time.Second } else { log.Info("replaced DNS record", "record", record.Spec, "dnszone", zone) condition.Status = string(operatorv1.ConditionFalse) condition.Reason = "ProviderSuccess" condition.Message = "The DNS provider succeeded in replacing the record" } } else { if err := r.dnsProvider.Ensure(record, zone); err != nil { log.Error(err, "failed to publish DNS record to zone", "record", record.Spec, "dnszone", zone) condition.Status = string(operatorv1.ConditionTrue) condition.Reason = "ProviderError" condition.Message = fmt.Sprintf("The DNS provider failed to ensure the record: %v", err) result.RequeueAfter = 30 * time.Second } else { log.Info("published DNS record to zone", "record", record.Spec, "dnszone", zone) condition.Status = string(operatorv1.ConditionFalse) condition.Reason = "ProviderSuccess" condition.Message = "The DNS provider succeeded in ensuring the record" } } statuses = append(statuses, iov1.DNSZoneStatus{ DNSZone: zone, Conditions: []iov1.DNSZoneCondition{condition}, })

Miciah · 2021-01-07T03:18:50Z

pkg/operator/controller/dns/controller.go


 		condition := iov1.DNSZoneCondition{
 			Status:             string(operatorv1.ConditionUnknown),
 			Type:               iov1.DNSRecordFailedConditionType,
 			LastTransitionTime: metav1.Now(),
 		}

+		if record.Generation != record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
+			log.Info("replacing DNS record", "record", record.Spec, "dnszone", zone)


Probably don't need this log statement.

OK just added so that in the logs we can get it went to replace action

Miciah · 2021-01-07T03:21:21Z

pkg/dns/dns.go

@@ -13,11 +13,15 @@ type Provider interface {

 	// Delete will delete record.
 	Delete(record *iov1.DNSRecord, zone configv1.DNSZone) error
+
+	//Replace will replace the record


Suggested change

//Replace will replace the record

// Replace will replace the record

Miciah · 2021-01-08T06:59:48Z

pkg/dns/gcp/provider.go

@@ -13,10 +13,13 @@ import (

 	gdnsv1 "google.golang.org/api/dns/v1"
 	"google.golang.org/api/option"
+
+	logf "github.com/openshift/cluster-ingress-operator/pkg/log"


This can be grouped with "github.com/openshift/cluster-ingress-operator/pkg/dns".

Miciah · 2021-01-08T07:47:05Z

pkg/operator/controller/dns/controller.go

-			condition.Reason = "ProviderError"
-			condition.Message = fmt.Sprintf("The DNS provider failed to ensure the record: %v", err)
-			result.RequeueAfter = 30 * time.Second
+		if record.Generation != record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {


We check record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) above and return if that condition is true, so we can simplify this condition as follows:

Suggested change

if record.Generation != record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {

if recordIsAlreadyPublishedToZone(record, &zone) {

So we don't need to check the Generation ? @Miciah

is it because we already checked that condition on line 254 ? @Miciah

Miciah · 2021-01-08T07:47:59Z

pkg/operator/controller/dns/controller.go

+		} else if record.Generation == record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
+			log.Info("skipping zone to which the DNS record is already published", "record", record.Spec, "dnszone", zone)
+			continue


The else if block is redundant and can be deleted.

miheer · 2021-01-08T07:49:16Z

@Miciah Here is what I did ->
When the "oc delete svc router-default -n openshift-ingress" hung I removed the finalizer from it and the command completed and the external IP was assigned.

Also this issue does not seem to be related this PR. I deleted the svc router default from a cluster which did not have this PR fix and that also hung where I had to delete the finalizer from the svc to get the delete command completed from hung state. After that the external IP was assigned.

Miciah · 2021-01-08T07:54:31Z

Error syncing load balancer: failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}\nThe kube-controller-manager logs may contain more details.)

That does look like a new issue. The service controller adds the annotation, but it should be adding it when the service is created. How quickly did you delete the service after it was created? Can you check the service controller logs for clues? This likely warrants a new Bugzilla report.

miheer · 2021-01-08T08:01:32Z

@Miciah filed this https://bugzilla.redhat.com/show_bug.cgi?id=1914127

miheer · 2021-01-12T01:26:08Z

/retest

miheer · 2021-01-12T13:41:18Z

@Miciah I think we don't have any control at kubernetes service level code so before deleting the service we need to delete the finalizers and then perform the delete action.

Shall we close this BZ bhttps://bugzilla.redhat.com/show_bug.cgi?id=1914127

miheer · 2021-01-12T13:41:46Z

@Miciah also can you approve this PR ?

… after recreating loadbalancer service

Miciah · 2021-01-12T14:57:49Z

Last couple of updates fixed the only remaining outstanding feedback, so
/lgtm
Thanks!

openshift-bot · 2021-01-12T16:26:09Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-01-12T17:05:11Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-01-12T17:18:07Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-01-12T18:49:08Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-01-12T20:20:08Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-01-12T22:04:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-01-12T22:17:00Z

/retest

Please review the full test history for this PR and help us cut down flakes.

miheer · 2021-01-12T23:49:54Z

/retest

Miciah · 2021-01-13T01:09:45Z

/test ?

openshift-ci-robot · 2021-01-13T01:09:59Z

@Miciah: The following commands are available to trigger jobs:

/test e2e-aws
/test e2e-aws-operator
/test e2e-azure
/test e2e-upgrade
/test images
/test unit
/test verify

Use /test all to run the following jobs:

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator
pull-ci-openshift-cluster-ingress-operator-master-e2e-upgrade
pull-ci-openshift-cluster-ingress-operator-master-images
pull-ci-openshift-cluster-ingress-operator-master-unit
pull-ci-openshift-cluster-ingress-operator-master-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

miheer · 2021-01-13T01:37:48Z

/retest

lihongan · 2021-01-13T01:55:47Z

/lgtm

openshift-ci-robot · 2021-01-13T01:56:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lihongan, Miciah, miheer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2021-01-13T03:24:56Z

@miheer: Bugzilla bug 1898417 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

In response to this:

Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

miheer changed the title ~~Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service~~ [WIP] Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Dec 1, 2020

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2020

openshift-ci-robot requested review from Miciah and sgreene570 December 1, 2020 09:08

miheer force-pushed the replace-dnsRecord branch 4 times, most recently from 537103f to b617bc6 Compare December 1, 2020 11:31

sgreene570 reviewed Dec 1, 2020

View reviewed changes

miheer force-pushed the replace-dnsRecord branch 4 times, most recently from 32d3c02 to af0ebd5 Compare December 15, 2020 09:37

miheer changed the title ~~[WIP] Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service~~ Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Dec 15, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 15, 2020

miheer changed the title ~~Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service~~ Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service Dec 16, 2020

openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Dec 16, 2020

openshift-ci-robot requested a review from lihongan December 17, 2020 01:13

Miciah reviewed Jan 7, 2021

View reviewed changes

miheer force-pushed the replace-dnsRecord branch from af0ebd5 to a1e1b39 Compare January 8, 2021 03:00

Miciah reviewed Jan 8, 2021

View reviewed changes

miheer force-pushed the replace-dnsRecord branch from 7323f44 to db0d237 Compare January 12, 2021 14:14

Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated…

eb4c9be

… after recreating loadbalancer service

miheer force-pushed the replace-dnsRecord branch from db0d237 to eb4c9be Compare January 12, 2021 14:50

openshift-ci-robot assigned Miciah Jan 12, 2021

openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 12, 2021

openshift-ci-robot assigned lihongan Jan 13, 2021

openshift-merge-robot merged commit c5d307f into openshift:master Jan 13, 2021

	log.Error(err, "failed to replace DNS record to zone", "record", record.Spec, "dnszone", zone)
	log.Error(err, "failed to replace DNS record in zone", "record", record.Spec, "dnszone", zone)

	log.Info("replaced DNS record to zone", "record", record.Spec, "dnszone", zone)
	log.Info("replaced DNS record in zone", "record", record.Spec, "dnszone", zone)

	//Replace will replace the record
	// Replace will replace the record

	if record.Generation != record.Status.ObservedGeneration && recordIsAlreadyPublishedToZone(record, &zone) {
	if recordIsAlreadyPublishedToZone(record, &zone) {

Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service #500

Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service #500

Conversation

miheer commented Dec 1, 2020

Choose a reason for hiding this comment

miheer Dec 4, 2020 • edited

Choose a reason for hiding this comment

sgreene570 Dec 4, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-merge-robot commented Dec 15, 2020 • edited

lihongan commented Dec 16, 2020

openshift-ci-robot commented Dec 16, 2020

lihongan commented Dec 16, 2020

openshift-ci-robot commented Dec 16, 2020

miheer commented Dec 16, 2020

lihongan commented Dec 17, 2020

openshift-ci-robot commented Dec 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miheer commented Jan 8, 2021 • edited

Miciah commented Jan 8, 2021

miheer commented Jan 8, 2021

miheer commented Jan 12, 2021

miheer commented Jan 12, 2021

miheer commented Jan 12, 2021

Miciah commented Jan 12, 2021

openshift-bot commented Jan 12, 2021

openshift-bot commented Jan 12, 2021

openshift-bot commented Jan 12, 2021

openshift-bot commented Jan 12, 2021

openshift-bot commented Jan 12, 2021

openshift-bot commented Jan 12, 2021

openshift-bot commented Jan 12, 2021

miheer commented Jan 12, 2021

Miciah commented Jan 13, 2021

openshift-ci-robot commented Jan 13, 2021

miheer commented Jan 13, 2021

lihongan commented Jan 13, 2021

openshift-ci-robot commented Jan 13, 2021

openshift-ci-robot commented Jan 13, 2021

miheer Dec 4, 2020 •

edited

sgreene570 Dec 4, 2020 •

edited

openshift-merge-robot commented Dec 15, 2020 •

edited

miheer commented Jan 8, 2021 •

edited