Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bubble up kcert status message when it's failed #14962

Merged
merged 1 commit into from Apr 23, 2024

Conversation

ckcd
Copy link
Contributor

@ckcd ckcd commented Feb 29, 2024

Fixes #14530

Proposed Changes

  • KCert Status Message gets bubbled up to its parent Route when it's failed

Release Note

Certificate generation errors are bubbled up to its parent Route.

Copy link

knative-prow bot commented Feb 29, 2024

Welcome @ckcd! It looks like this is your first PR to knative/serving 🎉

Copy link

knative-prow bot commented Feb 29, 2024

Hi @ckcd. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow knative-prow bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 29, 2024
@knative-prow knative-prow bot added area/API API objects and controllers area/networking labels Feb 29, 2024
@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch from 0ea19a9 to 790f52d Compare February 29, 2024 06:27
@ckcd
Copy link
Contributor Author

ckcd commented Feb 29, 2024

@ReToCode @gabo1208 PTAL, thanks.

@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch 2 times, most recently from 29ca43c to 2fc98c9 Compare February 29, 2024 07:52
@izabelacg
Copy link
Member

/ok-to-test

1 similar comment
@Cali0707
Copy link
Member

Cali0707 commented Mar 5, 2024

/ok-to-test

@knative-prow knative-prow bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 5, 2024
Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 69.23077% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 84.95%. Comparing base (c2d0af1) to head (8d86ca3).
Report is 62 commits behind head on main.

Files Patch % Lines
pkg/reconciler/route/route.go 50.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14962      +/-   ##
==========================================
+ Coverage   84.11%   84.95%   +0.84%     
==========================================
  Files         213      213              
  Lines       16783    13115    -3668     
==========================================
- Hits        14117    11142    -2975     
+ Misses       2315     1618     -697     
- Partials      351      355       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch from 2fc98c9 to fefef66 Compare March 6, 2024 15:35
@ckcd
Copy link
Contributor Author

ckcd commented Mar 11, 2024

@izabelacg @Cali0707 @ReToCode Could please help to look at this PR? thanks.

Status: "False",
Reason: "CommonName Too Long",
Status: "Unknown",
Reason: "The ready condition of Cert Manager Certificate does not exist.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message does not say much (not saying the old one was much better).

@@ -451,8 +451,8 @@ func TestCertificateNotReadyWithBubbledUpMessage(t *testing.T) {
Conditions: duckv1.Conditions{
{
Type: "Ready",
Status: "False",
Reason: "CommonName Too Long",
Status: "Unknown",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unknown instead of false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to set up different tests for different states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I was also confused because the reason message here was moved to the other tests. What about extracting the message outside of the test and re-use it in both tests? It seems like you want to test for the actual states (failed/ready) and the message just has to be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ReToCode sure, will do this change. Thanks for your idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, way better now.


apistest.CheckConditionFailed(r, RouteConditionCertificateProvisioned, t)
}

func TestCertificateFailedWithBubbledUpMessage(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I get the difference between these two tests. We mark the certificate provisioning as failed in both cases. Could you please elaborate the idea behind it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @gabo1208 said in #14530, the difference is Status=Unkown is a status in progress which means it's just not ready for now, and the Status=False is a final state with failure.

@ReToCode
Copy link
Member

As API is also part of this:

/cc @dprotaso

@knative-prow knative-prow bot requested a review from dprotaso March 11, 2024 10:38
@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch from fefef66 to 24f0f45 Compare March 12, 2024 05:42
@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 12, 2024
Copy link
Member

@ReToCode ReToCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold for @dprotaso

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2024
@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2024
@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch from 24f0f45 to 95adad8 Compare March 12, 2024 06:55
@knative-prow knative-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2024
@ckcd
Copy link
Contributor Author

ckcd commented Mar 12, 2024

Seems test failed because of data race for some irrelevant tests.
https://github.com/knative/serving/actions/runs/8244156246/job/22547213240?pr=14962

I will rebase code and re-trigger the tests.

--

Now all successful.

@ckcd
Copy link
Contributor Author

ckcd commented Mar 21, 2024

@dprotaso could you please take a look at this? thanks.

@dprotaso
Copy link
Member

dprotaso commented Mar 26, 2024

closing/reopening so that the PR runs against the latest commit in main (not this is only for github actions - prow is smart about this)

@dprotaso dprotaso closed this Mar 26, 2024
@dprotaso dprotaso reopened this Mar 26, 2024
@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch from 95adad8 to e7288b1 Compare March 27, 2024 01:27
@ckcd
Copy link
Contributor Author

ckcd commented Mar 27, 2024

@dprotaso seems all ut flow passed now. please check it, thanks.

@ReToCode
Copy link
Member

/assign @dprotaso

@ckcd
Copy link
Contributor Author

ckcd commented Mar 27, 2024

/retest

Copy link
Member

@dprotaso dprotaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good - just a few minor things

Comment on lines 440 to 446
}{
{
name: "Ready with empty message",
cert: &netv1alpha1.Certificate{},
status: corev1.ConditionTrue,
wantMessage: "",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind compacting this to match the list style we use in other parts of the repo

see:

isReady bool
}{{
name: "empty status should not be ready",
status: RouteStatus{},
isReady: false,
}, {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks for your suggestion. will do this change.

pkg/reconciler/route/route.go Show resolved Hide resolved
@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch from e7288b1 to 250ab9c Compare March 28, 2024 10:14
@ckcd
Copy link
Contributor Author

ckcd commented Mar 28, 2024

@dprotaso Thanks for your reply! I made some changes according to your idea, please take a look. Thx~

Copy link
Member

@dprotaso dprotaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the test - some comments

Comment on lines 3159 to 3179
WantCreates: []runtime.Object{
ingressWithTLS(
Route("default", "becomes-ready", WithConfigTarget("config"), WithURL,
WithRouteUID("12-34"), WithRouteGeneration(1)),
&traffic.Config{
Targets: map[string]traffic.RevisionTargets{
traffic.DefaultTarget: {{
TrafficTarget: v1.TrafficTarget{
LatestRevision: ptr.Bool(true),
ConfigurationName: "config",
RevisionName: "config-00001",
Percent: ptr.Int64(100),
},
}},
},
}, nil, nil),
simpleK8sService(
Route("default", "becomes-ready", WithConfigTarget("config"), WithRouteUID("12-34")),
WithExternalName("becomes-ready.default.example.com"),
),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect creates in this scenario since we just want to test route propagation. I'd suggest moving these to the Objects list

Comment on lines 3181 to 3183
Object: certificateWithStatus(resources.MakeCertificates(Route("default", "becomes-ready", WithConfigTarget("config"), WithURL, WithRouteUID("12-34")),
map[string]string{"becomes-ready.default.example.com": ""}, netcfg.CertManagerCertificateClassName, "example.com")[0], failedCertStatus()),
}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certificate shouldn't be updated. What's in Objects should be the updated certificate

Comment on lines 3185 to 3196
Object: Route("default", "becomes-ready", WithConfigTarget("config"),
WithRouteUID("12-34"), WithRouteGeneration(1), WithRouteObservedGeneration,
// Populated by reconciliation when all traffic has been assigned.
WithAddress, WithRouteConditionsHTTPDowngrade,
MarkTrafficAssigned, MarkIngressNotConfigured, WithStatusTraffic(
v1.TrafficTarget{
RevisionName: "config-00001",
Percent: ptr.Int64(100),
LatestRevision: ptr.Bool(true),
}), MarkIngressNotConfigured,
// The certificate is not ready. So we want to have HTTP URL.
WithURL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The route should be in failed state but I don't see that here

Comment on lines 3205 to 3209
// This certificate's DNS name is not the host name needed by the input Route.
ActionImpl: clientgotesting.ActionImpl{
Namespace: "default",
Verb: "delete",
Resource: netv1alpha1.SchemeGroupVersion.WithResource("certificates"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This action (deleting the certificate seems unexpected) I'm wondering if we are setting up the test correctly. I wouldn't expect the certificate to be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprotaso actually for this check that Route is correctly updated when Certificate is not ready test case, it also has above WantUpdates, WantDeletes and so on.

And for both not ready and failed in this table_test.go it's hard to check the route status = notReady/failed because in this line it will override the condition with HTTPDowngrade info. That's because we use HTTPProtocol=HTTPEnabled when init NewTestReconciler.

Copy link
Member

@dprotaso dprotaso Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually for this check that Route is correctly updated when Certificate is not ready test case, it also has above WantUpdates, WantDeletes and so on.

Yeah two things - some of the fixtures are off. Meaning they don't really reflect the state a resource can be in. Secondly, each test case varies from whichi point in time a reconciliation occurs so the starting fixtures can be different.

In the scenario where the certificate provisioning has failed. I would expect we were already in a a 'steady state' where all the objects have been created and updated but the only the certificate has been marked as 'failed'

Thus to patch your code the fixtures for this scenario would be

diff --git a/pkg/reconciler/route/table_test.go b/pkg/reconciler/route/table_test.go
index 52080a089..809690182 100644
--- a/pkg/reconciler/route/table_test.go
+++ b/pkg/reconciler/route/table_test.go
@@ -3130,10 +3130,47 @@ func TestReconcileEnableExternalDomainTLS(t *testing.T) {
 	}, {
 		Name: "check that Route is correctly updated when Certificate is failed",
 		Objects: []runtime.Object{
-			Route("default", "becomes-ready", WithConfigTarget("config"), WithRouteUID("12-34"), WithRouteGeneration(1)),
+			Route("default", "becomes-ready", WithConfigTarget("config"), WithRouteUID("12-34"), WithRouteGeneration(1),
+				// Populated by reconciliation when all traffic has been assigned.
+				WithAddress,
+				MarkTrafficAssigned,
+				WithStatusTraffic(
+					v1.TrafficTarget{
+						RevisionName:   "config-00001",
+						Percent:        ptr.Int64(100),
+						LatestRevision: ptr.Bool(true),
+					}),
+				// The certificate is not ready. So we want to have HTTP URL.
+				WithInitRouteConditions,
+				MarkTrafficAssigned,
+				WithRouteObservedGeneration,
+				MarkCertificateNotReady,
+				WithAddress,
+				MarkIngressReady,
+				WithURL,
+			),
 			cfg("default", "config",
 				WithConfigGeneration(1), WithLatestCreated("config-00001"), WithLatestReady("config-00001")),
 			rev("default", "config", 1, MarkRevisionReady, WithRevName("config-00001")),
+			simpleK8sService(
+				Route("default", "becomes-ready", WithConfigTarget("config"), WithRouteUID("12-34")),
+				WithExternalName(pkgnet.GetServiceHostname("private-istio-ingressgateway", "istio-system")),
+			),
+			ingressWithTLS(
+				Route("default", "becomes-ready", WithConfigTarget("config"), WithURL,
+					WithRouteUID("12-34"), WithRouteGeneration(1)),
+				&traffic.Config{
+					Targets: map[string]traffic.RevisionTargets{
+						traffic.DefaultTarget: {{
+							TrafficTarget: v1.TrafficTarget{
+								LatestRevision:    ptr.Bool(true),
+								ConfigurationName: "config",
+								RevisionName:      "config-00001",
+								Percent:           ptr.Int64(100),
+							},
+						}},
+					},
+				}, nil, nil, withReadyIngress),
 			// MakeCertificates will create a certificate with DNS name "*.test-ns.example.com" which is not the host name
 			// needed by the input Route.
 			&netv1alpha1.Certificate{
@@ -3151,64 +3188,35 @@ func TestReconcileEnableExternalDomainTLS(t *testing.T) {
 					},
 				},
 				Spec: netv1alpha1.CertificateSpec{
-					DNSNames: []string{"abc.test.example.com"},
+					Domain:     "example.com",
+					DNSNames:   []string{"becomes-ready.default.example.com"},
+					SecretName: "route-12-34",
 				},
 				Status: failedCertStatus(),
 			},
 		},
-		WantCreates: []runtime.Object{
-			ingressWithTLS(
-				Route("default", "becomes-ready", WithConfigTarget("config"), WithURL,
-					WithRouteUID("12-34"), WithRouteGeneration(1)),
-				&traffic.Config{
-					Targets: map[string]traffic.RevisionTargets{
-						traffic.DefaultTarget: {{
-							TrafficTarget: v1.TrafficTarget{
-								LatestRevision:    ptr.Bool(true),
-								ConfigurationName: "config",
-								RevisionName:      "config-00001",
-								Percent:           ptr.Int64(100),
-							},
-						}},
-					},
-				}, nil, nil),
-			simpleK8sService(
-				Route("default", "becomes-ready", WithConfigTarget("config"), WithRouteUID("12-34")),
-				WithExternalName("becomes-ready.default.example.com"),
-			),
-		},
-		WantUpdates: []clientgotesting.UpdateActionImpl{{
-			Object: certificateWithStatus(resources.MakeCertificates(Route("default", "becomes-ready", WithConfigTarget("config"), WithURL, WithRouteUID("12-34")),
-				map[string]string{"becomes-ready.default.example.com": ""}, netcfg.CertManagerCertificateClassName, "example.com")[0], failedCertStatus()),
-		}},
 		WantStatusUpdates: []clientgotesting.UpdateActionImpl{{
 			Object: Route("default", "becomes-ready", WithConfigTarget("config"),
 				WithRouteUID("12-34"), WithRouteGeneration(1), WithRouteObservedGeneration,
 				// Populated by reconciliation when all traffic has been assigned.
-				WithAddress, WithRouteConditionsHTTPDowngrade,
-				MarkTrafficAssigned, MarkIngressNotConfigured, WithStatusTraffic(
+				WithAddress,
+				WithRouteConditionsHTTPDowngrade,
+				MarkTrafficAssigned,
+				WithStatusTraffic(
 					v1.TrafficTarget{
 						RevisionName:   "config-00001",
 						Percent:        ptr.Int64(100),
 						LatestRevision: ptr.Bool(true),
-					}), MarkIngressNotConfigured,
+					}),
 				// The certificate is not ready. So we want to have HTTP URL.
-				WithURL),
-		}},
-		WantEvents: []string{
-			Eventf(corev1.EventTypeNormal, "Created", "Created placeholder service %q", "becomes-ready"),
-			Eventf(corev1.EventTypeNormal, "Updated", "Updated Spec for Certificate %s/%s", "default", "route-12-34"),
-			Eventf(corev1.EventTypeNormal, "Deleted", "Deleted orphaned Knative Certificate %s/%s", "default", "route-12-34"),
-			Eventf(corev1.EventTypeNormal, "Created", "Created Ingress %q", "becomes-ready"),
-		},
-		WantDeletes: []clientgotesting.DeleteActionImpl{{
-			// This certificate's DNS name is not the host name needed by the input Route.
-			ActionImpl: clientgotesting.ActionImpl{
-				Namespace: "default",
-				Verb:      "delete",
-				Resource:  netv1alpha1.SchemeGroupVersion.WithResource("certificates"),
-			},
-			Name: "route-12-34",
+				WithInitRouteConditions,
+				MarkTrafficAssigned,
+				WithRouteObservedGeneration,
+				WithRouteConditionsHTTPDowngrade,
+				WithAddress,
+				MarkIngressReady,
+				WithURL,
+			),
 		}},
 		Key: "default/becomes-ready",
 	}, {

Here we have no updates to resources and nothing being created - but we see the route's status having the HTTPDowngrade.

Unsure if you wanted to disable the HTTPDowngrade option.

Also I wouldn't expect the certificate to be deleted - I think that's because the fixture wasn't accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ckcd just following up here

Copy link
Contributor Author

@ckcd ckcd Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprotaso Thanks for your analysis! And sorry for my late reply.

I made some changes for the test case according to your suggestions. And also add a MarkCertificateProvisionFailed for testing. Please check it. Thanks.

Signed-off-by: ckcd <curtis@mail.ustc.edu.cn>
@ckcd ckcd force-pushed the ck2402_bubble_up_kcert_failed branch from 250ab9c to 8d86ca3 Compare April 15, 2024 15:50
@dprotaso
Copy link
Member

/lgtm
/approve

thanks @ckcd

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2024
Copy link

knative-prow bot commented Apr 23, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ckcd, dprotaso, ReToCode

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2024
@dprotaso
Copy link
Member

/hold cancel

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2024
@dprotaso
Copy link
Member

/override "style / suggester / shell"
/override "style / suggester / yaml"
/override "style / suggester / github_actions"

Copy link

knative-prow bot commented Apr 23, 2024

@dprotaso: Overrode contexts on behalf of dprotaso: style / suggester / github_actions, style / suggester / shell, style / suggester / yaml

In response to this:

/override "style / suggester / shell"
/override "style / suggester / yaml"
/override "style / suggester / github_actions"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow knative-prow bot merged commit f65df07 into knative:main Apr 23, 2024
64 checks passed
@ckcd
Copy link
Contributor Author

ckcd commented Apr 23, 2024

@dprotaso Thank you for your patience and help!!

@dprotaso
Copy link
Member

likewise 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/networking lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propagate the status when a Knative Certificate fails to be created
5 participants