Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry checker for DNS failure from Ingress #2351

Merged
merged 1 commit into from
Nov 17, 2021

Conversation

nak3
Copy link
Contributor

@nak3 nak3 commented Nov 17, 2021

This patch fixes the recent flake in service_to_service_test.go on Kind/Istio.
istio-ingressgateway returns 502 with no such host error when it fails to resolve the cluster domain.

/cc @markusthoemmes

@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 17, 2021
@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/test-and-release approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 17, 2021
@nak3
Copy link
Contributor Author

nak3 commented Nov 17, 2021

knative/serving#12294 runs this patch and seems error could be avoided.

@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #2351 (6bd2b4b) into main (0b0c339) will decrease coverage by 0.84%.
The diff coverage is 52.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2351      +/-   ##
==========================================
- Coverage   64.52%   63.67%   -0.85%     
==========================================
  Files         226      227       +1     
  Lines        9668     9814     +146     
==========================================
+ Hits         6238     6249      +11     
- Misses       3154     3288     +134     
- Partials      276      277       +1     
Impacted Files Coverage Δ
injection/sharedmain/main.go 5.63% <50.00%> (ø)
test/spoof/spoof.go 54.54% <50.00%> (+1.15%) ⬆️
test/spoof/error_checks.go 85.71% <100.00%> (+1.09%) ⬆️
controller/controller.go 86.13% <0.00%> (-0.37%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d82be48...6bd2b4b. Read the comment docs.

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 17, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markusthoemmes, nak3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dprotaso
Copy link
Member

This patch fixes the recent flake in service_to_service_test.go on Kind/Istio.

I guess to clarify this isn't a flake as it consistently failed in Kind running on GitHub actions. Thus I wonder if the retry is masking some other issue.

The other concerns I have, and let me know if this is just paranoia, is that by including this error checker by default we could mask issues in other ingresses and wouldn't know about it

@dprotaso
Copy link
Member

To be more specific this only failed on GitHub actions with kind 1.22 & net-istio

@nak3
Copy link
Contributor Author

nak3 commented Nov 18, 2021

I was also concerned but it retries only for the no DNS record in the internal DNS temporary.
I think it is difficult for ingress/coredns to guarantee that *.svc.cluster.local record is registered as soon as k8s svc is created so we need to retry it 🤔

@dprotaso
Copy link
Member

Then maybe as part of probing we should ensure that DNS works as expected.

K8s 1.22 introduces a newer CoreDNS version - ie. with endpoint slice support thus maybe istio isn't playing well with it

@markusthoemmes
Copy link
Contributor

Sounds fair enough! Shall we track this and make sure we do our due diligence here?

@nak3
Copy link
Contributor Author

nak3 commented Nov 22, 2021

Sure, opened knative/networking#575

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants