-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic config generation causing frequent inbound listener reload #18088
Comments
I increased to log level on pilot and see this for every connected pod every 5 minutes:
|
On a side note, could anyone tell me the implications of setting a very large |
@nrjpoddar i can confirm we're changing nothing whatsoever but every 5 minutes we see this. I will give |
can you provide diff of Envoy config dump, specially listeners before and after this happens? |
Yeah, if it does then it's the same underlying issue. Remember |
As pointed out in #18043, it's almost the same issue but seems to be limited to the |
Nope, |
Caught a diff! before: after: It seems to be non-deterministic ordering when we have two services pointing at the same pod (different ports) on the
The service is:
The ep's are:
|
I've made @howardjohn and @duderino aware of this, and raised a separate issue (#18089) to cover the fact listener reloads can result in 503UC (they're just exacerbated by this issue), and another issue (#18090) to question why pilot is pushing config every 5 minutes. |
It would be useful to have a stat from envoy of number of configs that were
unique, so we could see a spike in LDS drains, measure how efficient pilot
is about not sending duplicate config, etc. I will send Envoy a feature
request
…On Sat, Oct 19, 2019, 2:18 PM Karl Stoney ***@***.***> wrote:
I've made @howardjohn <https://github.com/howardjohn> and @duderino
<https://github.com/duderino> aware of this, and raised a separate issue (
#18089 <#18089>) to cover the fact
listener reloads can result in 503UC (they're just exacerbated by this
issue).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18088?email_source=notifications&email_token=AAEYGXPKBKBAC56FYEST4U3QPN2TJA5CNFSM4JCRDIE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBX43NA#issuecomment-544198068>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXKSCOMIO6ZSWFXG4RLQPN2TJANCNFSM4JCRDIEQ>
.
|
This is fixed in 1.3.4 |
Bug description
Following on from #18086 (I'm raising separate things as I continue to debug #18043) I periodically (roughly every 5 minutes) see pilot-agent seemingly disconnect from pilot with:
eg:
These correlate with big spikes XDS pushes:
Which I believe are causing listeners to reload, which then in turn causes connections to be reset, which correlates to the metrics:
And is also I believe the cause of my 503UC's on long running requests, because the drainDuration defaults to 45s
istio/install/kubernetes/helm/istio/templates/configmap.yaml
Line 211 in 362fb98
There is nothing in the discovery logs to indicate why these pushes are happening, and no config is changing.
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
These XDS pushes, if they're normal, to not reset listeners.
Steps to reproduce the bug
Version (include the output of
istioctl version --remote
andkubectl version
)1.3.3
How was Istio installed?
Helm
Environment where bug was observed (cloud vendor, OS, etc)
GKE 1.14
Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.
The text was updated successfully, but these errors were encountered: