New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
404 NR when using browser on multiple ingress gateways #9429
Comments
|
We faced the same issue and it seems to be connected with opened TCP connection from browser. Ended up using single gateway with multiple services. Scenario:
Checking TCP connections (used incognito window on chrome and connection list from
|
|
Thanks @bernardmo this is exactly the issue we are facing! I am tying your workaround and I will let you know. |
|
I can confirm that this workaround is effective but managing all services in the same configuration object makes automation really difficult. Thanks again @bernardmo for your help. |
|
Can you provide @rshriram this looks like envoy making decisions about connections rather than a request based on sni. |
|
I'm fairly certain I am now encountering this in a 1.1 test cluster as well... As soon as I added a second service I began seeing 404s from the browser. Whichever service I hit first in a browser session would work, following requests to other services would 404 consistently. |
|
@mandarjog My cluster is currently using a 1.1 daily build. About to update to a newer daily to verify it still happens. Config dump from current cluster: https://gist.github.com/blackbaud-brandonstirnaman/a21d578814ba4abe54eda480b5a95674 Edit: Can reproduce on latest daily as well. |
|
Sorry I missed this. When you said two ingress gateways, what are their gateway specs like? Do they have distinct ways to be identified using the selector labels in the gateway? |
|
Also what do you mean by adding a service to the gateway? Did you mean adding a gateway spec and a virtual service? |
|
Well it looks like this is not related to having multiple ingressgateway instances. This occurs when you have 2 virtualservices configured on 2 subdomains and each have its own gateway configured on the subdomain. Am I clear @rshriram? |
|
Can confirm, istio 1.0.3, same behavior. Going to |
|
@yciabaud Let me try to describe an example the fits the problem scenario.
Am I correct that this describes a configuration that will have the connection problem? If so, I'm wondering why the two
Would that have the same problem, or is the problem only because there are two Gateways using the same certs? |
|
Changed my |
|
Well @frankbu the original need was to have a working service independently from the certificates used in the gateways so each service exposed has its own gateway using a common ingressgateway. It may not be the way it was designed but we wanted to define each host without a wilcard in the gateways and we wanted each service to be configured independently. I will try to use different configuration but this still looks like the side effect is an issue. |
|
@yciabaud I think it also works if each gateway uses its own certs, as is shown in this example: https://preliminary.istio.io/docs/tasks/traffic-management/secure-ingress/#configure-a-tls-ingress-gateway-for-multiple-hosts. Unless, maybe that also has the same browser issue? I'm trying to understand exactly what does and doesn't work so we can figure out if it's a bug that can be fixed, or an intrinsic limitation. |
|
OK I will try using separate certs for each host and then I will try to duplicate my wilcard cert in the gateway. |
|
@frankbu I can confirm that using different certificates on different gateways is working as expected. |
|
@yciabaud can you give me the envoy configuration from the gateway when you have the wildcard cert? curl localhost:15000/config_dump should be sufficient. I am specifically looking for the configuration of the listeners on the 443 port that is hosting the two services with single wildcard cert. |
|
how long does this issue last after adding the new gateway with same wildcard cert? |
|
I don't have access to my cluster ATM but the issue occurs right at the moment we add the second gateway. |
|
Okay, now I know the issue and only part of it is solvable with Istio. In the background, the gateway has been updated with new configs (for service2.test.com). This causes envoy to spin up new listener threads (crudely speaking) while the old one is still serving the browser with service1.test.com. The new one has both but no one is talking to it yet. When you clear the browser cache, you are effectively releasing the old one, allowing Envoy to shutdown the older listener and always use the new one. These are H2 semantics unfortunately: https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/ .. Now, we could partially mitigate this problem by not doing a listener update in Envoy (just a route update should work for your use case). But doing so involves making the following compromise:
will be treated effectively as one gateway This is the best we can do code wise. That said, I would suggest the following alternative. Instead of creating redundant gateways, create a gateway once for every unique certificate and use the certificate's domain (in your case *.test.com). For example And use individual virtual services for each host, i.e. one for service1.test.com, and another for service2.test.com, both referring to the same gateway. This way, you have to create the gateway only once. You can keep adding virtual services for the hosts you expose out of the *.test.com domain. And you can refer to this gateway in your individual virtual services. |
|
Thank you @rshriram you're right and I get it know. The first solution mitigates the problem but I will follow your advice and use a wildcard gateway even if this is less easy to automate for me. I guess this issue can be closed now since your investigation explained the root cause and that there is no perfect solution. Thinking about it, maybe istio should not validate creating 2 gateways with the same certificate since it may lead to this issue. |
|
Added documentation for this: istio/istio.io#2970 |
|
Closing since the problem is identified and there is no solution to provide. |
|
Not to be a pest, but is there a way in which one can have a wildcard (*.example.com) on one gateway and then a regular certificate (foo.example.com) on another gateway where Chromium/Firefox will not route accidently down the wildcard connection if they both are on the same ingress-gateway? In my experience we encountered this problem and the proposed work around in this ticket didn't help (because, obviously, we can't use two different certificates on one gateway). I also have opened a bug on chromium to see if there's some consensus on how to properly solve this issue (or at least provoke some dialog). https://bugs.chromium.org/p/chromium/issues/detail?id=954160 |
|
This issue should be reopened with an external ref envoyproxy/envoy#6767 @istio/wg-networking-maintainers |
|
Here's the CVE for this vulnerability https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11767 |
|
/reopen |
|
This is tracked in #13589 |
Describe the bug
When using browsers (tested on multiple browsers, multiple OS, multiple devices and multiple connections), we are having many 404 NR responses when contacting our services on an additional ingress gateway.
The service is loading well every time I'm cleaning the browsers cache and when using command line clients.
Routing is done with hostname in separate gateway and virtual services configurations. All services are exposed on port 443 using different port names but the same TLS certificate per gateway.
Ingress log is printing the good hostname but envoy is not finding any routes.
Expected behavior
We expect to have the requests routed to the service or a way to find if there is something missing in the configuration.
Steps to reproduce the bug
Version
Kubernetes v1.10.5 on AWS (same issue on v1.9.9)
Calico
Istio: 1.0.2
Installation
Istio installed using helm chart with a first ingress gateway.
Second gateway installed using helm in a namespace.
Services installed using helm charts.
Environment
Kubernetes deployed on AWS using Kops with coreos images.
Cluster state
istio-dump.tar.gz
I could not dump pods and deployments due to private information in environment variables
The text was updated successfully, but these errors were encountered: