Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent cluster/mapping configurations are used when multiple mappings pointed to the same upstream service #3112

Closed
guoyiang opened this issue Dec 11, 2020 · 6 comments

Comments

@guoyiang
Copy link

guoyiang commented Dec 11, 2020

Describe the bug
When multiple mappings are configured to use the same upstream service, all those mappings share the same envoy cluster. But each mapping can be created with different parameters. As a result, a mapping may behave different from its configuration.

For example, ambassador is proxying to an envoy which are capable to perform gRPC transcoding. The envoy reverse proxy serves both RESTful API and gRPC API. 2 mappings are created, one for rest api and the other for gRPC, with different prefix. The gRPC mapping has grpc: true configured. But the setting may get ignored when ambassador configuring envoy cluster behind the gRPC route, and gRPC request fails.

To Reproduce

  1. Create mappings as follow:
apiVersion: getambassador.io/v1
kind: Mapping
metadata:
  name: test
spec:
  prefix: /helloworld
  rewrite: ""
  service: test-service
  connect_timeout_ms: 1000
---
apiVersion: getambassador.io/v1
kind: Mapping
metadata:
  name: testgrpc
spec:
  prefix: /helloworld.Greeter/
  rewrite: ""
  service: test-service
  grpc: true
  connect_timeout_ms: 5000
  1. Go to ambassador diagnostic page and check testgrpc mapping. Mappings "test" and "testgrpc" are merged together using the same cluster with following configuration:
{
    "connect_timeout": "1.000s",
    "dns_lookup_family": "V4_ONLY",
    "lb_policy": "ROUND_ROBIN",
    "load_assignment": {
        "cluster_name": "cluster_test_service_dev",
        "endpoints": [
            {
                "lb_endpoints": [
                    {
                        "endpoint": {
                            "address": {
                                "socket_address": {
                                    "address": "test-service",
                                    "port_value": 80,
                                    "protocol": "TCP"
                                }
                            }
                        }
                    }
                ]
            }
        ]
    },
    "name": "cluster_test_service_dev",
    "type": "STRICT_DNS"
}

Actual behavior
gRPC request failed. Testing gRPC API using evans failed with message "server closed the stream without sending trailers".

connect_timeout is set to 1s which is not what's set in mapping.

This is because the cluster behind mapping testgrpc do NOT have http2_protocol_options option. Envoy won't initialize HTTP 2 connection with upstream, which caused failure in proxied gRPC requests as described here. This can be observed in envoy debug log as http1 handler is used.

Expected behavior
Each mapping's envoy cluster should be configured according to the configuration of the mapping it self.

testgrpc mapping should work as expected with 5s connect timeout. Which means http2_protocol_options should be added to envoy cluster and connect_timeout is set to 5s.

Mapping test should have connect timeout as 1s.

Versions (please complete the following information):

  • Ambassador: [1.8.1]
  • Kubernetes environment [Azure Kubernetes Service]
  • Version [e.g. 1.18.10]

Additional context
A quick look inside this code, it seems that the cluster config is cached using cluster name, and the cached cluster config is used as is without verifying if they are configured the same way. According to this code, configs from the first mapping sorted alphabetically are getting used.

@guoyiang
Copy link
Author

guoyiang commented Dec 14, 2020

A few ways to workaround this issue:

  1. Set different services between two mappings. This can be achieved by using different port. In our case, port 80 is added explicitly for gRPC mapping (service: test-service:80), but http mapping only have hostname and port is implicit (service: test-service), .

  2. Use cluster_tag to enforce a different cluster name for gRPC mapping.

@stale
Copy link

stale bot commented Feb 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label Feb 14, 2021
@dvaldivia
Copy link

@guoyiang can you share what you mean by cluster tag?

@stale stale bot removed the stale Issue is stale and will be closed label Feb 26, 2021
@guoyiang
Copy link
Author

@dvaldivia doesn't recall exactly what I did back that time. But looked a bit on the docs, cluster_tag is an attribute can be added in mapping, to customize generated cluster name. When present, it enforces a different cluster name, and then workaround this issue.

Using cluster_tag
If the cluster_tag attribute is present, its value will be prepended to cluster names generated from the Mapping. This provides a simple mechanism for customizing the cluster name when working with metrics.

@cindymullins-dw
Copy link
Contributor

Closing as there is an apparent workaround. If the issue persists on 2.x please reopen.

@juanjoku
Copy link

juanjoku commented Mar 2, 2023

But then... is it still mandatory to use "clustertag" with Emissary 3.x, as a workaround to this problem?
It is not clear to me if it has been fixed in any version.

Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants