Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio 1.3 to Istio 1.4 Upgrade causes RouteConfigurationValidationError's. #19665

Closed
dominicgunn opened this issue Dec 18, 2019 · 7 comments · Fixed by #19679
Closed

Istio 1.3 to Istio 1.4 Upgrade causes RouteConfigurationValidationError's. #19665

dominicgunn opened this issue Dec 18, 2019 · 7 comments · Fixed by #19679

Comments

@dominicgunn
Copy link

@dominicgunn dominicgunn commented Dec 18, 2019

Bug description

We currently run Istio v1.3.6 in our clusters, and things are working just fine, however when trying to upgrade to Istio v1.4.2 the Istio Pilot bombs out with the following error message, followed by ~40,000 lines worth of configuration.

2019-12-18T13:37:41.020003Z	warn	ads	ADS:RDS: ACK ERROR 172.30.112.5:44284 router~172.30.112.5~my-gateway-76964dfd44-485vr.istio-system~istio-system.svc.cluster.local-18 (my-gateway-76964dfd44-485vr.istio-system) 
    Internal:Proto constraint validation failed (RouteConfigurationValidationError.VirtualHosts[i]: ["embedded message failed validation"] | caused by VirtualHostValidationError.Routes[i]: ["embedded message failed validation"] | caused by RouteValidationError.Match: ["embedded message failed validation"] | caused by field: "path_specifier", reason: is required): name: "http.443"

I realize this is likely something on our side, but I'm not quite sure what, my best guess is it's something to do with the Envoy generated configuration shown below, particularly the 1: "" block, but I'm not sure.

    match {
      case_sensitive {
        value: true
      }
      headers {
        name: "the-service"
        exact_match: "my-service"
      }
      10 {
        2: "^/my-url.*"
        1: ""
      }
    }

The matching Istio configuration for the VS looks like this, which works just fine in v1.3.6.

  - match:
    - headers:
        sms-routing:
          exact: the-service
        sms-servicename:
          exact: my-service
      uri:
        regex: ^/my-url.*
    route:
    - destination:
        host: my-service.default.svc.cluster.local
        subset: default
  - match:
    - headers:
        sms-routing:
          exact: the-service
        sms-servicename:
          exact: my-service
    route:
    - destination:
        host: my-service.default.svc.cluster.local
        subset: default

I could be way off on my analysis, but at current, this is my best guess. Could anybody point me in the right direction>

Expected behavior

Upgrade to v1.4.2 from v1.3.6 is seamless.

Steps to reproduce the bug

We have thousands of routes, and VS's so it could really be anything, but I've posted my best guess above.

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

$ istioctl version --remote
client version: 1.4.2
citadel version: 1.3.6
galley version: 1.3.6
ingressgateway version: 1.3.6
pilot version: 1.3.6
policy version: 1.3.6
sidecar-injector version: 1.3.6
telemetry version: 1.3.6
rw-internal-ingressgateway version:
rw-internal-ingressgateway version:
data plane version: 1.3.6 (144 proxies), 1.2.2 (31 proxies), 1.4.2 (11 proxies)
$  kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.9", GitCommit:"500f5aba80d71253cc01ac6a8622b8377f4a7ef9", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:04Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?

The helm template method.

Environment where bug was observed (cloud vendor, OS, etc)

AWS, CoreOS.

@howardjohn

This comment has been minimized.

Copy link
Member

@howardjohn howardjohn commented Dec 18, 2019

I applied the same config, with 1.4.2, and it correctly is sent to Envoy:

          "match": {
           "case_sensitive": true,
           "headers": [
            {
             "name": "sms-routing",
             "exact_match": "sms"
            },
            {
             "name": "sms-servicename",
             "exact_match": "stream-reporting"
            }
           ],
           "safe_regex": {
            "google_re2": {},
            "regex": "^/report/.*"
           }
          },

You can see its the same config as in the issue above. The weird 10 field seems to just be protobuf serialization, its referring to safe_regex: https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/route/route.proto#L399

So I am not sure why envoy is complaining about missing path specifier when it is set...

@howardjohn

This comment has been minimized.

Copy link
Member

@howardjohn howardjohn commented Dec 18, 2019

@lambdai any ideas here? This is looking like an Envoy bug, but maybe I am missing something?

I also looked at pilots code, we set path specifier everywhere

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Dec 18, 2019

match {
case_sensitive {
value: true
}
headers {
name: "the-service"
exact_match: "my-service"
}
10 {
2: "^/my-url.*"
1: ""
}
}

@dominicgunn How did you obtain the above config? Specificically the 10 part.
Is it from envoy? where?

It seems to me the envoy/istio-proxy cannot recognize the new field. So the printer prints the field number instead of field name.
It also explains why the validation is failing: the validation require exactly one of the field, while pilot does provides one safe_regex but envoy's cannot serialize from it

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Dec 18, 2019

One of the solution is not to use safe_regex until you update istio-proxy.

Alternatively @howardjohn: Is it safe to update dataplane first if the upgrade is from 1.3.6 to 1.4.2?

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Dec 18, 2019

Talked to @howardjohn, istio use safe_regex to replace regex since 1.4
Alternatively,

  1. upgrade the control plane with the option, something like the use-legacy-regex
  2. upgrade dataplane
  3. Revert the flag use-legacy-regex

@howardjohn Does above sound good? What is the precise option name?

@howardjohn

This comment has been minimized.

Copy link
Member

@howardjohn howardjohn commented Dec 18, 2019

Set PILOT_ENABLE_UNSAFE_REGEX on pilot deployment, upgrade dataplane, then remove that.

We should also make it so pilot will send regex to older proxies

@howardjohn howardjohn self-assigned this Dec 18, 2019
@dominicgunn

This comment has been minimized.

Copy link
Author

@dominicgunn dominicgunn commented Dec 18, 2019

Thanks for the awesome work guys, appreciate you looking into this.

howardjohn added a commit to howardjohn/istio that referenced this issue Dec 18, 2019
Fixes istio#19665

This field was introduced in Istio versions 1.4 only, so this will break
anyone using regex with 1.3.6 proxy and 1.4 Pilot (eg during upgrade)
istio-testing added a commit that referenced this issue Dec 19, 2019
Fixes #19665

This field was introduced in Istio versions 1.4 only, so this will break
anyone using regex with 1.3.6 proxy and 1.4 Pilot (eg during upgrade)
istio-testing added a commit to istio-testing/istio that referenced this issue Dec 19, 2019
Fixes istio#19665

This field was introduced in Istio versions 1.4 only, so this will break
anyone using regex with 1.3.6 proxy and 1.4 Pilot (eg during upgrade)
istio-testing added a commit that referenced this issue Dec 19, 2019
Fixes #19665

This field was introduced in Istio versions 1.4 only, so this will break
anyone using regex with 1.3.6 proxy and 1.4 Pilot (eg during upgrade)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.