Description
openedon Sep 15, 2022
Describe the bug
After deploying nginx-service with integrated NGINX+ ingress controller, VirtualServer configured for services that that are not in the mesh will return 502 bad gateway. This is bad because I want to keep some solutions OUT OF THE MESH so they cannot access protected services.
The NSM is configured to have mTLS set to strict mode to drop traffic from outside of the service mesh as the cluster has both services that are part of the mesh and services that are not part of the mesh.
To Reproduce
Steps to reproduce the behavior:
I used helmfile to encapsulate and configure Helm charts.
- Install o11y
URLS=(https://docs.nginx.com/nginx-service-mesh/examples/{prometheus,grafana,otel-collector,jaeger}.yaml) for URL in ${URLS[*]}; do curl -sOL $URL; done for FILE in {prometheus,grafana,otel-collector,jaeger}.yaml; do kubectl apply -f $FILE; done - Install NSM
cat << EOF > nsm.yaml repositories: # https://artifacthub.io/packages/helm/nginx/nginx-service-mesh - name: nginx-stable url: https://helm.nginx.com/stable releases: - name: nsm namespace: nginx-mesh chart: nginx-stable/nginx-service-mesh values: - prometheusAddress: prometheus.nsm-monitoring.svc:9090 telemetry: exporters: otlp: host: otel-collector.nsm-monitoring.svc port: 4317 samplerRatio: 1 tracing: null mtls: mode: strict autoInjection: disable: true EOF helmfile -f nsm.yaml apply
- Install NGINX+ IC
# assume nginx-plus images are in local accessible GCR cat << EOF > nginx_ic.yaml repositories: # https://artifacthub.io/packages/helm/nginx/nginx-ingress - name: nginx-stable url: https://helm.nginx.com/stable releases: # NOTE: tutorial online uses 'nginx-ingress' for namespace - name: nginx-ingress namespace: kube-addons chart: nginx-stable/nginx-ingress version: 0.14.0 values: - controller: nginxplus: true image: repository: gcr.io/{{ requiredEnv "GCR_PROJECT_ID" }}/nginx-plus-ingress tag: 2.3.0 # NGINX Configmap config: entries: ssl-redirect: "True" http2: "True" ingressClass: nginx # NGINX IC CRDs enableCustomResources: true enableCertManager: true enableExternalDNS: true # Prometheus must be installed enableLatencyMetrics: true nginxServiceMesh: enable: true enableEgress: true EOF helmfile -f nginx_ic.yaml apply
- Install External DNS and Cert-Manager
NOTE: For real DNS + ACME DNS01 challenge to work, services must have access to r/w DNS (route53, Cloud DNS, Azure DNS, etc). The snippet below is oriented to GKE with GCR + Cloud DNSexport DNS_PROJECT_ID="<your-cloud-dns-zone-project>" export DNS_SA_EMAIL="<your-gsa-with-access-to-cloud-dns-zone>" export DNS_DOMAIN="example.com" # replace me cat << EOF > kube_addons.yaml repositories: # https://artifacthub.io/packages/helm/cert-manager/cert-manager - name: jetstack url: https://charts.jetstack.io # https://artifacthub.io/packages/helm/bitnami/external-dns - name: bitnami url: https://charts.bitnami.com/bitnami releases: - name: external-dns namespace: kube-addons chart: bitnami/external-dns version: 6.8.1 values: - provider: google google: zoneVisibility: public project: {{ env "DNS_PROJECT_ID" }} sources: - crd - service - ingress # use with NGINX VirtualServer CRD crd: create: false apiversion: externaldns.nginx.org/v1 kind: DNSEndpoint serviceAccount: annotations: # google workgroup identity annotation iam.gke.io/gcp-service-account: {{ requiredEnv "DNS_SA_EMAIL" }} nodeSelector: # deploy on nodes that support workgroup identity iam.gke.io/gke-metadata-server-enabled: "true" logLevel: {{ env "EXTERNALDNS_LOG_LEVEL" | default "debug" }} domainFilters: - {{ requiredEnv "DNS_DOMAIN" }} txtOwnerId: external-dns rbac: create: true apiVersion: v1 policy: upsert-only - name: cert-manager namespace: kube-addons chart: jetstack/cert-manager version: 1.9.1 values: - installCRDs: true extraArgs: - --cluster-resource-namespace=kube-addons global: logLevel: 2 serviceAccount: annotations: # google workgroup identity annotation iam.gke.io/gcp-service-account: {{ requiredEnv "DNS_SA_EMAIL" }} nodeSelector: # deploy on nodes that support workgroup identity iam.gke.io/gke-metadata-server-enabled: "true" EOF cat << EOF > issuers.yaml repositories: # https://artifacthub.io/packages/helm/itscontained/raw - name: itscontained url: https://charts.itscontained.io releases: - name: cert-manager-issuers chart: itscontained/raw namespace: kube-addons version: 0.2.5 disableValidation: true values: - resources: - apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-staging spec: acme: server: https://acme-staging-v02.api.letsencrypt.org/directory email: {{ requiredEnv "ACME_ISSUER_EMAIL" }} privateKeySecretRef: name: letsencrypt-staging solvers: - dns01: cloudDNS: project: {{ env "DNS_PROJECT_ID" }} - apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: {{ requiredEnv "ACME_ISSUER_EMAIL" }} privateKeySecretRef: name: letsencrypt-prod solvers: - dns01: cloudDNS: project: {{ env "DNS_PROJECT_ID" }} EOF helmfile -f kube_addons.yaml apply helmfile -f issuers.yaml apply
- Install Ratel outside of mesh
cat << EOF > ratel.yaml repositories: # https://artifacthub.io/packages/helm/itscontained/raw - name: itscontained url: https://charts.itscontained.io releases: - name: ratel chart: itscontained/raw namespace: ratel version: 0.2.5 disableValidation: true values: - resources: - apiVersion: apps/v1 kind: Deployment metadata: name: dgraph-ratel spec: selector: matchLabels: app: dgraph component: ratel replicas: 1 template: metadata: labels: app: dgraph component: ratel spec: containers: - name: dgraph-ratel image: docker.io/dgraph/ratel:v21.03.2 imagePullPolicy: command: - dgraph-ratel ports: - name: http-ratel containerPort: 8000 - apiVersion: v1 kind: Service metadata: name: dgraph-ratel labels: app: dgraph component: ratel spec: type: ClusterIP ports: - port: 80 targetPort: 8000 name: http-ratel selector: app: dgraph component: ratel EOF cat << EOF > ratel_vs.yaml repositories: # https://artifacthub.io/packages/helm/itscontained/raw - name: itscontained url: https://charts.itscontained.io releases: - name: ratel-virtualserver chart: itscontained/raw namespace: ratel version: 0.2.5 disableValidation: true values: - resources: - apiVersion: k8s.nginx.org/v1 kind: VirtualServer metadata: name: dgraph-http spec: host: ratel.{{ requiredEnv "DNS_DOMAIN" }} tls: secret: tls-secret cert-manager: cluster-issuer: {{ requiredEnv "ACME_ISSUER_NAME" }} externalDNS: enable: true upstreams: - name: ratel service: dgraph-ratel port: 80 routes: - path: / action: pass: ratel EOF helmfile -f ratel.yaml apply helmfile -f ratel_vs.yaml apply
- Access the website, for example:
curl https://ratel.$DNS_DOMAIN
Expected behavior
I expected that the gateway (NGINX+ IC) would route traffic to back-end services that are not meshed in addition to services that are meshed. The reason why this is important, it because ratel is only a client application, and should it ever be compromised, it should NOT be able to reach the private database cluster or any other services on the mesh.
Actual behavior
I globally search/replace my registered domain for example.com.
2022/09/15 03:33:42 [error] 47#47: *378 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:42 [error] 47#47: *380 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET /apple-touch-icon-precomposed.png HTTP/2.0", upstream: "https://10.104.0.40:8000/apple-touch-icon-precomposed.png", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET /apple-touch-icon-precomposed.png HTTP/2.0" 502 157 "-" "Safari/15608.4.9.1.3 CFNetwork/1121.1.2 Darwin/19.2.0 (x86_64)" "-"
2022/09/15 03:33:42 [error] 47#47: *380 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET /apple-touch-icon.png HTTP/2.0", upstream: "https://10.104.0.40:8000/apple-touch-icon.png", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET /apple-touch-icon.png HTTP/2.0" 502 157 "-" "Safari/15608.4.9.1.3 CFNetwork/1121.1.2 Darwin/19.2.0 (x86_64)" "-"
2022/09/15 03:33:45 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:45 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:47 [error] 47#47: *378 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:47 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:47 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:47 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:51 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:51 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:52 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:52 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:55 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
Your environment
- Version of the Ingress Controller -
nginx/1.21.6 (nginx-plus-r27) - Version of Kubernetes:
1.22.11 - Kubernetes platform GKE
- Using NGINX Plus
Additional context
I can provide scripts to provision Cloud DNS, GKE, GCR, and configure access with Google Service Accounts and Workload Identity using gcloud and gsutil if needed.
I also deployed a backend distributed graph database Dgraph, but since that was suppose to be in the mesh and works fine, I didn't include it here. The Ratel is a client only to bootstrap the client, so it shouldn't have access to the strict service mesh.
Activity