Skip to content
This repository has been archived by the owner on Mar 14, 2024. It is now read-only.

virutalservers no longer work when integrated w NSM #75

Closed
darkn3rd opened this issue Sep 15, 2022 · 2 comments
Closed

virutalservers no longer work when integrated w NSM #75

darkn3rd opened this issue Sep 15, 2022 · 2 comments

Comments

@darkn3rd
Copy link

Does nginx-ingress-controller have some sort of auto-detection to integrate with mTLS with NSM?

After deploying nginx-service with integrated NGINX+ ingress controller, virtualserver for services that that are not in the mesh will return 502 bad gateway. This is bad because I want to keep some solutions OUT OF THE MESH so they cannot access protected services.

ACTUAL RESULTS

Globally search/replace my registered domain for example.com.

2022/09/15 03:33:42 [error] 47#47: *378 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:42 [error] 47#47: *380 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET /apple-touch-icon-precomposed.png HTTP/2.0", upstream: "https://10.104.0.40:8000/apple-touch-icon-precomposed.png", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET /apple-touch-icon-precomposed.png HTTP/2.0" 502 157 "-" "Safari/15608.4.9.1.3 CFNetwork/1121.1.2 Darwin/19.2.0 (x86_64)" "-"
2022/09/15 03:33:42 [error] 47#47: *380 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET /apple-touch-icon.png HTTP/2.0", upstream: "https://10.104.0.40:8000/apple-touch-icon.png", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET /apple-touch-icon.png HTTP/2.0" 502 157 "-" "Safari/15608.4.9.1.3 CFNetwork/1121.1.2 Darwin/19.2.0 (x86_64)" "-"
2022/09/15 03:33:45 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:45 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:47 [error] 47#47: *378 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:47 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:47 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:47 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:51 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:51 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:52 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:52 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:55 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"

EXPECTED RESULTS

I expected that the gateway (NGINX+ IC) would route traffic to back-end services that are not meshed in addition to services that are meshed. The reason why this is important, it because ratel is only a client application, and should it ever be compromised, it should NOT be able to reach the private database cluster or any other services on the mesh.

STEPS TO REPRODUCE

I used helmfile to encapsulate and configure Helm charts.

  1. Install o11y
    URLS=(https://docs.nginx.com/nginx-service-mesh/examples/{prometheus,grafana,otel-collector,jaeger}.yaml)
    for URL in ${URLS[*]}; do curl -sOL $URL; done
    for FILE in {prometheus,grafana,otel-collector,jaeger}.yaml; do kubectl apply -f $FILE; done
  2. Install NSM
    cat << EOF > nsm.yaml
    repositories:
      # https://artifacthub.io/packages/helm/nginx/nginx-service-mesh
      - name: nginx-stable
        url: https://helm.nginx.com/stable
    
    releases:
      - name: nsm
        namespace: nginx-mesh
        chart: nginx-stable/nginx-service-mesh
        values:
          - prometheusAddress: prometheus.nsm-monitoring.svc:9090
            telemetry:
              exporters:
                otlp:
                  host: otel-collector.nsm-monitoring.svc
                  port: 4317
              samplerRatio: 1
            tracing: null
            mtls:
              mode: strict
            autoInjection:
              disable: true
    EOF
    helmfile -f nsm.yaml apply
  3. Install NGINX+ IC
    # assume nginx-plus images are in local accessible GCR
    cat << EOF > nginx_ic.yaml
    repositories:
      # https://artifacthub.io/packages/helm/nginx/nginx-ingress
      - name: nginx-stable
        url: https://helm.nginx.com/stable
    
    releases:
      # NOTE: tutorial online uses 'nginx-ingress' for namespace
      - name: nginx-ingress
        namespace: kube-addons
        chart: nginx-stable/nginx-ingress
        version: 0.14.0
        values:
          - controller:
              nginxplus: true
              image:
                repository: gcr.io/{{ requiredEnv "GCR_PROJECT_ID" }}/nginx-plus-ingress
                tag: 2.3.0
              # NGINX Configmap
              config:
                entries:
                  ssl-redirect: "True"
                  http2: "True"
              ingressClass: nginx
              # NGINX IC CRDs
              enableCustomResources: true
              enableCertManager: true
              enableExternalDNS: true
              # Prometheus must be installed
              enableLatencyMetrics: true
            nginxServiceMesh:
              enable: true
              enableEgress: true
    EOF
    helmfile -f nginx_ic.yaml apply
  4. Install External DNS and Cert-Manager
    NOTE: For real DNS + ACME DNS01 challenge to work, services must have access to r/w DNS (route53, Cloud DNS, Azure DNS, etc). The snippet below is oriented to GKE with GCR + Cloud DNS
    export DNS_PROJECT_ID="<your-cloud-dns-zone-project>"
    export DNS_SA_EMAIL="<your-gsa-with-access-to-cloud-dns-zone>"
    export DNS_DOMAIN="example.com" # replace me
    
    cat << EOF > kube_addons.yaml
    repositories:
      # https://artifacthub.io/packages/helm/cert-manager/cert-manager
      - name: jetstack
        url: https://charts.jetstack.io
      # https://artifacthub.io/packages/helm/bitnami/external-dns
      - name: bitnami
        url: https://charts.bitnami.com/bitnami
    
    releases:
      - name: external-dns
        namespace: kube-addons
        chart: bitnami/external-dns
        version: 6.8.1
        values:
          - provider: google
            google:
              zoneVisibility: public
              project: {{ env "DNS_PROJECT_ID" }}
            sources:
              - crd
              - service
              - ingress
            # use with NGINX VirtualServer CRD
            crd:
              create: false
              apiversion: externaldns.nginx.org/v1
              kind: DNSEndpoint
            serviceAccount:
              annotations:
                # google workgroup identity annotation
                iam.gke.io/gcp-service-account: {{ requiredEnv "DNS_SA_EMAIL" }}
            nodeSelector:
              # deploy on nodes that support workgroup identity
              iam.gke.io/gke-metadata-server-enabled: "true"
            logLevel: {{ env "EXTERNALDNS_LOG_LEVEL" | default "debug" }}
            domainFilters:
              - {{ requiredEnv "DNS_DOMAIN" }}
            txtOwnerId: external-dns
            rbac:
              create: true
              apiVersion: v1
            policy: upsert-only
    
      - name: cert-manager
        namespace: kube-addons
        chart: jetstack/cert-manager
        version: 1.9.1
        values:
          - installCRDs: true
            extraArgs:
              - --cluster-resource-namespace=kube-addons
            global:
              logLevel: 2
            serviceAccount:
              annotations:
                # google workgroup identity annotation
                iam.gke.io/gcp-service-account: {{ requiredEnv "DNS_SA_EMAIL" }}
            nodeSelector:
              # deploy on nodes that support workgroup identity
              iam.gke.io/gke-metadata-server-enabled: "true"
    EOF
    
    cat << EOF > issuers.yaml
    repositories:
      # https://artifacthub.io/packages/helm/itscontained/raw
      - name: itscontained
        url: https://charts.itscontained.io
    
    releases:
      - name: cert-manager-issuers
        chart: itscontained/raw
        namespace: kube-addons
        version:  0.2.5
        disableValidation: true
        values:
          - resources:
              - apiVersion: cert-manager.io/v1
                kind: ClusterIssuer
                metadata:
                  name: letsencrypt-staging
                spec:
                  acme:
                    server: https://acme-staging-v02.api.letsencrypt.org/directory
                    email: {{ requiredEnv "ACME_ISSUER_EMAIL" }}
                    privateKeySecretRef:
                      name: letsencrypt-staging
                    solvers:
                      - dns01:
                          cloudDNS:
                            project: {{ env "DNS_PROJECT_ID" }}
    
              - apiVersion: cert-manager.io/v1
                kind: ClusterIssuer
                metadata:
                  name: letsencrypt-prod
                spec:
                  acme:
                    server: https://acme-v02.api.letsencrypt.org/directory
                    email: {{ requiredEnv "ACME_ISSUER_EMAIL" }}
                    privateKeySecretRef:
                      name: letsencrypt-prod
                    solvers:
                      - dns01:
                          cloudDNS:
                            project: {{ env "DNS_PROJECT_ID" }}
    EOF
    
    helmfile -f kube_addons.yaml apply
    helmfile -f issuers.yaml apply
  5. Install Ratel outside of mesh
    cat << EOF > ratel.yaml
    repositories:
      # https://artifacthub.io/packages/helm/itscontained/raw
      - name: itscontained
        url: https://charts.itscontained.io
    
    releases:
      - name: ratel
        chart: itscontained/raw
        namespace: ratel
        version:  0.2.5
        disableValidation: true
        values:
          - resources:
              - apiVersion: apps/v1
                kind: Deployment
                metadata:
                  name: dgraph-ratel
                spec:
                  selector:
                    matchLabels:
                      app: dgraph
                      component: ratel
                  replicas: 1
                  template:
                    metadata:
                      labels:
                        app: dgraph
                        component: ratel
                    spec:
                      containers:
                        - name: dgraph-ratel
                          image: docker.io/dgraph/ratel:v21.03.2
                          imagePullPolicy:
                          command:
                            - dgraph-ratel
                          ports:
                            - name: http-ratel
                              containerPort: 8000
    
              - apiVersion: v1
                kind: Service
                metadata:
                  name: dgraph-ratel
                  labels:
                    app: dgraph
                    component: ratel
                spec:
                  type: ClusterIP
                  ports:
                    - port: 80
                      targetPort: 8000
                      name: http-ratel
                  selector:
                    app: dgraph
                    component: ratel
    EOF
    
    cat << EOF > ratel_vs.yaml
    repositories:
      # https://artifacthub.io/packages/helm/itscontained/raw
      - name: itscontained
        url: https://charts.itscontained.io
    
    releases:
      - name: ratel-virtualserver
        chart: itscontained/raw
        namespace: ratel
        version:  0.2.5
        disableValidation: true
        values:
          - resources:
              - apiVersion: k8s.nginx.org/v1
                kind: VirtualServer
                metadata:
                  name: dgraph-http
                spec:
                  host: ratel.{{ requiredEnv "DNS_DOMAIN" }}
                  tls:
                    secret: tls-secret
                    cert-manager:
                      cluster-issuer: {{ requiredEnv "ACME_ISSUER_NAME" }}
                  externalDNS:
                    enable: true
                  upstreams:
                    - name: ratel
                      service: dgraph-ratel
                      port: 80
                  routes:
                    - path: /
                      action:
                        pass: ratel
    EOF
    
    helmfile -f ratel.yaml apply
    helmfile -f ratel_vs.yaml apply
  6. Access the website, for example:
    curl https://ratel.$DNS_DOMAIN

WORKAROUNDS

For the workaround, the side-car proxy container has to be injected for any VirtualServers to work (and deducing possibly also ingress).

helmfile --file ratel.yaml template \
  | nginx-meshctl inject \
  | kubectl apply --namespace "ratel" --filename -

But now anything running in this container can now access private services on the mesh. Thus either another layer needs to be added to further restrict access from this service, or network-policies are required to wall off access.

OTHER

I noticed that nginx-meshctl would error out if VirtualServer CRD is one of the manifests. The tool shouldn't error out on CRDs created by NGINX. I would enter a bug for that one, but there's no open source for the free tool.

@sjberman
Copy link
Collaborator

sjberman commented Sep 15, 2022

Per our documentation:

"All communication between NGINX Plus Ingress Controller and the upstream Services occurs over mTLS, using the certificates and keys generated by the SPIRE server. Therefore, NGINX Plus Ingress Controller can only route traffic to Services in the mesh that have an mtls-mode of permissive or strict. In cases where you need to route traffic to both mTLS and non-mTLS Services, you may need another Ingress Controller that does not participate in the mTLS fabric."

I believe this is the issue you're dealing with.

Regarding your second issue about nginx-meshctl and CRDs, you can open an issue in this repository and discuss your details there.

@darkn3rd
Copy link
Author

darkn3rd commented Sep 16, 2022

I will close this one then. It is not ideal, but it is designed this way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants