Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BadRequest error when configuring external Vault with Kubernetes authentication #3637

Closed
GlassOfWhiskey opened this issue Feb 7, 2021 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@GlassOfWhiskey
Copy link

When trying to configure Vault as an external certificate Issuer, the ClusterIssuer resource cannot reach the Ready status, giving the following message:

Failed to initialize Vault client: error reading Kubernetes service account token from issuer-token-9dzk8: error calling Vault server: Error making API request. URL: POST http://vault.cert-manager:8200/v1/auth/kubernetes/login Code: 400. Raw Message: 400 Bad Request

After debugging with tcpdump, the request that generates the 400 error is the following:

09:50:03.089664 IP 252.135.0.1.1296 > juju-df5650-145-lxd-3.8200: Flags [P.], seq 4142211567:4142212651, ack 3722483870, win 502, options [nop,nop,TS val 3742515432 ecr 3797097798], length 1084
E..p.y@.?.+......BB... ...!.........du.....
..@..S.FPOST v1/auth/kubernetes/login HTTP/1.1
Host: vault.cert-manager:8200
User-Agent: Go-http-client/1.1
Content-Length: 935
Accept-Encoding: gzip

{"jwt":"***","role":"issuer"}

where the content of the JWT has been substituted with stars.

Now, the same request done with curl from a debugging alpine pod in the cert-manager namespace, forcing all headers to be the same with the following command

curl -v --request POST -H "User-Agent: Go-http-client/1.1" -H "Accept:" -H "Content-Type:" -H "Accept-Encoding: gzip" -d $'{"jwt":"***","role":"issuer"}' http://vault.cert-manager:8200/v1/auth/kubernetes/login

produces a correct result

{
  "request_id": "7947cf73-04a6-9dc9-74eb-9d59d2df83ef",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": null,
  "auth": {
    "client_token": "***",
    "accessor": "***",
    "policies": [
      "default",
      "kubernetes-pki"
    ],
    "token_policies": [
      "default",
      "kubernetes-pki"
    ],
    "metadata": {
      "role": "issuer",
      "service_account_name": "issuer",
      "service_account_namespace": "cert-manager",
      "service_account_secret_name": "issuer-token-9dzk8",
      "service_account_uid": "683576fd-e3e1-40ff-9282-ddb63f210df8"
    },
    "lease_duration": 1200,
    "renewable": true,
    "entity_id": "d88d11fb-e711-d90c-885c-266f62b77a51",
    "token_type": "service",
    "orphan": true
  }
}

Capturing also this request with tcpdump, the output is the following

09:58:05.241387 IP 252.153.0.1.6031 > juju-df5650-145-lxd-3.8200: Flags [P.], seq 3290090115:3290091200, ack 829201698, win 502, options [nop,nop,TS val 416887426 ecr 770110039], length 1085
E..q..@.?.
......BB... .....1l.".....Y.....
..2.-..WPOST /v1/auth/kubernetes/login HTTP/1.1
Host: vault.cert-manager:8200
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
Content-Length: 935

{"jwt":"***","role":"issuer"}

The only difference I can observe is in the length, that differs by 1, while all the headers and the content are the same.

Expected behaviour:
The ClusterIssuer should reach the ready state

Steps to reproduce the bug:

  • setup an external Vault cluster
  • connect it with Kubernetes using Service and Endpoint resources
    kind: Service
    apiVersion: v1
    metadata:
      name: vault
      namespace: cert-manager
    spec:
      type: ClusterIP
      ports:
      - port: 8200
        targetPort: 8200
    ---
    kind: Endpoints
    apiVersion: v1
    metadata:
      name: vault
      namespace: cert-manager
    subsets:
    - addresses:
      - ip: <vault1.ip>
      ports:
      - port: 8200
  • create a pki secret for Kubernetes authentication, configuring the TTL to one year
    vault secrets enable \
       -path='kubernetes-pki' \
       -description='Kubernetes cert-manager PKI backend' \
       -default-lease-ttl=8760h \
       -max-lease-ttl=8760h \
       pki
  • Generate an internal root CA
    vault write kubernetes-pki/root/generate/internal \
       common_name='Vault Root Certification Authority (kubernetes-pki)' \
       ttl=87599h
    
    vault write kubernetes-pki/config/urls \
       issuing_certificates='http://vault.cert-manager:8200/v1/kubernetes-pki/ca' \
       crl_distribution_points='http://vault.cert-manager:8200/v1/kubernetes-pki/crl'
    
    vault write kubernetes-pki/roles/hpc4ai \
       allowed_domains='my.domain.com' \
       allow_subdomains=true \
       max_ttl=72h
  • Create a vault policy vault policy write kubernetes-pki - with the following content
    path "kubernetes-pki*" {
       capabilities = ["read", "list"]
    }
    
    path "kubernetes-pki/roles/hpc4ai" {
       capabilities = ["create", "update"]
    }
    
    path "kubernetes-pki/sign/hpc4ai" {
       capabilities = ["create", "update"]
    }
    
    path "kubernetes-pki/issue/hpc4ai" {
       capabilities = ["create"]
    }
    
  • Configure a serviceAccount with the system:auth-delegator role
    kind: ServiceAccount
    apiVersion: v1
    metadata:
      name: issuer
      namespace: cert-manager
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: role-vault-tokenreview-binding
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:auth-delegator
    subjects:
      - kind: ServiceAccount
        name: issuer
        namespace: cert-manager
  • Configure the vault Kubernetes authentication
    vault auth enable kubernetes
    
    vault write auth/kubernetes/config \
       token_reviewer_jwt='<vault-auth token>'
       kubernetes_host='<Kubernetes Master Host>' \
       kubernetes_ca_cert='<Kubernetes Cluster CA>'
    
    vault write auth/kubernetes/role/issuer \
       bound_service_account_names='issuer' \
       bound_service_account_namespaces='cert-manager' \
       policies='kubernetes-pki' \
       ttl=20m
  • Configure a ClusterIssuer in Kubernetes
    kind: ClusterIssuer
    apiVersion: cert-manager.io/v1
    metadata:
      name: vault-cluster-issuer
      namespace: cert-manager
    spec:
      vault:
        path: kubernetes-pki/sign/hpc4ai
        server: http://vault.cert-manager:8200
        auth:
          kubernetes:
            mountPath: v1/auth/kubernetes
            role: issuer
            secretRef:
              name: <Issuer Secret Name>
              key: token

Environment details::

  • Kubernetes version: v1.17.17 and v1.20.2 (bug happens with both versions)
  • Cloud-provider/provisioner: OpenStack Stein
  • cert-manager version: v1.1.0
  • Install method: Helm chart

/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2021
@vineetmadan
Copy link

vineetmadan commented Jul 6, 2021

Any update on this? I'm having the same issue with both Issuer and ClusterIssuer.

@jetstack-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2021
@jetstack-bot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale

@jetstack-bot jetstack-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 3, 2021
@jetstack-bot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

@jetstack-bot
Copy link
Collaborator

@jetstack-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants