Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

waypoint.server.grpc: failed to register hostname: authentication handshake failed: x509 #4904

Open
rastakajakwanna opened this issue Sep 15, 2023 · 2 comments
Labels

Comments

@rastakajakwanna
Copy link

Describe the bug
Url service stopped working and keep showing the following confusing error instead.

The release did not provide a URL and the URL service is disabled on the
server, so no further URL information can be automatically provided. If
this is unexpected, please ensure the Waypoint server has both the URL service
enabled and advertise addresses set.

Waypoint-server logs clearly state something completely different (URL service client successfully initialized):

2023-09-15T13:31:47.616Z [DEBUG] waypoint.server.singleprocess.url_service: API token not set in config, initializing guest account
2023-09-15T13:31:47.617Z [DEBUG] waypoint.server.singleprocess.url_service: using saved URL guest token
2023-09-15T13:31:47.852Z [DEBUG] waypoint.server.singleprocess.url_service: connection is ready
2023-09-15T13:31:47.853Z [INFO]  waypoint.server.singleprocess.url_service: URL service client successfully initialized

Here is the error logged when waypoint tries to register a new hostname from the url service:

2023-09-15T14:39:39.231Z [ERROR] waypoint.server.grpc: failed to register hostname: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-09-15T14:37:44Z is after 2023-09-01T10:30:41Z\""
2023-09-15T14:39:39.231Z [INFO]  waypoint.server.grpc: error creating default hostname: err="rpc error: code = Unavailable desc = failed to register hostname"

I tried TRACE verbosity level but it does not provide extra details besides the above information.

And yes, auto_url is true on the waypoint.hcl level.

I've also made sure the advertise-addr is defined (I was using helm values defaults previously):

      spec:
        containers:
        - args:
          - server
          - run
          - -accept-tos
          - -db=/data/data.db
          - -listen-grpc=0.0.0.0:9701
          - -listen-http=0.0.0.0:9702
          - -listen-http-insecure=0.0.0.0:9703
          - -advertise-addr=waypoint-server.waypoint.svc.cluster.local:9701
          - -advertise-tls-skip-verify=true
          - -url-enabled=true
          - -vv
          command:
          - waypoint
          env:
          - name: HOME
            value: /home/waypoint
          image: docker.io/hashicorp/waypoint:0.11.4

Waypoint self-signed certificate is obviously re-generated after every server start:

processing: https://waypoint-server.waypoint:9701
*   Trying 172.30.32.182:9701...
* Connected to waypoint-server.waypoint (172.30.32.182) port 9701
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: O=Waypoint
*  start date: Sep 15 14:06:30 2023 GMT
*  expire date: Sep 12 14:06:30 2033 GMT
*  issuer: O=Waypoint
*  SSL certificate verify result: self-signed certificate (18), continuing anyway.
* using HTTP/1.x

Steps to Reproduce
I have no idea what went wrong. Waypoint.hcl content did not change and neither did our gitlab-ci definition. From the pipeline output it seems like the problem really started around the unknown TLS certificate expiration date.

Expected behavior
Temporary hostname is created when all conditions are met.

I would suggest to improve url service logging and log the whole network communication on TRACE verbosity level. That would help to identify the failing endpoint and the real reason for it.

Waypoint Platform Versions
Additional version and platform information to help triage the issue if
applicable:

  • Waypoint CLI Version: CLI: v0.11.4 (7128fba)
  • Waypoint Server Platform and Version: kubernetes, Server: v0.11.4
  • Waypoint Plugin: kubernetes

Additional context
N/A

@rmmr
Copy link

rmmr commented Sep 28, 2023

Also having this exact same issue

@supaspoida
Copy link
Contributor

I've been seeing this as well. And here is someone reporting similar on the forums: https://discuss.hashicorp.com/t/url-service-expired-cert/58009

By looking at my waypoint server logs I was able to see that the cert issue seems related to the https://control.hzn.network domain, which is the default for -url-control-addr on a fresh waypoint server install. I'm planning to do some digging today to try to figure out whether the url service will work without that value set, but since all of this was working previously it feels like our only recourse is for someone at hashicorp to kick whatever cert rotation process is failing on that domain. Although throughout this whole time the waypoint url status reported by hashicorp has stayed green so that does leave me some hope that this is a misconfiguration w/our installs. https://status.hashicorp.com/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants