HTTP-01 IPv6 to IPv4 fallback not working properly #2770

cpu · 2017-05-17T16:44:53Z

A user in IRC noticed that they were suffering HTTP-01 validation failures for a domain that previously worked. Investigating it appears the domain had an AAAA record and an A record but the AAAA address wasn't working. I expected the IPv6 to IPv4 fallback code would have masked this issue but looking at the validation records it did not, there is no addressTried, and the addressUsed is the v6 address:

    "validationRecord": [
      {
        "url": "http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX",
        "hostname": "XXXXX",
        "port": "80",
        "addressesResolved": [
          "92.XXX.XXX.XXX",
          "2001:XXXX:XXXX:XXXX::111"
        ],
        "addressUsed": "2001:XXXX:XXXX:XXXX::111",
        "addressesTried": null
      }
    ]

The VA logged:

HTTP request to http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX failed. err=[&url.Error{Op:"Get", URL:"http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX", Err:(*http.httpError)(0xc420c89260)}] errStr=[Get http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)]

The root cause is the VA's HTTP-01 dialer wrapper is re-using the same underlying net.Dialer with an expended timeout between the initial and subsequent fallback connection.

The text was updated successfully, but these errors were encountered:

cpu · 2017-05-22T14:56:19Z

One false-positive for this issue I've seen so far is a host with an A and AAAA record failing an HTTP-01 challenge because the webserver on the AAAA IP returned a 404 while the A webserver had the correct webroot configured. This doesn't meet the conditions for the retry because the failure is at the HTTP challenge validation level and not the IP connectivity level.

jsha · 2017-05-22T15:07:11Z

That seems like a situation where it's consistent with our other behavior to treat the validation as failed due to a misconfigured server.

cpu · 2017-05-22T15:19:59Z

@jsha I agree, that's why I called it a false positive.

jangrewe · 2017-05-22T15:30:30Z

This also happened to me, although with dehydrated.
The issue for me was that the IPv6 address in my AAAA wasn't properly routed by my ISP, but the IPv4 still worked fine.

Processing pokemap.berlin with alternative names: www.pokemap.berlin dev.pokemap.berlin
 + Checking domain name(s) of existing cert... changed!
 + Domain name(s) are not matching!
 + Names in old certificate: dev.pokemap.berlin pmg.faked.org pokemap.berlin www.pokemap.berlin
 + Configured names: dev.pokemap.berlin pokemap.berlin www.pokemap.berlin
 + Forcing renew.
 + Checking expire date of existing cert...
 + Valid till Jun 14 22:01:00 2017 GMT Certificate will expire
(Less than 30 days). Renewing!
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting challenge for pokemap.berlin...
 + Requesting challenge for www.pokemap.berlin...
 + Requesting challenge for dev.pokemap.berlin...
 + Responding to challenge for pokemap.berlin...
 + Responding to challenge for www.pokemap.berlin...
 + Responding to challenge for dev.pokemap.berlin...
ERROR: Challenge is invalid! (returned: invalid) (result: {
  "type": "http-01",
  "status": "invalid",
  "error": {
    "type": "urn:acme:error:connection",
    "detail": "Could not connect to dev.pokemap.berlin",
    "status": 400
  },
  "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/aL5EsF2NuL8zBQHzgBw8qFxKFmnrlt91RdfNxm30lAk/1210320188",
  "token": "MQyfKns3ATGGPMS-pKSieBlrWpjE86FIj2pnuYcAFfo",
  "keyAuthorization": "MQyfKns3ATGGPMS-pKSieBlrWpjE86FIj2pnuYcAFfo.ymn7rrjFsLBQUTzWYgdoacDjsIe-B36saKrAYkAh2Tk",
  "validationRecord": [
    {
      "url": "http://dev.pokemap.berlin/.well-known/acme-challenge/MQyfKns3ATGGPMS-pKSieBlrWpjE86FIj2pnuYcAFfo",
      "hostname": "dev.pokemap.berlin",
      "port": "80",
      "addressesResolved": [
        "87.128.111.190",
        "2003:a:37f:ef4f::"
      ],
      "addressUsed": "2003:a:37f:ef4f::",
      "addressesTried": []
    }
  ]
})

sahsanu · 2017-05-22T15:44:00Z

Regarding this ipv6 preference https://community.letsencrypt.org/t/certbot-ipv6-address-on-domain-misconfigured-and-challenges-fail-prefer-ipv6/34626

In this case the user have a domain with both records, A and AAAA, but the web server is only configured for ipv4, the ipv6 reachs the web server but not the right virtualhost, in this case, obviously, the challenge fails. I don't know whether it is worth to fallback to ipv4 in this case.

cpu · 2017-05-22T15:45:43Z

@sahsanu - Thanks for commenting. That therad is the same one I mentioned earlier in this thread as a false positive (I should have linked to it, apologies). In this case I don't expect a fallback and everything appears to be working as intended.

jsha · 2017-05-22T16:17:27Z

Got it! I didn't understand that part. :-)

sahsanu · 2017-05-23T13:06:16Z

Just another "false positive" https://community.letsencrypt.org/t/404-on-well-known-acme-challenge-but-accessable-from-browser/34730

I can understand the decision to prefer AAAA if both records are available for a domain, but sadly we are still living in an ipv4 world. The use case for ipv6 is very limited as majority of domestic ISPs doesn't provide an ipv6 to their customers. Also, there are a lot of people getting a dedicated, vps and shared hosting and theirs hosters auto conf the DNS providing both ips (ipv4 and ipv6) but people doesn't care about ipv6 (yet) and don't configure their services tu use it properly so I'm afraid we will see a lot of cases with this "false positive" issue ;).

cpu · 2017-05-23T13:49:02Z

so I'm afraid we will see a lot of cases with this "false positive" issue ;).

I will be posting an announcement in the community forum about the IPv6 preference today. Hopefully that will help clear up the confusion.

Ultimately if you run a website that publishes an AAAA address that doesn't work you're going to run into problems sooner or later!

jsha · 2017-05-30T17:53:47Z

From the entry in https://community.letsencrypt.org/t/unable-to-update-challenge-the-challenge-is-not-pending/35118/3, looking in the logs, it appears that the problem was a timeout. So one possibility is that the fallback doesn't happen correctly on timeouts, perhaps because the first try uses up all of the available time?

cpu · 2017-06-12T13:35:59Z

@jsha That's indeed a possibility. We didn't increase the timeout to accommodate making two back-to-back requests.

mhofman · 2017-06-13T07:57:49Z

Same problem here. The IPv6 address times out since the HTTP server isn't listening on that interface, and the IPv4 is never checked ("addressesTried": []).

Why not run both in parallel and let the faster one win? If you want to give IPv6 a preference, you can start that check a second before the IPv4.

mirion · 2017-06-15T09:25:12Z

Any news on this issue? In my case, the IPv6 address is unreachable (out of my direct control). For example curl -v6 returns "Immediate connect fail", "Network is unreachable".

Thanks

cpu · 2017-06-15T12:54:27Z

No news - if the issue isn't assigned & placed into a milestone for a sprint then it's safe to assume it isn't being actively worked on yet.

ArchangeGabriel · 2017-07-01T21:51:33Z

I’ve just stumbled upon this. My server apparently drops its IPv6 connectivity from time to time (no idea why at this point), and then the challenge verification fails by timing out. I would have expected a fallback to IPv4, but apparently no.

cpu · 2017-07-07T21:19:50Z

I've updated this issue description to reflect my understanding after some debugging & working on a fix. The fallback problem is isolated to HTTP-01 and I have opened #2852 with a proposed fix.

The implementation of the dialer used by the HTTP01 challenge, constructed with `resolveAndConstructDialer`, used the same wrapped `net.Dialer` for both the initial IPv6 connection, and any subsequent IPv4 fallback connections. This caused the IPv4 fallback to never succeed for cases where the initial IPv6 connection expended the `validationTimeout`. This commit updates the http01Dialer (newly renamed from `dialer` since it is in fact specific to HTTP01 challenges) to use a fresh dialer for each connection. To facilitate testing the http01Dialer maintains a count of how many dialer instances it has constructed. We use this in a unit test to ensure the correct behaviour without a great deal of new mocking/interfaces. Resolves #2770

derekatkins · 2017-10-30T15:45:23Z

Hi,
Does this require a change on the client or is this a server change? I just started hitting this issue myself. My IPv6 service is "broken" so, even though I have both A and AAAA records, only IPv4 will successfully reach my server. I'm getting this timeout when I try to renew.
I just upgraded certbot to certbot-0.19.0-1.fc25.noarch but it didn't seem to fix the problem.
If it requires a change to the service, has this change been pushed to the LetsEncrypt service?

cpu · 2017-10-30T15:54:36Z

Hi @derekatkins - the fallback behaviour is a server-side change, and has been deployed to production already.

The catch is that it's not a complete solution for 100% of all broken IPv6 configurations. In practice there are a handful of cases where IPv6 will not validate for ACME and is broken, but in which the actual IPv6 connectivity works enough to prevent a fallback from occurring. At this point we've decided that we can't invest any more resources in improving the fallback and are not pursuing additional improvements to the server-side code.

My IPv6 service is "broken" so, even though I have both A and AAAA records, only IPv4 will successfully reach my server. I'm getting this timeout when I try to renew.

@derekatkins I recommend that you resolve the IPv6 connectivity or remove the AAAA record entirely. Unfortunately these are the only two options that will be able to fix your problem.

If you need further help diagnosing the problem I recommend starting a new forum topic in the Let's Encrypt Community Forum. Thanks!

derekatkins · 2017-10-30T16:05:51Z

Interesting. I would think that a lack-of-connect would trigger the fallback after the connect() times out -- which is my case.

I'll work on getting the AAAA records removed (I don't control the DNS) until I get IPv6 working again.

darkain · 2018-03-28T15:54:40Z

I just ran into this issue as well. The error messages were not descriptive enough in the main client to even clue me in to why I was receiving timeouts, but only on one of my domains (I have several with shared IP addresses). After an entire day of investigating, it became apparent that it was because that was the only domain with dual-stack listed in DNS, and there was some routing issues upstream with IPv6 between Lets Encrypt and my servers. In this particular case, because no TCP connection could even be made, and just "timeout" instead, shouldn't that quality as a downgrade to IPv4 condition? This is EXACTLY how web browsers handle this exact situation. Sadly, even today in 2018, there are still routing issues with the IPv6 global network at the backbone/BGP level, and because of this, it literally took my production web site offline due to the fact I could not renew certs through LetsEncrypt, and simply got the rate limit (only 5?) when it really seems like a IPv4 fallback should have been preferential.

alfonsonishikawa · 2018-05-14T15:55:55Z

Any updates on this? I rely on IPv4 connection too. I don't have AAAA record defined, only A, and still does not work.

jsha · 2018-05-14T17:22:33Z

I don't have AAAA record defined, only A, and still does not work.

It sounds like you have a different problem. I recommend posting on https://community.letsencrypt.org/. Thanks!

navara · 2018-07-21T07:15:25Z

Another thumbs up for this problem.

We do have two domains with IPv6 on port 443 enabled and those update crtificates correctly. Remaining domains are for our use however, not published to clients, so without IPv4 (no need for it, only universities use IPv6 there).
When server connects it reaches one of those public domains and fails the check without reverting to IPv4 version of site, where the files is created and accessible - I see letsencrypt record in one of their logs.

I can symlink all challenge dirs into one, but option -ipv4only for certbot would be cooler...

jsha · 2018-12-13T23:26:42Z

Hi @navara! It sounds like you've got a configuration problem. I recommend posting on https://community.letsencrypt.org/.

I'm going to lock this conversation for now - I think most followups are best sent to the forum. Thanks all!

cpu added area/va kind/bug labels May 17, 2017

cpu mentioned this issue May 22, 2017

The ACME server was probably unable to reach - but can from browser and elsewhere win-acme/win-acme#443

Closed

jsha modified the milestone: Sprint 2017-06-27 Jun 27, 2017

jsha assigned cpu Jun 27, 2017

jsha modified the milestone: Sprint 2017-06-27 Jun 27, 2017

cpu added this to the Sprint 2017-06-27 milestone Jul 7, 2017

cpu mentioned this issue Jul 7, 2017

Fix HTTP-01 IPv6 to IPv4 fallback with fresh dialer per conn. #2852

Merged

cpu changed the title ~~IPv6 to IPv4 fallback not working properly~~ HTTP-01 IPv6 to IPv4 fallback not working properly Jul 7, 2017

cpu mentioned this issue Jul 10, 2017

Renewal stopped working between May 9 and June 22 hlandau/acmetool#265

Closed

cpu closed this as completed in #2852 Jul 10, 2017

cpu mentioned this issue Jul 10, 2017

Fix HTTP-01 IPv6 to IPv4 fallback with fresh dialer per conn. (#2852) #2855

Closed

letsencrypt locked as resolved and limited conversation to collaborators Dec 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP-01 IPv6 to IPv4 fallback not working properly #2770

HTTP-01 IPv6 to IPv4 fallback not working properly #2770

cpu commented May 17, 2017 •

edited

Loading

cpu commented May 22, 2017

jsha commented May 22, 2017

cpu commented May 22, 2017

jangrewe commented May 22, 2017

sahsanu commented May 22, 2017

cpu commented May 22, 2017

jsha commented May 22, 2017

sahsanu commented May 23, 2017

cpu commented May 23, 2017

jsha commented May 30, 2017

cpu commented Jun 12, 2017

mhofman commented Jun 13, 2017

mirion commented Jun 15, 2017

cpu commented Jun 15, 2017

ArchangeGabriel commented Jul 1, 2017

cpu commented Jul 7, 2017

derekatkins commented Oct 30, 2017

cpu commented Oct 30, 2017

derekatkins commented Oct 30, 2017

darkain commented Mar 28, 2018

alfonsonishikawa commented May 14, 2018

jsha commented May 14, 2018

navara commented Jul 21, 2018 •

edited

Loading

jsha commented Dec 13, 2018

HTTP-01 IPv6 to IPv4 fallback not working properly #2770

HTTP-01 IPv6 to IPv4 fallback not working properly #2770

Comments

cpu commented May 17, 2017 • edited Loading

cpu commented May 22, 2017

jsha commented May 22, 2017

cpu commented May 22, 2017

jangrewe commented May 22, 2017

sahsanu commented May 22, 2017

cpu commented May 22, 2017

jsha commented May 22, 2017

sahsanu commented May 23, 2017

cpu commented May 23, 2017

jsha commented May 30, 2017

cpu commented Jun 12, 2017

mhofman commented Jun 13, 2017

mirion commented Jun 15, 2017

cpu commented Jun 15, 2017

ArchangeGabriel commented Jul 1, 2017

cpu commented Jul 7, 2017

derekatkins commented Oct 30, 2017

cpu commented Oct 30, 2017

derekatkins commented Oct 30, 2017

darkain commented Mar 28, 2018

alfonsonishikawa commented May 14, 2018

jsha commented May 14, 2018

navara commented Jul 21, 2018 • edited Loading

jsha commented Dec 13, 2018

cpu commented May 17, 2017 •

edited

Loading

navara commented Jul 21, 2018 •

edited

Loading