Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Docker on Windows fails to resolve localhost, causing heath check to fail #9764

Closed
brendan-mccoy opened this issue Apr 8, 2021 · 9 comments
Labels
Z-Upstream-Bug This issue requires a fix in an upstream dependency.

Comments

@brendan-mccoy
Copy link

Description

I'm playing around with the latest docker image, and it keeps throwing a message that it is in an unhealthy state. I checked and it seems that this is because the configured health check is "curl -fSs http://localhost:8008/health || exit 1" and the /etc/hosts file dosen't exist and so localhost isn't able to be resolved. At the time of this posting the image has sha256:d0bca6bb6f4009297601c06a8e3c22e193fde993290e969677b3b30e70bc3286

Steps to reproduce

  • docker pull matrixdotorg/synapse:latest
  • docker run -it --rm --mount type=volume,src=synapse-data,dst=/data -e SYNAPSE_SERVER_NAME=my.matrix.host -e SYNAPSE_REPORT_STATS=yes matrixdotorg/synapse:latest generate
  • docker run -d --name synapse --mount type=volume,src=synapse-data,dst=/data -p 8008:8008 matrixdotorg/synapse:latest
  • wait for health check
  • enter container shell, enter "cat /etc/hosts"

The health check fails, "curl: (6) Could not resolve host: localhost", the file "/etc/hosts" doesn't exist causing this.

Version information

  • Version: {"server_version":"1.31.0","python_version":"3.8.9"}

  • Install method: Docker 20.10.0

  • Platform: Windows Server 2019 Standard (Core)
@anoadragon453
Copy link
Member

Thanks for the detailed reproduction steps! Interestingly, following them I didn't end up with the same result. execing into the container brought me:

root@61f2bb34918e:/# curl http://localhost:8008/health && echo
OK
root@61f2bb34918e:/# 

I also happen to have an /etc/hosts files:

root@61f2bb34918e:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.4	61f2bb34918e
root@61f2bb34918e:/#

I'm running on a Linux host though. It appears as if this is a problem with Docker on Windows specifically, and even 127.0.0.1 won't resolve properly.

One solution may be to run docker using the WSL2 backend instead: nforgeio/neonKUBE#968.

I'm also not sure whether querying http://0.0.0.0:8008/health may work instead?

@anoadragon453 anoadragon453 added the Z-Upstream-Bug This issue requires a fix in an upstream dependency. label Apr 9, 2021
@anoadragon453 anoadragon453 changed the title Docker image missing /etc/hosts, causes image's heath check to fail Docker on Windows does not support loopback addresses, causing heath check to fail Apr 9, 2021
@brendan-mccoy
Copy link
Author

Unfortunately MS hasn't graced Windows Server 2019 with WSL2 yet, and it may just not come until the next LTSC release 🙃.

I do think this is an image specific issue though, as my other linux images exhibit different behavior, which is think is because they generate a hosts file allowing them to resolve "localhost"

# cat /etc/hosts
127.0.0.1       localhost localhost.localdomain
::1             localhost localhost.localdomain

The above is what I get for those other images, only this one seems to be different.

# curl http://0.0.0.0:8008/health
OK

# curl http://127.0.0.1:8008/health
OK

# curl http://localhost:8008/health
curl: (6) Could not resolve host: localhost

# cat /etc/hosts
cat: /etc/hosts: No such file or directory

@anoadragon453
Copy link
Member

Oh interesting, so it's not an issue with loopback, but rather just DNS.

In that case we can just update the check to look at 127.0.0.1 instead 🙂

@anoadragon453
Copy link
Member

(though I do wonder whether the lack of localhost resolving correctly will cause other problems down the line...)

@anoadragon453
Copy link
Member

@brendan-mccoy Does resolving localhost work if you supply the --add-host="localhost:127.0.0.1" flag to the docker run command?

@brendan-mccoy
Copy link
Author

Nope, I can't seem to get that nor "extra_hosts" working. Probably a limitation on windows.

In any case, a "fix" (if you call making stuff work on windows a fix ;) ) would be for the container to have the a hosts file prepopulated with the localhost entry, or I guess to change the health check to not use a name.

It looks like any alpine based images I've used have the hosts file prefilled, and the debian/ubuntu based ones don't.

@brendan-mccoy
Copy link
Author

Anyway, I decided just to make a linux vm to run docker in. Maybe when WSL2 is on an LTSC release of Windows Server I'll try again. No clue why it is that the hosts file only seems to not get populated when running on LCOW, I guess that's to be expected with an experimental and no longer developed solution ;).

Not sure the root cause of why this happens, but if somebody trying to run this stuff on a windows docker host comes by this, they'll know what's going on. I leave it to the maintainers to figure out what to do with this issue.

@anoadragon453 anoadragon453 changed the title Docker on Windows does not support loopback addresses, causing heath check to fail Docker on Windows fails to resolve localhost, causing heath check to fail Apr 12, 2021
@anoadragon453
Copy link
Member

@brendan-mccoy thanks for your responses. It looks like alpine do specifically add in an /etc/hosts file to their images, while Debian and Ubuntu do not (see the linked tar files' contents). Note that our image is based on python:3.8-slim, which in turn is based on debian:buster.

I'm wary of changing the healthcheck as it's just a band-aid on the root problem. We assume that localhost will work in Synapse, so having the healthcheck do the same can be a good early warning sign when localhost can't be resolved.

I've created an issue in debian's image issue tracker to see if they can include /etc/hosts in their image: debuerreotype/docker-debian-artifacts#127. There may be a specific reason they've chosen not to, or it may have just been an oversight. The fact alpine do explicitly add one in is interesting though.

@anoadragon453
Copy link
Member

anoadragon453 commented Apr 13, 2021

Upstream (debian docker images) has stated that they indeed don't intend to add an /etc/hosts file, as it can conflict with container runtimes that do try to populate the file automatically: debuerreotype/docker-debian-artifacts#127

They also helpfully pointed towards Docker's response on the matter, which boils down to "this is still considered experimental and probably will stay that way".

With those responses from upstream, there's not much we can do I'm afraid. Workarounds involve switching to a different engine, or perhaps adding an /etc/hosts file in at runtime.

anoadragon453 added a commit that referenced this issue Apr 14, 2021
Context is in #9764 (comment).

I struggled to find a more official link for this. The problem occurs when using WSL1 instead of WSL2, which some Windows platforms (at least Server 2019) still don't have. Docker have updated their documentation to paint a much happier picture now given WSL2's support.

The last sentence here can probably be removed once WSL1 is no longer around... though that will likely not be for a very long time.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Z-Upstream-Bug This issue requires a fix in an upstream dependency.
Projects
None yet
Development

No branches or pull requests

2 participants