Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand docker version evaluation. #4834

Closed

Conversation

donaldguy
Copy link

@donaldguy donaldguy commented Jan 24, 2024

EDIT: apparently contrary to my impression of their plans, they cut a docker 25.0.1 this morning, so this is kinda for no one now. The broader idea could still be helpful in future, but lmk if just want to close

Proposed change

  1. Expand the EvaluateDockerVersion to include both a minimum version check and a list of known bad versions for poor supervised-installer users. (I would also accept, even perhaps prefer e.g. a constraint in https://github.com/home-assistant/supervised-installer/blob/main/homeassistant-supervised/DEBIAN/control but this is already here)

  2. expand test suite accordingly

  3. do away with the apparently otherwise unused supported_version property of DockerInfo, concentrating that logic in the evaluation itself

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

As I discussed at #4827 (comment)

there is already a fix landed upstream in moby for this ~bug but they are not interested in fast-tracking a patch release apparently.

Ditto is perhaps foolish to suspect a supervisor release would be forthcoming either, but I'd already spent ~5 hours struggling with debugging this issue myself (errantly over-focalizing on pyudev getting denied on udev_monitor_new_from_netlink... for some reason), so figured might as well spend 2 hours more closing the loop

  • Link to documentation pull request: [... there should perhaps be one for https://www.home-assistant.io/more-info/unsupported/docker_version, but I'll let reviewer have say first ]

Checklist

  • The code change is tested and works locally.

    I have verified that the additions do not hamper supervisor (and enough else for the setup wizard to showup at http://localhost:9123) startup on docker 24.0.7 in devcontainer (running under orbstack on an M1 MacBook air).

    I have not yet investigated exact behavior/ordering of failures during startup under docker-ce 25.0.0

  • Local tests pass. Your PR cannot be merged unless tests pass

  • There is no commented out code in this PR.

  • I have followed the development checklist

  • The code has been formatted using Black (black --fast supervisor tests)

  • Tests have been added to verify that the new code works.

Copy link

@home-assistant home-assistant bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donaldguy

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

@home-assistant home-assistant bot marked this pull request as draft January 24, 2024 02:19
@home-assistant
Copy link

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@donaldguy donaldguy marked this pull request as ready for review January 24, 2024 02:20
@donaldguy
Copy link
Author

donaldguy commented Jan 24, 2024

To expound a little/summarize #4827:

  1. the supervisor bootstraps (or finds from previous run) a docker bridge network hassio that passes through these CIDRs:

    DOCKER_NETWORK_MASK = ip_network("172.30.32.0/23")
    DOCKER_NETWORK_RANGE = ip_network("172.30.33.0/24")

  2. the apparent expected behavior is for addons to populate the upper range (172.30.33.0/24) while home-assistant itself (supervisor/observer/multicast/etc) grab (fixed?) low IPs at the bottom of the subnet, generally supervisor at 172.30.32.2, etc.

  3. Docker/moby upstream (inadvertently) landed a change (issue: Can no longer set an IP address inside of a subnet range when subnet range is larger than IP range moby/moby#47120) that tightened restrictions on how container network attachment could work, limiting it only to the IPRange, not the entire subnet.

  4. As such after upgrading to Docker 25.0.0 (as has landed in the repos included by running the curl bash in the installer-supervised REAMDE) and attempting to restart the supervisor and/or any other major component, these containers quietly got errored out of an assignment in the hassio network at all and landed all in the default bridge docker network instead (172.17.0.0/16, so supervisor as first run docker container at 172.17.0.2, but only that IP (as opposed to e.g. when its bootstrapping the network))

  5. For unclear reasons, this failed/mis-assignment is not clear to other components, so e.g. the /etc/hosts inside the hassio_cli container still hardcodes/computes? supervisor to 172.30.32.2 - as such no ha supervisor commands work

  6. The supervisor is also broadly unhappy and prints loads of errors, before settling on blocked from execution, system is not healthy - privileged ; despite the container in fact being privileged (I went down a deep rabbit hole around the netlink udev refusing to initiate which I though might be the core issue)

As such the hardcoding of a check on docker version in this specific case is a rather course instrument, but I suspect it may come up again in the future.

A more appropriate specific solution might be to (simply) bubble this error:

# Attach Network
try:
self.network.connect(container, aliases=alias, ipv4_address=ipv4_address)
except docker.errors.APIError as err:
raise DockerError(
f"Can't link container to hassio-net: {err}", _LOGGER.error
) from err

more aggressively as ~fatal in the case of any essential component (supervisor, observer, dns?, etc.)

or more aggressively attempt to do discovery and routing to ~misplaced components

Warns about home-assistant#4827 as well as potential future regressions

pushing a same-tree commit to see if that continues past the now satisfied
CLA check
async def test_evaluation(coresys: CoreSys):
"""Test evaluation."""
async def test_evaluation_supported(coresys: CoreSys):
"""Test real evaluation with a current docker daemon."""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Test real evaluation with a current docker daemon."""
"""Test evaluation with a known good docker version."""

I kinda thought at one point it would be good to just let it hit the real thing, but I could not finagle getting pytest not to mock it out.

@codyc1515
Copy link

From an architecture side, would it not make more sense to just decide on a few main version and support those only? Same as what happens with Python versions today.

@donaldguy
Copy link
Author

From an architecture side, would it not make more sense to just decide on a few main version and support those only? Same as what happens with Python versions today.

I wouldn't disagree, and obvs the defacto over in hassos land is that there is a single pin at a time

I would say broadly that:

  1. the docker API is nominally versioned, along with the software, like this: https://docs.docker.com/engine/api/#api-version-matrix so in theory that is the thing-to-pin-to that is correct
  2. but then you see abberant version-specific-fails like the one in 25.0.0 ; not sure how commonly such are apt to arise as would suggest something like this PR still is warranted nevertheless

and/but then speaking as a supervised-installer user, I would say (again, as in PR descrip) that this pin/constraint should absolutely be in the debian package metadata (https://github.com/home-assistant/supervised-installer/blob/main/homeassistant-supervised/DEBIAN/control) or at least the https://github.com/home-assistant/supervised-installer/blob/main/homeassistant-supervised/DEBIAN/postinst should call the relevant apt-mark holds

I am willing to have HASSIO holding-back this machine from taking on newer docker engine, but I am pretty peeved to have it break on apt upgrade


That said, my suspicion is that HA et al are facing down a bigger container runtime reckoning sooner-rather-than later, with cgroups v1 officially officially deprecated (for sometime-this-year removal) in systemd 255

and with that a question of e.g. jumping ship(/"jumping whale") from docker/moby entirely to podman, or systemd-nspawn, or whatever.

It did not escape my notice that the /usr/sbin/hassio-supervisor run command recently grew a containerd.sock mount

So ... there's that.

@agners
Copy link
Member

agners commented Jan 31, 2024

EDIT: apparently contrary to my impression of their plans, they cut a docker 25.0.1 this morning, so this is kinda for no one now. The broader idea could still be helpful in future, but lmk if just want to close

Wrt close: A bit on the fence weather we should add this code, since with 25.0.1 this is kinda already in the past at this point.

On the other hand, maybe it indeed helps for the future. Question: Did you test that with 25.0.0? Did the message appear early enough so it would have been useful for people?

@mdegat01
Copy link
Contributor

The issue with this approach is in order for anyone to see this message we have to update supervisor. Like if Docker has a bad version in the future, using this as a mechanism for messaging means we need to urgently get a new version of supervisor to stable with this evaluation updated and the new bad version listed. We can't anticipate future bad versions of docker so existing versions of supervisor won't be aware to warn people.

Even then it would still only help people that had not updated docker yet. People that have updated docker already have unhealthy systems and so are pretty stuck and unlikely to be able to update supervisor to receive this message.

I think #355 is the better approach. The supervised installer should probably pin to a known good version and not simply install the latest. In addition if a known bad version of docker remains out as latest for a longer time we do actually have a way to get messages out to users without needing to update Supervisor or Home Assistant - this is why https://alerts.home-assistant.io/ was created. We can create a new alert and ensure anyone with a supervised type installation sees it as alert in their Home Assistant so they are aware of the potential issue. This seems like a better approach then handling it in supervisor here to me.

Copy link

github-actions bot commented Mar 1, 2024

There hasn't been any activity on this pull request recently. This pull request has been automatically marked as stale because of that and will be closed if no further activity occurs within 7 days.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 1, 2024
@github-actions github-actions bot closed this Mar 8, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants