New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use non-persistent connection for MPD #94507
Use non-persistent connection for MPD #94507
Conversation
In other words, don't rely on the connection management provided by "python-mpd2". Instead of keeping the connection to MPD open, we explicitly connect before and disconnect after each command. There is probably a bit of overhead to this, but as the integration uses a local-polling approach to begin with, no functionality is lost or degraded. This change greatly hardens the MPD integration against both network issues and problems with the daemon itself. All connection-related failure modes have effectively been removed.
Only "async_get_media_image" attempts to connect, all others are either called from there, or from the main "async_update" method (see previous commit) which also attempts to connect. So, this commit mainly revolves around gracefully handling situations where no connection is available when trying to retrieve MPD state. Finally, note the removal of "self._commands". This property is only used at the start of "_async_get_file_image_response" and was thus changed into a local variable.
These all need to explicitly connect to MPD as part of their flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change is fine since the implementations is polling, with an interval of 120s.
A few comments though
# Cleanly disconnect in case connection is not in valid state | ||
_LOGGER.debug("Error updating status: %s", error) | ||
self._disconnect() | ||
return self._is_available not in [None, False] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need do differentiate between None and False, is it just to control debug logs?
Also, I'd suggest to simplify:
return bool(self._is_available)
Or maybe:
return self._is_available is True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need do differentiate between None and False, is it just to control debug logs?
Correct.
Whenever a device initially goes offline (or is offline when HA starts), a single warning is logged. If the device remains offline, all subsequent failed connection attempts raise a debug message instead.
Personal preference of keeping the logs clear of duplicate messages/clutter... I guess the grouping HA does in its log viewer UI more or less achieves the same β if this unwanted / against conventions let me know.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requested simplification (return self._is_available is True
) was applied.
The distinction between False
and None
(to control debug logging) remains β pending my remark/question:
Personal preference of keeping the logs clear of duplicate messages/clutter... I guess the grouping HA does in its log viewer UI more or less achieves the same β if this unwanted / against conventions let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's more a question for @emontnemery - personally I think logging a WARN on the first failure and logging a DEBUG on subsequent failures makes sense to me. Noisy logging is unhelpful, and the code looks like it'll behave as designed.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, warn on first failure and debug after that is fine, it was just not clear to me that's what was happening.
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks π |
This comment was marked as outdated.
This comment was marked as outdated.
Why not fix the mpd integration so that when there is an exception, it reconnects? This is what I have been doing in my MPRIS integration, with the added safety of a heartbeat plus disconnect and retry when the heartbeat isn't received in the expected time window. |
That's the first thing I tried; never managed to get that to work though: Once the MPD integration and its underlying library start to disagree about the connection state, nothing seems to work any more on my end. Any command issued from the integration (e.g. "ping", "disconnect") simply results in a "cannot execute this command on a disconnected MPD instance"-response from the library... If you've got some example code that works properly on your end, feel free to share! I'd love to give it a try here. |
BTW, you must change uses of |
Used to be initialised as None, which caused "NoneType is not iterable" type of issues in case of an unexpected disconnect (at which point status gets set to None again). Instead of guarding against None everywhere, using an empty dictionary seemed more prudent... Furthermore, more cautiously access its members to prevent potential KeyError-s in similar cases.
This method doesn't need a connection; it calls into two other methods that actually connect to MPD β attempting to connect from here resulted in a livelock.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @thijsputman π
If you want to change how the log level is decided, please open a follow-up PR.
(Close + Open to rerun CI against the current tip of dev) |
Proposed change
Instead of relying on the (persistent) connection handling of
python-mpd2
(which is not working optimally β see for example Mic92/python-mpd2#31), explicitly connect and disconnect for every command send to MPD.This resolves #57633 (comment) (and probably some related issues) where the MPD integration would get stuck in "limbo": Disconnected and unable to reconnect at the same time. Outside of a full restart of Home Assistant there was no way of recovering from this.
With the three MPD players in my network, this issue would occur once per week; a slight hiccup in the Wi-Fi was often already enough to require a full restart of HA.
This change introduces a bit of overhead, but as the integration is local-polling (ie, nothing actually gets pushed back to us over the persistent connection) I'd say the impact is minimal. The MPD protocol doesn't really appear to be geared towards persistent connections any way.
The best solution would probably be a fix in
python-mpd2
itself, but I'm afraid that's a bit out of scope for me at the moment.I've tried to keep changes to a minimum (didn't totally succeed π) and use an elegant/modern Python approach. I'm not very familiar with Python, so there might be some oddities here and there...
I've been running this code locally for about two months (with three MPD players on the network) and it's been rock solid: Disconnects happen from time to time, the integration always recovers and never gets stuck in "limbo".
In case anyone is wondering: I did initially try to simply brute-force some additional connection attempts into the integration β that didn't work at all: Once
python-mpd2
looses the connection, there doesn't appear to an (obvious) way to get it to reconnect again.Type of change
Additional information
Checklist
black --fast homeassistant tests
)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest
.requirements_all.txt
.Updated by running
python3 -m script.gen_requirements_all
..coveragerc
.To help with the load of incoming pull requests: