Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move connect_single to SmartDevice.connect #538

Merged
merged 30 commits into from
Nov 21, 2023

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Oct 31, 2023

home-assistant/core#104213 will save the device type so we can avoid an update cycle to probe the device type at HA startup which will connect the device faster and reduce the risk of timeout during startup.

The power strips currently take upwards of 6-7s to do an update cycle: 2023-11-19 11:00:12.099 DEBUG (MainThread) [kasa.device_factory] Device 192.168.211.141 with known type (strip) took 6.90 seconds to connect which causes them to fail to connect. By saving the device type we decrease the chance the connect will fail since we only have to do one cycle in the allotted 10s.

Backstory: The HA tplink integration used to require UDP to set up devices. We rewrote the whole HA integration a few years ago to avoid needing to having working discovery to set up the devices since UDP is unreliable with these devices. Unfortunately, 0.5.4 brought back the UDP requirement so we now have a regression as seen in:

Fixes #543
Fixes #545

Copy link

codecov bot commented Oct 31, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (27c4799) 81.74% compared to head (3bab4f6) 82.22%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
+ Coverage   81.74%   82.22%   +0.47%     
==========================================
  Files          28       30       +2     
  Lines        2421     2458      +37     
  Branches      685      692       +7     
==========================================
+ Hits         1979     2021      +42     
+ Misses        382      379       -3     
+ Partials       60       58       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bdraco bdraco marked this pull request as ready for review October 31, 2023 21:27
Copy link
Member

@rytilahti rytilahti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first impression is that this looks good. Would you mind adapting the cli tool to use the same mapping?

TYPE_TO_CLASS = {

kasa/discover.py Outdated Show resolved Hide resolved
@bdraco
Copy link
Member Author

bdraco commented Nov 1, 2023

My first impression is that this looks good. Would you mind adapting the cli tool to use the same mapping?

TYPE_TO_CLASS = {

I changed the enum values to match the cli and built the dict lookup from it

kasa/smartdevice.py Outdated Show resolved Hide resolved
@sdb9696
Copy link
Collaborator

sdb9696 commented Nov 1, 2023

Hi @bdraco, apologies, I don't really understand this PR so wanted to ask you about it.

It seems like the change will bypass any connection in connect_single if a device_type is passed into the connect_single function. This seems counter-intuitive as the function is called connect_single so you would expect it to connect. If you want to bypass the connection why would you call connect_single at all? Wouldn't it be simpler and clearer what's happening to just construct the device in HA if that's what you want.

async def async_setup_entry(hass: HomeAssistant, entry: ConfigEntry) -> bool:
    """Set up TPLink from a config entry."""
    host = entry.data[CONF_HOST]
    try:
        #replace this:
        #device: SmartDevice = await Discover.discover[or connect]_single(host)
        #with this:
        device: SmartDevice = DEVICE_TYPE_TO_CLASS.get(device_type))
    except SmartDeviceException as ex:
        raise ConfigEntryNotReady from ex

Two other questions:

  • In my experience with these HA errors they seem to be intermittent and the ConfigEntryNotReady has retry logic built in which corrects the problem. Are you not finding that's happening?
  • Is the discovery timeout PR you added Make timeout adjustable #494 not helping here? Are you sure bypassing discovery isn't just going to move the underlying problem to the data coordinator?

Apologies in advance if I missed something obvious here.

@bdraco
Copy link
Member Author

bdraco commented Nov 1, 2023

Wouldn't it be simpler and clearer what's happening to just construct the device in HA if that's what you want.

Ideally ha knows as little as possible about how to construct the device and the library takes care of it. It's more about abstracting that complexity away from HA

@sdb9696
Copy link
Collaborator

sdb9696 commented Nov 1, 2023

I don’t think there’s really much complexity as it’s only one line of code as per my example above. You could anyway wrap that in a function call.

It just doesn’t seem to make sense to introduce a new function connect_single() and then another change to it to not connect at all. Why add connect_single in the first place? Instead just add a get_device_from_type() function.

Sorry, I’m not trying to be difficult but these changes seem designed to solve a specific HA problem that I’m worried they won’t solve, and will make the api confusing in the process.

N.B. @rytilahti asked me to modify discover_single to use the same UDP logic path as discover() in order to simplify things. Bearing in mind that’s it’s not even released to HA yet so can’t be causing any of the HA issues, maybe he could opine on this.

@bdraco
Copy link
Member Author

bdraco commented Nov 1, 2023

What you are proposing above won't work because it makes the exception catch and retry ineffective since

    except SmartDeviceException as ex:
        raise ConfigEntryNotReady from ex

will never run with that change and if the device is offline, the config entry will fail to load without retry.

@sdb9696
Copy link
Collaborator

sdb9696 commented Nov 2, 2023

What you are proposing above won't work because it makes the exception catch and retry ineffective since

will never run with that change and if the device is offline, the config entry will fail to load without retry.

Agreed, that's what I was saying when I asked this:

In my experience with these HA errors they seem to be intermittent and the ConfigEntryNotReady has retry logic built in which corrects the problem. Are you not finding that's happening?

What I was suggesting above was that IF you're going to avoid a device.update() with your device_type PR home-assistant/core#103150 do it in a new method rather than a connect_single() that doesn't do a connect.

Taking a step back all this started with trying to avoid UDP but as we've agreed UDP isn't the issue because it isn't live yet. There will still be connection errors whether they happen in setup_entry or data update. However one of the advantages of the UDP change is that it will tell us if the connection issues are related to firmware updates because it will report the devices as unsupported.

Anyway, to help re-iterate that I'm really trying to help here :) I tested the 0.5.4 implementation of connect_single with HA and I think there is an issue. It errors here in async_setup_entry:

found_mac = dr.format_mac(device.mac)

because the new device has not had .update() called on it and @requires_update. I think you need to add the following line to connect_single right after you create the new device:

dev.update_from_discover_info(unknown_dev._last_update)

However if you go ahead with this PR and the associated HA PR and avoid calling device.update() altogether I think you will error on the same line in async_setup_entry because the device object will not have a mac to check. I really don't see how you can avoid calling device.update() in async_setup_entry.

I do have a suggestion to try to improve the resilience here and maybe get to the bottom of the causes of 99449: If async_setup_entry can't find the device, we could schedule a full discover() and then check the mac addresses against all the entry.unique_id's and if they match then update the host name for the entry. This could help eliminate some of the dhcp/dns type issues some of the users were reporting. What do you think? I'm happy to draft a PR if you think it's worth it.

@trevorwarwick
Copy link

While you guys are debating what the ultimate correct fix should be, please bear in mind that for many people this integration is currently just totally broken... So getting it working even imperfectly would be a good start, and maybe the code can be finessed later ? Thanks for all the efforts so far, of course.

@bdraco
Copy link
Member Author

bdraco commented Nov 2, 2023

What I was suggesting above was that IF you're going to avoid a device.update() with your device_type PR home-assistant/core#103150 do it in a new method rather than a connect_single() that doesn't do a connect.

👍

Taking a step back all this started with trying to avoid UDP but as we've agreed UDP isn't the issue because it isn't live yet. There will still be connection errors whether they happen in setup_entry or data update. However one of the advantages of the UDP change is that it will tell us if the connection issues are related to firmware updates because it will report the devices as unsupported.

Anyway, to help re-iterate that I'm really trying to help here :)

I get that, and thats why I'm continuing to engage on the subject.

I tested the 0.5.4 implementation of connect_single with HA and I think there is an issue. It errors here in async_setup_entry:

found_mac = dr.format_mac(device.mac)

because the new device has not had .update() called on it and @requires_update. I think you need to add the following line to connect_single right after you create the new device:

dev.update_from_discover_info(unknown_dev._last_update)

However if you go ahead with this PR and the associated HA PR and avoid calling device.update() altogether I think you will error on the same line in async_setup_entry because the device object will not have a mac to check. I really don't see how you can avoid calling device.update() in async_setup_entry.

home-assistant/core#103150 is marked a draft and needs #538 because of this problem which is why https://github.com/python-kasa/python-kasa/pull/538/files#diff-5e81b58a19fa4037d13d8a879cd13f4619ee94c91e8c2e889a7991c552e06afbR361 was added

I do have a suggestion to try to improve the resilience here and maybe get to the bottom of the causes of [99449]

I think there are multiple problems

  • The time it takes to connect up the device via discover_single or connect_single. If we can't update fast enough before the timeout it will fail to set up
  • Broken broadcast/multicast/devices on other subnets so we can't see IP changes
  • Devices that stop responding to discovery

(home-assistant/core#99449): If async_setup_entry can't find the device, we could schedule a full discover() and then check the mac addresses against all the entry.unique_id's and if they match then update the host name for the entry. This could help eliminate some of the dhcp/dns type issues some of the users were reporting. What do you think? I'm happy to draft a PR if you think it's worth it.

Discovery is already run periodically in the background and any ip changes are fed into integration discovery which calls https://github.com/home-assistant/core/blob/401bb90215091b933671ab138315bd851d0a1aee/homeassistant/components/tplink/config_flow.py#L38 and https://github.com/home-assistant/core/blob/401bb90215091b933671ab138315bd851d0a1aee/homeassistant/components/tplink/config_flow.py#L49 which will update the ip address if it changes which will reload the config entry so the next attempt will use the newly discovered ip.

@rytilahti rytilahti added the enhancement New feature or request label Nov 2, 2023
kasa/discover.py Outdated Show resolved Hide resolved
kasa/discover.py Outdated Show resolved Hide resolved
@rytilahti
Copy link
Member

rytilahti commented Nov 3, 2023

So, to my understanding discover_single and connect_single solve two conceptually different use cases:

  1. discover_single provides a way to discover a single device (or more when using broadcast datagrams) using the UDP-based discovery protocol.
  2. connect_single provides a way to create a device instance by avoiding using the discovery protocol at all, leveraging the fact that we can decide on the device type by querying the device, with this PR skipping the model detection altogether. This does not leverage the discovery protocol at all, but tries to access the device directly using the TCP-based communication protocol.

I'm starting to wonder if whole idea of having these methods inside the discover class was completely wrong in the first place, and maybe there should be a separate factory class (or a classmethod inside the Device class) to construct device instances..? The connect_single has not yet appeared in any release, so we can still adjust this for the time being.

The classmethod-based approach would have a signature maybe like this:

T = TypeVar("DeviceTypeVar", SmartDevice, SmartPlug, SmartBulb, Smart...)
@classmethod
def create(cls, host, credentials, device_type: Optional[T], ...) -> T:
    pass

@sdb9696
Copy link
Collaborator

sdb9696 commented Nov 3, 2023

So, to my understanding discover_single and connect_single solve two conceptually different use cases:

  1. discover_single provides a way to discover a single device (or more when using broadcast datagrams) using the UDP-based discovery protocol.
  2. connect_single provides a way to create a device instance by avoiding using the discovery protocol at all, leveraging the fact that we can decide on the device type by querying the device, with this PR skipping the model detection altogether. This does not leverage the discovery protocol at all, but tries to access the device directly using the TCP-based communication protocol.

The question for me is that I'm not sure why there's a need for connect_single and I don't think it's going to fix these issues.

  • In 0.5.3 discover_single did a direct connection. It didn't use the discovery UDP protocol. (You asked me to change it for 0.5.4 so there is only one "discovery path")
  • The issues 99449 100663 currently being reported by HA users are with 0.5.3 release. 0.5.4 is not live yet in HA.
  • So connect_single in 0.5.4 will be the same as discover_single in 0.5.3. I don't see how this is going to solve the current user issues. The problem is occurring in the _query method, all the error logs are coming from there.
  • Also the logs we've been given are showing a variety of errors, many are connection errors rather than timeouts. I can't actually see any timeout errors in the discovery part, only connection errors (When the msg after the last colon in the log is blank its a timeout, otherwise it's a connection error.)

I've been trying to recreate the reported errors on my devices and yesterday I was having some luck with one of the devices. A few of the errors we don't currently retry like ConnectionRefused and HostUnreachable were actually intermittent. Also if I did a small delay between retries on connection errors I was getting a connection whereas if I removed the delay I wasn't. I was hoping to get some log data together for @bdraco but I can't get it to behave in quite the same way today.

I think there's some fairly safe and simple work we could do to the _query function in protocol to improve the reliability. I also think it could help to have separate timeouts for connect vs read similar to the requests library. I'm happy to pull a PR together for this if you agree.

EDIT: Some log snippets:

Connection errors during discovery:

Config entry 'Living Room Lights KS220M(US)' for tplink integration not ready yet: Unable to connect to the device: 192.168.1.63:9999: [Errno 104] Connect call failed ('192.168.1.63', 9999); Retrying in background
Config entry 'Master Bedroom 2 KL125(US)' for tplink integration not ready yet: Unable to query the device 192.168.1.33:9999: Connection lost; Retrying in background
Config entry 'Living Room Lights KS220M(US)' for tplink integration not ready yet: Unable to query the device 192.168.1.63:9999: [Errno 104] Connection reset by peer; Retrying in background

Connection and timeout errors during data refresh:

2023-09-27 11:01:56.453 ERROR (MainThread) [homeassistant.components.tplink.coordinator] Error fetching 192.168.30.170 data: Unable to connect to the device: 192.168.30.170:9999: [Errno 113] Connect call failed ('192.168.30.170', 9999)
2023-09-27 11:02:45.333 INFO (MainThread) [homeassistant.components.tplink.coordinator] Fetching 192.168.30.170 data recovered
2023-09-27 11:03:20.249 ERROR (MainThread) [homeassistant.components.tplink.coordinator] Error fetching 192.168.30.170 data: Unable to connect to the device: 192.168.30.170:9999: 

Timeout errors during data refresh:

2023-10-08 15:00:45.520 ERROR (MainThread) [homeassistant.components.tplink.coordinator] Error fetching [IP] data: Unable to connect to the device: [IP]:9999:
2023-10-08 15:00:45.521 DEBUG (MainThread) [homeassistant.components.tplink.coordinator] Finished fetching [IP] data in 20.011 seconds (success: False)

@bdraco
Copy link
Member Author

bdraco commented Nov 8, 2023

I'm tempted to wait until #542 is released to see if makes a difference, and if we still have a problem, than proceed with this refactor

@bdraco
Copy link
Member Author

bdraco commented Nov 18, 2023

It seems like we do need to get rid of the UDP based on home-assistant/core#103977

bdraco added a commit to home-assistant/core that referenced this pull request Nov 18, 2023
I am going to attempt a fix for #103977
via python-kasa/python-kasa#538

I am picking up codeowner on this for the forseeable future to watch
for issues as well
@bdraco
Copy link
Member Author

bdraco commented Nov 19, 2023

2023-11-19 11:00:12.099 DEBUG (MainThread) [kasa.device_factory] Device 192.168.211.141 with known type (strip) took 6.90 seconds to connect

no really surprising why the power strips failed before

@bdraco
Copy link
Member Author

bdraco commented Nov 19, 2023

The other devices are nice and fast. ... its the children that kill it

2023-11-19 11:00:05.325 DEBUG (MainThread) [kasa.device_factory] Device 192.168.211.203 with known type (plug) took 0.12 seconds to connect
2023-11-19 11:00:05.375 DEBUG (MainThread) [kasa.device_factory] Device 192.168.214.242 with known type (lightstrip) took 0.17 seconds to connect

@bdraco
Copy link
Member Author

bdraco commented Nov 19, 2023

retested this with home-assistant/core#104213 + home-assistant/core#104208. All my power strips are solid now

I think we can backport home-assistant/core#104213 but home-assistant/core#104208 will have to wait for 2023.12.x since its technically a breaking change. I can get the strip thats not responding to UDP to set up after home-assistant/core#104213 and its mostly working even if its sugglish/fails to respond sometimes

@rytilahti rytilahti added the breaking change Breaking change label Nov 20, 2023
Copy link
Member

@rytilahti rytilahti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short "Initialization" section to the docs describing how downstreams should use the lib, either in https://python-kasa.readthedocs.io/en/latest/smartdevice.html or https://python-kasa.readthedocs.io/en/latest/design.html

kasa/smartdevice.py Show resolved Hide resolved
@rytilahti
Copy link
Member

Uhh, messed up by splitting the review comments... Oh well, this approach looks good to me especially as it clearly separates the concerns and simplifies the code. This will be a breaking change as we remove the connect_single from the existing API, but given it's been only there for a single release that's fine.

@sdb9696 it should be enough to do the discovery only once and then re-use the connection parameters for future accesses. This way we can get rid off the whole discovery data path inside homeassistant for the devices that are already configured.

@bdraco
Copy link
Member Author

bdraco commented Nov 20, 2023

I'll retest this in HA shortly

@bdraco
Copy link
Member Author

bdraco commented Nov 20, 2023

Retested in HA. Everything still works

Copy link
Member

@rytilahti rytilahti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very minor doc nit, I tested that all my devices work as expected using the cli tool.

**************

When the IP Address of a device is known, it can be connected with :meth:`~kasa.SmartDevice.connect()`.
If the IP Address is not known, it can be discovered using :func:`~kasa.Discover.discover_devices`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the IP Address is not known, it can be discovered using :func:`~kasa.Discover.discover_devices`.
If the IP Address is not known, it can be discovered using :func:`~kasa.Discover.discover`.

Note the wrong function name. I would also swap this and the previous sentence, so something like:

Use to perform broadcast, udp-based discovery on the network. This will return you a list of device instances based on the discovery replies.
If the device's host is already known, you can use to construct a device instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good now, but I'm really jet lagged so I'll check it again in the morning. Than I think we should get a release out to address the linked issues in HA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looked fine to me so I went and merged it already... I can do release prep on the weekend if that works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdraco I think it might be prudent to keep HA doing discover_single instead of direct tcp connection for the next HA release. I understand there are some issues at the moment with subnets and udp but the klap changes might not fully work with TCP connect and we may need to sync up on the error handling in HA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather get the users who are using the integration today fixed before we focus on adding new things.

Copy link
Member

@rytilahti rytilahti Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this changes anything regarding to klap, or auth support in general. The discovery will still done using the discovery protocol and what changes is only that homeassistant can optimize after-discovery accesses when it knows the connection parameters.

So basically the discovered (or inputted) connection settings are stored inside the config entry, which are then used for future connections. What this means for authenticated devices is that the integration will need to be modified to catch the exception on invalid creds / creds needed to let the user to change them, but that should be enough. Or am I mistaken?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a great path forward to me unless I'm missing something as well 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Sorry, I just meant that we should incorporate that error handling and connection parameters before going ahead with the draft HA PR as-is.

It’s not essential because as @bdraco says, it’s new functionality, but atm it could raise errors for anyone who adds a klap device via discovery then restarts.

If we can fix rapidly in HA then not so much of a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit to home-assistant/core@4d3b86b to make sure klap devices don't start a discovery flow so they don't appear before we have code in place to support them in HA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if there are any other places we should check as well.

Once we get this one bumped in HA and that PR out the door, we can start on a KLAP one to get it supported in HA

bdraco and others added 2 commits November 21, 2023 23:10
Co-authored-by: Teemu R. <tpr@iki.fi>
@rytilahti rytilahti changed the title Move connect_single to SmartDevice.connect and allow passing in the device type Move connect_single to SmartDevice.connect Nov 21, 2023
@rytilahti rytilahti merged commit e98252f into python-kasa:master Nov 21, 2023
29 checks passed
@bdraco bdraco deleted the connect_single_device_type branch November 22, 2023 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Breaking change enhancement New feature or request
Projects
None yet
4 participants