Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TP-Link Kasa Integration not working. Device status shows "Failed setup, will retry". #103977

Closed
JackBeQuick87 opened this issue Nov 14, 2023 · 30 comments

Comments

@JackBeQuick87
Copy link

JackBeQuick87 commented Nov 14, 2023

The problem

All devices using the TP-Link Kasa integration show "Failed setup, will retry". Hovering over the status shows "Unable to get discovery response for [device's hostname]."

Issue appeared immediately after updating to Home-Assistant v2023.11.
The integration was known to work before upgrading (I upgraded from v2023.9 to v2023.11, skipping 2023.10).

Debug log shows evidence of successful connection to device.

What version of Home Assistant Core has the issue?

core-2023.11.2

What was the last working version of Home Assistant Core?

core-2023.9 (or closest build)

What type of installation are you running?

Home Assistant Container

Integration causing the issue

tplink

Link to integration documentation on our website

https://www.home-assistant.io/integrations/tplink

Diagnostics information

home-assistant_tplink_2023-11-14T17-58-32.092Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Nothing abnormal was found.

Additional information

  • Multiple devices of varying models produce similar error and logs. None of the devices work.
  • All devices were added by manually entering their hostname/FQDN.
    • Devices are on different VLAN (with routing rules configured), so I do not expect auto discovery to work. These devices worked prior to update.
@home-assistant
Copy link

Hey there @rytilahti, @TheGardenMonkey, mind taking a look at this issue as it has been labeled with an integration (tplink) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of tplink can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign tplink Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


tplink documentation
tplink source
(message by IssueLinks)

@joshua-nord
Copy link

I experienced the same symptoms on a previously working setup when I upgraded to Home-Assistant v2023.11. I worked around it by allowing traffic from the HomeAssistant controller to the devices on ports 9999 and 20002 UDP and TCP., which appear to be Kasa's discovery ports.

@sweharris
Copy link

sweharris commented Nov 14, 2023

Just to note that I'm also seeing identical issues (as noted in a comment on #99449); as I reported there I'm seeing get_sysinfo calls working and seeing UDP traffic flowing fine.

My setup also has the devices on a separate VLAN to HomeAssistant, but the firewall rules permit HA to talk to the IoT VLAN without restriction.

@JackBeQuick87
Copy link
Author

JackBeQuick87 commented Nov 15, 2023

Looking through the commit history involving the tplink integration, there was a recent commit that changed to using a newer version of the python-kasa package. I found a related issue in that repository: python-kasa/python-kasa#543.

The problem in Home-Assistant seems to be rooted in the python-kasa package.

As a workaround, you can re-add the devices using their IP addresses. This will obviously cause problems if you do not used fixed IPs or DHCP reservations.

A fix of the python-kasa library is still needed.

@JackBeQuick87
Copy link
Author

I experienced the same symptoms on a previously working setup when I upgraded to Home-Assistant v2023.11. I worked around it by allowing traffic from the HomeAssistant controller to the devices on ports 9999 and 20002 UDP and TCP., which appear to be Kasa's discovery ports.

Thanks, but this problem seems to be different. My HomeAssistant instance can already/still reach the Kasa devices on those ports.

@sweharris
Copy link

As a workaround, you can re-add the devices using their IP addresses. This will obviously cause problems if you do not used fixed IPs or DHCP reservations.

This seems to be a good workaround.

In my case I stopped the service, edited .storage/core.config_entries and replaced the DNS names with IP addresses, and then restarted. The devices are now properly found.

Thanks!

@JackBeQuick87
Copy link
Author

As a workaround, you can re-add the devices using their IP addresses. This will obviously cause problems if you do not used fixed IPs or DHCP reservations.

This seems to be a good workaround.

In my case I stopped the service, edited .storage/core.config_entries and replaced the DNS names with IP addresses, and then restarted. The devices are now properly found.

Thanks!

Ah, thank you so much for that tip with the core.config_entries file!

If, in the GUI, you try to add a new device using an IP of a device that was formerly registered using its hostname, then it seems the HA core or integration will automatically replace the device's host name with the newly entered IP. However, I don't know if this process would work in reverse, whenever the hostname issue is fixed. Using your method of editing the core.config_entries file will be convenient for the switch back to hostnames, too.

@TheCodeJanitor-dotcom
Copy link

TheCodeJanitor-dotcom commented Nov 15, 2023

I started experiencing this also. I have four TP-Link devices, all configured with static IP addresses on a subnet. By entering the host IP when adding the devices to the integration, they work, for a time ranging from minutes to hours, then stop working with the "Failed setup" indication. They always work from the Kasa app.
I examined core.config_entries, and found the host address for all four devices were set to one of my routers' address.
I edited the file, replacing the the host addresses with the correct static addresses, and restarted HA. All devices worked again, but eventually failed over the next few hours.
The host addresses in core.config_entries had reverted to that same router address. I assume this is a function of the auto-discovery mechanism, but I don't know enough to be sure, or how to disable it.

@latteetanne
Copy link

I'm having this same behaviour. Can ping and use all three of them via official app but Home Assistant discovery can't find any of them via auto discovery and gives out error "No devices found on the network". If I try to enter IP when adding manually I get "Failed to connect" error. Prior to deleting and trying to re-add them I also checked that in core.config_entries they had same IP addresses as they currently do so it's weird. They used to work really reliably until now. :(

@TheCodeJanitor-dotcom
Copy link

TheCodeJanitor-dotcom commented Nov 17, 2023

I started experiencing this also. I have four TP-Link devices, all configured with static IP addresses on a subnet. By entering the host IP when adding the devices to the integration, they work, for a time ranging from minutes to hours, then stop working with the "Failed setup" indication. They always work from the Kasa app. I examined core.config_entries, and found the host address for all four devices were set to one of my routers' address. I edited the file, replacing the the host addresses with the correct static addresses, and restarted HA. All devices worked again, but eventually failed over the next few hours. The host addresses in core.config_entries had reverted to that same router address. I assume this is a function of the auto-discovery mechanism, but I don't know enough to be sure, or how to disable it.

I found a workaround that may only be applicable in my particular subnet configuration:
I used IP/MAC binding in my subnets' router to pin the TP-Link/Kasa device MAC addresses to the static IPs I had previously assigned them. This seems to prevent whatever discovery mechanism kept changing the IP addresses in core.config_entries, and I haven't had a recurrence of 'Failed setup' (yet).

@TheCodeJanitor-dotcom
Copy link

I found a workaround that may only be applicable in my particular subnet configuration: I used IP/MAC binding in my subnets' router to pin the TP-Link/Kasa device MAC addresses to the static IPs I had previously assigned them. This seems to prevent whatever discovery mechanism kept changing the IP addresses in core.config_entries, and I haven't had a recurrence of 'Failed setup' (yet).

Forget it. Failed. Same result, the IP of the device reverts to the WAN address of the subnet router.

bdraco added a commit that referenced this issue Nov 18, 2023
I am going to attempt a fix for #103977
via python-kasa/python-kasa#538

I am picking up codeowner on this for the forseeable future to watch
for issues as well
@bdraco bdraco mentioned this issue Nov 18, 2023
20 tasks
@bdraco bdraco self-assigned this Nov 18, 2023
rytilahti pushed a commit that referenced this issue Nov 18, 2023
I am going to attempt a fix for #103977
via python-kasa/python-kasa#538

I am picking up codeowner on this for the forseeable future to watch
for issues as well
@TermiNaderTL

This comment was marked as duplicate.

@seancrites
Copy link

I too run with a vlan/firewalled app & iot networks. Also have been having the same issue where my TP-LINK (HS105?) switch suddenly stopped working within the past month or two. I've also got DHCP reservation giving predictable IP addresses to the TP-LINK as well.

My old firewall was just permitting TCP 9999 from HA OS -> IOT. I enabled logging to see what was not permitted between the two and found that HA OS is now sending traffic via UDP on 9999 & 200002. I updated my firewall and that fixed the issue.

@TheCodeJanitor-dotcom
Copy link

I too run with a vlan/firewalled app & iot networks. Also have been having the same issue where my TP-LINK (HS105?) switch suddenly stopped working within the past month or two. I've also got DHCP reservation giving predictable IP addresses to the TP-LINK as well.

My old firewall was just permitting TCP 9999 from HA OS -> IOT. I enabled logging to see what was not permitted between the two and found that HA OS is now sending traffic via UDP on 9999 & 200002. I updated my firewall and that fixed the issue.

I disabled SPI on my routers, and got a different behavior, but discovered something interesting: my IOT router is also TP-Link, and the router is responding to the probes from HA OS, but obviously not in the way that the TP-Link integration is expecting.

So, all four of my TP-Link outlets (a KP303, an HS300, an HS107, and an EP40) get tried, but a failure for each is reported from the TP-Link routers' IP address...

It also appears that whatever fix was submitted above has stalled...

@iointerrupt
Copy link

iointerrupt commented Dec 9, 2023

It appears to be dns related issue in how python kasa is parsing hostname. For example, running

kasa --host bedroomlamp.lan on fails with:
No --type defined, discovering..
Got error: SmartDeviceException('Unable to get discovery response for bedroomlamp.lan')
.

Whereas specifying the ip address of bedroomlamp.lan works fine: kasa --host 1.2.3.4 on

[EDIT]: For now, I have reverted my python-kasa install back to 0.5.3 and its working as expected. Looks like something in 0.5.4

@trustno1foxm
Copy link

I too run with a vlan/firewalled app & iot networks. Also have been having the same issue where my TP-LINK (HS105?) switch suddenly stopped working within the past month or two. I've also got DHCP reservation giving predictable IP addresses to the TP-LINK as well.

My old firewall was just permitting TCP 9999 from HA OS -> IOT. I enabled logging to see what was not permitted between the two and found that HA OS is now sending traffic via UDP on 9999 & 200002. I updated my firewall and that fixed the issue.

oh that was my bugfixing! thank you! nice that there hasn't been any documentation about that...

@Shredder5262
Copy link

I believe my issue is related to this also. I see the error message below in the logs and since updating to HA core 2023.12.2 (I'm on 2023.12.3 now) I have been experiencing either large delays with my devices turning on or not turning on at all.

Logger: homeassistant.components.tplink.coordinator
Source: helpers/update_coordinator.py:332
Integration: TP-Link Kasa Smart (documentation, issues)
First occurred: December 14, 2023 at 7:32:59 PM (218 occurrences)
Last logged: 10:15:51 AM

Error fetching 192.168.1.30 data: Unable to query the device 192.168.1.30:9999: [Errno 104] Connection reset by peer
Error fetching 192.168.1.195 data: Unable to query the device 192.168.1.195:9999: [Errno 104] Connection reset by peer
Error fetching 192.168.1.148 data: Unable to query the device 192.168.1.148:9999:
Error fetching 192.168.1.30 data: Unable to connect to the device: 192.168.1.30:9999: [Errno 104] Connect call failed ('192.168.1.30', 9999)
Error fetching 192.168.1.30 data: Unable to query the device 192.168.1.30:9999:

@mmccool
Copy link

mmccool commented Dec 19, 2023

Two things to add here:

  • I had a similar problem but then realized the device in question was not powered up. However, it looked similar, going through a reinitialization/retry/reload loop. But really it should just be marked as "unavailable." Sometimes I will not have devices powered up - in this case the switch is downstream from my solar battery and I have a rule to turn off the solar battery's AC invertor when it is not needed to save power. This MAY be a different issue (should be unavailable instead of init looping), but it looks the same. After I powered the device up it worked fine - I have three KP400's; the only TP-Link devices I have.
  • This problem is behaving suspiciously similar to Zigbee is failing constantly, requires reloading/reinitialization #105506, which is however for the ZHA integration. But it started at the same time, with 2023.12.1 - so maybe a common root cause?

@Shredder5262
Copy link

I've only had 1 occurance of a device outright failing to connect and instead showing offline; else the devices seem to respond as they should. The error message itself seems to be a lot of noise that I'm trying to reduce in my environment.

@tomlyo
Copy link

tomlyo commented Dec 21, 2023

I have this happen on all my Kasa devices whenever my WiFi goes out temporarily. As soon as it's back up all of them are in a state of "Failed setup, will retry". I've restarted Home Assistant in each case, and the OS itself (running HassOS). The devices work fine in the Kasa app, and they work in Home Assistant IF I remove the device, then re-add it by it's static IP address.

In my case, the switches are on a different VLAN than my HASS server. It's a bit frustrating, but every time we get a power outage, or my Wifi AP goes offline, I just have to go through deleting all my Kasa devices, and re-adding them. Also discovery is just wack, not sure what it's doing as I get logs like this:

2023-12-20 19:37:33.242 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.100', 9999) >> {'system': {'get_sysinfo': None}}
2023-12-20 19:37:33.246 DEBUG (MainThread) [kasa.discover] Waiting a total of 10 seconds for responses...
2023-12-20 19:37:39.826 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.101', 9999) >> {'system': {'get_sysinfo': None}}
2023-12-20 19:37:39.829 DEBUG (MainThread) [kasa.discover] Waiting a total of 10 seconds for responses...
2023-12-20 19:37:40.803 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.109', 9999) >> {'system': {'get_sysinfo': None}}
2023-12-20 19:37:40.807 DEBUG (MainThread) [kasa.discover] Waiting a total of 10 seconds for responses...
2023-12-20 19:37:48.649 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.100', 9999) >> {'system': {'get_sysinfo': None}}

which those IP's are just completely random as my network and vlans use nothing even close to that..

EDIT:

Re-reading the OP, "Unable to get discovery response for [device's hostname]." In my case this would make sense why the devices aren't connecting because it's trying completely random IP addresses, that aren't even the IP addresses I specified when I set the devices up in the integration.

In my case my switches use IP address 192.168.1.70-79

@joeidea
Copy link

joeidea commented Dec 24, 2023

I've had similar problems with the Kasa HS300 smart power strip. I first noticed at 2023.11.2. After reading here, I re-did the integration specifying IP (I use DHCP IP address reservation); that worked, for a while. It broke again about 2 weeks ago; but after about 2 days resolved by itself. I am now at Core 2023.12.3.

It broke again about 2 days ago. The behavior is different. I re-did the integration, specifying IP, but discovery fails. I then leave the Host blank, and discovery works, shows the correct IP address; but no entities are discovered, and I see endless failed/retry loop.

I will watch closely this week. I believe the trigger is my router rebooting (which I do weekly for maintenance.).

I resolved (temporarily?) doing this:

  1. Shutdown HA VM (Windows 11 Hyper-V)
  2. Change DHCP IP (192.168.1xxx)
  3. Reboot Router
  4. Start HA VM.

@ecroskery
Copy link

Went from HASS 2023.8 to 2024.1 today and after the upgrade the Kasa devices all failed to initialize and I found this thread.

My Kasa devices are also on a different VLAN from HASS and I was previously allowing HASS to connect to the Kasa devices via TCP on port 9999

With the information here I added a rule to allow HASS to also connect via UDP on port 20002 (* not 200002) and everything seems to be working again just fine.

Thanks very much for the information to get this working again

@paulbraren
Copy link

I have Home Assistant with Terminal installed. Not sure of the exact command for allowing discovery via UDP port 20002, but I'm curious, if I get that figured out, and add my 4 EP25 devices seen by Home Assistant, do you think those devices will persist through future updates? Eventually hoping Home Assistant incorporates a fix for these latest rev. EP25 devices not being discoverable, but I have no idea how long that may take. I'm admittedly rather new to the Home Assistant community.
2024-01-09_17-45-44

@rytilahti
Copy link
Member

@paulbraren that's a different issue as what is being discussed here, sorry.

Anyway, when #105143 gets merged, the UDP connectivity will only be required for the initial discovery and for potential firmware updates that change the low-level communication parameters (different encryption etc.) which are only available using the discovery protocol. All device communications are done using TCP.

In practice, this means that as long as IP addresses remain stable or homeassistant is in the same network and can use DHCP traffic to update the addresses, the integration should keep working without any issues even on less favorable network conditions.

@paulbraren
Copy link

Thank you for the polite redirect to #105143 that I've been watching, much appreciated. I have a DHCP reservations for my 4 new EP25 devices, so the IP addresses won't change. It's good to hear UDP is only used initially for discovery, thanks for clarifying. Looking forward to finally being able to pair my Lutron Pico remotes with my 4 new EP25 devices, especially since that works so well with my older EP25 units that were easily discovered by Home Assistant.

@sdb9696
Copy link
Contributor

sdb9696 commented Feb 8, 2024

This issue should be fixed in the latest HA release 2024.02

@sdb9696
Copy link
Contributor

sdb9696 commented Feb 8, 2024

@home-assistant close

@home-assistant home-assistant bot closed this as completed Feb 8, 2024
@TheCodeJanitor-dotcom
Copy link

Thank you all for your efforts, it appears to be working correctly now!

@TheCodeJanitor-dotcom
Copy link

As before, the integration worked for a few minutes, but auto-discovery apparently continues to run and corrupts the configuration of my Kasa devices (all outlets/outlet strips). Possibly related to the presence of a TP-Link SR20 router on my network, as the error message reports "Unable to connect to the device: :9999"

@rytilahti
Copy link
Member

Without any network traces or other information, it is really time-consuming to figure out what is going wrong and how to fix it. The way DHCP discovery works is that it listens for broadcasted DHCP requests, and uses the requested IP addresses within those requests and the MAC address of the packet to inform integrations about address changes.
Perhaps the router is acting as a weird, non-standard confirming relay that sets its own IP address to those requests the homeassistant host is receiving?

If that'st the case, there are no easy real fixes that can be done at homeassistant's end. One stop-gap solution as long as your devices have static IP addresses is to disable the dhcp discovery completely. To my knowledge, this could be done by removing the default_config and configuring all wanted integrations listed on that page manually. This approach is not recommended, as any future changes to defaults will not be automatically applied, and it will most likely break unexpectedly at some point in the future.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.