Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie fix (Refresh command 0x12) #491

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Elendilon
Copy link

Discussed in issue #445 , and see the commit comment. This may need further work (due to edge cases on the way each manufacturer handles this command) or restructuring.

This state occurs on some newer devices when they lose power and come back online with no internet access
Will fix devices failing to come back online due to this issue and also handle devices that came back online but were not controllable until their state was switched
Every manufacturer I have handles this situation differently; there are likely still outstanding issues on manufacturers I do not have devices from
or even firmware versions on devices from the manufacturers I have tested
Tested on: Gosung, Sunco, Treatlife, Helloify, Supernight

Add a REFRESH command (0x12) to pytuya
Handle the refresh command in pytuya similarly to the handshake poll
Call the refresh command if we successfully connect but fail to update the initial status
Handle new edge cases, including:
* Incorrect switches to type_0d
* Heartbeats timing out before refresh is done
* Refresh command triggering two responses, one of which is an empty 0x08 status response

Yet to do:
* Configurable dpIds to refresh per device
* Decode status responses before deciding who to dispatch them to
* General cleanup for a better flow
@igr91
Copy link

igr91 commented May 31, 2021

So, yesterday I setup a single gang WiFi Smart Switch in HA + localtuya after getting the required keys. This one has the WA2 board, so no custom firmware possible. Worked perfectly in its own VLAN with no DNS and WAN access, completely isolated. HA reports firmware version as 3.3

After power cycling, it went zombie, but it would still connect to its access point and respond to pings.

After applying this PR and rebooting HA, it picked up the switch's status and started working right away just fine, just like yesterday. I didn't need to resync the device with the Smart Life app nor give it any connectivity.

Great work!

@Maronato
Copy link

Maronato commented Jun 2, 2021

This PR fixed my issues as well. LGTM!

@JVital2013
Copy link

Worked for me as well on some newer smart bulbs I have. Good job!

I think this also fixes #87

@jmkraan74
Copy link

hi,

I'm strrggling with the same, but I am completely new to HA. How do I implement your fix?

thnx for all the effort!

@Elendilon
Copy link
Author

hi,

I'm strrggling with the same, but I am completely new to HA. How do I implement your fix?

thnx for all the effort!

If you want to download before it is merged, you can check it out with git using "Open with" in the upper right, or you can download it as a zip file at:

https://github.com/Elendilon/localtuya/archive/2ee7e5c78831588f1a4da7428baad135c605bb62.zip

Then install the code normally as a custom component. Upload the custom_components/localtuya files in the zip to your HA's config/custom_components/localtuya directory. You can do so over SSH/SCP (install the HA SSH addon), or via the file manager you can install as an addon. HA's forum would be appropriate to search for more information on how to install, run and debug custom components.

@AdmiralStipe
Copy link

Worked for me as well on some newer smart bulbs I have. Good job!

I think this also fixes #87

Unfortunately it doesn't. When trying it all my Tuya plugs became and remained unavailable.

@chandr1000
Copy link

chandr1000 commented Jun 30, 2021

Applied on my HASS instance. It solved the "unavailable" problem while the internet access was blocked.
Tested with 8 lights & 1 power switch.

Need one more reviewer to approve the changes.

@eggwhalefrog
Copy link

I can confirm that it's working for my Novostella bulb as well.
This has been driving me crazy, thanks for the fix!

Copy link

@jrochate jrochate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine.

@joshuaspence
Copy link

Can confirm that it fixes my issue as well

@lloydw
Copy link

lloydw commented Oct 17, 2021

This fixed my down lights too.

Copy link

@Pheoxy Pheoxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be working fine from here after running for a while.

@Arakon
Copy link

Arakon commented Nov 15, 2021

I noticed one issue with this fix: It causes the device to become unavailable exactly once per minute, then come back on, triggering all related automations every time. Once returning to the original current release, this ceases.
image
image

@Elendilon
Copy link
Author

@Arakon If you want to turn on debug logging and then post a log, I can see if I can tell what your devices are doing. None of mine do the same; but as noted above every device manufacturer seems to treat refresh slightly differently.

@Arakon
Copy link

Arakon commented Nov 21, 2021

@Arakon If you want to turn on debug logging and then post a log, I can see if I can tell what your devices are doing. None of mine do the same; but as noted above every device manufacturer seems to treat refresh slightly differently.

Okay.. I'm not sure if it was just some really weird glitch or what, but I just reinstalled your version and the issue no longer happens. Only difference is that I also added a Tuya power socket since I tried last time. It's possible that I messed up copying all files and folders over last time, or that it simply glitched out. If it comes back, I'll try to get a debug log.

@shtrom
Copy link

shtrom commented Nov 24, 2021

Doesn't seem to work on my mid-2021 Mirabella globe (DPs in the 20--26 range)... I get a Connection to device succeeded but no datapoints found, please try again. error when trying to add it. It works fine without Internet block, and until the globe power is turned off and on again.

Debug log of trying to add the device
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Sending command status (device type: type_0a)
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Send payload: b'{"gwId":"<LOCAL_ID>","devId":"eba
fe767a15e74b8d5eknd"}'
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Waiting for sequence number 0
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching message TuyaMessage(seqno=0, cmd=10, retcode=1, p
ayload=b'\xf2\xe5\xeb\x86\xd4]\x97\x1b\x1d\xc8\xef\xaaU\x95\x9f\x1e\x80G\x98A\x13\xd6\xdb\xc0\x83\x15!4\x1f\x1c( ', crc=1611821793)
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching sequence number 0
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] switching to dev_type type_0d
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Re-send status due to device type change (type_0a -> type_0d)
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Sending command status (device type: type_0d)
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Send payload: b'{"devId":"<LOCAL_ID>","uid":"ebaf
e767a15e74b8d5eknd","t":"1637755414","dps":{"1":null,"2":null,"3":null,"4":null,"5":null,"6":null,"7":null,"8":null,"9":null,"10":null}}'
homeassistant    | 2021-11-24 23:03:34 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Waiting for sequence number 1
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching message TuyaMessage(seqno=1, cmd=13, retcode=0, p
ayload=b'', crc=808133514)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching sequence number 1
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Decrypted payload: {}
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Sending command status (device type: type_0d)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Send payload: b'{"devId":"<LOCAL_ID>","uid":"ebaf
e767a15e74b8d5eknd","t":"1637755415","dps":{"1":null,"11":null,"12":null,"13":null,"14":null,"15":null,"16":null,"17":null,"18":null,"19":null,"20":null}}'
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Waiting for sequence number 2
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching message TuyaMessage(seqno=0, cmd=8, retcode=0, pa
yload=b'3.3\x00\x00\x00\x00\x00\x00\x98\xf1\x00\x00\x00\x01WIB\xd0r&\x81\xa5\xac\x0b\x13\x95\n\x0f$\x99\xf0\xc0\x7f\n\x93b\x96\x93\x8a\xe4\xa2\xbe\xb5$O\xe6\x02E\x84\xa
1)\x133\xbe\xbbv\xb9C%]4W', crc=1101513801)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Got status update
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Decrypted payload: {"dps":{},"type":"query","t":13590}
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching message TuyaMessage(seqno=2, cmd=13, retcode=0, p
ayload=b'', crc=2380353348)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching sequence number 2
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Decrypted payload: {}
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Sending command status (device type: type_0d)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Send payload: b'{"devId":"<LOCAL_ID>","uid":"<LOCAL_ID>","t":"1637755415","dps":{"1":null,"21":null,"22":null,"23":null,"24":null,"25":null,"26":null,"27":null,"28":null,"29":null,"30":null}}'
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Waiting for sequence number 3
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching message TuyaMessage(seqno=0, cmd=8, retcode=0, payload=b'3.3\x00\x00\x00\x00\x00\x00\x98\xf2\x00\x00\x00\x01WIB\xd0r&\x81\xa5\xac\x0b\x13\x95\n\x0f$\x99\xf0\xc0\x7f\n\x93b\x96\x93\x8a\xe4\xa2\xbe\xb5$O\xe6\x02E\x84\xa1)\x133\xbe\xbbv\xb9C%]4W', crc=3268285578)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Got status update
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Decrypted payload: {"dps":{},"type":"query","t":13590}
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching message TuyaMessage(seqno=3, cmd=13, retcode=0, payload=b'', crc=1350014657)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching sequence number 3
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Decrypted payload: {}
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Sending command status (device type: type_0d)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Send payload: b'{"devId":"<LOCAL_ID>","uid":"<LOCAL_ID>","t":"1637755415","dps":{"1":null,"100":null,"101":null,"102":null,"103":null,"104":null,"105":null,"106":null,"107":null,"108":null,"109":null,"110":null}}'
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Waiting for sequence number 4
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching message TuyaMessage(seqno=4, cmd=13, retcode=0, payload=b'', crc=755273881)
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Dispatching sequence number 4
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Decrypted payload: {}
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Detected dps: {}
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Closing connection
homeassistant    | 2021-11-24 23:03:35 DEBUG (MainThread) [custom_components.localtuya.pytuya] [eba...knd] Connection lost: None

Confusingly, I don't think there are any of the debug log messages added by this PR in the log above, but I can confirm that the code in my custom_components/localtuya has the changes...

I do, however, also get a timeout I have been able to revive the device by force-setting the DPs that I know it has with tuya-cli. Every time I set a DP to a (valid) value, it becomes visible via a get, and usable by localtuya

tuya-cli set --id <LOCAL_ID> --key <KEY> --ip <IP> --protocol-version 3.3 --dps 20  --set true
tuya-cli set --id <LOCAL_ID> --key <KEY> --ip <IP> --protocol-version 3.3 --dps 21  --set 'white'
tuya-cli set --id <LOCAL_ID> --key <KEY> --ip <IP> --protocol-version 3.3 --dps 22  --set 500
tuya-cli set --id <LOCAL_ID> --key <KEY> --ip <IP> --protocol-version 3.3 --dps 25  --set '030e0d00000000000000001f403e8'
tuya-cli set --id <LOCAL_ID> --key <KEY> --ip <IP> --protocol-version 3.3 --dps 26  --set 0

as suggested as a fix in #574

@Elendilon
Copy link
Author

@shtrom

Ugh, this is yet another unique way this interaction can happen. The flow that the code is using is:

(Original) Connect
(Original) Try to get the status
(Original) If the request fails switch to the "new" type of status packet
(Original) Try to get the status using the "new" type of status packet
(New) If that request fails, switch back to the "original" packet type, then send the REFRESH command
(New) Try to get the status
(New) If that fails, switch to the "new" type of status packet
(New) Try to get the status
(Original) If that failed bomb out and log a connection error

In this debug log it looks like your path through this part is:

Connect
Try to get this status, device responds with garbage
Switch to the "new" status packet type
Try to get the status, using the "new" type
Device responds successfully (thus we continue on the success flow instead of the error/recovery flow above)

So you are not making it into my code at all, because your device responds with a valid packet instead of a failure - its just that the valid packet doesn't have any useful information. Eventually, during the normal/original code flow you try to get a list of usable DPS to continue config flow, but then that is empty and config flow bombs out. There is one missing line from your debug log that I would expect with my code - "Started heartbeat loop" - so it is possible you are not actually using my code? But given the rest of the log you wouldn't have made it into any other new code anyway.

I can add something that detects empty dps as an error pretty easy, but I'm not sure all of the other repercussions that might have. It's been very hard to dance around all the various ways different devices respond to different scenarios.

Your device, given the command you have run that works - will initialize if sent any "set" command. But not all devices will, and it is most likely a bug on the manufacturers implementation that it does so. So far, they all initialize if sent a "refresh" command - but some devices will kill the connection if sent a refresh command while they are already initialized, so we can't just send it every time we try to connect (or periodically like another PR is trying to do to solve the energy options not updating). Some devices respond with a single packet of type refresh, when sent a refresh command, others respond with two packets - and this tuya library was always set up for "one message sent, one response". So I had to implement a hack to handle that. Some devices will respond to the "heartbeat" command even when not initialized, but others just close the connection or respond with garbage if you do anything except send "refresh" - so I had to move starting the heartbeat to after we successfully get a good response back. Ideally there would be some way to ask the device if it is initialized - and that "garbage" response we get at first may be saying exactly that, but since we can't decode it we don't know. Ideally there is some concrete flow we can use that will work with all devices from all manufacturers - but I'm not even sure that is the case, given googling finds plenty of people with issues trying to use the official app when the internet is down and it is in local mode. This probably isn't something every manufacturer tests for.

Anyway, I will make an update later tonight that detects empty dps responses as an error - but I can't test it, so I'll send you a branch to test if you are willing.

@Elendilon
Copy link
Author

@shtrom

Sorry for the delay, been busy with thanksgiving stuff. I haven't even had a chance to upload and run this, so it may even have a syntax error heh. But I added a line that should throw an exception if we get a valid status packet that contains an empty dps object. Just one file modified.

https://github.com/Elendilon/localtuya/blob/zombie_fix_dps/custom_components/localtuya/pytuya/__init__.py

@shtrom
Copy link

shtrom commented Dec 2, 2021

@Elendilon thanks for that.

I'll give that a go as soon as I can but.... The globe has stopped misbehaving. I did notice before that, despite all DNS traffic being blocked, it somehow magically managed to know an IP for the Tuya cloud, and try to connect to it, which resulted in a zombie state with no DPs.

It now seems to have stopped doing that. I suspect it might have had some sort of DNS cache that would survive a power-cycle (even with a day's wait), and that cache would have eventually expired, leaving it to behave normally, and be happy to expose its DPs.

It's all conjecture at this point, but I'll keep an eye on it, and try your fix when things go wrong again.

@markdor
Copy link

markdor commented Dec 21, 2021

This fixed my Tuya Lights which I also power on/off manually. Please accept this PR.

@Elendilon
Copy link
Author

There is another PR implementing this same 0x12 command (for a different purpose). That PR is almost ready to be pushed. Once it is, I will pull that code here (they conflict slightly) and also do a bit of a rework on how a lack of response to the 0x12 command is handled (we found a few more/different ways devices handle receiving the command, and one of them is to revive but just not respond at all; so I need to keep trying the connection attempt if the response times out).

@sibowler
Copy link

sibowler commented Jan 4, 2022

Hi @Elendilon - happy to help test once you've merged this change in with the most recent changes. I've been working through "awakening" zombie devices, but your solution seems to be a lot more elegant than I've been able to figure out yet.

@LogSpider
Copy link

LogSpider commented Jan 16, 2022

@Elendilon thank you, you saved my day. Im new to localtuya, got it working yesterday with some led stripes and ceiling lights. I was happy that everything worked, everything was blocked in my isolated VLAN with pfsense, also saw that all devices were displayed as "offline" in the tuya iot dev console...

...until my wife switched off the power...

I searched hours, enabled and disabled my pfsense firewall rules, no chance. After power cycle nothing worked anymore.

EDIT: With your branch Lampux-RGBceilinglight is working after power cycle.
Thank you again.

Tested Devices Working with your fix, not the Lumary Stripes:
Lampux-RGBceilinglight

Tested Devices that stay unavailable with your fix:
Lumary rgb tunable white led strip
Lumary RGB Tunable White LED Strip A2

@Pirateguybrush
Copy link

Is this likely to be integrated soon?

If not - I have localtuya installed via HACS. How would I install this fix, and would it interfere with HACS ability to update/manage localtuya?

@Idan37S
Copy link

Idan37S commented Feb 8, 2022

I'm really waiting for this to be merged as well,
Right now it's working great for me but I always have to take the changes manually after every update.

leeyuentuen pushed a commit to leeyuentuen/localtuya that referenced this pull request Feb 13, 2022
leeyuentuen pushed a commit to leeyuentuen/localtuya that referenced this pull request Feb 13, 2022
leeyuentuen pushed a commit to leeyuentuen/localtuya that referenced this pull request Feb 14, 2022
leeyuentuen added a commit to leeyuentuen/localtuya that referenced this pull request Feb 14, 2022
Merge original request rospogrigio#491 from upstream branch
leeyuentuen pushed a commit to leeyuentuen/localtuya that referenced this pull request Feb 14, 2022
@manisar2
Copy link

Confirming that with this PR, my Globe bulbs have started working (the unavailability problem after power restart is gone). Thanks!

joshuaspence added a commit to joshuaspence/home-assistant-config that referenced this pull request Feb 24, 2022
@Pirateguybrush
Copy link

I copied the files over my existing localtuya files, and all my lights were unavailable after a reboot (before rebooting, they were working). Rebooted twice for good measure. I have two types of lights, what information would be useful to help troubleshoot this?

Rolled back using the HACS redownload option, and they were working again.

@codingcatgirl
Copy link

Sooooo, how's progress on this? It would be nice to finally see this merged.

@lloydw
Copy link

lloydw commented Apr 26, 2022

I copied the files over my existing localtuya files, and all my lights were unavailable after a reboot (before rebooting, they were working). Rebooted twice for good measure. I have two types of lights, what information would be useful to help troubleshoot this?

Rolled back using the HACS redownload option, and they were working again.

@Pirateguybrush this branch is getting pretty old, it will only works with HomeAssistant 2021.12 and earlier. The latest master of rospogrigio:master needs go be merged into this branch for it to work with newer versions.

Any chance of merging in master @Elendilon ?

@codingcatgirl
Copy link

The last LocalTuya update looked liked it adressed a similar issue. After removing my LocalTuya devices and setting them up anew, for me all issues are gone. If you're waiting for this PR to be merged, maybe check to see if that works for you too.

@lloydw
Copy link

lloydw commented May 7, 2022

Thanks for the tip @codingcatgirl .

Grabbing the latest did not fix my lights initially, however I have ported the additional relevant changes in a new PR here: #817

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet