Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta 0.114 All OZW devices report unavailable untill used #38658

Closed
VinceRMP opened this issue Aug 8, 2020 · 42 comments
Closed

Beta 0.114 All OZW devices report unavailable untill used #38658

VinceRMP opened this issue Aug 8, 2020 · 42 comments

Comments

@VinceRMP
Copy link

VinceRMP commented Aug 8, 2020

The problem

When I reboot homeassistant all z-wave devices are unavailable and unusable untill I manually use the device or it reports some information in case it is a sensor.
So for a switch I have to manually turn device on or off to make it available again.
Sensors become available again when they send an update.

When I force a reboot of the (external) OZW container everything is available when OZW has finished loading. I noticed this on the dev builds too, but since that is development that can happen. I tried the several dev builds again to see when it stopped working for me and that is 0.114.0-dev20200724. So 0.114.0-dev20200723 does work as expected. My production system (Home Assistant OS) which runs 113.3 kept working without any problem and keeps receiving all the information.
After updating this to the latest beta the same problem occured as on my test system (Home Assistant Container).

I have around 81 devices z-waves in my network.

Environment

arch x86_64
chassis vm
dev false
docker true
docker_version 19.03.5
hassio true
host_os HassOS 3.13
installation_type Home Assistant OS
os_name Linux
os_version 4.19.115
python_version 3.8.3
supervisor 229
version 0.114.0b2
virtualenv false
  • Home Assistant Core release with the issue: 0.114.0-dev20200724 and 0.114.0.b2
  • Last working Home Assistant Core release (if known): 0.113.3 and 0.114.0-dev20200723
  • Operating environment (OS/Container/Supervised/Core): Home Assistant OS and Container
  • Integration causing this issue: OZW
  • Link to integration documentation on our website: https://www.home-assistant.io/integrations/ozw

Problem-relevant configuration.yaml

Traceback/Error logs

No errors reported

Additional information

@MartinHjelmare
Copy link
Member

This sounds like symptoms from the cache invalidation that was part of the zwave add-on version 0.5.2.

@probot-home-assistant
Copy link

ozw documentation
ozw source
(message by IssueLinks)

@VinceRMP
Copy link
Author

VinceRMP commented Aug 8, 2020

The ozw cache file has been rebuild sometime ago if that is what you mean?
I use a seperate docker container for the OZW daemon (The all-in-one version build 170), so that shouldn't be the issue I think?
When I downgraded the dev build to the build 0.114.0-dev20200723 it worked as expected.
Also since I'm running a production and a test system I could use the different builds beside eachother.
The build before 0.114.0-dev20200724 worked, but everything starting from that build showed the problems.
And those builds used the same mqtt server and OZW daemon and also at the same time.

@VinceRMP
Copy link
Author

VinceRMP commented Aug 8, 2020

Maybe it is mqtt related, seeing a simular issue here:
#38661
And if that is the case maybe this PR is related: #37890
That date matches when my problem first occured in the dev builds.

@VinceRMP
Copy link
Author

I did some more testing and when I reconfigure mqtt from the integrations menu all devices are immediately available again. So this is not ozw related, but related to mqtt.

@probot-home-assistant
Copy link

mqtt documentation
mqtt source
(message by IssueLinks)

@probot-home-assistant
Copy link

Hey there @home-assistant/core, @emontnemery, mind taking a look at this issue as its been labeled with an integration (mqtt) you are listed as a codeowner for? Thanks!
(message by CodeOwnersMention)

@emontnemery
Copy link
Contributor

emontnemery commented Aug 10, 2020

maybe this PR is related: #37890

That PR can't be related, it only changed some tests.

What are the commit IDs for 0.114.0-dev20200724 and 0.114.0-dev20200723?

@VinceRMP
Copy link
Author

Those are the nightly dev builds which I pulled from docker hub.
I don't know where I can get the commit ids from those builds?

@MartinHjelmare
Copy link
Member

MartinHjelmare commented Aug 10, 2020

There doesn't seem to be any changes to the mqtt integration in that date range.

Would you be able to capture some log info during a start of home assistant with both the "bad" and "good" build?

logger:
  default: info
  logs:
    homeassistant.core: debug
    homeassistant.components.mqtt: debug
    homeassistant.components.ozw: debug

If you can capture the docker ozwdaemon logs, that could also help.

@emontnemery
Copy link
Contributor

Note: I edited @MartinHjelmare to request log both from the good and bad version.

@VinceRMP
Copy link
Author

I attached both the log files. I used 0.113.3 and 0.114b4, hopefully that is ok too.
home-assistant logs.zip

OZW logs can be done, but since I'm running two installations that would take some more work.
For a clean log file I have to stop my production system. But I can't do that right now.

@emontnemery
Copy link
Contributor

emontnemery commented Aug 10, 2020

Assuming it's a problem for all ozw entities, here's an example:
Working log: MQTT message for sensor.ozw_multisensor_zolder_unknown_3 is received and state is first initialized as unavailable and then updated.
Note that it takes 30 seconds before the state is updated though.

2020-08-10 18:05:35 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on OpenZWave/1/node/2/instance/1/commandclass/49/value/36310272034226194/ (retained): b'{\n    "Label": "Unknown",\n    "Value": 31.5,\n    "Units": "",\n    "ValueSet": false,\n    "ValuePolled": false,\n    "ChangeVerified": false,\n    "Min": 0,\n    "Max": 0,\n    "Type": "Decimal",\n    "Instance": 1,\n    "CommandClass": "COMMAND_CLASS_SENSOR_MULTILEVEL",\n    "Index": 129,\n    "Node": 2,\n    "Genre": "User",\n    "Help": "",\n    "ValueIDKey": 36310272034226194,\n    "ReadOnly": true,\n    "WriteOnly": false,\n    "Event": "valueAdded",\n    "TimeStamp": 1597064390\n}'
2020-08-10 18:05:35 DEBUG (MainThread) [homeassistant.components.ozw] [VALUE ADDED] node_id: 2 - label: Unknown - value: 31.5 - value_id: 36310272034226194 - CC: CommandClass.SENSOR_MULTILEVEL
2020-08-10 18:05:35 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=sensor.ozw_multisensor_zolder_unknown_3, old_state=None, new_state=<state sensor.ozw_multisensor_zolder_unknown_3=unavailable; unit_of_measurement=, friendly_name=OZW Multisensor Zolder: Unknown @ 2020-08-10T18:05:35.450914+02:00>>
2020-08-10 18:06:05 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=sensor.ozw_multisensor_zolder_unknown_3, old_state=<state sensor.ozw_multisensor_zolder_unknown_3=unavailable; unit_of_measurement=, friendly_name=OZW Multisensor Zolder: Unknown @ 2020-08-10T18:05:35.450914+02:00>, new_state=<state sensor.ozw_multisensor_zolder_unknown_3=31.5; node_id=2, unit_of_measurement=, friendly_name=OZW Multisensor Zolder: Unknown @ 2020-08-10T18:06:05.432410+02:00>>

Failing log: MQTT message for sensor.ozw_multisensor_zolder_unknown_3 received, but state is not updated:

2020-08-10 18:08:03 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on OpenZWave/1/node/2/instance/1/commandclass/49/value/36310272034226194/ (retained): b'{\n    "Label": "Unknown",\n    "Value": 31.5,\n    "Units": "",\n    "ValueSet": false,\n    "ValuePolled": false,\n    "ChangeVerified": false,\n    "Min": 0,\n    "Max": 0,\n    "Type": "Decimal",\n    "Instance": 1,\n    "CommandClass": "COMMAND_CLASS_SENSOR_MULTILEVEL",\n    "Index": 129,\n    "Node": 2,\n    "Genre": "User",\n    "Help": "",\n    "ValueIDKey": 36310272034226194,\n    "ReadOnly": true,\n    "WriteOnly": false,\n    "Event": "valueAdded",\n    "TimeStamp": 1597064390\n}'
2020-08-10 18:08:03 DEBUG (MainThread) [homeassistant.components.ozw] [VALUE ADDED] node_id: 2 - label: Unknown - value: 31.5 - value_id: 36310272034226194 - CC: CommandClass.SENSOR_MULTILEVEL
2020-08-10 18:08:03 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=sensor.ozw_multisensor_zolder_unknown_3, old_state=None, new_state=<state sensor.ozw_multisensor_zolder_unknown_3=unavailable; unit_of_measurement=, friendly_name=OZW Multisensor Zolder: Unknown @ 2020-08-10T18:08:03.339592+02:00>>

@emontnemery
Copy link
Contributor

emontnemery commented Aug 10, 2020

@VinceRMP In "bad" case log, it seems not all OZW devices are stuck as unavailable.
For example switch.basic_wall_controller_instance_1_switch_2 and sensor.ozw_dimmer_zolder_instance_1_power both seem to get their state correctly. Can you confirm this is the case, and give examples of entities stuck as unavailable and entities working normally?

@VinceRMP
Copy link
Author

VinceRMP commented Aug 10, 2020

It happens to all ozw devices unless they receive an update in case of an sensor or I manually switch the device on or off. When I reboot HA some of the devices will return as available but this happens because they are triggered for example the motion sensors. Also it seems an automation is able to activate an switch, but I can't do it from the UI. I haven't checked it fully, but I noticed this earlier today.
What also causes devices to become available is my production system which turn devices on and off through automations. What I do at the moment to restore it instantly is a quick change in the mqtt settings to trigger a refresh/discovery. (I change the birth message qos setting from 0 to 1 and back at the next reboot, this is also the case for my prod system which run 0.114b4)

@VinceRMP
Copy link
Author

The ones I notice that it does not work is for example:
sensor.humidity_woonkamer
This is listed on my floorplan so this triggers me if it fails.

Most switches also fail, for example
switch.synology
switch.meterkast

These device are always turned on so don't change the status.

@emontnemery
Copy link
Contributor

emontnemery commented Aug 10, 2020

For switch.synology (working case):
MQTT Message for switch.synology is received and state is first initialized as unavailable and then updated.
Note that it takes 30 seconds before the state is updated, same as for the previous example.

2020-08-10 18:05:36 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on OpenZWave/1/node/13/instance/1/commandclass/37/value/222904336/ (retained): b'{\n    "Label": "Switch",\n    "Value": true,\n    "Units": "",\n    "ValueSet": true,\n    "ValuePolled": false,\n    "ChangeVerified": false,\n    "Min": 0,\n    "Max": 0,\n    "Type": "Bool",\n    "Instance": 1,\n    "CommandClass": "COMMAND_CLASS_SWITCH_BINARY",\n    "Index": 0,\n    "Node": 13,\n    "Genre": "User",\n    "Help": "Turn On/Off Device",\n    "ValueIDKey": 222904336,\n    "ReadOnly": false,\n    "WriteOnly": false,\n    "Event": "valueChanged",\n    "TimeStamp": 1597064664\n}'
2020-08-10 18:05:36 DEBUG (MainThread) [homeassistant.components.ozw] [VALUE ADDED] node_id: 13 - label: Switch - value: True - value_id: 222904336 - CC: CommandClass.SWITCH_BINARY
2020-08-10 18:05:36 DEBUG (MainThread) [homeassistant.components.ozw.entity] Adding Node_id=13 Generic_command_class=16, Specific_command_class=0, Command_class=CommandClass.SWITCH_BINARY, Index=ValueIndex.ALARM_TYPE, Value type=ValueType.BOOL, Genre=ValueGenre.USER as switch
2020-08-10 18:05:36 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=switch.synology, old_state=None, new_state=<state switch.synology=unavailable; friendly_name=Synology, icon=mdi:nas @ 2020-08-10T18:05:36.267241+02:00>>
2020-08-10 18:06:05 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=switch.synology, old_state=<state switch.synology=unavailable; friendly_name=Synology, icon=mdi:nas @ 2020-08-10T18:05:36.267241+02:00>, new_state=<state switch.synology=on; node_id=13, friendly_name=Synology, icon=mdi:nas @ 2020-08-10T18:06:05.426897+02:00>>

For switch.synology (failing case):

2020-08-10 18:08:03 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on OpenZWave/1/node/13/instance/1/commandclass/37/value/222904336/ (retained): b'{\n    "Label": "Switch",\n    "Value": true,\n    "Units": "",\n    "ValueSet": true,\n    "ValuePolled": false,\n    "ChangeVerified": false,\n    "Min": 0,\n    "Max": 0,\n    "Type": "Bool",\n    "Instance": 1,\n    "CommandClass": "COMMAND_CLASS_SWITCH_BINARY",\n    "Index": 0,\n    "Node": 13,\n    "Genre": "User",\n    "Help": "Turn On/Off Device",\n    "ValueIDKey": 222904336,\n    "ReadOnly": false,\n    "WriteOnly": false,\n    "Event": "valueChanged",\n    "TimeStamp": 1597064664\n}'
2020-08-10 18:08:03 DEBUG (MainThread) [homeassistant.components.ozw] [VALUE ADDED] node_id: 13 - label: Switch - value: True - value_id: 222904336 - CC: CommandClass.SWITCH_BINARY
2020-08-10 18:08:03 DEBUG (MainThread) [homeassistant.components.ozw.entity] Adding Node_id=13 Generic_command_class=16, Specific_command_class=0, Command_class=CommandClass.SWITCH_BINARY, Index=ValueIndex.ALARM_TYPE, Value type=ValueType.BOOL, Genre=ValueGenre.USER as switch
2020-08-10 18:08:03 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=switch.synology, old_state=None, new_state=<state switch.synology=unavailable; friendly_name=Synology, icon=mdi:nas @ 2020-08-10T18:08:03.988276+02:00>>

@emontnemery
Copy link
Contributor

sensor.humidity_woonkamer seems OK in the "failing" log:

2020-08-10 18:08:10 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=sensor.humidity_woonkamer, old_state=<state sensor.humidity_woonkamer=unavailable; restored=True, supported_features=0, device_class=humidity, unit_of_measurement=%, friendly_name=Vochtigheid Woonkamer @ 2020-08-10T18:08:05.795414+02:00>, new_state=<state sensor.humidity_woonkamer=43.0; node_id=77, unit_of_measurement=%, friendly_name=Vochtigheid Woonkamer, device_class=humidity @ 2020-08-10T18:08:10.372409+02:00>>

@VinceRMP
Copy link
Author

Is the 30 seconds a problem?
After home assistant reports it is finished starting up it always take a while (could be 10-20 seconds) before z-wave devices are completely loaded. But since the 81 devices use 441 entities I did not see that as a problem. After that it is quick and stable.

Do you have an idea why on the latest releases de states are not updated? Could it be a timing issue perhaps?
The devices when activated work without a problem. Only the first initialization after a reboot seems to fail or not finish.

@probot-home-assistant
Copy link

ozw documentation
ozw source
(message by IssueLinks)

@emontnemery
Copy link
Contributor

@MartinHjelmare can you take a look again? From what I can understand, MQTT messages are received and acknowledged by the ozw integration.

@probot-home-assistant
Copy link

Hey there @cgarwood, @marcelveldt, mind taking a look at this issue as its been labeled with an integration (ozw) you are listed as a codeowner for? Thanks!
(message by CodeOwnersMention)

@emontnemery
Copy link
Contributor

emontnemery commented Aug 10, 2020

PR #38005 matches the 30s delay until state is updated (30s is default poll interval).

@VinceRMP Maybe you could try reverting #38005 (just remove the should_poll function from homeassistant/components/ozw/entity.py)

@VinceRMP
Copy link
Author

I just checked it and that solves the problem it seems. Did a few reboots and ozw works everytime after the reboot. Tried it on the latest 0.115 dev build. If you want I can also try it on the 0.114b4 build?

@MartinHjelmare
Copy link
Member

I think the problem is that we create the entity too early before the complete node state is reported from the broker. The polling worked around that issue by scheduling a state update after 30, or so, seconds. When we removed the polling, the issue surfaced.

@MartinHjelmare
Copy link
Member

MartinHjelmare commented Aug 11, 2020

We've been looking at this, but fail to reproduce it. The polling lead seems to be a false lead. The ideas I had seem to be dead ends. Two different test environments are working good with latest dev branch.

Can we get an mqtt dump where the problem exists?
https://www.home-assistant.io/docs/mqtt/service/#service-mqttdump

@VinceRMP
Copy link
Author

Yes sure no problem. Just with the default settings and directly after the reboot I assume?

@MartinHjelmare
Copy link
Member

Yes, that sounds good. Increase the listening time to at least 30 seconds to capture all the messages.

@VinceRMP
Copy link
Author

See the attached file, this was run on the latest nightly dev build.
But I must say when I run the dump all devices become available when I press the button. Somehow it is refreshed at that time it seems?

mqtt_dump.txt

(If this really only happens on my side, I can automate it now to solve the issue with the service call)

@MartinHjelmare
Copy link
Member

Can you test with the Supervisor zwave add-on for the ozwdaemon instead of standalone docker? It would be interesting to see if there's a difference in behavior.

@VinceRMP
Copy link
Author

VinceRMP commented Aug 11, 2020

I could but that would need some reconfiguration.
At the moment I use a seperate mqtt container and also the ozw deamon is running as a stand alone solution.
I don't want to reconfigure my production system for this, since that is the only installation using the supervisor.
My beta/dev is using the home assistant container installation.

What you want me to do is create a new system using Home Assitant OS, load my current configuration.
Change my seperate mqtt en ozw deamon to the one supplied from the supervisor and see if the same problem occures?

@MartinHjelmare
Copy link
Member

Yes, if you can do that, it could help pinpoint the issue. I understand that's not very easily done. If you don't have a spare computer it might be too much work.

@VinceRMP
Copy link
Author

It's no problem. I'm glad I can help with the issue.
I'm running everything on a Vmware server, so it's no big issue.
I have already setup the new supervisor installation and when I import my configuration the problem is there again. So that stays the same.
I have setup the mqtt part already, but to move the ozw installation I need to make some time for that. I have to move the usb stick to the new installation and that creates some downtime.
I will see if I can do that this evening or else tomorrow morning.

@VinceRMP
Copy link
Author

VinceRMP commented Aug 11, 2020

If it still gives problems, do you want some log files?
If so which would you need, the same as I supplied earlier or different ones or with different settings?

@marcelveldt
Copy link
Member

Thanks for your assistance on this. We were not able to reproduce the issue on our environments so would be great if you can help us out testing. We want to make sure there isn't some global issue introduced in recent changes.

@marcelveldt
Copy link
Member

For now the test with the OZW daemon addon will suffice. After that we can do some more debugging with logs.

@VinceRMP
Copy link
Author

No problem, I'm a happy HA user and since this is the only way I can contribute I'm happy to do so.
I have just moved my stick to the new system and it's generating the cache file. I hoped all the devices would be recognised in HA as the same entities and they do, so this makes testing a lot easier. Is it necessary to have all the battery powered devices activated/available also?

@VinceRMP
Copy link
Author

Ok so I have some good news I guess. I can't reproduce it either when using the asked configuration. Just to make sure I have activated most of the battery devices, only missing 7 which aren't easy to reach. So it doesn't seem to be a global issue.

Still not sure what the problem on my side is though.. and if other people will have a problem when using external docker applications.

@MartinHjelmare
Copy link
Member

Big thanks for taking time to test so thoroughly!

We can look at the changes in OpenZWave and ozw daemon that have happened in the versions used in the add-on compared to the docker image you're running. Maybe there are clues there?

@VinceRMP
Copy link
Author

VinceRMP commented Aug 11, 2020

No problem, glad I can help.
I did some more testing and I switched my OZW container to the HA mqtt container and that also works without a problem on the test system. After that I switched my production system also to this HA mqtt container, this also works without a problem. So somehow something is not working with the seperate mqtt server it seems?
0.113.3 connects fine to this container and so I assume the OZW connecting is not the problem.
The settings used on the mqtt connection from HA are the same to both mqtt containers.
The mqtt container I use which does not seem to work anymore is using default settings.
I only changed the username and password and disabled anonymous access.

Edit: I just realised that my last test with my production system does not have to be correct.. Earlier today I made an automation which executes the mqtt.dump, which force the refresh needed to make it work. I could do that test again if needed to make sure, but that won't be today.

@kpine
Copy link
Contributor

kpine commented Aug 11, 2020

If the mqtt.dump call is influencing the behavior, maybe try getting a topic dump using a 3rd party utility? mosquitto_sub can produce a similar dump.

mosquitto_sub -t OpenZWave/# -h broker -v -W 5

@VinceRMP
Copy link
Author

I checked the functionality again from my production system (0.114.0b4) against the HA mqtt test connected to the seperate OZW container. This does work with the mqtt dump automation disabled. So changing mqtt server causes the problem somehow.

Attached two mqtt dumps from the mosquitto_sub command (30 seconds)
After a HA reboot, when not all devices are available:
mqtt_failing.log
After running the mqtt dump from HA to force all device to become available:
mqtt_working.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants