Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future of Z-Wave Integration #81

Closed
cgarwood opened this issue Sep 24, 2018 · 51 comments

Comments

Projects
None yet
@cgarwood
Copy link

commented Sep 24, 2018

The Thinking Big blog post talks about better Z-Wave integration by replacing OpenZWave with the SiLabs public SDK.

I'm guessing this will take a while to build out. So my first question is: what do we want to do regarding improvements/etc with the current Z-Wave implementation (like converting to config entries, implementing the device registry, etc). Is it worth it to code that into the current implementation, or should we leave it as-is and focusing on building the new integration?

Outside of moving to use config entries/the device registry are there any major architecture changes that need to be addressed? Has anyone put thought into a migration process, or has it been mostly an idea but hasn't gotten a lot of attention yet?

Mostly just want to get the discussion started so we can have a solid foundation and idea where to focus development time and how to respond to any issues that crop up with the current implementation.

@balloob

This comment has been minimized.

Copy link
Member

commented Sep 24, 2018

I think that the old Z-Wave implementation will stay existing for a long time, because getting to feature parity will take a while. So yes, I think we should move forward with config entry and device registry.

Config entry could be as simple as trying to auto find the USB stick, test it and then configure it.

Device info should be very very simple, as Open Z-Wave has all that data available. The nice thing is that a new Z-Wave implementation will be able to fill the slots of the same entities and devices, as the identifiers are provided by the Z-Wave network, not openzwave.

@DubhAd

This comment has been minimized.

Copy link

commented Sep 24, 2018

This also potentially gives three integrations for Z-Wave:

  1. OZW (local)
  2. Z-Ware
  3. Z/IP

The last two could support multiple configurations, since they may be in other buildings. There have been a number of people who've wanted similar functionality.

@DubhAd

This comment has been minimized.

Copy link

commented Oct 21, 2018

Given the amount of flack the current Z-Wave integration is getting right now (go browse #zwave if you haven't) is there any news?

@daringer

This comment has been minimized.

Copy link

commented Nov 5, 2018

hey hey together,

@DubhAd (alias tinkerer) on Discord made me aware of this thread, maybe I can support here. I have a rather big network (70+ nodes) means inclusion, maintenance especially shifting e.g., 10 nodes to a new configuration is really not fun. ozcwp helped "sometimes", but, well---nothing to add here 😄

Sooo, long story short: zwave-core

Some targets and status right now:

  • first public release to github yesterday, wanted to have it on a certain level first
  • REST API already exposes nearly the full ZWave stack
    (scenes missing, assoc groups adding/removing nodes, some other minor stuff)
  • a web frontend (ozcwp replacement) making use of the REST API and websocket-based push to not miss a single signal, overall it's "ok", worth calling it ozcwp replacement with a pretty long list of details to be done
  • next step would be MQTT, use the frontend to pick/combine whatever is needed and publish it, latest at this point this might get interesting for HA

Overall the latter is the designated goal for me, because I clearly see this abstraction as highly needed for networks above a certain size, not to talk about physical separation of HA and Z-Wave. Under the line the architecture is also not bound specifically to python-openzwave as I also see after this step openzwave as the next weak spot.

Hope I'll find time to update the README today for some more details.
best

@balloob

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

@DubhAd: Given the amount of flack the current Z-Wave integration is getting right now

I actually don't think that there is too much flack. I did some asking and things missing were CentralScene, Barrier and sending null to clear a code on a lock.

@pvizeli is actually preparing a soft-fork of OZW for the next release, which will be the master branch of OZW + Barrier support. CentralScene cannot be easily cherry-picked and sending null was never merged into OZW because of other issues.

@daringer I applaud the effort but I think that your code is too much a WIP to be considered for HA. I think that having a standalone process manage Z-Wave would be great. It's something we also mention in my research doc. However, if we are going to go that direction, we really should not rely on OZW.

@daringer

This comment has been minimized.

Copy link

commented Nov 6, 2018

Oops, sorry if I left the impression it's mature, absolutely not! It's more like version 0.1.
I did not intend to jump in and replace Z-Wave from one day to another.

The motivation on my side is that I actually have major issues maintaining my network in HA (e.g., try to change 4 config values on 10 nodes, this made me crazy). Further the whole configuration mechanism for Z-Wave nodes in HA is not really fun, not to talk about the "blind" inclusion (which is really painful for battery devices). This is just my approach to make HA a little better for me.

From backend-side sure ozw is sub-par, but haven't had the chance to build an opinion on both newer ones...

@balloob

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

Check my research doc for some alternatives.

If "blind" inclusion is a problem in HA, I would prefer we fix it in HA without requiring each user to build out their own external HA platform 😉 Same with if ouf config panel is not up to par. I wonder if running Z-Wave with a virtual serial port that connects to another container will already solve your problems. Anyway, this is going off-topic and we should not continue this discussion here.

@riemers

This comment has been minimized.

Copy link

commented Nov 30, 2018

The only pain i always feel is the startup time if you have 80+ nodes. If a node is dead it also doesn't always show that and you have to really look carefully to see what the problem is. If the dead node cannot be excluded in your network it will really screw up your speed/startup up to the point that it is not workable.

But down in the end, if it would start up and use the state that was saved so it would just work fast straight away it would be peachy in my eyes. Now i have to wait +- 10 minutes before i can test an automation with z-wave devices and that really takes it toll.

@lukas-hetzenecker

This comment has been minimized.

Copy link

commented Dec 12, 2018

@balloob In your research doc you seem to mention that the "Z/IP gateway" is a binary blob, with no source code provided.
But are you sure about that? The "Z/IP Gateway SDK" is freely available and it does contain the source code of the gateway. The binary can be compiled from this source via cmake.
Granted, it isn't licensed under the GPL - but at least we have access to the source code?

@riemers

This comment was marked as off-topic.

Copy link

commented Jan 2, 2019

Imho, i don't think this topic is the right discussion for this. This topic is about where we want to go with z-wave in the future. That said, you could look into https://community.home-assistant.io/t/best-way-to-create-dedicated-zwave-instance/67712 perhaps that might fit your need too.

@kdschlosser

This comment has been minimized.

Copy link

commented Feb 17, 2019

I wanted to chime in on this. Tho the zwave specification is public (some of it is) the most important component is not. and that is the serial communications with the zstick. The folks over at OpenZwave have spent an unimaginable amount of time reverse engineering that whole protocol. This is not something I think anyone at Home Assistant would want to do (I know I wouldn't).

As far as the startup time is concerned. I think that this is partly the fault of OpenZwave partly the z stick and also partly the fault of Home Assistant. I do know that on the OpenZwave end of things they are trying to get the network to load properly without having to scan the entire network. There also appears to be an issue with the most popular Z Stick, the Aeotec Z-Stick Gen5. this stick has some kind of a design flaw in it where if it sends out a command it waits for a response. another command cannot be sent until the response gets returned. This is extremely inefficient and part of the speed issues I would imagine. This is not the fault of the developers at OpenZwave. Tho they are trying to come up with some way to work around the issue.

Now I also mention part of the issue being Home Assistant. I could be wrong in my interpretation of what is taking place, and If I am incorrect please correct me.

When Home Assistant gets started up it creates an entity that represents each device. Now instead of grabbing device states when they are needed everything gets lumped in a single call. When dealing with z wave. there is no way to get "everything" from a device in a single call. each variable on a node is a separate request. Now zwave is not a very fast network, and if you combine that with a poor network layout that has bottle necks this is going to cause big issues. Not to mention the fact that Home Assistant is now trying to update information for how ever many nodes there are all at the same time. I have not specifically looked at the code for the zwave portion of Home Assistant but if it follows all of the rest of the components coding layout there are going to be zwave network bottle necks and huge latency when device states change .

I would imagine one of these 2 things is happening. either the information is being requested from each device every 10 seconds. or polling is set up on the network and every variable for every device on the network is being polled for data changes... Either way by it's self is very bad,. excessive polling = network congestion, polling caused by Home Assistant = network congestion. and now the eyebrow goes up and the question is how do we get the needed data correctly.

to a degree we do both. setup zwave to poll for the important things. like the state of a light. or the state of an alarm. Now here is the sneaky bit. the polling done by the network can be dynamically changed.. On a per variable basis. a device may have say 20 variables. we are only interested in knowing 1 of them. so instead of setting a global polling interval. there is the ability to only poll that single variable. if you have 20 devices and each device has 20 variables. if you have a global poll set up there would be 400 requests and 400 responses, seems a wee bit excessive if we only want to know what a single variable is from each device. I am thinking that 20 requests and 20 responses would be the better way to go. Would there be a need to poll the lights for their state if you have the alarm sensors armed? typically an arm alarm means you are not there. and if no one if there would the light states change?. This also works vice versa. there are things on the network that will need to be polled based on time of day. or the state of another device.

also. instead of adding a whole gaggle of devices constantly getting polled by Home Assistant. there needs to be a single entity added for the network and this entity gets polled by Home Assistant. and that entity should handle what does and does not actually need to be queried from the devices.

here is something that gets done with media_player devices. every 10 seconds we need to update the volume, the state, the mute, the source..... even if the device is not opened. the volume indicator, mute indicator and the source are only visible if the device is open.. so why are we updating those things when they are not being used? There are methods and properties in place that simply return the stored value. how come these are not being used to get the state from the device when it is actually needed?

My point is there is a massive amount of network IO happening on the zwave network that does not need to take place.

The performance issues with the initial loading of Home Assistant and the zwave nodes can also be fixed. by creating a file with a list of the devices on the network. this is what gets used to initially populate Home Assistant. then start a thread and start the network in that thread. and let Home Assistant go about it's business. the devices will get updated as the variables get updated from the initial scan of the network.

I want to also want you to be able to trust what I am saying about how the zwave network operates. I am the guy that wrote the portion of python_openzwave that made it possible for Home Assistant to have zwave available for Windows users (wasn't the easiest thing to get working)

I have also submitted this PR OpenZWave/python-openzwave#134. This PR is going to make it easy to use as well as improve performance This PR allows for a proper representation of a node on the network. instead of having to make all kinds of calls to the device to find out what it is capable of doing.

I know the python_openzwave very well. I also have a good handle on libopenzwave, what I do not know is Home Assistant. If someone that knows Home Assistant inside and out is willing to work on the zwave portion to get it performing better. I would be more then willing to share any kind of information I have.

If you are running Windows I can also show you an implementation that I wrote which is rocket fast. <1 second state updates from a device (not zwave plus) <1 second on changing the state of a device. on a well populated network. I have close to 70 devices on my network and it does not take that long to start up.

@riemers

This comment has been minimized.

Copy link

commented Feb 17, 2019

Appreciate the time and effort which you took into looking into all of this. Points made give me a warm feeling 👍 lets hope @balloob can help out or give someone a nudge to see if this can be a good way forward into improving z-wave. I once stepped away from Fibaro Home Center to HA and this is the only thing that was better in HC (although we are getting closer and closer)

p.s. what does "does not take that long to start up" mean? 10 seconds, 20?

@kdschlosser

This comment has been minimized.

Copy link

commented Feb 17, 2019

3-4 seconds. if that.

@riemers

This comment has been minimized.

Copy link

commented Feb 17, 2019

If people that can truly help on this issue require hardware that they don't have. I'll sponsor if needed. (assuming z-wave sticks and such) just trying to chip in, assuming thats ok.

@kdschlosser

This comment has been minimized.

Copy link

commented Feb 17, 2019

Thee reason I got involved in python-openzwave (the backend that hass uses) was because hass offered support for zwave but only on NIX OS's. and not on Windows, upon investigating I found that it would not compile on Windows. so I set out on a mission to fix the thing. My knowledge of python was more limited then what it is now. so It took me about a year to get the thing to compile properly. I had to fix the 2 python packaging libraries distutils and also setuptools. neither one of the would support the whole plethora of possible Windows C compilers and SDK's. now while the current openzwave library that hass uses does work.. it is extremely limited. and it will only work if you have Visual Studio 2017 installed.

Me being the kind of person I am.. I could not leave it as that. Not everyone is going to have VS2017 installed. and VS2017 is a fat cow. chiming in at around 20 gigs of disk space needed. I wanted to offer more. I could have left it and gave those specific requirements., I felt that was not the right thing to do. The most current version of the code sets up an exact replica of the build environment that Visual Studios does. It will detect if you have any of the following compilers installed. VS 2008 - current, Visual C 2008 - current, and also Visual C Build tools and Visual Studio build tools. Th last 2 are all of the things needed to compile C code without the bulk of having an IDE installed.
It will detect all versions of .NET, NETFX and all of the Windows SDK's, HTML Help, FSharp and others....

I moved away from using MSBuild to compile libopenzwave. So there is no more converting Visual Studio solutions (which could have led to a whole assortment of problems). In a nut shell libopenzwave extension is no longer built by Visual Studio. It is now built using distutils internal to python. Some of the benefits of using distutils are more control over the compiler options (being able to hide warnings). the biggest being multi-threaded compiling of libopenzwave. so now instead of having to wait 10-15 minutes.. the whole install process from start to finish takes less then 2 minutes. and all MS compilers made from 2008 until now are supported.

@kdschlosser

This comment has been minimized.

Copy link

commented Feb 17, 2019

I did a PR last night for Home Assistant that adds detection of the serial port the Z Stick is connected to. so no having to go an check. This is important because the Windows formatting of a COM Port is not what everyone thinks it is. It is not COM10.. or COM22.. it is actually a virtual namespace path. very similar to what NIX OS's do with a file on the HDD except Windows does it in memory (database)

I was glancing at the code for the zwave component. it is hard to follow what is happening. I am going to have to add a function logger to see what the data path is. This is also going to let us know where specifically in the code the bottle neck is.

@johnflorin

This comment has been minimized.

Copy link

commented Feb 18, 2019

Fascinating discussion here!

I'd just like to add that at least for me usability would be vastly improved if some kind of wizard could be set up for adding/removing/resetting ZWave devices, maybe something that could just read the OZW log and present it in a readable way, telling you if something succeeded or failed.

I say this because it would surely be easier to implement than a full stack change and at the moment if things don't go absolutely perfectly with those 3 ZWave actions, you are left to go thru a gigantic OZW log, trying to figure out what happened.

This is immensely frustrating, by far the worst thing in HA (for me)...Vera, for example, has a reasonable wizard. I left Vera after the magic of HA showed me that it was not always seeing state changes in my door/window sensors, but if we could somehow have what we have now but with a more informative UI, it would be exceptional.

@riemers

This comment has been minimized.

Copy link

commented Feb 18, 2019

I would first go for improving the zwave stack. Although what you mention is true, once configured i have no problems anymore. Where as the speed of starting up when trying out new automation or new setups keeps being a pain. But there is no reason not to do both if enough volunteers can work on it, since frontend and zwave stack can be worked on separately up to some points.

@kdschlosser

This comment has been minimized.

Copy link

commented Feb 18, 2019

@johnflorin

I to cam from the Vera. the reason I left was because of the same issues that Home Assistant is plagued with slow boot times. slow device updates.. things of that nature. I searched and searched and found no standalone controller that was "up to snuff" the standalone controllers I found they all always tried to be something they are not. they are a z-wave controller but they all tried to be a home automation controller This simple thing made the software on them bloated and slow. This is what I had initially thought. then I started looking at software for the PC. I found that the z wave implementations that would run on a PC also suffered from the same latency problems. that's when i started really digging into the z wave protocol. I wanted to know why there was this large latency problem. there are several factors which can cause latency problems. But the one that I have found to be the cause is a poor design of how to grab the information from the individual devices.

This is my interpretation of the z wave protocol's transmission/receive design.
Z Wave is a "mesh" network. any device that has a hard wired power source is also a repeater.
so issue # 1 a network with mostly battery powered devices = network not going to work properly.
issue # 2 devices are pretty dumb to say the least. they only have thee ability to do a single thing at a time. so if a light switch is being used via the manual rocker. it is will stop sending and receiving network packets. there is no "queue" for the packets. the device essentially acts like it has been disconnected from the network, this same thing takes place while it is sending a packet. now normally this issue would not be to much of a problem. because with a mesh network a single device will have a handful of "neighbor" devices. so if the first doesn't respond then it moves onto the next. once all friends have been tried. and the information is not able to be sent or repeated. the device simply gives up. and the data never makes it where it is supposed to go. this is going to appear to the user as latency. because of the above design limitation of the devices if you have a network that is to congested this problem is going to happen. so polling devises for all of there stats all the time even when the device is not in use is going to cause an enormous amount of network traffic which is going to lead to poor response times from devices.

what I have found is taking place in almost all implementations is that the software implements no kind of polling management. they either leave it to the user to set up or they simply ask for everything every single time, if left to the user. the user has no clue about the problems that can occur. and the latter is a problem when you get say more then 10 devices on the network.

I wrote a program that would create an image of the network connections. this shows all of the "friends" a node has. the more friends a node has the better. this is a great tool to be able to diagnose a bad network design. bad design in terms of the physical layout of the devices. I built in the ability to show this graph into my implementation.

node_layout

I also educated the hell out of the users. the same that is being done here. I gave the user the ability for them to set what to poll and when to poll. I designed it so that the user could set the polling for each variable on each device dynamically. so through the use of automations the polling could be adjusted when necessary. like i described above. no need to be polling for light state changes when the alarm sensors are armed. network management is the key to offering great z wave control. but in order to do that tools and education need to be available to the users. lets face it. as a user I want to be able to control everything my device is able to do.. "if you can't hack it then you don't own it". if i do not know what all of the adjustments and settings are then I can very easily make a mess of things. so we do need to provide some kind of management. the software should have a threshold monitoring polling and throwing up warning is that threshold gets broken. zwave plus is a step in the right direction. but still not the solution. if you have 80 zwave plus switches. an you issue an "all on" command. what is the network traffic going to look like when all of these devices then try to report their stat all at the seme time. we are now back to square 1 again. now what really burns my biscuit about this kind of thing is this. you have an SOC that costs about $0.50 USD that has more then enough power to be able to perform the tasks needed and it does not get used in a $50.00 dollar zwave light switch. the esp8266 chip is an example of this. instead they use chips that are not capable of doing more then a single thing at a time.

while some of the problem is protocol design the other side of that is poor/bad device design. add that to poor software design and you end up with a zwave control system that functions like all of the rest of them.

now we have no control over the first 2 design problems. what we do have control over is the 3rd. and if the software is designed properly we can partially make the other 2 issues transparent to the user.

make recommendations for devices. have a page dedicated to this. or some kind of an easily accessible forum where users can post about devices. some devices perform better then others. people that look at PC based solutions for z wave control most of them already have a standalone control system they do not like. I see this all over the place. the defacto ZStick being listed is the Aeotec gen5 Z Stick. which is the wrong answer. I posted earlier about the problems with it. and that is a problem that the manufacturer has no intention of fixing either. from my understanding the zwave.me Z Stick does not have this issue. I have not looked at the full list of specs on it as of yet. from my readings this is the Z Stick that should be used. I have no personal experience with it. but this is the kind of information that should be made available to the user.

here is another example of a device suggestion. it is the only devices that allows for this to be done.
if you get GE/Jasco dimmer switches for your media room. you have the ability to set the ramping rates. ramping rates are how fast the light will dim up and down.. Now most dimmers do offer this feature. the GE/Jasco ones are the only ones that offer 2 sets. one for manual rocker use. and the second set is for a network received command. so if you wanted to give a full movie theatre experience and dim the lights slowly when the movie starts. you can do that with these dimmers without affecting the speed of the dimming when manually using the rocker. Information like that is very important. and it should be easily available to the user. here is another point in fact, free HA software???!?? yes please.. click download. click install. then go and use.. no where in that process did you see me state that I went to read instructions. this is what happens. we all do it. so add links in the software. in there face. not where they would have to in inadvertently stumble upon it. through the use of the config file they can have the ability hide the links if they want. but this is then a conscious decision by the user. the link that appears in the interrogations for zwave is a great place to put one. it should lead right to a forum section on setting up the zwave network. there should be another when adding a device that will point to a forum page for adding and device suggestions. another would be in the properties page for a device. and that would lead to a page where the user can read about the assortment of available settings. and what they do. and how to properly set things up. HOVER OVER popup descriptions!!!! I am a huge fan of these. they allow for a nice clean settings page but also provide detail when needed.

adding Z Wave should not be done partially. it needs to be done the correct way. the time needs to be invested into making it work properly. the one that I wrote took me a year to do. not something that happened overnight. the python_openzwave library is very raw. you can not go from a UI directly to the library. there needs to be a layer in the middle. one that is able to manage the devices. that is why I came up with the command class structure and submitted it as a PR to python_openzwave. The other thing is that Python only uses a single core of a processor. point in fact,, you will never see python use more then 100 / # of cores% cpu use. Z wave needs to be run in it's own process, threads threads threads because of the network design of zwave threading is your friend. The use of callbacks can be key as well. pull the software z wave polling out of the main polling loop. set up a separate process to handle this. the design of libopenzwave is if you have network poling enabled for a variable it is going to always store that variable's value with the most up to date. and you will get a dispatch telling you when it has updated. this should not be allowed to run in the same process as the main program.

tell me if i am right in this assumption. Most people do not care about CPU use or how much memory the program is using if the trade off is a fast HA program that works well. Point in fact.. look at the Vera. they offer an "upgraded" version of the hardware. but it still performs like crap. is it worth paying the additional money for it? NO. but people are spending the additional money because they want to get a better product. so that means they would be willing to spend money to increase resources if the end result was going to be a better running HA solution. But you need to design a program that is capable of consuming the resources and not be limited to a defined percentage of the CPU. or the constant release of large data objects that need to bee recreated over and over again because you want to free up memory. doing that takes up 3 times the amount of processor ticks then would be needed to create it and leave it be (garbage collection). start up the zwave side of things when the program starts. not after the core is loaded. by time the core is done loading the zwave would be ready. home assistant does not have a speedy startup routine. and then to add the startup of zwave so it is consecutive with everything else as well as the core is a wee bit bonkers.

each zwave device is just that a separate device that needs to be interfaced with the z stick is merely a network gateway. so go ahead and bring up a graphical representation of your home network (255 nodes 192.168.1 addressing).. how long does it take to enumerate all of the devices. It's not fast. and most people only have maybe 10 devices. with zwave networks it is very common to see 50...80...100 devices. and the network has a bandwith limit that is a very small fraction of what a home network is. and the network is slow.. i mean real slow... how now knowing what you know as far as the speed goes with enumerating lan network devices. apply that to the zwave network. 10 devices = 3 seconds on a lan (and that is being really really really generous). so 3 x 100 / 10 = 30 seconds. that is if the networks run at the same speed. now if we factor in speed and bandwidth we could easily be at a 1 minute 30 seconds.

design design design design.. I cannot stress it enough. the software can be made to properly address these road bumps which will lead to a nice smooth road. providing the best possible ride.

@zerox1212

This comment has been minimized.

Copy link

commented Feb 18, 2019

I come from an Industrial Automation background. Time sensitive automation of real world devices is common place. I was a bit surprised by how bad Z wave is compared to the promises. I don't know if it makes sense, but maybe a network scheduler for Z wave would help. Working with a X number of devices in a fixed frame with priorities might be an easy way to optimize communication since we have no way to manage the network (neighbors).

@kdschlosser

This comment has been minimized.

Copy link

commented Feb 19, 2019

There is a single statement in the specifications for ZWave that made me realize how bad the protocol actually is. and that is was never going to live up to the statements made. it states

write the control software in a manner that does not rely on a response from the nodes.

now... that would mean that either the request never made it there or the response to the request never made it back. either one is a horrible thing. that is why using broadcast transmission is not really used all that much on LANs or if it is a packet typically gets sent a slew of times and then the fingers get crossed in the hopes that at least one of them made it there. that is the reason why it is typically only use to advertise the presence of a device on a network and not used for any kind of a control protocol.

I think that the zwave protocol is all built upon the broadcast scheme. I do not know what the ding dongs were thinking when they wrote the specification. they obviously never looked at the history of the broadcast packet and that just about every protocol specification ever made that tried to implement some kind of a control or data passing using broadcast messaged never ended up getting used in any kind of a large scale. they died essentially. the last one i believe to try was the xpl protocol used by devices like squeeze box. which died as well.

@kdschlosser

This comment has been minimized.

Copy link

commented Feb 19, 2019

the point is the whole sit there and wait for a response is not going to work. so each and every single time a command is to be sent to a device a new thread needs to be created. and that thread can then sit there and wait for an answer. or sit there and wait for the state to get changed via a network polling loop. otherwise the program ends up in an unresponsive state while this process is taking place.

@gerdesj

This comment has been minimized.

Copy link

commented Feb 28, 2019

@kdschlosser - I think the reason for that "does not rely on a response from the nodes" is to deal with battery only nodes. Zwave (int al) have to do some odd looking things because they have to deal with some very odd circumstances. This is really, really not Ethernet! Zwave has to deal with devices that sit idle for days on end and then suddenly connect, squirt a value and vanish within a very short time, everything is designed with minimalism for these things.

Again, this is not Ethernet. Dump/read state to/from persistent storage and move on.

Have a long think about what these things can do. You can buy a battery powered "switch" (window/door open/shut) detector for a few pounds and expect the battery to last for years.

@zerox1212

This comment has been minimized.

Copy link

commented Mar 1, 2019

@gerdesj I think his point is that z wave does not use a mesh network as an advantage, it uses it because without it network paths would constantly drop because nodes can only do one thing at a time. Of course this is a byproduct of the radio communication (basically nodes are half duplex communication).

This design is exacerbating issues with control software like OZW that only implements the raw protocol and lacks optimization. There are also some questions about the Aeotec Z-Stick Gen5 and if it is async or not. If the controller firmware is not async by design then most people running this stick will never get good network performance simply because a single device timing out stops the entire network.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 1, 2019

@zerox1212 hit the nail right on the head.

the mesh network was used so they could use a low power radio. it was also designed so that if a network path was not achieved going a specific route it could take another one. They did design the protocol so that it could be used by devices that had low power processors in them. so that if a device was busy doing something it could seek out an alternate route. I am thinking that maybe the original design was for the network to work in a half duplex fashion. but not the whole device.

the design hangup is a bit complex. I think that what appears to be half duplex communications really is not. the whole device is what is half duplex so to speak. and the fault is in the firmware of the devices and the "processor" ability of the devices.

if you set up this test scenario you will then see what I am talking about.

create a zwave network with 2 nodes on the network, the first nod being the controller. and the second node being a z wave plus dimmer. we want to use z wave plus to remove polling from skewing the results. and only having the 2 node network removes skewing because of network congestion.
Change the ramp rate on the dimmer so it does a nice slow gradual increase or decrease of the light over say 10 or 15 seconds.
What wee are trying to asses here is if you manually press the rocker on thee switch how many status updates will the controller receive from the device. Because there is no polling and because the dimmer is the only controllable device on the network you should get a status update for every single level change (or you would think you would anyway). What actually happens is quite different. You may only get the final level. you may get 2-3 status changes. But you will not get a status change for every level the dimmer changes.

So the question now turns into Why does the device not send the status changes? And the answer is poor design. Under powered device that does not have the ability to do more then a single task.

Now armed with the knowledge of the crappy design. place this scenario into your bean...

the numbers used here are for example. they are not factual numbers. they are used to show the scope of the problem.
so say a node has 10 friends. and it is trying to send some data. and then say that node 1 thru 9 are busy doing tasks and only node 10 is free but it happens to be that node 10 is the last one it is going to try. the only way a node will know if a device is busy is when it tries to send a packet to it. and that send has to time out. so put a 1 second timeout on the packet send. so now you have 9 seconds that node is going to be busy trying to send a packet. and during those 9 seconds it is not going to receive any packets either because remember a node can only do a single thing at a time.

So if you throw in massive amounts of polling for status changes. because in the HA world no one wants to know that a light was turned on or off a minute after it has taken place the polling times get set to 1 or 2 seconds. Some dumb ass that was involved in creating the protocol thought that a 60 second polling was a fantastic idea. and that application should just update the icon or status blindly if the user pressed the button to say turn it on. and the program should be made in a manner that when it eventually got the status from the polling cycle. if it had not changed then to try to change it again. In the real world this is not how it gets done. nor should it to be honest with ya. because of this scenario. if a light gets turned on via an application. then before the application gets the status update the light is manually turned off via the rocker the status that should be returned should be that it is off. If the application is expecting to see an on status and then once again sends the command to turn it on because the application would think that the command never made it. This leads to very undesirable behavior. so a polling interval of 1 or 2 seconds is almost a MUST.

Could you imagine what a zwave network would run like if there was 250 nodes on it? it would be useless.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 1, 2019

Now ZWave plus is a step in the right direction. It is going to reduce the amount of time a device is going to be busy doing something else. But it is still only a band-aide. The correct fix is to have the devices manufactured with the ability to do multiple tasks at the same time. This is going to be the absolute last thing they will change if at all. because they want to keep the price tag on the devices low(ish) because we know that a 50 cent chip would cause the cost of the product to go up by 10.00 at a minimum. I am willing to bet that if a single company actually did this. and created devices that could do more then a single thing at a time and say they did charge 10.00 more for the device then their competitors. in time their devices would be flying off the shelves. I know that I would replace the 75 or so light switches and dimmers in my house with that brand. I would rather spend 50 bucks on a great product then spend 40 on a piece of crap product. I think that most people would feel the same way.

I have not once seen anywhere anyone stating that "this zwave device is awesome this is the product that you should purchase" Not for any brand. I do not think that any of the current zwave devices a user would recomend as a must have over another one. they all function the same... like crap... I do have to say that the Jasco dimmers have a feature that no other dimmer that I am aware of has the ability to do, and that is to adjust the ramping rate for the rocker and a zwave command separately. that would be the only reason I would recommend them over any other dimmer.

@ryanwinter

This comment has been minimized.

Copy link

commented Mar 1, 2019

I would recommend the levaton dimmer. Don't like the ge ones.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 1, 2019

I like the GE/Jasco switches because of that split ramp rate.
The split ramp rates are nice for media rooms when you want the full movie theater effect but do not want to change the behavior of the rocker on the switches. I also use the split ramp rates for my bathrooms. after say midnight i change the rocker ramp to ramp up super slow and i leave it fast on zwave so if i need the lights on in a hurry i can use a touch pad to turn them on. no one likes getting blinded in the middle of the night when they have to use the bathroom. LOL.

and also because up is on and down is off unlike the leviton switches.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 1, 2019

and to add to the reason I am using the Jasco switches. is because they are sold at the local big box hardware store. and if i have a problem with it. i can easily exchange it. I had on that was 2 years old and you had to press pretty firmly to get the thing to turn on. I took it back and they gave me a new one. the other thing i dislike about the leviton switches is you have to use wire nuts. they have pig tails coming out of the switch. My house was built during the Vietnam war. so metal was in short supply. and this is before the use of plastics for switch and outlet boxes. so my boxes are fiberglass. and they can just fit a zwave switch in them.. and only without wire nuts.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 2, 2019

I am writing an animated 3d node graph that can be used to diagnose network congestion issues, I am not going to be able to determine the exact path a packet takes to get to a specific device. It is going to be able to do a "best guess" for that portion of it. But it is going to show every variable for every device that is being polled. it is going to show all of thee nodes and all of the connections to the "neighbors" the more neighbors that are available the larger the node is going to be in size. the packets them selves will be animated and moving between the nodes along a rout it would take to get to a device. My thought behind it is if you see a node that is small (low number of friends) and you see a whole mess of packets always going through that node to get to other devices. then this is going to be a weak area in the network that is going to be prone to congestion issues. and the addition of devices or a range extended in the physical vicinity of that node would be a good idea to improve latency issues. it is a basic simulation to show the number of polling packets that have been set up to run so at any given point in time you will be able to see how many packets are whipping around on the network. It's not going to be a real time animation. But it will create a snapshot of what the network is set up to do at the point the animation is created. It will be a nice tool to help the performance. the program creates an html file or a java script that can be embedded into a web page. (the html file can be loaded without being embedded) I do not know how to add something like this to HASS so if there is someone that wants to take on that challenge let me know. and I can shoot over the code I have thus far.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 3, 2019

Here is a a video clip of the animation. This is a simulation of a network and the network has 50 total nodes
the what nodes are neighbors are set randomly as well as the number of neighbors a node has (with a max of 10).

There are 100 polling packets sent out from the controller. this animation shows the request packet as well as the response packet. so there are a total of 200 packets that are transmitted. there are many routes to and from the controller to a node. that route is chosen at random. You will see what appears like packets simply disappear.. and that is essentially what is taking place. the packet gets lost.

z_wave_animation.zip

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 3, 2019

the program i wrote will work outside of a simulation. where data can b sent into it to be used instead of random number generation. The animation can be zoomed and paused. and you have turnable rotation and orbital rotation as well.

@riemers

This comment has been minimized.

Copy link

commented Mar 4, 2019

So whats the next step? Wait for the PR on openzwave? Didn't we already have our own version on HA? If it does improve stuff we can just as well add that too. (assuming its all do-able) not much activity on the openzwave one yet. If its as simple as loading another version/library and presto then i don't mind testing.

@Fishwaldo

This comment has been minimized.

Copy link

commented Mar 5, 2019

OZW author here. Going to make one comment because I feel things are misunderstood.

  1. Z-Wave is a low bandwidth protocol, 100K at most, but if multiple hope are required it’s latency goes through the roof. At the RF layer it’s Half Duplex.

  2. on a non-encrypted, a single command sent from software to a device generates 3 packets. The original message. A ACK the ZWave stick sent the message, and a ACK the node recieved it (but no indication if it’s a valid command etc)
    On a Secured S0 network, a single command generates a minimum of 7 packets.

In OZW we always do a GET after a SET, to see if the value was applied. So a simple On Command will generate 7 packets on a non- encrypted network, and a minimum of 14 packets on a secured network.

Now, here is my main point: Polling is a Z-Wave network is evil. A single instance in a device will generate 4 packets on non- secured, and 12 packets on a secured network. If you are polling for multiple values on a device, multiple the above by the number of values and then multiply by the rest of the devices you poll for, and you can see all of a sudden we have a LOT of traffic on a low bandwidth protocol. No amount of Async work is going to speed that up. As the bottleneck is the network, not the devices etc.

All too often I have had HAAS users come to OZW complaining about latency. A few minutes looking over the logs and I see that they are aggressively polling the network, and wonder why it takes two minutes to report status changes etc.

Solution: read the device manuals and UNDERSTAND how association groups are configured on that device. This is not anything new in ZWave Plus, (other than giving association group 1 a fancy name called Lifeline).

And when I say “understand” association groups, don’t just configure every group to report. More than likely you will get multiple reports.

And if you can configure a device to send say, a Multilevel report instead of a Basic SET, do it!

I have more than 120 devices on my network, at least a dozen or so might be dead(due to development etc) and I have sub-second response times. I poll exactly 2 devices once every 5 minutes (temp/humidity sensors - more on this below).

The other big thing that affects network latency is the devices themselves. Quality control and interoperability from manufacturers has improved 10x over the last 3-4 years, but there are still some crappy devices out there. Bugs in firmware can result in timeouts (as the protocol has no way to signal that a device is acting on a command, only that a packet was recieved), and the specs have timing information specified as well. There are a few manufacturers that are hugely popular but horrendous bugs, and a few that do a LOT of beta testing before release. AEOTEC is rock solid and I have no idea where the negative feedback about their sticks come from as it just works without fuss. ZWave.me tries to lock functionality behind license keys (at least the last time I looked), and the Sigma/SIlabs sticks are minimal but work fine.

My final points is about update frequency and security. The last person from HAAS that complained about latency in the OZW forums had configured 4-5 power meters to report once per second. As power meters include multiple values this meant about 20+ messages (excluding ACK’s etc) per second. Ask yourself do you really need per second resolution on the power useage if your lamps/lights/temperature etc etc etc.

Regarding security, this is a trade off (and my real life Job is as a director in a IT security vendor, so I don’t take this lightly). Ask yourself when including devices If the temp/humidity sensor is transmitting sensitive information for you, and needs to be encrypted. The switch for the porch light? Etc etc etc. chances are, if you are going to get hacked, or someone spying on you, they are going to go after your WiFi, or hell, just break a window to get in. But this each to his own to make that decision, just be prepared to make some trade offs with respect to performance/latency.

In closing - you can get a decent performing zwave network if you take the time to configure it correctly.

This is not to say that OZW is perfect. The list of things to be improved is long. My biggest gripe with doing OZW, and what burns me out is the long list of complaints and or “demands” to fix things, but very few people actually step up and offer patches or ask what can be done. I see above complaints about multilevel dimmers that take time to change values. Did you know the dev branch provides you two values for them? The current and final value. (For devices that support that version of the command class).

And those that are about to complain about the slow pace of OZW development, see my point above. Instead of complaining, send patches, discuss on the mailing list, offer help (not only to me but the dozens of people a month that ask “how do I?” Questions. I have a full time job, two kids and a family plus all my other commitments. The less work I’m doing on “support” means more time im working on the code.

And if you think you can do a better job than me, but don’t want to work with the OZW community, fork it and go right ahead.

@riemers

This comment has been minimized.

Copy link

commented Mar 5, 2019

I too have sub second responses, and i don't poll anything. I do associations 1 group to hass and this works like a champ. The only complaint is the startup time. I know where it comes from, sadly i don't have the knowledge to 'improve' on that but i have never had complaints on OZW about things being slow. Once started it is fast enough for me. A default fibaro power plug sends power usages like madness, so yes you need to tweak those too.

You don't want to be rude to people but in your case i would just tell them to look in a FAQ where those things are mentioned and close the issue. I am no pro programmer but i tell everybody if i can help i will. Including OZW where possible. 💪

We all want the same thing in the end.

@Fishwaldo

This comment has been minimized.

Copy link

commented Mar 5, 2019

Also - regarding the comment about devices only able todo single things at a time. This is strictly not true. First, you have to realize that for a device to work for months on a battery, it has to be very efficient in hardware. They are essentially single CPU, but with a lot of functionality like PWM (for dimming) and ADC (for taking measurements) implemented in hardware rather than software. So a device can DIM and send a packet at the same time.

The official Z-Wave SDK for Nodes is also a partial event driven loop and mostly ASYNC (I’ve seen the source). So the ability to have a responsive node is provided by SIlabs/Sigma. Now the actual device vendor may take shortcuts with their firmware implementation etc, but that comes back to my comment about about quality/QA testing. If your going with the cheapest option you can find, don’t expect to get a Ferrari!

Personally, I’d much prefer to spend $100 on a light switch now that just works, than $50 on a sub par one, that ends up having with wife complain and me wasting countless hours putting hacks in place to make work.

@Fishwaldo

This comment has been minimized.

Copy link

commented Mar 5, 2019

@riemers: 2 FAQ/WIKi, and website to upload log files and tell you common problems and solutions, mailing lists and issue trackers that point people in the right directions and still.... hahaha. Not sure what else can be done.

But I get it - Zwave is billed as simple, and for someone with a few devices etc it is. When your network starts to grow, or you invest in crappy devices, I guess that’s when experience is required (and probably why Sigma/SIlabs have a “certified Installer Program”).

I’ve been working on OZW since 2014 at least, and still to this day get surprised at the bugs that pop up.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 5, 2019

@Fishwaldo

I do agree with you 100% on the poor design of the protocol (outside of the control of you and the many others that have helped make OZW) and the crappy devices out there (again outside of the control of OZW) do not help matters. and then toss in a shake of a horribly slow network design. and you end up with a challenge. Now if you then mix in uneducated users, and throw in poor code (not OZW) you end up with a MiCasaVerde Vera type of experience.

as I stated above. the majority of latency problems comes from 2 places. the first being uneducated users. and the second is no regulation built into the implementation of OZW. Users get excited to see it working. and they start exploring and clicking and end up making a mess and then place the blame on the software. Now as developers we know this is a P.E.B.K.A.C (Problem Exists Between Keyboard And Chair) problem. So as developers we should try to limit or at the very least warn of potential issues when settings are set in a manner that is going to cause undesirable results. That being stated. I do not expect all things to be done by OZW. OZW has no idea on how to display such information to the users other then log files. this is better left to the authors of the program that OZW is being used in, This is so that the programmers can dial in what is good and what is bad for their specific program needs.

and for the life of me I cannot seem to locate where i read about an issue with the Aeotec Z Sticks and that they will sit and wait for a reply instead of letting another packet be transmitted. But as we all know everything we read on the internet is truth yes?? LOL.. It is something I had read. I could have sworn it was on an OZW group, or Github issue somewhere. but i could be mistaken. You would know better then I do if it is a true statement or not.

polling is a necessary evil with Z-Wave because of the possibility of dropped packets . and if managed properly everything will run fine. One of the best features of OZW is you have dynamic control of the polling. No sense in polling security sensors that are not armed. and no sense in polling lights and dimmers if the security sensors are armed. Time of day is another tool that can be used when and when not to poll different devices.

Another way to trim down on network congestion is to go old school HA for some of your lights.. an example would be basement lights. move the switch to above the basement door. use a rocker switch (None Zwave) and attach a small fixed wheel to the top of the door. so when you open the door the wheel rolls over the rocker and turns the light on and off when you close it. (one of my favorites..). That is one less device connected to a Z-Wave network that does not really have to be there but still gives you that HA feel... HA = "Home Automation". HA != "Home Automation through the use of wireless electronic devices".

LMAO @ Sigma Certified Installer Program. I would be interested to see what the course curriculum is

@Fishwaldo No one was complaining about OZW here either.. 😉

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 5, 2019

Also - regarding the comment about devices only able todo single things at a time. This is strictly not true. First, you have to realize that for a device to work for months on a battery, it has to be very efficient in hardware. They are essentially single CPU, but with a lot of functionality like PWM (for dimming) and ADC (for taking measurements) implemented in hardware rather than software. So a device can DIM and send a packet at the same time.

I have not seen the code nor the hardware specifications. I would imagine that Sigma has designed a chip that is the radio and holds th protocol stack. Either that chip will directly control a relay or a PWM enabled chip or it will have to send a command into a CPU of sorts. Now if the device manufacturer does not implement some kind of a central CPU then the Sigma chip will sit there and wait until the other component releases it. then It would go back to sending and receiving. and if there is a CPU and it is of a crappy design either it be from firmware or whatever. if the CPU ends up sitting there waiting for say some kind of information from a PWM chip and there is not any kind of an interrupt request built into the device. then if a packet comes in the Sigma chip is going to sit there and wait until the CPU is done screwing about with the PWM. so we end up with the same issue.

You also mentioned that the RF layer is half duplex.. and what makes it half duplex? the hardware does and nothing more. I am sure there is a way that a device could use 2 RF chips to make it full duplex.

The official Z-Wave SDK for Nodes is also a partial event driven loop and mostly ASYNC (I’ve seen the source). So the ability to have a responsive node is provided by SIlabs/Sigma. Now the actual device vendor may take shortcuts with their firmware implementation etc, but that comes back to my comment about about quality/QA testing. If your going with the cheapest option you can find, don’t expect to get a Ferrari!

I do have a question on this. the use of ACK packets on an async com network does not seem proper. My question would be what is async about it?

Personally, I’d much prefer to spend $100 on a light switch now that just works, than $50 on a sub par one, that ends up having with wife complain and me wasting countless hours putting hacks in place to make work.

tell me what device this is.. I have yet to see such a thing. I too would buy it in a heartbeat. I would replace all of my switches and dimmers.

@jrowberg

This comment has been minimized.

Copy link

commented Mar 5, 2019

You also mentioned that the RF layer is half duplex.. and what makes it half duplex? the hardware does and nothing more. I am sure there is a way that a device could use 2 RF chips to make it full duplex.

Quick comment from a hardware engineer who has worked extensively with Bluetooth (obviously different but not in the fundamentals I'm about to mention). There is "a way," yes, but doing so makes the design far more complex. It isn't just a matter of throwing an extra radio silicon module in. You have to deal with receiver saturation and RF path management issues that don't exist in a half-duplex arrangement. Sending and receiving simultaneously is really hard unless you have significant physical distance between transmitter and receiver and (of course) two separate antennas. Most of the commonly used inter-device communication protocols (Bluetooth, Wifi, etc.) are half duplex.

At a hardware level, trying to make Z-Wave into a full-duplex system is almost certainly a dead end.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 5, 2019

I am going to agree and disagree with the Bluetooth thing. I thought Bluetooth is able to send and receive at the same time so long as it is not to/from the same device. so it is half duplex in a way and then again it is not. I would imagine that Bluetooth devices that are able to do this do have 2 radios in them.

and if Z-Wave was able to do the same it would greatly increase it's speed because of the use of a mesh network. it could receive a packet and transmit it on the other radio while receiving another on the first.

I thought that dual band WiFi allows for full duplex communication. IDK I could be wrong..

@jrowberg

This comment has been minimized.

Copy link

commented Mar 5, 2019

Bluetooth and Wifi use time-division duplexing for bidirectional data, i.e. one radio rapidly alternating between transmit and receive mode based on predetermined time slots:

https://en.wikipedia.org/wiki/Duplex_(telecommunications)#Time-division_duplexing

This is virtual full-duplex if done quickly, but it's still only one radio, one antenna, and one direction at a time.

It is possible that a 2.4 GHz and 5 GHz WiFi router can achieve true full-duplex, but this is two radios, two antennas, and separate frequencies. Taking Z-Wave this direction is, as I mentioned before, most likely a complete non-starter.

@loe

This comment has been minimized.

Copy link

commented Mar 5, 2019

I am going to agree and disagree with the Bluetooth thing. I thought Bluetooth is able to send and receive at the same time so long as it is not to/from the same device. so it is half duplex in a way and then again it is not. I would imagine that Bluetooth devices that are able to do this do have 2 radios in them.

and if Z-Wave was able to do the same it would greatly increase it's speed because of the use of a mesh network. it could receive a packet and transmit it on the other radio while receiving another on the first.

I thought that dual band WiFi allows for full duplex communication. IDK I could be wrong..

You are wrong.

https://en.wikipedia.org/wiki/Duplex_(telecommunications)
https://en.wikipedia.org/wiki/IEEE_802.11

@loe

This comment has been minimized.

Copy link

commented Mar 5, 2019

and if Z-Wave was able to do the same it would greatly increase it's speed because of the use of a mesh network. it could receive a packet and transmit it on the other radio while receiving another on the first.

Sure this is theoretically possible, after all its just software you can do anything.

It will never happen, both for cost and size reasons. So lets move on.

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 5, 2019

since we are going to spout IEEE standards why not go to IEEE and not mostly wrong Wikipedia???

https://ieeexplore.ieee.org/document/4753832

yeah OK thumbs down me because i am right...

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 5, 2019

@kdschlosser

This comment has been minimized.

Copy link

commented Mar 5, 2019

It is possible that a 2.4 GHz and 5 GHz WiFi router can achieve true full-duplex, but this is two radios, two antennas, and separate frequencies.

and this is what I was talking about in the first place using 2 radios yes?? so it is the same thing. and they make dual band PCIE laptop Wifi Cards that would very easily fit inside of a dimmer switch. so creating a device that uses a low power RF signal that has 2 radios should not be an issue.

The whole point behind this is that zwave will lose ground to other mesh networks like Zigbee and Thread because of one simple thing. it's network is slow. They need to do something to speed it up. and the easiest thing is going to be adding that second radio. or they are going to have to change to a higher frequency and up the radio power. the latter would be more difficult I would think because of backwards compatibility (which would cause 2 radios to be installed into the devices anyway)

@balloob

This comment has been minimized.

Copy link
Member

commented Mar 5, 2019

@Fishwaldo thank you so much for your response. It is, like your work, very much appreciated. I will create an issue to make sure that the Z-Wave component default settings are not messing with performance.

I think that this thread has lived out its usefulness. For discussion on how to configure networks etc, please use our dedicated Z-Wave category on the forums or use the #zwave channel on Discord.

I will close this issue now.

@DeanRoddey

This comment has been minimized.

Copy link

commented Mar 27, 2019

Not that I should be helping out the competition... :-) But Fishwaldo is very much correct. I'm the author of CQC. We have our own bespoke Z-Wave implementation, and doing one is BRUTAL. It's a hundred times more difficult that is remotely justified for the ultimate results that you get, no matter how much you try. Z-Wave should really just die, unfortunately it never seems to get the memo.

To address a comment above about using the previous configuration as an initial optimistic setup, that's what I do. The bulk of the time, that's going to be a valid assumption. If it's not, you aren't really much worse off since the user should be aware of this and take action if he's completely redone the network. What I do is get the basic node info only. If that matches the previous configuration, I go with that and hope for the best. If some nodes don't match the previous config, only those nodes are knocked back to rescan mode. And that's also usually likely to be a reasonable bet.

It allows us to get up and going reasonably quickly (using the Z-Wave form of that word), so that we at least can move straight towards getting data, and skip all of the discovery bits most of the time, and if only a node or two has changed, then at least minimize scanning.

One of the most stupidly stupid moves on the part of the original design was not requiring a unique id (MAC type address) to be included in each device. Without that we have absolutely no way of knowing if what was there is still there. And we are dependent on associations for simple numbers that can easily change and totally screw up the whole system, instead of remembering absolute ids that we can always re-locate and know if they have disappeared.

It's a horrible system, and I hope so much that Zigbee can ultimately destroy it, though it probably will never happen.

@Fishwaldo

This comment has been minimized.

Copy link

commented Apr 1, 2019

Hi, Another comment I have (after hanging out in your discord channel for a while).

Regarding Heal Networks Function running every-night.

Background: In Pre-Zwave+ devices, there was one node on your network (usually the controller) that was designated as a Static Update Controller (SUC). Its purpose was to generate a network topology and distribute routes to all the nodes in your network. When you added a device, or moved the network around, you would need to get the SUC to update its topology. Additionally, if a Node couldn't communicate with another node, it would ask the SUC for routing info to get there. Heal Network functionality in OZW was what you would use to get the SUC to update it topology.

The Kicker - When recalculating the topology, the SUC would use 100% of the ZWave Bandwidth, and it could take several hours for it to complete on large networks. This all happened in the background as well, so even though Heal Networks command would say its complete, it saying it "sent the command to recalculate the topology".

When ZWave+ was introduced (around that time anyway), a new feature was added to the ZWave Spec called Explorer Frames. This enabled a node to maintain its own routing information and it didn't need to ask a SUC (though it can fall back to a SUC if needed). Basically Explorer Frames would be sent if a node did not get a acknowledgement to a message it sent, and the explorer frame would be broadcast to all other nodes saying "Do you know how to reach node x?". They would respond, and the node would update it own routing table with the shortest path. (if it can't find a path that was shorter than 4 hops then it would ask the SUC).

Now when a Node transmits a message to another node (or controller), it will use the last known good route. if there had been a heal network command recently, then usually that path is the shortest path to the destination. Sounds good, but in reality, the shortest path may not always be the best path (RF interference, RF reflections from metallic devices etc). That path may be intermittent depending where the cat is sleeping, who's microwave in the street is on, which car just drove by etc etc etc and not the most reliable (with dropped packets). So the sending node may end up transmitting a explorer frame, find another route through another node and use that.

Over time, explorer frames should get the network routes using the most stable paths (that might be longer, but more reliable). Doing a heal networks every night destroys that "intelligence" gathered and may end up making your network more unreliable!

So - Heal Networks should only be used when:

  1. Adding/Removing a new node
  2. Moving a node around
  3. Moving the Controller.
  4. Removing a Dead Node

(this applies to both pre and post zwave+ devices). If you have pre wave+ devices, Id strongly recommend you replace them - there are a number of other advantages to having latest devices such as bandwidth etc).

It would be great to have this info (and my previous comment) added to the FAQ or your documentation somewhere. I've seen a lot of frustrated users coming into your support channel and either been given the wrong advice or just clueless about zwave in general and unable to find the right information.

@balloob

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Opened a PR to turn off auto heal by default home-assistant/home-assistant#22628

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.