-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Z-Wave binding should retry connecting to the interface device if first attempt failed #3487
Comments
I ran into the same issue while writing my own openhab2 docker image.
The two things I did to fix the issue were
|
As the image runs OpenHAB as root (something that really has to be changed), I'd say (1) should not be an issue. It indeed seems deleting |
Trying to reconnect still might be something to try, for example if you consider the Aeon Z-Wave stick with built-in battery to get the controller near the device to include. The Z-Wave plugin should be able to handle reconnected Z-Wave controllers. |
While I don’t disagree, this isn’t so simple at the moment as there are issues with the java serial driver that makes this difficult. I suggest we close this issue since it seems resolved. |
Which one is difficult -- providing reasonable error messages or the retrying-to-connect-stuff? |
Well, for the error message, we just use the response from the driver. And the issue with reconnecting is that if you remove the zstick as you suggest, then the serial driver crashes, so reconnecting isn’t possible. |
I am seeing the same behaviour, i.e. OpenHAB reporting that /dev/ttyACM0 does not exist and for me it appears rather erratic with it working sometimes and sometimes it doesn't. It usually takes me a few tries to start OpenHAB. Just an hour ago I had to restart upwards of 15 times before it established a connection. I am not running OpenHAB in a container and I don't see a pattern in when it works and when it doesn't. My gut feeling is that it happens more often now then previously when I had fewer bindings loaded in OpenHAB. I did see #2760 discussing potential race conditions between bindings using serial ports and as I'm using at least one other binding that does use a serial port, this could be applicable to me but I don't have any further data. As @cdjackson points out, it seems that OpenHAB dies completely when the USB stick is unplugged. I too have an Aeon stick so I am pulling it out every now and then to add new units to my Z-Wave network. Having OpenHAB not die would be nice. Having OpenHAB reconnect upon reinsertion of the USB stick would naturally be even nicer. I haven't looked into the crash too much, can't we just catch an exception and trigger a reconnect (potentially timed at a later point in time)? Anyway, I think we are dealing with two issues here.
I suggest we let this issue remain about performing connection retries until a connection can be established. If we can deal with the crash in another issue then we should automatically get connection retrying, not just at startup but continuously, when that issue is fixed. |
No - that was my previous point in the other thread (I think). From what I’ve seen, it completely bombs, so it must be a very low level issue. Currently we catch all exceptions and restart the threads if there’s been an exception within the send/receive thread. This doesn’t catch this issue which is why I believe it’s likely in the driver (i.e. outside of openhab). I think that in general, if you’re having lots of problems with the COM port not opening, I would try and resolve this issue as this is probably not correct and may indicate another issue. I’m not saying here that I’m against retries, but this would be a bodge to solve another problem, and it’s probably also worth looking for the source of the issue on your system? |
@cdjackson, if it is some external factor preventing OpenHAB to open the serial device then yes, I agree that I should have that fixed but to me it looks like it's OpenHAB having issues. No point in arguing based on guesses so I'm running some tests right now. I put together a dirty little shell script that continuously restarts openhab and tails the log files looking for failure/success of the zwave connection. My idea is to do this for when I only have the zwave binding loaded (as this was what I used initially with great success) and then with all other bindings/modules loaded to see if there is a difference. If there is a difference I think it's fairly safe to conclude that OpenHAB is the issue, perhaps not the zwave module by itself but openhab as a whole. Even with the issue pinpointed to OpenHAB I agree that your argument about finding root cause still stands. Implementing reconnects in the zwave module would be a kludge if there is some actual race condition in serial device init in one or more OpenHAB modules. |
Here's the comparison of 50 starts between having just the zwave module and all my modules, which are the following:
And the result:
I think we can conclude that there is a problem between modules since just using zwave is 100% success. I have yet to single out the module (if there is one) that is causing the conflict.. will do as soon as I have some time :) For reference, here's the test script, if someone else wants to do the same: #!/bin/bash
for I in `seq 1 50`; do
echo "Stopping OpenHAB..."
/etc/init.d/openhab stop
echo "Emptying zwave debug log"
echo "" > /var/log/openhab/zwave.log
echo "Starting OpenHAB..."
/etc/init.d/openhab start
echo "Checking for serial connection..."
while true; do
echo -n "."
sleep 1
egrep "Connecting to serial port" /var/log/openhab/zwave.log && echo " Found!" && echo -n "connect $I: " >> openhab-serial.log && break
done
while true; do
echo -n "."
sleep 1
egrep "Serial Error: Port /dev/ttyACM0 does not exist" /var/log/openhab/zwave.log && echo "failure" | tee -a openhab-serial.log && break
egrep "Serial port is initialized" /var/log/openhab/zwave.log && echo "success" | tee -a openhab-serial.log && break
done
done |
This might still be the same issue with the s rial driver that caused the race condition previously as I don't think that was ever properly solved. I'll see how hard it is to add the retry over the next week or so. Serial ports under Java are a problem :( Sent from my iPhone
|
Just googling for a few permutations of "rxtx", "crash", "java serial" and similar I see quite a few recommendations to use jSSC and there are examples of other projects, like Arduino, switching from rxtx to jssc due to the unmaintained and bad shape rxtx is in. Is this something we would consider as well? |
I'd need to check the constraints but I looked at a number of options about 12 months back and there isn't an ideal solution. Many libraries have compatibility issues (ie only supported on certain platforms) or aren't maintained, or have license issues... I'll have another look at jSSC as I'm not against changing if there's a better solution. Sent from my iPhone
|
I ran another test series, this time with the zwave binding and benq;
So we can see that using benq together with zwave certainly lowers the odds of success but not as much as with all the modules. This is roughly 50/50. These are the only two modules I'm using that I have configured to speak over serial devices but perhaps one of the other modules include some serial snippet that starts to compete as well because with all modules loaded I'm closer to 1/3 (17 success out of 50). Also, I haven't looked at the success rate of the benq binding with the zwave binding loaded. |
Sounds like the same race condition to me - not surprising as it wasn't fixed previously. Creating a retry in the zwave binding will potentially solve the issue for Zwave but of course won't fix the problem for other bindings using the serial services. Really this needs a solution at source - either fix the driver or we all change to a new library.... Sent from my iPhone
|
I agree with this, but I would like to have reconnects as well since that would allow us to start working again after unplug/replug of USB stick (when doing Z-wave inclusion of new devices) :) |
No. This won't work. When you unplug the stick oh crashes and needs to be restarted. This is will need to be solved first and I believe this is a bug in the serial driver. Sent from my iPhone
|
I know it won't work now but once rxtx is fixed or replaced by jssc it will work :) Just saying I want two activities:
Did you have a chance to look at jssc? I might give it a shot to implement this for a module to try it out. Would probably start with the Benq projector binding since I wrote the serial transport for that. |
I think we should separate these out then as you're now asking for something quite different. Reconnecting after removing the stick to include more devices is not the same as retrying initial connection and will require more re initialisation of the binding. I looked at jSSC briefly - I noted it was quite old (not maintained for 2 years I think). I need to look at it further though... Sent from my iPhone
|
I've had a read through the open issues with jSSC, and there are quite a few very worrying ones. It seems to me that there's no support - the maintainer made the following statement 1 year ago (but there have been no updates to code for 2 years it seems) -:
It seems that the support on linux is poor (ie it seems it uses polling) and I suspect if this is still the case (and I assume it is since there has been no activity), it will be a showstopper for OH since many people use low end linux distros (eg Debian on RPi). Having a high CPU due to polling is (IMHO) a non-starter as the Pi is already low end. (There is also an open issue specifically saying 100% CPU load on ARM). I also see that @maggu2810 has looked at this around the time we were looking at alternatives due to the rate condition. Maybe he can also comment? I would not be keen to move to a different library that is not well used/supported. Yes, at the moment, there are a few issues, and while this is not good, and clearly a significant issue for those experiencing the issues, but it is stable and usable for most people. I would be concerned that moving to another library with known issues will lead to further problems - at least in the short term so we should be clear that whatever we do is a positive move before we make changes... As I said earlier - Serial comms in Java is not well supported :( |
AFIAK openHAB uses https://github.com/NeuronRobotics/nrjavaserial for the serial transport. This is an actively maintained fork from RXTX. If there is an issue, it can be raised there. |
Yes, that's correct. The point that we're making here is we're looking for an alternative (as we did 12 months back) as there are problems with the existing library... |
Hi @cdjackson, I have a look at the code of jSSC long time ago. I also tried (that time) some work on nrjavaserial, but the code is very blown and I believe that this has historic reasons. Not easy to implement new stuff without finding someone who knows why the old (forked) code is designed such way. |
IIRC: I have also realized that jSSC don't use the select system call (or something similar) to read stuff from the serial line. So they waste a lot of CPU time using polling. I started to change that but I stopped it caused by the "dead" (or stalled) upstream development (and I could work on the native code of linux only). |
Okay, but then it should be a matter of opening a bug with nrserial. I don't know why I got the idea we are using rxtx.. sorry about that. |
Have you read https://dev.eclipse.org/mhonarc/lists/iot-wg/msg00366.html and the following replies e.g. https://dev.eclipse.org/mhonarc/lists/iot-wg/msg00369.html |
I would certainly be keen to see something ESR compatible. I discussed with Kai a while ago trying to get a direct USB driver working, but this didn't work out. As you say though, the current solutions are not workable for ESH. We "just" need to find @Someone to work on it :) |
I have added the dk.comm stuff to a Github repo of my own, so I can build it as a separate project. Will see when I will find some time to do further testing, bugfix, etc. |
Greetings, |
I'm running OpenHAB in a docker container using tdecker's image, which is working fine. But when restarting the container (instead of creating a new one), sometimes Z-Wave breaks with following log lines:
The device is already there (
ls -la /dev/ttyACM0
added in the startup script succeeds, even before OpenHAB is started), but there seem to be some race conditions, maybe because of something like connections to the stick sill open or similar problems.This is really annoying, and I can also imagine other constellations where this would break. What about retrying connecting to the stick after a given time, maybe three times waiting 30 seconds in-between? The stick is a common Z-Wave ME ZME_UZB1.
The text was updated successfully, but these errors were encountered: