[onewire] - Only tries initial owserver connection once? #4859
Comments
I haven't been able to find the official site for the java owfs library being used here. Anyone know where that lives? |
|
Could be. Tough to tell. Thanks for the pointer.
|
This looks like it's a fairly easy fix. But there's already a "retry" setting for the binding. Do we need a new one or can we use this one?
I've done an initial implementation using this one that appears to work nicely. This includes a new optional 'retryInterval' configuration setting that defines the time between retries (default of 60 seconds). I don't have an owserver to test against, however. |
My understanding is that the current #onewire:retry= setting applies to the 1wire sensor network retry interval, not the owserver connection-attempt status. That is, let's say you have 25 1Wire sensors on a 1wire bus, and one or more sensors is not giving reliable data readings, e.g. due to electrical noise or physical bus wiring problems -- in that case, the binding uses #onewire:retry= count to know how many times to re-attempt a read on the non-responsive sensors, before giving up & moving on to other sensors. That is separate from the issue I'm asking/requesting on, which is the overall connection to the owserver daemon. I retested this morning (by killing & restarting the owserver daemon in my setup) and can verify that after the owserver daemon connection drops, OH + onewire binding never re-attempts the connection (i.e. to localhost:4304.) I have a working owserver here and can test for you, please advise on where to download an updated JAR for testing. Thank you. |
Also, as I understand it the current #onewire:retry= for sensors is an absolute number of tries (before fail / giving up,) not an interval. Suggest clarifying that by making the new config option something like:
I propose the default retry interval to be 60 seconds, with a minimum interval of 1 second (to avoid flooding a network or hammering a server,) and no maximum interval. To cover all possible future use cases (e.g. maybe involving some kind of fancy super-paranoid network firewalling setup) perhaps it would be good to include a "DISABLE" option as well, which would prevent any re-try attempt (after the first startup attempt upon initial OpenHAB startup.) |
Here's a jar of my initial attempt that uses the existing retry config setting. Based on your input, I agree we need a new setting. I'll work on that soon. I think a minimum interval of 5 seconds sounds better, but I agree a "no retry" option is needed. Probably this will be implemented by setting the number of retries to 0. |
Tested.
Comments: But the owserver:4304 TCP connection should be retried at a specified interval indefinitely, because an OH setup that depends on 1Wire sensor data (e.g. temperature for example) will not function correctly without that data. I agree that 5 seconds is probably fine as a minimum interval, that gives good responsiveness while keeping CPU/network load reasonable. My comments are inserted into the log output, marked with ### below:
|
Another observation & request: In the previous (1.8.3 / committed) onewire binding, when the owserver:4304 connection dies, OpenHAB very quickly becomes aware that sensor data is no longer updating, and it changes all the sensor outputs to undefined/null states. (e.g. it shows a "-" symbol, instead of say, "64.5ºF". With your new test version 1.9.0, I notice that when the owserver:4304 connection is dropped, OpenHAB is not informed that the sensor data is stale / no longer updated. I think it's important for the old behavior to be preserved, not only for continuity with historical behavior but because I think it's "the correct thing to do." Any number of situations where stale sensor data could detrimental, e.g. in a thermostat control loop for example. |
Update ~12 hours later, continued stable operation w/ your revised 1.9.0 version onewire JAR. |
Implemented the new setting:
The new jar is ready for testing. As to the behaviour of OpenHAB being "informed that the sensor data is stale", I'm not sure how that happens. I would have thought that sensor values would only change to UNDEF after a sensor read failure. |
A binding should only post updates when the underlying source has in some way assertively indicated what the new state should be. A loss of connectivity to the source doesn't seem like a sufficient reason for the binding to assert that its items' states are unknown, but there can be extenuating circumstances that I'm unfamiliar with with this binding. Previous binding behavior should be considered, but not necessarily as the highest priority among other considerations. There might be a reason to use the Expire binding in conjunction with this binding. |
Sorry for the slow response. I had a chance to test @9037568's latest JAR (from 4 days ago) today: |
Back to the issue of the 1.8.3 binding behavior vs 1.9.0 behavior upon loss of server connection:
Regarding @watou remarks on this:
I argue that if the OneWire binding cannot connect to its sole source of data any longer, and that data source is known to be a constantly-updated stream of sensor information such as temperature, light level, etc., in fact that IS a strong case for the binding no longer knowing what state the sensors are in. (I can see how this may not apply to other bindings, because they are different in nature -- but OneWire is very frequently used in a constantly-updating sensor mode.) In a thermostat control situation (which is my particular use case, but I can imagine plenty of others,) a heater-control rule that believes a temperature is 66.2 ºF indefinitely / forever is at a minimum a potential huge waste of energy, and worst a dire physical hazard if it leads to overtemperature and fire. If OpenHAB reports the sensor data as "null," the rule can simply shut off the heater as a precaution (and perhaps send a notification). I request that, if it is not too much work, the 1.9.0 binding keeps continuity with the 1.8.3 binding in this regard. Attached screenshots show examples of "stale" sensor data that no longer updates, vs. "nulled" sensor data which 1.8.3 throws within seconds of losing an owserver:4304 socket connection. Example of safely nulled sensor info from 1.8.3 binding, with no owserver socket connection: Example of "stale, stuck" sensor info from 1.9.0 binding, with no owserver socket connection: |
It's true that the binding may no longer know recent sensors' states, but that's not the same as knowing that all the states should be changed to reflect a loss of connectivity to a different "bridge" component. Having the binding work like that makes it different from almost all other 1.x bindings. (Eclipse SmartHome has an online state for things to address the binding connection question, that isn't present in OH1.) Some bindings can keep a separate item up to date with the last update time, and/or you can use timers in rules or the expire binding. In any case, the proper state would be These points are separate from the question of whether there is a need for complete compatibility with the previous version of the binding. Just my .02. |
I understand your reasoning & explanations, & thanks for the correction on .UNDEF vs. .NULL. If it is difficult (>15 minutes of coding?) to restore the previous behavior to keep compatibility with 1.8.3, I will try working around using additional methods. But if it's relatively easy to keep I'd sure prefer not to lose that (as I see it) functionality. I did not anticipate that my original request (server-retry functionality) would cause loss of the quick null-state reporting that 1.8.3 does. |
Just out of curiosity, if you were to install the Expire Binding and add |
I'd like to see a log of the never-ending retries. I don't see how it could happen at present. |
I had looked at the Expire binding earlier but for some reason, maybe cloudy-headedness, at the time it didn't seem like it would do what I want. But on consideration I think it may work fine. I will test it for my use case. |
My config options:
Normal stream of debug data (TRACE level, all onewire binding logs to a single, separate log file). This is what the logs look like when everything is working:
I manually kill -9'ed the owserver daemon here.
I manually restarted the owserver daemon here.
|
These two lines show that I may have given out the wrong jar file.
That second line should show:
Try grabbing the jar from the cloudbees CI build. |
I tested w/ the build you linked above, it now gives the expected result.
|
I should also note, AFTER the binding times out and gives up attempting the server reconnection, at that point OpenHAB sets the items as "UNDEFINED," the same behavior as 1.8.3. |
Based on that, we could restore the original "nulling" behaviour by doing only one reconnect attempt on read/write failures. This jar does that. This would only perform the full reconnect cycle on startup or update. |
Can you clarify "...on startup or update.."? -- I understand "startup;" what does "update" refer to? When the OpenHAB configuration file is re-parsed while OH is running? |
Correct, when the openahb.cfg file is updated/modified while OH is running. |
* Update the onewire binding to retry failed connections. Fixes openhab#4859. * Fix several problems noted by @watou. * Add @author tag.
OpenHAB 1.8.3 with onewire binding active appears to only attempt a connection to the designated owserver server (in this case at 127.0.0.1) ONCE, when OpenHAB initially starts up.
If the owserver daemon either isn't ready & listening when OpenHAB starts, the OH onewire binding --> owserver TCP connection fails, and is never retried.
Similarly, if the owserver daemon needs to be restarted at some later time, perhaps because it quit of its own accord or as part of other system maintenance, OpenHAB & onewire binding never again look for the owserver.
Expected Behavior
Suggested behavior: Make available in the onewire section of openhab.cfg a configuration option defining how frequently the onewire binding should re-attempt to retry and/or reset a broken or non-existent initial owserver TCP connection.
Current Behavior
When openhab.sh is first started, when the onewire binding loads it attempt to connect to the defined owserver IP / port. (In my case, 127.0.0.1:4304). If the owserver is not responding or is not started, openhab.sh & onewire binding never again try to locate the owserver. It is necessary to kill & restart openhab.sh to re-establish the connection.
Your Environment
The text was updated successfully, but these errors were encountered: