Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Busch-Jaeger 6735/01] Device becomes offline/unreachable #124

Closed
weakfl opened this issue Jan 29, 2018 · 16 comments
Closed

[Busch-Jaeger 6735/01] Device becomes offline/unreachable #124

weakfl opened this issue Jan 29, 2018 · 16 comments

Comments

@weakfl
Copy link
Contributor

weakfl commented Jan 29, 2018

I've noticed that the battery-operated BJ control element becomes unresponsive after working just fine for maybe a day, or immediately if restarting OH.
I think you mentioned something that sounded a bit similar with another battery powered device (I think a ST motion sensor)?

Currently I have to remove the device in PaperUI and add it back to make it work again.

I still have to gather a debug log and will post it as soon as I can catch it. So far I can only say that the device doesn't respond to the node discovery request on restart:

11318: Node discovery request IEEE_ADDRESS failed. Wait before retry.
@cdjackson
Copy link
Contributor

If you can provide a "full" startup log it might be useful at least to see what is happening there at least.

@weakfl
Copy link
Contributor Author

weakfl commented Jan 29, 2018

Thanks Chris!
A couple of observations:

  1. Before I could capture the startup log, I noticed that the switch had removed itself from the network, which hadn't happened before. I should have noticed as the LEDs on the switch start blinking when the device is not part of the network. So here's also a log leading up to the removal:
    zigbee_issue124_self_removal_20180129123457.log.zip

  2. After that I've removed/re-added the device, tested it and restarted OH. Again, the device didn't come online. But I've pressed the switch just in case and noticed the events still show up in the log. It seems like they're not processed though, the attached OH channel didn't fire.
    Here's the log: zigbee_issue124_startup_20180129132824.log.zip

@weakfl
Copy link
Contributor Author

weakfl commented Jan 29, 2018

I forgot:
Network address: 5630
IEEE: D85DEF11A1002A12

@weakfl
Copy link
Contributor Author

weakfl commented Jan 29, 2018

Update: the switch removed itself from the network again (it was offline in PaperUI when it happened). So looks like that's caused by changes in newer versions of the binding.

Btw, your changes to the startup code are quite noticeable, it's a lot faster now!

@cdjackson
Copy link
Contributor

So looks like that's caused by changes in newer versions of the binding.

By this do you mean you think the binding is removing the device from the network? I don't think this is the case, and I don't think there has been anything like this added to the binding.

If this is what you think is happening though, then I'd need to see a log showing how the binding is removing the device.

@weakfl
Copy link
Contributor Author

weakfl commented Jan 29, 2018

By this do you mean you think the binding is removing the device from the network? I don't think this is the case, and I don't think there has been anything like this added to the binding.

I first thought the device is initiating the removal, but looks like it's indeed triggered by the binding. Must have been introduced after 2.3.0~20180124001658-1.

If this is what you think is happening though, then I'd need to see a log showing how the binding is removing the device.

2018-01-29 14:17:17.221 [DEBUG] [bee.handler.ZigBeeCoordinatorHandler] - D85DEF11A1002A12: ZigBee leave command
2018-01-29 14:17:17.224 [DEBUG] [bee.handler.ZigBeeCoordinatorHandler] - D85DEF11A1002A12: ZigBee leave command to 29794
2018-01-29 14:17:17.240 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX CMD: ManagementLeaveRequest [0/0 -> 29794/0, cluster=0034, TID=DE, deviceAddress=D85DEF11A1002A12, removeChildrenRejoin=false]
2018-01-29 14:17:17.250 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX APS: ZigBeeApsFrame [sourceAddress=0/0, destinationAddress=29794/0, profile=0000, cluster=52, addressMode=DEVICE, radius=31, sequence=222, payload=00 12 2A 00 A1 11 EF 5D D8 00]
2018-01-29 14:17:17.256 [DEBUG] [ngle.telegesis.ZigBeeDongleTelegesis] - Telegesis send: TelegesisSendUnicastCommand [address=29794, sourceEp=0, destEp=0, profileId=0, clusterId=52, messageData=00 12 2A 00 A1 11 EF 5D D8 00, messageId=null]
2018-01-29 14:17:17.262 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TX Telegesis queue: 2
2018-01-29 14:17:17.279 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - D85DEF11A1002A12: Handler dispose.
2018-01-29 14:17:17.283 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - D85DEF11A1002A12: Polling stopped
2018-01-29 14:17:17.286 [DEBUG] [converter.ZigBeeConverterSwitchOnoff] - D85DEF11A1002A12: Closing device on/off cluster
2018-01-29 14:17:17.311 [DEBUG] [org.openhab.binding.zigbee          ] - ServiceEvent UNREGISTERING - {org.eclipse.smarthome.core.thing.binding.firmware.FirmwareUpdateHandler}={service.id=269, service.bundleid=206, service.scope=singleton} - org.openhab.binding.zigbee
2018-01-29 14:17:17.331 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 4F 4B
2018-01-29 14:17:17.332 [DEBUG] [gesis.internal.TelegesisFrameHandler] - Telegesis command complete: TelegesisSendMulticastCommand [radius=31, address=65532, sourceEp=0, destEp=0, profileId=0, clusterId=54, messageData=00 00 01, status=SUCCESS]
2018-01-29 14:17:17.333 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TX Telegesis: TelegesisSendUnicastCommand [address=0, sourceEp=0, destEp=0, profileId=0, clusterId=54, messageData=00 00 01, messageId=null]
2018-01-29 14:17:17.337 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS TX: Data 41 54 2B 53 45 4E 44 55 43 41 53 54 42 3A 30 33 2C 30 30 30 30 2C 30 30 2C 30 30 2C 30 30 30 30 2C 30 30 33 36 0D 00 00 01 0D 0A
2018-01-29 14:17:17.497 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 53 45 51 3A 39 44
2018-01-29 14:17:17.499 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 4F 4B
2018-01-29 14:17:17.501 [DEBUG] [gesis.internal.TelegesisFrameHandler] - Telegesis command complete: TelegesisSendUnicastCommand [address=0, sourceEp=0, destEp=0, profileId=0, clusterId=54, messageData=00 00 01, messageId=157, status=SUCCESS]
2018-01-29 14:17:17.503 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TX Telegesis: TelegesisSendUnicastCommand [address=29794, sourceEp=0, destEp=0, profileId=0, clusterId=52, messageData=00 12 2A 00 A1 11 EF 5D D8 00, messageId=null]
2018-01-29 14:17:17.507 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS TX: Data 41 54 2B 53 45 4E 44 55 43 41 53 54 42 3A 30 41 2C 37 34 36 32 2C 30 30 2C 30 30 2C 30 30 30 30 2C 30 30 33 34 0D 00 12 2A 00 A1 11 EF 5D D8 00 0D 0A
2018-01-29 14:17:17.512 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 52 58 3A 30 30 30 30 2C 30 30 30 30 2C 30 30 2C 30 30 2C 38 30 33 36 2C 30 32 3A 00 00 2C 30 30 2C 46 46
2018-01-29 14:17:17.514 [DEBUG] [gesis.internal.TelegesisFrameHandler] - Telegesis event received: TelegesisReceiveMessageEvent [ieeeAddress=null, networkAddress=0, profileId=0, destinationEp=0, sourceEp=0, clusterId=32822, messageData=00 00, rssi=0, lqi=255]
2018-01-29 14:17:17.516 [DEBUG] [ngle.telegesis.ZigBeeDongleTelegesis] - Telegesis RX: TelegesisReceiveMessageEvent [ieeeAddress=null, networkAddress=0, profileId=0, destinationEp=0, sourceEp=0, clusterId=32822, messageData=00 00, rssi=0, lqi=255]
2018-01-29 14:17:17.518 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - RX APS: ZigBeeApsFrame [sourceAddress=0/0, destinationAddress=0/0, profile=0000, cluster=32822, addressMode=null, radius=0, sequence=0, payload=00 00]
2018-01-29 14:17:17.521 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - RX CMD: ManagementPermitJoiningResponse [0/0 -> 0/0, cluster=8036, TID=NULL, status=SUCCESS]
2018-01-29 14:17:17.531 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 41 43 4B 3A 39 44
2018-01-29 14:17:17.532 [DEBUG] [gesis.internal.TelegesisFrameHandler] - Telegesis event received: TelegesisAckMessageEvent [messageId=157]
2018-01-29 14:17:17.534 [DEBUG] [ngle.telegesis.ZigBeeDongleTelegesis] - Telegesis RX: TelegesisAckMessageEvent [messageId=157]
2018-01-29 14:17:17.536 [DEBUG] [ngle.telegesis.ZigBeeDongleTelegesis] - Unhandled Telegesis Frame: TelegesisAckMessageEvent [messageId=157]
2018-01-29 14:17:17.643 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 53 45 51 3A 39 46
2018-01-29 14:17:17.646 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 4F 4B
2018-01-29 14:17:17.648 [DEBUG] [gesis.internal.TelegesisFrameHandler] - Telegesis command complete: TelegesisSendUnicastCommand [address=29794, sourceEp=0, destEp=0, profileId=0, clusterId=52, messageData=00 12 2A 00 A1 11 EF 5D D8 00, messageId=159, status=SUCCESS]
2018-01-29 14:17:22.332 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 4E 41 43 4B 3A 39 46
2018-01-29 14:17:22.349 [DEBUG] [gesis.internal.TelegesisFrameHandler] - Telegesis event received: TelegesisNackMessageEvent [messageId=159]
2018-01-29 14:17:22.352 [DEBUG] [ngle.telegesis.ZigBeeDongleTelegesis] - Telegesis RX: TelegesisNackMessageEvent [messageId=159]
2018-01-29 14:17:22.355 [DEBUG] [ngle.telegesis.ZigBeeDongleTelegesis] - Unhandled Telegesis Frame: TelegesisNackMessageEvent [messageId=159]
2018-01-29 14:17:33.523 [DEBUG] [gesis.internal.TelegesisFrameHandler] - TELEGESIS RX: Data 4E 4F 44 45 4C 45 46 54 3A 37 34 36 32 2C 44 38 35 44 45 46 31 31 41 31 30 30 32 41 31 32
2018-01-29 14:17:33.527 [DEBUG] [gesis.internal.TelegesisFrameHandler] - Telegesis event received: TelegesisDeviceLeftNetworkEvent [networkAddress=29794, ieeeAddress=D85DEF11A1002A12]
2018-01-29 14:17:33.530 [DEBUG] [ngle.telegesis.ZigBeeDongleTelegesis] - Telegesis RX: TelegesisDeviceLeftNetworkEvent [networkAddress=29794, ieeeAddress=D85DEF11A1002A12]
2018-01-29 14:17:33.532 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - D85DEF11A1002A12: Node 29794 is removed from the network
2018-01-29 14:17:33.589 [DEBUG] [nal.ZigBeeNetworkStateSerializerImpl] - Saving ZigBee network state: done.

Do you need a longer part of the log?

@cdjackson
Copy link
Contributor

Do you need a longer part of the log?

Maybe the part before this - this just seems to show ZigBee stuff - not the information from the core as to why this was triggered.

There's two ways that I know of for a device to leave the network and that's if the thing is deleted, or if a specific leave command is sent. Neither of these are new - the binding has always done this, but the question is why it now seems to have started sending these requests. I assume you are not deleting the thing in HABmin/PaperUI or sending configuration somewhere that is causing this??

@weakfl
Copy link
Contributor Author

weakfl commented Jan 29, 2018

I assume you are not deleting the thing in HABmin/PaperUI or sending configuration somewhere that is causing this??

Stupid me!!! Of course I copied exactly the part of the log where I actually removed the thing myself to make it responsive again 😞
Sorry for the confusion!

So there's nothing coming from the binding, the device is just leaving:
zigbee_issue124_20180129135414.log

@cdjackson
Copy link
Contributor

As you say, there's no real clue what's happening here. I assume that before it decided to leave it had been working ok?

I wonder if the device leaves if there's no communications for a certain time? I think it would be strange as it means if you have a power cut for many hours, then all your devices leave. One thing I did change recently was the default polling period- it was increased recently to further reduce needless network congestion.

Otherwise I'm not sure why it would just leave...

@weakfl
Copy link
Contributor Author

weakfl commented Jan 30, 2018

I assume that before it decided to leave it had been working ok?

I'm not sure yet, but it looks to be leaving only when it is also offline in PaperUI. Or at least it will leave pretty fast after a OH restart when it seems to lose connectivity for whatever reason. I still have to capture a debug log after it has been working ok though. It's currently working fine after I've re-added it yesterday.

I wonder if the device leaves if there's no communications for a certain time? [...] One thing I did change recently was the default polling period- it was increased recently to further reduce needless network congestion.

I don't think the polling interval is an issue as it's still working. If it was an issue I would expect it to have left the network meanwhile.

Did you notice anything unusual in the startup log? I don't understand why the device is shown as offline in PaperUI after a restart but On/Off commands are still showing up in the log (although not firing any OH events).

@cdjackson
Copy link
Contributor

From a quick look at the startup log, it seems that the device initially didn't respond, but once you started pressing the button on the device, it started to respond again (although not 100% reliably). Possibly this is a battery device management issue (it seems battery devices are a pain everywhere!).

@weakfl
Copy link
Contributor Author

weakfl commented Jan 31, 2018

Thanks for looking into it Chris. Device is still working atm so I'll let debug logging on and see how things develop. Maybe a firmware update will help, we'll see.

How exactly do you examine these logs? I'm always looking for the IEEE and network address of the device I'm interested in but couldn't spot anything unusual. What else should I look for?

@cdjackson
Copy link
Contributor

cdjackson commented Jan 31, 2018 via email

@weakfl
Copy link
Contributor Author

weakfl commented Feb 5, 2018

Thanks for the info Chris.

Regarding the issue at hand: the device is still responsive after a week.
So I'm not sure how to proceed here. Atm it looks like the worst case scenario is that I have to re-add the device on OH restarts.

If it is a general issue with management of battery powered devices we should see similar problems with the ST motion sensors, which is not the case.

So I think we can close here?

@weakfl
Copy link
Contributor Author

weakfl commented Feb 7, 2018

@cdjackson thoughts?

@cdjackson
Copy link
Contributor

Ok, let's close this and if there's future issues or comments we can open another issue. I'm sure there are improvements needed in the binding but we should keep issues specific if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants