Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let device finish All-Link cleanup process #84

Closed
wants to merge 2 commits into from

Conversation

mstovenour
Copy link
Contributor

Sending commands to a device while it is performing the All-Link
cleanup process will at best abort that cleanup. In the case
of battery powered devices, they may competely ignore the requests

Changes will wait for All-Link cleanup report (success or failure)
before sending queued commands to battery powered devices.

Resolves #83

Sending commands to a device while it is performing the All-Link
cleanup process will at best abort that cleanup.  In the case
of battery powered devices, they may competely ignore the requests

Changes will wait for All-Link cleanup report (success or failure)
before sending queued commands to battery powered devices.

Resolves pyinsteon#83
@teharris1
Copy link
Collaborator

Maybe I am not understanding this PR but it would seem to me the idea would be to run the queued commands if the current topic is not a clean up topic. The current code has the queued command run only when it is a clean-up topic. So I would think the == should be !=.

Do you have a real-world example of this or is this problem only theoretical?

The problem statement feels inconsistent with the solution. Battery-powered devices can only receive a command when they are awake, which means the device had to wake up first which would produce some kind of message like an on/off command. The initial message would never be a clean-up message. Once the original message fires, this will trigger the queued commands to process which would now run until complete. The cleanup commands would be a result of the original command rather than an original command. What would seem more appropriate here is to allow a sleep time between the wakeup time and the running of the queued commands.

@mstovenour
Copy link
Contributor Author

it would seem to me the idea would be to run the queued commands if the current topic is not a clean up topic

That is not the intent. The intent is to never run the queued commands when I know the device is trying to run it's cleanup process. The library should never send anything if it knows a device is in the cleanup process, not even a message for another device, since the PLM transmitted message will abort the device cleanup unfinished. I quoted that from the Insteon Developers Guide in the Issue I opened for this PR. The problem here is the library doesn't track the device state machine so I have no way to know if the device is or is not in the clean up process. A simple analog for that process ending is to rely on the final message, an all-link cleanup report broadcast. It is possible to miss this broadcast but then the next time the device is triggered gives another roll of the dice.

Do you have a real-world example of this or is this problem only theoretical?

The motion I reference in the Issue is a real world example. When I first switched to your new library every motion event was banging on this motion sensor and it never once responded. After the fix I moved on to other issues with this motion sensor. I would also add that there is no point in retrying the 2e, 0x00, 0xff command after 2-3 seconds. The device is asleep again by then. If the first try doesn't work, retries will not either.

The problem statement feels inconsistent with the solution. Battery-powered devices can only receive a command when they are awake, which means the device had to wake up first which would produce some kind of message like an on/off command. The initial message would never be a clean-up message.

You are correct about the initial message, however the device sends multiple messages during the cleanup process and remains awake the whole time. When a device is triggered (e.g. motion sensed, button pushed, etc.) it sends a series of messages in the all-link cleanup process. This is what makes Insteon both "instant on" and reliable. 1st it sends an all-link broadcast which will activate every reponder that hears the broadcast. 2nd it sends an all-link direct message to every device for which it is a controller. 3rd it finally sends an all-link clean up report. There is an example of this cleanup process in the whitepaper on page 49-50. Unfortunately they omitted the example of the cleanup report but the text on page 50 does describe it.

The PR waits for that last message in the process. The device is still awake at that point. This is a reliable way to talk to a battery powered device. All you need to do is trigger the device, wait for the all-link cleanup process to finish, then send it some commands. On the rare occasion where that last broadcast is lost, then simply try triggering again.

Once the original message fires, this will trigger the queued commands to process which would now run until complete. The cleanup commands would be a result of the original command rather than an original command. What would seem more appropriate here is to allow a sleep time between the wake up time and the running of the queued commands.

A delay is a possible fix and in fact that is exactly how I got past this issue initially while learning the library. However the required time is non-deterministic, exactly how long to wait? 2.6 seconds worked for me initially. Wait just 100-200ms too long and the device is asleep again. The ALDB was empty so the device only sent the initial and final broadcast taking about 2.6 seconds. However once the library added the default links (with a controller entry for the PLM), device now needed to send the all-link direct command to the PLM; and the timer needed to be extended. Link more devices to this motion and more time is needed. Also the all-link cleanup messages are acked by a responder. So the device might retransmit adding even more variability to the process. The correct solution is to wait until the device declares that it has finished the process (i.e. the all-link cleanup report).

@teharris1
Copy link
Collaborator

Thanks for the explanation and my apologies for not reading the original error you posted.

The problem here is the library doesn't track the device state machine so I have no way to know if the device is or is not in the clean up process.

I agree with this statement so I propose we address this. Here is my recommendation:

  1. Create a new property of DeviceBase called cleanup_in_process
  2. Create a CleanUpManager that will track the state of the cleanup. This manager will listen for outbound messages from a device (i.e. on/off etc.) and start tracking the cleanup process. The cleanup process is a known process based on the ALDB. In other words, if there are 3 controller recorded for group 1 in the ALDB, there will be 3 cleanup messages. When the last cleanup message is sent, the manager will set the cleanup_in_process to False.
  3. Change either DeviceBase or ModemBase to wait for cleanup_in_process to be False before sending any message to it.

Thoughts?

@teharris1
Copy link
Collaborator

Thinking about the third point, this could be in OutboundHandlerBase. This way no device message will be sent without confirming cleanup_in_process.

@mstovenour
Copy link
Contributor Author

I really like your idea to create a state that tracks the cleanup process for each device. For point 2 though, the PLM will only see two or three messages below. The PLM never sees the direct cleanup messages to from a device to other linked devices.

  1. The device will send an All-Link group broadcast that the PLM hears
  2. "If" the PLM is a responder of the all-link recall group (as recorded by the device ALDB); then the PLM will receive one all-link direct message (source device, target PLM). The default links that are added ensure this is received. Note the PLM will not see the other direct messages to other linked device.
  3. The device will send an All-Link broadcast cleanup status report that the PLM hears

I think the state should be set true when 1 occurs and set false when 3 occurs. I don't think 2 should have any influence on the state.

We should then have a timeout that will set the state to false after a period of time to catch the case when the broadcast cleanup status report is missed. Note this can happen if there is any other traffic on the network because the device will abort the clean up process.

@teharris1
Copy link
Collaborator

Good call on the All-Link broadcast clean up. That would be the notification for the end of the cleanup process.

Yes, the timeout is important in case we miss a message. Broadcast messages are notorious for being lost. They do not repeat like direct messages. The timeout can likely be set based on the number of links in the ALDB for the group.

Another option for the state tracker is an asyncio.Lock rather than a bool. This will allow for async with device.cleanup_in_process (which sould then be cleanup_lock or something like that.

@teharris1
Copy link
Collaborator

How are you coming on this? I want to include it in release 1.1 which I am hoping is out in about 4 weeks. This will be a quality focused release with some feature enhancements.

@teharris1 teharris1 changed the base branch from master to dev May 17, 2021 13:57
@teharris1
Copy link
Collaborator

@mstovenour Any update on this?

@teharris1 teharris1 deleted the branch pyinsteon:dev April 29, 2022 19:27
@teharris1 teharris1 closed this Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Battery powererd device never responds to polling requests
2 participants