Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] SendAutoResubscribeRequest should by default detect when a node comes online again using DNS-SD/mDNS #29663

Open
Emill opened this issue Oct 9, 2023 · 3 comments

Comments

@Emill
Copy link

Emill commented Oct 9, 2023

Feature description

I'm trying out SendAutoResubscribeRequest of the ReadClient to see how quickly subscriptions can be re-established when I power on the remote device again after having been turned off for some time. I noticed this time can be extremely long so I had to check the sdk implementation what is going on.

Currently, it seems the default resubscribe policy is to use a fibonacci backoff algorithm that simply uses a timer to try every now and then, with maximum interval of as much as 1.5 hours(!), according to CHIP_RESUBSCRIBE_MAX_RETRY_WAIT_INTERVAL_MS. Not sure if this will ever be reached though since CHIP_RESUBSCRIBE_MAX_FIBONACCI_STEP_INDEX is 14, but in any case, the waiting times can be very long. During every attempt, an mDNS resolver runs for 45 seconds and if nothing is found, it goes quiet again until the next retry.

This long time period is way too long for a good user experience for a default implementation, for the case when a user turns on a device and expects the Matter controller to in a couple of seconds see that the device should become "reachable".

DNS-SD over mDNS already supports detecting when a device gets powered on, since it announces itself over the network when it gets online. If a DNS-SD resolver is active at that time, it will immediately detect this. If that UDP packet got lost for some reason, the resolver will notice it a bit later due to it sends mDNS network requests every now and then.

My suggestion to mitigate this issue is to, when a "timeout" error occurs for a subscription:

  • Register a DNS-SD resolver for the node as target.
  • As long as the device is present on the network according to DNS-SD (i.e. the operational node can be mapped to an IP address), run the backoff algorithm as usual.
  • As long as the device is not present on the network according to DNS-SD, don't attempt to connect to the node (there is no point since we don't have any IP address anyway). As soon as the status goes from unresolved to resolved, immediately try to connect to it, bypassing or resetting the fibonacci backoff.

The DNS-SD resolver could be unregistered when a session gets established.

I know I can override the default implementation, but I had hoped this was already present as default in the sdk.

Platform

all

Platform Version(s)

No response

Anything else?

No response

@bzbarsky-apple
Copy link
Contributor

@Emill See #25091

@Emill
Copy link
Author

Emill commented Oct 9, 2023

Yep, that's another way to solve the same issue.

I honestly find it a bit strange that SendAutoResubscribeRequest exists and has logic in it to reconnect and resubscribe. Because if you for some reason create two different subscriptions with auto-reconnect, then you suddenly have two independent reconnect flows active with independent timers if I'm not mistaken.

It would make more sense to have some automatic case session establisher that automatically (forever) tries to re-create a case session whenever the previous one drops. When it reconnects, I can invoke my subscribe commands I need, as well as perform other interactions.

#26718 also makes sense to me. At the end of the day, I simply need my controller to be aware of changes happening on devices (all of them or some selection) on my fabric, no Matter how many times a device goes offline and comes online.

Feel free to close the issue if you see it as a duplicate.

@bzbarsky-apple
Copy link
Contributor

Yep, that's another way to solve the same issue.

It's the same way, fundamentally. The "notice when a thing appears on the network" DNS-SD operation is called "browse".

I honestly find it a bit strange that SendAutoResubscribeRequest exists and has logic in it to reconnect and resubscribe.

It's a convenience API for something that controllers have to do all the time....

then you suddenly have two independent reconnect flows active with independent timers if I'm not mistaken.

That's correct. For the use cases this is intended for, there is only one subscription around anyway.

whenever the previous one drops

A CASE session has no concept of "drops".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants