-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic Leader ID timeout selection #9032
Conversation
…how the Leader Id propagation trickle timer is managed. It implements the algorithm proposed in SPEC-1167 for automatic management of the maximum advertisement interval with a modification to set the lower bound on the calculated value to 12.
…mer header as it was not implemented or used.
Size Report of OpenThread
|
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #9032 +/- ##
========================================
Coverage 82.37% 82.38%
========================================
Files 546 547 +1
Lines 74599 74781 +182
========================================
+ Hits 61451 61607 +156
- Misses 13148 13174 +26
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmb-davidsmith , thanks for submitting this. Overall, this looks great. I commented on a number of minor nits.
…fically covers the following: - converts `TrickleTimer::GetLastTimerStart` to return a value rather than taking a parameter for return - makes `TrickleTimer::GetLastTimerStart` private - removes the unused `TrickleTimer::GetIntervalMin` and `TrickleTimer::GetIntervalMax` - adjusts the mechanism for iterating over the router table - moves to use `OT_MIN` and `OT_MAX` for the calculation of the `advertiseIntervalMax` value.
@jwhui, in terms of the issues with the The markdown lint check failed to pull a URL which loads when I go manually, so not sure if there's an adjustment needed there for retries or something in the script. |
It looks like the added preprocessor statements are tripping up |
I've remove the preprocessor statement but it still seems to be failing. From what I can tell it's now something related to a missing javascript library. |
The current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mmb-davidsmith.
Some suggestions below.
src/core/thread/mle_router.cpp
Outdated
} | ||
|
||
uint32_t advertiseIntervalMax = OT_MIN(kAdvertiseIntervalMax, (neighbors + 1) * 4); | ||
advertiseIntervalMax = OT_MAX(12, advertiseIntervalMax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to define constants for 12
and 4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll define these in src/core/thread/mle_types.hpp
mTimeInInterval = aIntervalMax; | ||
} | ||
|
||
TimerMilli::FireAt(newFireTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree that if current mInterval
is longer than new aIntervalMax
we want to emulate as if we used the new aIntervalMax
when we scheduled the current interval. But I feel we may need to consider more cases here. Let me work out some cases (using a diagram) and will post here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, (under mMode == kModeTrickle
) if we have mPhase == kBeforeRandomTime
and mTimeInInterval < aIntervalMax
then the new calculated newFireTime
may be incorrect.
An example:
|<------mInterval = 20 sec --------------------------->|
| | |
|<------- new aIntervalMax = 15 sec ----->| |
| | | |
|<-- mTimeInInterval=11 sec -->| | |
| | |
^
Now
mInterval = 20 sec
,mTimeInInterval = 11 sec
, newaIntervalMax = 15 sec
- Since
mTimeInInterval < aIntervalMax
we keepmTimeInInterval
as is (will remain11
). - We set the
mInterval
to15
. - Since we are in
mPhase == kBeforeRandomTime
,GetLastTimerStart()
will return the start of current trickle interval newFireTime
will point to 15 sec after the start of trickle interval.- When the timer fires, we are still in
kBeforeRandomTime
phase so we do the following:
case kBeforeRandomTime:
// We reached end of random `mTimeInInterval` (aka `t`)
// within the current interval. Trickle timer invokes
// handler if and only if the counter is less than the
// redundancy constant.
mPhase = kAfterRandomTime;
TimerMilli::Start(mInterval - mTimeInInterval);
VerifyOrExit(mCounter < mRedundancyConstant);
break;
Which then start the timer for remaining time to end of current interval calculated as mInterval - mTimeInInterval = 15 - 11 = 4
, so the overall interval becomes 15 + 4 = 19
second long which is not what we want.
I think in this case,
- The timer is already scheduled to fire at
mTimeInInterval
after start (or timet
) which we keep the same, so we don't need to change the timer firetime. - We only need to update the
mInterval
toaIntervalMax
so that after timer fires, we apply the shorter interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to capture the possible cases and list them below. Once we're aligned on the desired behaviour, I'll ensure that my code is correctly implementing it.
-
aIntervalMax
>=mInterval
- doesn't matter timer type, do nothing except updatemIntervalMax
and let it affect the next run(question: what happens if
aIntervalMax
is selected such that the currently running timer will fire at a time less thanaIntervalMax / 2
in Trickle mode. I think technically we're out of spec, but I would be tempted to just leave it to resolve after the next timer run) -
aIntervalMax
<mInterval
- plain timer mode - just reschedule toaIntervalMax
-
aIntervalMax
<mInterval
- trickle mode- in
kBeforeRandomTime
aIntervalMax
<mTimeInInterval
- need to setmInterval
=aIntervalMax
& re-schedulemTimeInInterval
to fire ataIntervalMax
(from start of interval)aIntervalMax
>=mTimeInInterval
- need to setmInterval
=aIntervalMax
- in
kAfterRandomTime
aIntervalMax
<mTimeInInterval
- not possible asmTimeInInterval
is the transition point tokAfterRandomTime
aIntervalMax
>=mTimeInInterval
- need to setmInterval
=aIntervalMax
& re-schedule timer to fire ataIntervalMax
(from start of interval)
- in
Please let me know if you agree with my logic above or if I should make updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good. 👍
I think ii.a
is possible, like this:
|<------ mInterval = 30--------------------------->|
|<-- mTimeInInterval=18 sec -->| ^ |
| | ^ |
|<-- aMaxInterval = 10 ->| | ^ |
| ^ |
Now
But I think we need to do the same in this case as in ii.b
change mInterval
and re-schedule to fire aIntervalMax
from last interval start (which will then fire immediately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be able to combine the cases:
-
The only case that seems to be different is
i.b
. -
In all other cases we can do the same set of things:
- Change
mInterval = aIntervalMax
- Change
mTimeInInterval = aIntervalMax
(it only matters ini.a
case, ignored in other cases and harmless to change) - And re-schedule timer to fire
aIntervalMax
from the last start interval. - And we keep
mPhase
as is.
- Change
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the diagram for ii.a
, that makes it clear that it is possible. As you say, taking the same approach as ii.b
should work.
I think that even in i.b
we need to set mInterval
= aIntervalMax
(according to the logic above)
So with all of this worked through, I think the only change necessary is to update GetLastTimerStart()
to GetLastIntervalStart()
- `TrickleTimer::GetLastTimerStart` and `TrickleTImer::SetIntervalMax` handle kModePlainTimer now - in `TrickleTimer::GetLastTimerStart` the variable `aStartTime` is renamed to `startTime` as it's no longer a parameter - documentation for the private method is removed - `MleRouter::RecalculateAdvertiseInterval`: - removes a superfluous scope - declares all variables at the start of the function - uses existing function `RouterTable::GetNeighborCount` to count the number of neighbor routers - uses the `Min` and `Max` helper functions which are more modern than the `OT_MIN` and `OT_MAX` functions - uses named constants for the multiplier and lower bound - adds constant definitions for the multiplier and lower bound Outstanding Items: - the `TrickleTimer::SetIntervalMax` will need to have adjustments done based on the conclusion of the discussion on the draft PR. - depending on outcome of `TrickleTimer::SetIntervalMax`, the `TrickleTimer::GetLastTimerStart` may be renamed to `TrickleTimer::GetLastIntervalStart` and updated to always return the start of the interval vs the specific timer within the interval.
Are there any resources about how to add new unit tests / integration tests. Ultimately I would like to figure out how to add some unit tests for the trickle timer changes and some simulator tests for the overall functionality. Am I correct in my understanding that the unit tests would need to be a part of this PR, and then for the simulator tests, I would need to add the necessary functions in https://github.com/openthread/ot-ns/tree/main/pylibs/stress_tests and then reference them in a PR in this repo which updated the matrix on the workflow? |
We don't have existing unit tests for the trickle timer. The closest we have is the timer unit tests. In short, we basically implement a mock platform and test the behavior the the specific component. For functional tests, one option is to follow the scripts in tests/scripts/thread-cert. This would allow you to include the tests in this PR. We generally prefer to have tests included with the PR, but I admit there are many PRs where we do not enforce this. |
…sed in the pull request. The primary change is that `GetLastTimerStart` needed to be changed to `GetLastIntervalStart` to ensure all calculations were being started from the beginning of trickle interval instead of the specific phase.
@jwhui - I'm just trying to write some unit tests for the trickle timer and wanted to understand this bit of code from
My understanding of C math is that the only time that this would ever select Should this be simplified? |
@mmb-davidsmith , thanks for pointing this out. I reviewed the code as well and I agree with your conclusions. I think this might have been the result of avoiding a div-by-0 warning from static analyzer on this line of code: openthread/src/core/common/random.cpp Line 148 in 4cba902
But I think the |
…ckle modes. This covers the standard operation of both, but does not yet test starting the set max interval functionality.
@jwhui / @abtink - I've made a new file for implementing my trickle timer tests. Unfortunately I'm getting issues with the pretty tool that I'm not understanding. Specifically it's saying
I have tried putting |
I think tool is suggesting for the method to be declared as uint32_t GetFiredCounter(void) const { return mFiredCounter; } |
Thanks. C++ isn't a language I use regularly so I'm a little less familiar with some of the constructs. |
@jwhui Is it expected that the |
This occurs when codecov API limits are exceeded. Waiting a bit and re-running the job typically resolves the check. |
@@ -169,6 +177,8 @@ class TrickleTimer : public TimerMilli | |||
void FireAtIfEarlier(void) {} | |||
void GetFireTime(void) {} | |||
|
|||
TimeMilli GetLastIntervalStart(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare as const
?
TimeMilli GetLastIntervalStart(void); | |
TimeMilli GetLastIntervalStart(void) const; |
@@ -53,6 +53,78 @@ TrickleTimer::TrickleTimer(Instance &aInstance, Handler aHandler) | |||
{ | |||
} | |||
|
|||
TimeMilli TrickleTimer::GetLastIntervalStart(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeMilli TrickleTimer::GetLastIntervalStart(void) | |
TimeMilli TrickleTimer::GetLastIntervalStart(void) const |
TimeMilli newFireTime = GetLastIntervalStart(); | ||
newFireTime += aIntervalMax; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line after variable declarations?
TimeMilli newFireTime = GetLastIntervalStart(); | |
newFireTime += aIntervalMax; | |
TimeMilli newFireTime = GetLastIntervalStart(); | |
newFireTime += aIntervalMax; |
mTimeInInterval = aIntervalMax; | ||
} | ||
|
||
TimerMilli::FireAt(newFireTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in i.b
case we should not re-schedule the timer.
mInterval = aIntervalMax; | ||
if (mTimeInInterval > aIntervalMax) | ||
{ | ||
mTimeInInterval = aIntervalMax; | ||
} | ||
|
||
TimerMilli::FireAt(newFireTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what I would suggest:
- In all cases we do
mInterval = aIntervalMax;
. - The
if()
is fori.b
case where we exit (and won't change the timer). - In all other cases, we
mTimeInInterval = aIntervalMax
(only matters for1.a
case but harmless to do in other cases), and re-schedule the timer.
mInterval = aIntervalMax; | |
if (mTimeInInterval > aIntervalMax) | |
{ | |
mTimeInInterval = aIntervalMax; | |
} | |
TimerMilli::FireAt(newFireTime); | |
mInterval = aIntervalMax; | |
if ((mPhase == kBeforeRandomTime)) && (aIntervalMax >= mTimeInInterval)) | |
{ | |
ExitNow(); | |
} | |
mTimeInInterval = aIntervalMax; | |
TimerMilli::FireAt(newFireTime); |
VerifyOrExit(IsRouterOrLeader()); | ||
|
||
neighbors = Get<RouterTable>().GetNeighborCount(); | ||
advertiseIntervalMax = Min(kAdvertiseIntervalMax, (neighbors + 1) * kNeighborAdvertiseIntervalMultiplier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use Get<RouterTable>().GetNeighborCount()
directly here (and not define a local neighbors
variable)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also avoid assigning the uint8 return value into the uint32 first, which isn't needed.
Note that there was discussion on only using the Router neighbors that have Link Quality >= 2 for this count.
@@ -0,0 +1,314 @@ | |||
/* | |||
* Copyright (c) 2016, The OpenThread Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2016, The OpenThread Authors. | |
* Copyright (c) 2023, The OpenThread Authors. |
@@ -494,6 +494,11 @@ class MleRouter : public Mle | |||
mDiscoveryRequestCallback.Set(aCallback, aContext); | |||
} | |||
|
|||
/** | |||
* This method recalculates the MLE Advertisement Trickle timer interval based on SPEC-1167 algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the main
branch now all the wording on "This method" and "This function" have been removed.
Also the description is incomplete, because the function also starts the Trickle timer in case it wasn't yet running.
That seems important enough to also have it stated in the function's name.
E.g. RecalculateAdvertiseIntervalAndStartAdvertising()
@@ -881,6 +881,8 @@ void RouterTable::HandleTableChanged(void) | |||
#if OPENTHREAD_CONFIG_HISTORY_TRACKER_ENABLE | |||
Get<Utils::HistoryTracker>().RecordRouterTableChange(); | |||
#endif | |||
|
|||
Get<Mle::MleRouter>().RecalculateAdvertiseInterval(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose that the Adv Trickle timer was not running (for whatever reason - it seems to be possible, looking at the code) and an event comes in that the router table is changed. Is it really intended here to start the Trickle timer now? (which is a side effect of RecalculateAdvertiseInterval()
)
If yes, why wasn't it started here in the original code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took care of this in new PR -> #9307.
Submitted
|
Closing as @abtink has submitted a PR which supersedes this PR. |
This commit adds an initial implementation of the proposed change to how the Leader Id propagation trickle timer is managed. It implements the algorithm proposed in SPEC-1167 for automatic management of the maximum advertisement interval with a modification to set the lower bound on the calculated value to 12.