-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat gh project 3 heartbeat calculation improvement #89
Feat gh project 3 heartbeat calculation improvement #89
Conversation
enforce a stricter health check record update condition check on seq
if in the classic way(without device info), heartbeat with outdated sequence will be discarded
@@ -242,6 +251,9 @@ export class ConstantIntervalHeartbeatSyncStrategy implements HeartbeatSyncStrat | |||
this.result = HealthCheckResult.OnTime; | |||
// no old next heartbeat time for the first heartbeat, use the current arrival time. | |||
oldNextHeartbeatTime = heartbeatArriveTime; | |||
// no old record for reference, use the seq provided by the device. If the seq is | |||
// NaN, it means no sequence provided by the device, then use 0. | |||
oldSeq = (isNaN(deviceSyncInfo.sequence) && 0) || deviceSyncInfo.sequence; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(isNaN(NaN) && 0) || undefined
-> undefined
you should use a ternary operator instead
isNaN(deviceSyncInfo.sequence) ? 0 : deviceSyncInfo.sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -252,9 +264,17 @@ export class ConstantIntervalHeartbeatSyncStrategy implements HeartbeatSyncStrat | |||
nextHeartbeatTime: nextHeartbeatTime, | |||
syncState: HeartbeatSyncState.InSync, | |||
syncRecoveryCount: 0, // sync recovery count = 0 means no recovery needed | |||
seq: 1, // set to 1 because it is the first heartbeat | |||
// use the device sequence if it exists or 1 as the initial sequence | |||
seq: (!isNaN(deviceSyncInfo.sequence) && deviceSyncInfo.sequence) || 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ternary pls, this one is probably fine but the ternary is clearer.
!isNaN(deviceSyncInfo.sequence) ? deviceSyncInfo.sequence : 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if (useDeviceSyncInfo) { | ||
delayCalculationMethod = 'by device send time'; | ||
// check if the sequence is in an incremental order compared to the data in the db | ||
// if not, the heartbea should be marked as outdated and to be dropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heartbeat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// in this situation, there are race conditions happening between some | ||
// heartbeat requests. The reason is one autoscale handler is taking much | ||
// longer to process another heartbeat and unable to complete before this | ||
// heartbeat arrives at the handler (by a parallel cloud function thread). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically: 'parallel cloud function process' (maybe even on a separate machine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// for the other hb (new seq > old seq + 1), the delay cannot be calculated | ||
// thus discarding the delay calculation, and trust it is an on-time hb | ||
else { | ||
delay = -1; // on-time hb must have a negative delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like it would be cleaner to just have null
and then handle the null
case later. -1ms
delay will be shown in the logs otherwise?
To properly account for this situation I guess you would need to store the last 3 heartbeat sequnce numbers in the database so you could retroactively count the missing sequence numbers?
10 -> 7 8 9 (+10): OK
12 -> 8 9 10 (+12): 1 missing...
11 -> 9 10 12 (+11): OK again
This is a simplified diagram for explanation. The final solution would need one record per heartbeat.. Could store the timestamp also so that could be (re)considered when the missing entries appear later.
(afterward delete them if the sequence number is lower than seq - 10 or so?)
This would allow you to calculate the loss count at each heartbeat based on multiple records from the db, rather than storing one record and overwriting each time and having issues with race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea! I believe since we now have the sequence from the sender, we can also make good use of the sequence. Let me think it again.
// calculate delay using the arrive time (classic method) | ||
else { | ||
// NOTE: | ||
// heartbeatArriveTime: the starting time of the function execution, considerred as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considerred -> considered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
snapshot | ||
); | ||
// NOTE: strictly update the record when the sequence to update is equal to or greater | ||
// than the seq in the db ton ensure data not to fall back to old value in race conditions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ton ensure -> to ensure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
}, | ||
snapshot | ||
); | ||
// NOTE: strictly update the record when the sequence to update is equal to or greater |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping multiple heartbeat records instead of overwriting them might improve this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! The current heartbeat monitoring has been improved. Will take this idea into consideration next time.
…tings remove caching of loadSettings
Bumping version to 3.3.2
clean stale health check records before redo primary election
latest QA/DEV release created: 3.3.3-dev.1 |
core/autoscale-core.ts
Outdated
// the action for updating primary record | ||
let action: 'save' | 'delete' | 'noop' = 'noop'; | ||
let redoElection = false; | ||
let reloadPrimarRecord = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reloadPrimaryRecord ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
points += this.weightMethod1(rec); | ||
points += this.weightMethod2(rec, allHealthCheckRecords); | ||
points += this.weightMethod3(rec, allHealthCheckRecords); | ||
points += this.weightMethod4(rec); | ||
points += this.weightMethod5(rec, allHealthCheckRecords); | ||
points += this.weightMethod6(rec, allHealthCheckRecords); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the weight methods should have descriptive names instead of 1 2 3 4 5 6....
haveChecksumWeight() + sharedChecksumWeight() + groupedChecksumWeight() + ... etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some weigh methods have mixed and complicated conditions which are hard to have a self-descriptive name so I gave up naming them in that way. I prefer to keep their naming well-formatted and with a better jsdoc. It also does the job. Any good IDE could display the information properly.
@jamie-pate Let me know whether the self-descriptive naming is still strongly desired to apply here in this case or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* @returns {number} point | ||
*/ | ||
weightMethod1(rec: HealthCheckRecord): number { | ||
return (rec.deviceChecksum !== null && 1) || 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer rec.deviceChecksum !== null ? 1 : 0
misusing boolean operators instead is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
432a578
to
971b741
Compare
207e7ff
into
staging_project_3_heartbeat_calculation_improvement
this PR target the issue #84
resolve #84 #85