Skip to content

Commit

Permalink
fix: Align UnhealthyInstance alarm period with underlying metric period
Browse files Browse the repository at this point in the history
GuUnhealthyInstancesAlarm is built on top of the UnHealthyHostCount
metric, which the Load Balancer measures and posts in 60-second
intervals.

Having the alarm period set to 5 minutes requires cloudwatch alarms
to bucket the underlying metric into buckets of 5, and pick (in our
case) a maximum value. The result of this operation forms the basis
for the alarm to calculate the set of `alarm data points` used to
decide if we are in an alarm state or not.

The results of this bucketing operation can be unstable, as
cloudwatch alarms operate on a rolling window basis. This makes the
triggering of the alarm itself unstable, prone to false-recoveries
upon initial alarm, and false alarms upon recovery.

By setting the alarm period to the same period as the underlying
metric they become synchronised, and alarm conditions become much
more stable.
  • Loading branch information
jorgeazevedo committed Jan 27, 2022
1 parent a7c03c0 commit 54ef1ef
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions src/constructs/cloudwatch/ec2-alarms.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,8 @@ export class GuUnhealthyInstancesAlarm extends GuAlarm {
constructor(scope: GuStack, props: GuUnhealthyInstancesAlarmProps) {
const alarmName = `Unhealthy instances for ${props.app} in ${scope.stage}`;

const period = Duration.minutes(5);
const evaluationPeriods = 12;
const period = Duration.minutes(1);
const evaluationPeriods = 60;
const evaluationInterval = Duration.minutes(period.toMinutes() * evaluationPeriods).toHumanString();

const alarmDescription = `${props.app}'s instances have failed healthchecks several times over the last ${evaluationInterval}.
Expand All @@ -80,7 +80,7 @@ export class GuUnhealthyInstancesAlarm extends GuAlarm {
treatMissingData: TreatMissingData.NOT_BREACHING,
threshold: 1,
comparisonOperator: ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
datapointsToAlarm: 6,
datapointsToAlarm: 30,
evaluationPeriods,
};
super(scope, AppIdentity.suffixText(props, "UnhealthyInstancesAlarm"), alarmProps);
Expand Down

0 comments on commit 54ef1ef

Please sign in to comment.