Skip to content

Commit

Permalink
Add minSize and maxSize of service scaleup and scaledown, deadletter …
Browse files Browse the repository at this point in the history
…queue threshold, info to doc (#211)

* Closes #208, #207, #206, #182, #149, #72, #15

(cherry picked from commit 8de328df79ccf52b8d612c625891555808c2fa0e)

* Add minSize as option

* update jest tests

* Change MinSize to 0

* update jest

* identation and minSize to 0

* Add deadletterThreshold info in Worker-retry-cycle
  • Loading branch information
tapaswenipathak committed Jun 12, 2018
1 parent 3ac7786 commit 5a92803
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 11 deletions.
31 changes: 31 additions & 0 deletions docs/worker-retry-cycle.md
@@ -0,0 +1,31 @@
## Worker retry cycle

Any single Watchbot message will be attempted up to `deadletterThreshold` times. Each time the message fails it is put back into the queue with an increasing backoff interval before it can be attempted again. These intervals look like:

attempt number | backoff interval (s)
--- | ---
1 | 2
2 | 4
3 | 8
4 | 16
5 | 32
6 | 64
7 | 128
8 | 256
9 | 512
10 | --> dead letter queue

This means that after a failure on the 9th attempt, the message will be invisible for at least 512 seconds before it is retried. Providing an increasing backoff interval with an increasing number of failures helps alleviate load that your processing may be placing on external systems.

The default `deadletterThreshold` is 10. The user can adjust while creating the
watchbot service.

Each time a message fails during processing, it is recorded in [the WorkerErrors or FailedWorkerPlacement metrics](./logging-and-metrics.md#custom-metrics). The [WorkerErrors alarm](./alarms.md#workererrors) will trigger whenever there are more than a configured number of failed attempts per minute. The [FailedWorkerPlacement alarm](./alarms.md#failedworkerplacement) will trigger if there are more than 5 failed placements per minute.

If the 10th attempt to process a message fails, then the message will have been retrying for a minimum of 17 minutes, and at this point it will fall into a dead letter queue.

## The dead letter queue

If a message fails processing deadletter Threshold times, Watchbot will stop attempting it. The message will be dropped into a second SQS queue, called a dead letter queue. When there are **any** messages visible in this queue, Watchbot will trip the [DeadLetter alarm](./alarms.md#deadletter). This helps to give visibility into edge-case messages that may highlight a bug in worker code.

Once a message is in the dead letter queue, it will stay there until it is manually removed, or after 14 days. See [the CLI documentation](./command-line-utilities.md#dead-letter) for instructions for interacting with the dead letter queue.
27 changes: 18 additions & 9 deletions lib/template.js
Expand Up @@ -35,9 +35,12 @@ const pkg = require(path.resolve(__dirname, '..', 'package.json'));
* to specify this in order to differentiate the resources.
* @param {String} [options.family] - the name of the the task definition family
* that watchbot will create revisions of.
* @param {Number|ref} [options.workers=1] - the maximum number of worker
* containers that can be launched to process jobs concurrently. This parameter
* can be provided as either a number or a reference, i.e. `{"Ref": "..."}`.
* @param {Number|ref} [options.maxSize=1] - the maximum size for the service to
* scale up to. This parameter can be provided as either a number or a reference,
* i.e. `{"Ref": "..."}`.
* @param {Number|ref} [options.minSize=0] - the minimum size for the service to
* scale down to. This parameter can be provided as either a number or a reference,
* i.e. `{"Ref": "..."}`.
* @param {String} [options.mounts=''] - if your worker containers need to mount
* files or folders from the host EC2 file system, specify those mounts with this parameter.
* A single persistent mount point can be specified as `{host location}:{container location}`,
Expand Down Expand Up @@ -97,6 +100,10 @@ const pkg = require(path.resolve(__dirname, '..', 'package.json'));
* of 1-minute periods before an alarm is triggered. The default is 1 period, or
* 1 minute. This parameter can be provided as either a number or a reference,
* i.e. `{"Ref": "..."}`.
* @param {Number|ref} [options.deadLetterThreshold=10] - Use this parameter to
* control the duration that the number of times a message is delivered to the
* source queue before being moved to the dead-letter queue. This parameter
* can be provided as either a number or a reference, i.e. `{"Ref": "..."}`.
*/
module.exports = (options = {}) => {
['service', 'serviceVersion', 'command', 'cluster'].forEach((required) => {
Expand All @@ -111,14 +118,16 @@ module.exports = (options = {}) => {
env: {},
messageTimeout: 600,
messageRetention: 1209600,
workers: 1,
maxSize: 1,
minSize: 0,
mounts: '',
privileged: false,
family: options.service,
errorThreshold: 10,
alarmThreshold: 40,
alarmPeriods: 24,
failedPlacementAlarmPeriods: 1
failedPlacementAlarmPeriods: 1,
deadletterThreshold: 10
},
options
);
Expand Down Expand Up @@ -211,7 +220,7 @@ module.exports = (options = {}) => {
MessageRetentionPeriod: options.messageRetention,
RedrivePolicy: {
deadLetterTargetArn: cf.getAtt(prefixed('DeadLetterQueue'), 'Arn'),
maxReceiveCount: 10
maxReceiveCount: options.deadletterThreshold
}
}
};
Expand Down Expand Up @@ -440,8 +449,8 @@ module.exports = (options = {}) => {
'/',
cf.getAtt(prefixed('Service'), 'Name')
]),
MinCapacity: 0,
MaxCapacity: options.workers,
MinCapacity: options.minSize,
MaxCapacity: options.maxSize,
RoleARN: cf.getAtt(prefixed('ScalingRole'), 'Arn')
}
};
Expand All @@ -458,7 +467,7 @@ module.exports = (options = {}) => {
MetricAggregationType: 'Average',
StepAdjustments: [
{
ScalingAdjustment: Math.ceil(options.workers / 10),
ScalingAdjustment: Math.ceil(options.maxSize / 10),
MetricIntervalLowerBound: 0.0
}
]
Expand Down
4 changes: 2 additions & 2 deletions test/__snapshots__/template.jest.js.snap
Expand Up @@ -365,7 +365,7 @@ Object {
"StepAdjustments": Array [
Object {
"MetricIntervalLowerBound": 0,
"ScalingAdjustment": 9,
"ScalingAdjustment": 1,
},
],
},
Expand Down Expand Up @@ -447,7 +447,7 @@ Object {
},
"SoupScalingTarget": Object {
"Properties": Object {
"MaxCapacity": 90,
"MaxCapacity": 1,
"MinCapacity": 0,
"ResourceId": Object {
"Fn::Join": Array [
Expand Down

0 comments on commit 5a92803

Please sign in to comment.