Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIFO queues #279

Merged
merged 2 commits into from
Oct 16, 2018
Merged

FIFO queues #279

merged 2 commits into from
Oct 16, 2018

Conversation

rclark
Copy link
Contributor

@rclark rclark commented Aug 21, 2018

This PR explores adding a configuration option that allows you to launch a watchbot stack that operates against a FIFO queue. On the first pass, the biggest adjustments watchbot and its users need to be aware of:

  • Usually watchbot makes an SNS topic to feed an SQS queue. When an SQS message is received, it is expected that the message came via SNS, which leads to the Subject and Message (SNS message attributes) being parsed and exposed to the worker as environment variables. You can't send messages to a FIFO queue via SNS topic -- instead you have to send them directly to the queue. This means the parsing assumptions that watchbot makes may be invalid. The least invasive approach I can think of here would be to adjust message parsing in a backwards compatible way, and expose the MessageGroupId to the worker as the Subject environment variable, and the Message as the SQS message body.

  • When feeding a message to a FIFO queue, you must provide a MessageGroupId. This acts as a sort of "shard" for the queue -- FIFO behavior is only guaranteed for messages that share an identical MessageGroupId. We need some additional documentation in here to explain this.

  • Scaling behavior in watchbot is based on the total number of messages in the queue. If there are 50 messages, concurrent processing scales up to 50 (or the configured MaxSize if lower). In a FIFO queue, really you would want to scale on the number of MessageGroupIds that are being fed to your queue. If there are 50 messages in the queue, but only 2 MessageGroupIds, then watchbot would launch 50 containers for processing, but only 2 at a time would be able to process messages.

Currently this PR locks the number of containers to 1 if you specify fifo: true, and turns off autoscaling entirely. This is probably not the right approach to take. Given that something like "number of MessageGroupIds" is not a metric that SQS provides in cloudwatch, and given that it is entirely dependent on how the user sends messages to the queue, I don't think we're going to find an ideal scaling behavior here. We should probably re-enable scaling, and document that in a FIFO queue, its important that you consider up-front how many MessageGroupIds your queue will have, and set your stack's MaxSize to that value or lower.

fixes #278
cc @davidtheclark

Copy link
Contributor Author

@rclark rclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib/template.js Outdated
@@ -461,7 +477,7 @@ module.exports = (options = {}) => {
Type: 'AWS::ECS::Service',
Properties: {
Cluster: options.cluster,
DesiredCount: options.minSize,
DesiredCount: !options.fifo ? options.minSize : 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 pinning the number of containers to 1

lib/template.js Outdated
@@ -475,7 +491,7 @@ module.exports = (options = {}) => {
options.placementStrategies;


Resources[prefixed('ScalingRole')] = {
if (!options.fifo) Resources[prefixed('ScalingRole')] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 disabling autoscaling

lib/template.js Outdated
@@ -511,7 +527,7 @@ module.exports = (options = {}) => {
}
};

Resources[prefixed('ScalingTarget')] = {
if (!options.fifo) Resources[prefixed('ScalingTarget')] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 disabling autoscaling

lib/template.js Outdated
@@ -528,7 +544,7 @@ module.exports = (options = {}) => {
}
};

Resources[prefixed('ScaleUp')] = {
if (!options.fifo) Resources[prefixed('ScaleUp')] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 disabling autoscaling

lib/template.js Outdated
@@ -548,7 +564,7 @@ module.exports = (options = {}) => {
}
};

Resources[prefixed('ScaleUpTrigger')] = {
if (!options.fifo) Resources[prefixed('ScaleUpTrigger')] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 disabling autoscaling

lib/template.js Outdated
@@ -734,7 +750,7 @@ module.exports = (options = {}) => {
}
};

Resources[prefixed('LambdaScalingRole')] = {
if (!options.fifo) Resources[prefixed('LambdaScalingRole')] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 disabling autoscaling

lib/template.js Outdated

Resources[prefixed('ScalingLambda')] = {
if (!options.fifo) Resources[prefixed('ScalingLambda')] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 disabling autoscaling

lib/template.js Outdated
@@ -785,7 +801,7 @@ module.exports = (options = {}) => {
}
};

Resources[prefixed('CustomScalingResource')] = {
if (!options.fifo) Resources[prefixed('CustomScalingResource')] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👉 disabling autoscaling

lib/template.js Show resolved Hide resolved
lib/template.js Show resolved Hide resolved
@davidtheclark
Copy link
Contributor

@rclark I added some documentation to this branch. I think that the functionality is tested by the combination of template validation and snapshots. So this is ready for a review.

This branch seems to be working well in our prototype. But I'd be happy to sit on it until the functionality is more proven, if you're hesitant to merge into master.

Copy link
Contributor Author

@rclark rclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidtheclark -- just a couple of documentation points to cover. Then we should merge/tag/publish a minor version when you're ready to roll with your first fifo system.


To use a FIFO queue, set the template option `fifo: true`.

A FIFO queue behaves differently and has some limitations, so you should read the AWS documentation and the details below if you are considering the switch.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we link out here to what AWS documentation is helpful to read?


### Triggering FIFO workers

With a standard queue, you trigger a Watchbot worker by sending an SNS message to the watcher's topic. With a FIFO queue, you will send a message directly to the watcher's FIFO queue, instead. Make sure you have read about the AWS documentation about the expected for these messages (for example, you need to include a `MessageGroupId`).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you have read about the AWS documentation about the expected for these messages

This sentence is a little clumsy. Additional about and expected vs expectations (I think?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha.

@davidtheclark
Copy link
Contributor

@rclark the AWS docs for dead-letter queues suggest not using one for most FIFO queues:

Don’t use a dead-letter queue with a FIFO queue if you don’t want to break the exact order of messages or operations.

Do you think we should worry about this and add the ability to turn off the dead letter queue? Or should we just recommend that you should be cognizant and maybe set the deadletterThreshold to 1000?

@rclark
Copy link
Contributor Author

rclark commented Aug 27, 2018

Perhaps we should get more clarity here, but generally I don't believe this suggestion. If you have a dead letter threshold of 10, then the FIFO queue should try that message 10 times and if it can't process, drop it in the dead letter queue. Now the FIFO queue can move on to the next message. Strictly speaking you are now out of order. This could be important for some applications where maybe the next message literally cannot proceed without the first one succeeding. But for many use cases, that may not be true -- the series of messages may not depend on each other but need ordering for another purpose.

So maybe there's a strictFifo option that doesn't have a dead letter queue? Maybe fifo: 'strict' or fifo: 'lenient' or something?

@davidtheclark
Copy link
Contributor

Yeah, I agree that the suggestion does not apply to our flagship FIFO use case, where we actually do want a dead-letter queue. What do you think of ignoring this situation for now? If in the future somebody needs the strict behavior we could add a 'strict' setting for fifo, in addition to true (which could maybe be treated as the equivalent of 'lenient'). 👍 👎 ?

@rclark
Copy link
Contributor Author

rclark commented Aug 28, 2018

What do you think of ignoring this situation for now?

Sure: build it when the need arises.

@davidtheclark
Copy link
Contributor

davidtheclark commented Oct 14, 2018

I remembered that we need to merge and release this! Everything has been working just fine in our Publisher tests so far. Don't want to let this branch get more stale.

I squashed and rebased off master. @rclark could you take a quick look over the diff to double-check that my rebase didn't introduce any suspect lines? Then I will, as you suggest above, "merge/tag/publish a minor version".

cc @mapbox/frontend-platform

@rclark rclark merged commit 182c5af into master Oct 16, 2018
@rclark rclark deleted the fifo branch October 16, 2018 13:02
@rclark
Copy link
Contributor Author

rclark commented Oct 16, 2018

Published as v4.13.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FIFO queue
2 participants