Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent possible duplicate ECS services. #884

Merged
merged 11 commits into from
Jun 17, 2016
Merged

Prevent possible duplicate ECS services. #884

merged 11 commits into from
Jun 17, 2016

Conversation

ejholmes
Copy link
Contributor

@ejholmes ejholmes commented Jun 17, 2016

After #878 I discovered an issue where duplicate ECS services got created from the same CloudFormation update. This does 2 things to prevent that from happening:

  1. I'd recommend setting a 30 minute visibility timeout on the sqs queue. We currently give custom provisioners 20 minutes to complete their work, so this should be more than enough time.
  2. We set the ClientToken property to ensure 100% idempotency of the CreateService call.

@ejholmes
Copy link
Contributor Author

ejholmes commented Jun 17, 2016

Ok, after testing locally, I discovered that the ClientToken is linked to the service name, which means adding the ClientToken had no effect, since we were generating a random, non-deterministic postfix and appending that to the service name when creating a service.

To fix this, I changed it so that the postfix is generated deterministically based on the properties of the ECS service that would require a replacement, if they were to change.

Functionally, this works well and guarantees that duplicate ECS services don't get created, but I'm not a fan of the length and format of the postfix (base64 encoded sha1):

If anyone has any suggestions on how to shorten this, but keep it sufficiently unique and avoid collisions, I'm all ears. Maybe truncating the sha would be fine?

@ejholmes
Copy link
Contributor Author

ejholmes commented Jun 17, 2016

Maybe it's fine the way it is. My concerns are mostly cosmetic =.

It does push things closer to the limit of the size of the service name (255) but you still have 228 characters, which is a pretty massive app + process name.

@ejholmes ejholmes force-pushed the dedup branch 2 times, most recently from a70ba39 to 323ca08 Compare June 17, 2016 06:41
@ejholmes
Copy link
Contributor Author

ejholmes commented Jun 17, 2016

Ok, really happy with this now. The postfix is now the base62 encoded value of the Sum64 of the hash function, which means the maximum size of the string is 11 characters, and it's alphanumeric.

I also pulled in https://github.com/mitchellh/hashstructure, which is a really nice library for generating a hash of a go struct.

@@ -0,0 +1,29 @@
// Package base62 implements conversion to and from base62. Useful for url shorteners.
package base62
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was stripped from an internal project.

Role *string
TaskDefinition *string
TaskDefinition *string `hash:"ignore"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty nifty. Fields that can be updated, we just add this tag.

// is deterministic based on the non updateable fields of the
// service, otherwise ClientToken has no effect on idempotency.
func postfix(p *ECSServiceProperties) (string, error) {
h, err := p.Hash()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think of, it would be sufficient to just hash the request id instead of the properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, hashing the request id would be better. If we hash the props, you could potentially get into a scenario where a service gets re-created with the same name as a service that was previously deleted. We've seen a lot of weird behavior when that happens (complaining about the load balancer not existing, saying it's in a draining state, etc), so best to avoid it.

@ejholmes
Copy link
Contributor Author

Ok, simplified this a bunch so we just generate a hash of the request id, then use that as a postfix for the service name.

Think this is good to review now.

@@ -292,7 +300,8 @@ func TestRequiresReplacement(t *testing.T) {
},
}

for _, tt := range tests {
for i, tt := range tests {
t.Log(i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

@mwildehahn
Copy link
Contributor

👍

"Type": "AWS::SQS::Queue"
"Type": "AWS::SQS::Queue",
"Properties": {
"VisibilityTimeout": "1800"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really long - you handle de-duping requests here as well, right? Otherwise we could see things being re-submitted 30 minutes later after a crash or something. In the case where de-duping is handled, what's the idea behind making this so long? Would 5 minutes be sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. Probably the best thing to do would be to leave the default, and then use the ChangeMessageVisibility api call to increase it when needed.

I'll revert the current change for now, since the de-duping is sufficient to fix the current issue.

@ejholmes ejholmes merged commit b836b09 into master Jun 17, 2016
@ejholmes ejholmes deleted the dedup branch June 17, 2016 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants