Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9.1.0 - IAM role management #395

Merged
merged 14 commits into from
Jun 13, 2024
Merged

9.1.0 - IAM role management #395

merged 14 commits into from
Jun 13, 2024

Conversation

mapsam
Copy link
Member

@mapsam mapsam commented May 17, 2024

What changed?

This is a change to the 9.x branch of ecs-watchbot that merges some of the IAM roles to reduce the number of roles created per stack. It allows the user to pass a pre-defined autoscaling role with the options.autoscalingRoleArn parameter, which enables a user to re-use a role throughout multiple stacks if they desire.

Role Retain Description
WatchbotRole this is the primary role and is still created for each stack. It now has multiple policies that encompass the permissions from the two additional lambda roles
WatchBotLambdaScalingRole this role has been removed and is now a policy in WatchbotRole
WatchBotLambdaTotalMessagesRole this role has been removed and is now a policy in WatchbotRole
WatchbotScalingRole ❌* this role is now conditionally created if the user does not provide a string or ref with options.autoscalingRoleArn when building the template

Docs are included in the README to provide guidance on what permissions the auto-scaling role requires in order to work properly.

10.x vs. 9.x

Given ecs-watchbot has been migrated to a CDK stack in 10.x, this release focuses exclusively on a 9.x backport to support the most users. 10.x roles will be updated as well but in a separate release. The watchbot binaries are still being built by this branch and maintained by Mapbox.

Test binaries can be found at the following location:

https://s3.amazonaws.com/watchbot-binaries/linux/v9.1.0-dev.3/watchbot

Deploy test

I was able to create a new stack called ecs-watchbot-iam-test-staging successfully with the following configuration (some options are obfuscated):

const bot = watchbot.template({
  cluster: cf.ref('Cluster'),
  service: 'ecs-watchbot-iam-test',
  serviceVersion: cf.ref('GitSha'),
  command: 'node index.js',
  env: { 
    AWS_STACK_NAME: cf.stackName 
  },
  maxSize: 5,
  reservation: { memory: 512 },
  notificationTopic: cf.importValue('val'),
  fargatePublicIp: 'ENABLED',
  fargateSecurityGroups: [cf.importValue('val')],
  fargateSubnets: cf.split(',',  cf.importValue('val')),
  capacity: 'FARGATE_SPOT',
  autoscalingRoleArn: 'arn:aws:iam::ACCOUNT:role/ecs-watchbot-role',
});

I verified the following:

  • Only 1 IAM role is created ✅
  • Stack custom metrics are created ✅
  • Stack scales out when messages are > 0 for more than 5 minutes ✅
  • Stack scales in when messages are < 1 for more than 10 minutes ✅
  • ECS service scaleout works as expected ✅
  • Utility lambdas succeed ✅

Screen Shot 2024-05-21 at 11 24 15 AM

Screen Shot 2024-05-21 at 11 22 02 AM

Screen Shot 2024-05-21 at 11 39 27 AM

Screen Shot 2024-05-21 at 11 12 46 AM

Screen Shot 2024-05-21 at 11 05 48 AM

Looking at metrics is a nice way to visually see the service scale out when messages are >0 and scale in when messages are <1. I sent 100 messages in bulk to the test service, then purged the queue after scale out was confirmed. Subsequently, the service scaled in.

Screen Shot 2024-05-21 at 11 39 05 AM

Checklist

Complete the steps below before merge where applicable:

  • I included JIRA ticket code in the PR header
  • I read and understood the CONTRIBUTING.md doc.
  • I used conventional commits for the PR title.
  • I used proper typescript comments to document the changes made
  • I made changes to relevant tests or added new tests if applicable
  • I updated the package.json, package-lock.json and readme.md to reflect the new correct Watchbot version
    • DO NOT MERGE WITHOUT MAKING THESE CHANGES
  • I updated the changelog.md

Copy link

@LandonHarvey LandonHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM the only thing is the question on whether we want to default to creating a role if one is not provided.

docs/building-a-template.md Show resolved Hide resolved
@mapsam mapsam marked this pull request as ready for review May 21, 2024 18:40
@mapsam mapsam requested a review from a team as a code owner May 21, 2024 18:40
@@ -233,7 +233,7 @@ const Resources = {
Owner: 'mapbox',
Repo: 'ecs-watchbot',
PollForSourceChanges: 'false',
Branch: 'master',
Branch: 'mapsam/iam-role-options',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ reminder to change this value to 9.x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't Forget

@@ -246,7 +247,7 @@ module.exports = (options = {}) => {
Statement: [
{
Effect: 'Allow',
Principal: { Service: ['ecs-tasks.amazonaws.com'] },
Principal: { Service: ['ecs-tasks.amazonaws.com', 'lambda.amazonaws.com'] },
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The roles created in ma-iam only had the service principal of ecs.amazonaws.com. Will that cause issues when interacting with the lambdas and should we add that in?

@mapsam
Copy link
Member Author

mapsam commented May 23, 2024

Just wrapped up another round of testing and this time tested both a fargate setup and an ec2 cluster setup, both seem to work as expected:

Fargate

Screen Shot 2024-05-23 at 1 42 28 PM

EC2

Screen Shot 2024-05-23 at 1 42 47 PM

These are both using scaling roles that look like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "application-autoscaling:*",
                "cloudwatch:DescribeAlarms",
                "cloudwatch:PutMetricAlarm",
                "ecs:DescribeServices",
                "ecs:UpdateService"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

And a trust policy using the principal ecs-tasks.amazonaws.com

@mapsam mapsam merged commit da57105 into 9.x Jun 13, 2024
1 check passed
@mapsam mapsam deleted the mapsam/iam-role-options branch June 13, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants