Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option for complex matching against s3 buckets #113

Merged
merged 35 commits into from Jun 15, 2020

Conversation

rhoboat
Copy link
Contributor

@rhoboat rhoboat commented May 5, 2020

Addresses #108

Purpose

Allow terminating AWS resources via more complex matching rules via a config file.

Scope

This PR is scoped to support nuking s3 buckets based on matching by name.

CLI changes

I propose we allow something like this:

cloud-nuke aws --config path/to/file.yaml

Yaml config format

The structure of the config should reflect the way AWS describes resources. S3 buckets have names and regions, so for the scope of this project:

s3:
  include:
    names_regex:
      - ^alb-alb-.*-access-logs$
      - ...

The proposed format at a meta level is this:

[RESOURCE_TYPE]:
  [include | exclude]:
    [FIELD_PLURAL][_regex]:
      - [ list of matchers ]
         OR
     [key]: [value]

where

  • resource_type can be s3, iam_role, etc. We are punting on supporting region for now.
  • include has rules that a resource must match to be nuked
  • exclude has rules that a resource must not match to be nuked
  • matching key looks like:
    • tags_regex: would accept a list of regular expressions
    • names_regex: would accept a list of regular expressions
    • regions (assumes plaintext, exact matching)
  • for resources like ec2_instance, we need to support key-value maps
    • e.g. Name: ^test_.*

Looking Ahead

[expand]

Eventually, we might support something like:

ec2_instance:
  include:
    tags_regex:
      Name: ^test_.*

iam_role:
  include:
    names_regex:
      - ^test_.*
      - ^cloud-nuke-TestListEksClusters-.*

s3:
  include:
    names_regex:
      - ^alb-alb-.*-access-logs$

The following would nuke only s3 buckets matching the names_regex but that are not in us-west-1, if called with cloud-nuke aws --resource-type s3 --config path/to/config.yaml:

s3:
  include:
    names_regex:
      - ^alb-alb-.*-access-logs$
  exclude:
    regions:
      - us-west-1

How to test locally

[expand]
  1. Clone this repo in your regular development directory (no need to use $GO_DIR)
  2. Log into nuclear wasteland (if you need access we need to grant it to you)
  3. Run some commands like:
    • go run main.go aws --resource-type s3 --config config/mocks/s3_cleanup.yaml
    • try the command with/out resource-type option
    • try the command with/out config option
  4. Run tests against sandbox environment, locally:
    • cd {repo}/aws
      houston exec sandbox -- go test -v -run TestFilterS3

Summary of trial project

[expand]

Notably completed today:

  • Add --config path/to/file.yaml support to the command line.
  • Add support for a simple config yaml including s3 buckets by name, using regexp.
  • Confirm via manual testing that these following commands work:
    • go run main.go aws: nuke all s3 buckets
    • go run main.go aws --config ./config/mocks/s3_include_names.yaml: nuke only s3 buckets in all regions that match any of the regexps.
    • go run main.go aws --config ./config/mocks/s3_include_names.yaml --region us-east-1: nuke only s3 buckets in us-east-1 that match any of the regexps.

Notably not done:

  • Add test coverage in config/config_test.go
  • Support other kinds of keys in the yaml config, like exclude_names_regex, exclude_names, and include_names.

Of course this is still far from complete, but might set some ground work for the future. There are still a few TODOs in the code that signal what might be worth checking. One caveat to mention: I haven't written go in years and was never an expert, so some of the conventions might be lost on me. (But that doesn't make it any less fun for me!)

@rhoboat rhoboat marked this pull request as ready for review May 5, 2020 16:41
@yorinasub17
Copy link
Contributor

I'm not sure if the aws specifier is important to include, so could it be this?

IIRC, urfave/cli (the CLI framework we are using) is not ergonomic enough in the argument parsing such that if you do the latter option, you have to always provide it there before the subcommand. This may or may not be intuitive. E.g.:

cloud-nuke --config /path/to/config aws --region us-east-1

Given that, I think it would probably be better to repeat the args for each subcommand for CLI consistency.

@rhoboat
Copy link
Contributor Author

rhoboat commented May 5, 2020

@yorinasub17 Updated the description to reflect that. Thanks for explaining why that's important.

@yorinasub17
Copy link
Contributor

I propose we use a config format that is universal, consistent with whatever we might already be using elsewhere, and something users use a lot. Does yaml fit the bill?

Yaml makes sense to me!

I propose we use the _filter to be clear, but I'm open to thoughts here.

I prefer being more explicit, e.g. include_names_regex. This makes it clear that:

  • Each entry is a regex.
  • The regex is for inclusion, and not exclusion. This also allows us to expand to include exclude_names_regex in the future.
  • The regex matches on names.

@yorinasub17
Copy link
Contributor

NIT: tag_filter should be a map instead of a list of maps -

ec2_instance:
  tag_filter:
    Name: ^test_.*

@rhoboat
Copy link
Contributor Author

rhoboat commented May 5, 2020

I prefer being more explicit, e.g. include_names_regex. This makes it clear that:

* Each entry is a regex.

* The regex is for inclusion, and not exclusion. This also allows us to expand to include `exclude_names_regex` in the future.

* The regex matches on names.

So, the yaml format could be something like this:

[resource_type]:
  [include|exclude]_[field_plural]_[format]:
    - [ list of matchers ]

Maybe we assume plaintext matching if they only specify include_names? Or we require include_names_plaintext?

About the ec2_instance one, I don't think the Name: ^test makes sense, how about this?

ec2_instance:
  include_tags_regex:
    - ^test_.*

Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start! I think having this in a config file makes sense and being able to specify regions would be useful. I could use a bit more clarification around the edge cases though, as mentioned in the comments below.

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
@yorinasub17
Copy link
Contributor

yorinasub17 commented May 5, 2020

About the ec2_instance one, I don't think the Name: ^test makes sense, how about this?

Well, tags are key-value pairs so what you want to do is filter by tags that match keys and values. So you either need to be specific about matching the tag keys (which I think is what you were going for in the alternative proposal), or matching values of specific tags (which is what I was going for in my suggestion). Does that make sense?

EDIT: To put another way, I think you need two kinds of tags filters. One for matching keys, and one for matching the key-value pair. This is similar to the AWS filters system.

@yorinasub17
Copy link
Contributor

Maybe we assume plaintext matching if they only specify include_names? Or we require include_names_plaintext?

I think assuming include_names as plain text, exact matches is a reasonable assumption to make. 👍

@rhoboat
Copy link
Contributor Author

rhoboat commented May 5, 2020

Okay, I get it now about the key-value pair matching, for tags. Updating the description now.

@rhoboat
Copy link
Contributor Author

rhoboat commented May 5, 2020

Supporting regions as a high level "resource type" is a little bit inconsistent with the rest of the config format. I think we could hash that out later? Maybe regions are just special.

I think everything else feels consistent enough. Let me know if we want to move forward with implementing something for nuking s3 buckets by name.

@yorinasub17
Copy link
Contributor

+1 to focusing on just the s3 names filter and punting on supporting regions!

@rhoboat
Copy link
Contributor Author

rhoboat commented May 5, 2020

@yorinasub17 Okay! Great. Updated the description with what we've discussed.

@yorinasub17
Copy link
Contributor

The updated description LGTM!

Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks like a good start! Had one minor suggestion in terms of organization but overall the end to end direction looks good.

Global style nit: Use go fmt to format the code to be consistent with the rest of the repo (e.g., consistent tabbing).

aws/aws.go Outdated Show resolved Hide resolved
config/config.go Show resolved Hide resolved
@rhoboat
Copy link
Contributor Author

rhoboat commented May 6, 2020

@yorinasub17 Thanks so much for your comments! I don't have time to take care of those concerns at the moment. But they will be helpful for the future.

I've updated the PR description with wrap-up comments (collapsed in a Summary). 🎉

Thanks, all!

@rhoboat rhoboat self-assigned this Jun 4, 2020
@rhoboat rhoboat changed the title Add config option for complex matching against s3 buckets WIP: Add config option for complex matching against s3 buckets Jun 4, 2020
@rhoboat rhoboat changed the title WIP: Add config option for complex matching against s3 buckets Add config option for complex matching against s3 buckets Jun 10, 2020
@rhoboat
Copy link
Contributor Author

rhoboat commented Jun 10, 2020

  • Added some test coverage
  • Supports exclude: now

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
aws/aws.go Outdated Show resolved Hide resolved
aws/s3_test.go Show resolved Hide resolved
aws/s3_test.go Outdated Show resolved Hide resolved
@rhoboat
Copy link
Contributor Author

rhoboat commented Jun 12, 2020

@brikis98 @yorinasub17 Could use another review, possibly someone to pull and run this, too. I noticed we can't delete s3 buckets that have no region, such as the two buckets that are currently stuck in our nuclear wasteland account. Even the AWS UI doesn't let me delete them.

@zackproser
Copy link
Contributor

@brikis98 @yorinasub17 Could use another review, possibly someone to pull and run this, too. I noticed we can't delete s3 buckets that have no region, such as the two buckets that are currently stuck in our nuclear wasteland account. Even the AWS UI doesn't let me delete them.

Happy to guinea pig this at some point. I haven't used cloud-nuke yet so it would be good learning for me. LMK if there are specific configs / accounts you'd like flexed!

@rhoboat
Copy link
Contributor Author

rhoboat commented Jun 12, 2020

@zackproser I'm using the nuclear wasteland account, and if you look at s3_test.go, search for Filter you'll find a test that runs through 3 different types of mock config files. The config_test.go file also references a bunch of mock config files in the config/mocks dir. Look through there to see what kinds of things I'm testing.

I'd love if we found a config file that actually breaks functionality.

I have updated the description to describe how to pull and run and test this out!

Copy link
Contributor

@yorinasub17 yorinasub17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only had minor nits regarding testing, and one feature improvement for the future.

Let me kick off a build now so you get one test cycle. I'll give you the instructions for kicking off forks in slack.

README.md Show resolved Hide resolved
aws/s3_test.go Show resolved Hide resolved
aws/s3_test.go Outdated Show resolved Hide resolved

if !reflect.DeepEqual(configObj, emptyConfig()) {
assert.Fail(t, "Config should be empty, %+v\n", configObj)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but perhaps in the next iteration we should look into seeing if this can be thrown as an error? It would be useful to catch typos (e.g., if I mistyped include, I would want it to give me an error than move along as if it didn't exist!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be awesome. Any unparseable key in the config file should raise an informative error.

@yorinasub17
Copy link
Contributor

Looks like the CI build failed on TestFilterS3Bucket_Config, with error:

--- FAIL: TestFilterS3Bucket_Config (17.52s)
    --- FAIL: TestFilterS3Bucket_Config/Exclude (10.54s)
        s3_test.go:560: 
            	Error Trace:	s3_test.go:560
            	Error:      	Not equal: 
            	            	expected: 6
            	            	actual  : 7
            	Test:       	TestFilterS3Bucket_Config/Exclude

This runs in phxdevops, so perhaps there's a bucket in there that is matching?

@rhoboat
Copy link
Contributor Author

rhoboat commented Jun 15, 2020

When I run this locally against phxdevops there are 191 buckets matching config/mocks/s3_exclude_names.yaml so that can't be what's happening, right?

houston exec dev -- go run main.go aws --resource-type s3 --dry-run \
  --config config/mocks/s3_exclude_names.yaml

@rhoboat
Copy link
Contributor Author

rhoboat commented Jun 15, 2020

Just fyi, we are running against the sandbox account, I believe. Unless these labels are wrong..

Using environment variables from project settings and/or contexts:
  ...
  SANDBOX_AWS_ACCESS_KEY_ID=**REDACTED**
  SANDBOX_AWS_SECRET_ACCESS_KEY=**REDACTED**

@rhoboat
Copy link
Contributor Author

rhoboat commented Jun 15, 2020

This last test run passed. Rerunning.

@rhoboat
Copy link
Contributor Author

rhoboat commented Jun 15, 2020

Running these tests locally against both phxdevops and sandbox.

e.g.:

cd cloud-nuke/aws
houston exec sandbox -- go test -v -run TestFilterS3
houston exec phx -- go test -v -run TestFilterS3

Okay, figured it out. It's related to which random region is chosen to create these test buckets and filter against. So we need to choose a non-random region for this test, or else first nuke all test buckets in whatever random region the test chooses.

@zackproser
Copy link
Contributor

zackproser commented Jun 15, 2020

@zackproser I'm using the nuclear wasteland account, and if you look at s3_test.go, search for Filter you'll find a test that runs through 3 different types of mock config files. The config_test.go file also references a bunch of mock config files in the config/mocks dir. Look through there to see what kinds of things I'm testing.

I'd love if we found a config file that actually breaks functionality.

I have updated the description to describe how to pull and run and test this out!

Thanks for updating with testing instructions! Just a quick update to say the happy paths you specified are working for me and I had zero issues running this against nuclear-wasteland as described. I'll try to flex it a bit more as you suggest.

// Create test buckets
awsParams, err := newS3TestAWSParams()
// Create AWS session in ca-central-1
awsParams, err := newS3TestAWSParams("ca-central-1")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcode the region for this test only.

cleanupBuckets, err := getAllS3Buckets(awsParams.awsSession, time.Now().Add(1*time.Hour), []string{awsParams.region}, "", 100, *configObj)
require.NoError(t, err, "Failed to list S3 Buckets in ca-central-1")

nukeAllS3Buckets(awsParams.awsSession, cleanupBuckets[awsParams.region], 1000)
Copy link
Contributor Author

@rhoboat rhoboat Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way to do this other than 1. list all the buckets, 2. nuke the listed buckets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this is the model of how cloud nuke works, so unfortunately, no this is the only way for now.

@rhoboat rhoboat merged commit c0ace86 into gruntwork-io:master Jun 15, 2020
@rhoboat rhoboat deleted the matching-s3-108 branch July 7, 2020 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants