-
Notifications
You must be signed in to change notification settings - Fork 6
Conversation
This would prevent know images from being deleted, e.g. if you've pushed a Git tag like
Same as above -- this could represent a named image that you manually placed into the ECR repository. Again, less likely to represent "cruft" and more likely to be something you don't want to lose.
Just a way to be explicit about images that you want to keep. I don't see any clear way to accommodate this in auto-cleanup mode, but I think it is a fair tradeoff, esp. if the previous two features can be maintained. |
@rclark, I've added the following validations:
A few thoughts:
And a few questions:
|
🤔 The Github API requests would be a pretty significant possible point-of-failure.
Is that what the manual-mode script did? Could we instead use |
@rclark the current status of this PR is:
With regard to tests:
Mind reviewing when you have a moment? Thanks! |
utils.sh
Outdated
local region=$1 | ||
local repo=$2 | ||
cleanup_filepath=./scripts/cleanup.js | ||
cleanup_filepath ${region} ${repo} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a missing $
in front of cleanup_filepath
here.
test/utils.test.sh
Outdated
test_region="us-east-1" | ||
test_repo="test-repo" | ||
|
||
function cleanup_filepath() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah since you're actually calling an external file, you can't just mask the bash function in order to mock this call. This test is working now because of the other missing $
I noted, which would actually break at runtime.
Instead you need to do something like:
- make
cleanup.js
live somewhere in$PATH
when conex is running, perhaps symlinked ascleanup_ecr
- make the call to this script
cleanup_ecr
instead of./path/to/cleanup.js
- in your test, you can mask that
cleanup_ecr
command. I'm not 100% sure if a function will work, or if you'll need to use a alias, or put an actual dummy file in$PATH
before the realcleanup_ecr
👋 @rclark @emilymdubois I spent some time working on this today. Regarding the puzzle of validating if a sha/tag is valid on github, could one feasible solution be the addition of an extra tag to each image called To start off, we would also have to manually add these additional tags to existing images as a one time thing, but this fixes the long term process to clean up images, without depending on the github API. AFAICT the github API rate limits by a github user per https://developer.github.com/v3/#rate-limiting, and we get 60 requests per hour per each mapbox employee (since we all have access to all repositories, however retrieving different people's access tokens will be hard), which would be 😕 considering the number of repositories and images we have. |
67c518e
to
1475884
Compare
If I am following, you could accomplish this by simply prepending "commit" images with |
Yep, exactly - looking for a means to avoid hitting the github API while deleting images. I suggested adding a new tag:
|
1475884
to
353b03b
Compare
f212d40
to
23c699b
Compare
An additional constraint that came up when I spoke to @emilymcafee last week is the constraint of being cautious when we delete images that correspond to merge commits - so we would leave something like 50 merge commits around in each repository. The question was really about how to identify these merge commits. I spent some time looking at Github Payloads for various kinds of merge commits, since github doesn't have a clear identifier for these commits. What I saw: Regular merge commit
Squashed merge commit
Rebased merge commit
I did write some code for parsing the above, which works on all my test fixtures, but it seems like a nicer way to think of this, is to save the 50 latest images from the default github branch, as opposed to saving merge commit images, since these are both analogous - since most images on a default branch are built is via a merge commit, barring other noisy commits like edits to documentation etc. The only problems I can think of with this approach is when people actively develop on the default github branch. |
11742b0
to
0d66127
Compare
6eb4d8a
to
1b8c9b5
Compare
@arunasank let's hold on new commits here for a minute -- we need to be able to clearly describe the behavior we're going for. So far I'm still pretty confused, and given that the tests aren't passing, I'm not convinced that this is even doing what it intends to do. |
After discussion with @arunasank this morning, I'm running with this description. Please let me know if this is clear.
|
* cleanup refactor, tests pass * remove extraneous node.js packages
Next step is more exhaustive testing on a staging system. Here are @arunasank's suggestions: For example, I think making sure that all of these cases work, would be really useful.
And cases from all of the 27 possibilities. At the very least, I think we should at least test the cases, where the total number of images is less than the max, = max, > max(which you already tested. |
This last commit sets up a testing framework where you provide the initial number of generic, priority, and custom images, and also the expected final numbers. The test setup code
This test code is long and convoluted, but it seems like the right way to create and persist the test rubric that @arunasank suggested we try out in staging. Some of the tests pass, and others fail. What this means is that I still cannot easily answer the question "given ECR initial state X, what will be the cleanup outcome Y". I'm not comfortable running this out until its easy to correctly answer that question. |
Worked through my problems with the complex test suite and uncovered a couple of issues in our logic:
Generally speaking, I think we should consider abandoning this automated approach. We can ask support for limit increases on repositories that are filing up. We can also build better tooling for manual-mode selection of images to remove from a repository. Automation feels complex and error prone, and the risk of removing an image that is in use is pretty serious. |
@rclark @arunasank are we good to close this pull request in favor of rate limit increases and tools for manually deleting images? |
I think I am 👍 on closing here, please close if you agree @arunasank. |
Refs #68
This PR initializes an alternative way to cleanup ECR images to keep the registry below 1,000 images on a continual basis.
I've added steps to
scripts/send-job.sh
that:There are a few
scripts/cleanup.js
features that I haven't implemented in this PR:@rclark, would love to chat about the original intent of each of these features so I can evaluate if they're worth persisting.