Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workflow to clean caches after PR gets closed #2450

Merged
merged 1 commit into from
Feb 1, 2023

Conversation

mauwii
Copy link
Contributor

@mauwii mauwii commented Jan 29, 2023

This helps at least a bit to get rid of all those huge caches

@mauwii mauwii enabled auto-merge January 29, 2023 01:26
@mauwii mauwii added the CI-CD Continuous integration / Continuous delivery label Jan 29, 2023
@keturn
Copy link
Contributor

keturn commented Jan 29, 2023

Do we need to manage this ourselves? Won't the cache manager do this itself as the old ones are no longer used?

@mauwii
Copy link
Contributor Author

mauwii commented Jan 29, 2023

Do we need to manage this ourselves? Won't the cache manager do this itself as the old ones are no longer used?

Those links should give a good explanation:

https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries

https://github.com/actions/cache/blob/main/tips-and-workarounds.md#force-deletion-of-caches-overriding-default-cache-eviction-policy

Personally I would even disable the caching of the Huggingface Models and just redownload them every time, since the cache is more ment to not need compute power to build artifacts over and over, but since it was asked more than once to cache those models, I thought it would be nice from us to delete unneeded caches early and therefore save GH some Storage early 😅

But after I tested this cache cleanup action succesfully now, I would even leave this in place, unecesarry if we disable caching of HF-Models or not (since I also cache the pip packages now, which indeed are not needed anymore after a PR got merged) 🤓

@mauwii
Copy link
Contributor Author

mauwii commented Jan 29, 2023

@lstein @ebr @keturn @psychedelicious

What are your thoughts about caching the HF-Models? Leave enabled or remove them and save a lot of storage space?

@psychedelicious
Copy link
Collaborator

My understanding of the Github actions caching:

  • Caches removed after 7 days of no access
  • Caches are limited to a total of 10GB, and when we go over, GH evicts caches until we are under
  • Caches created by feature branches cannot be used elsewhere, but caches created by base branches (eg main) can be used by feature branches

For our purposes, caching will speed up our tests only by preventing the need for a large download as the test sets up.

So if I understand correctly, I'd say we make our caches on a base branch (main) so PRs can use those caches. Then let GH manage clearing caches as needed.

@keturn
Copy link
Contributor

keturn commented Jan 30, 2023

What are your thoughts about caching the HF-Models? Leave enabled or remove them and save a lot of storage space?

🤷 while working on the diffusers branch, I had them enabled for a while, because I thought having cache locality for the runners would be the polite thing to do for stuff that would otherwise incur tens of gigabytes of traffic upstream.

then the runners would literally break at the end of the run trying to put the cache together, because they didn't have enough disk space to hold all of that and build the archive zip of it at the end (effectively requiring the runner to have twice as much available filesystem space), so we disabled them.

then huggingface.co had a bad day, so someone said we needed to cache these things to provide a buffer against upstream server outages, so I think we added the caching back in again?

honestly I've lost track at this point.

our integration tests seem too demanding for this low-tier GitHub infrastructure, and we'd probably be better served by focusing on containerization and obtaining access to higher-spec hosts.

@mauwii
Copy link
Contributor Author

mauwii commented Jan 31, 2023

My understanding of the Github actions caching:

  • Caches removed after 7 days of no access
  • Caches are limited to a total of 10GB, and when we go over, GH evicts caches until we are under
  • Caches created by feature branches cannot be used elsewhere, but caches created by base branches (eg main) can be used by feature branches

I would say yes

For our purposes, caching will speed up our tests only by preventing the need for a large download as the test sets up.

Can also be the opposite, when somehow the cache has a hickup and restoring suddenly takes up to 60 minutes (where the default timeout triggers)

So if I understand correctly, I'd say we make our caches on a base branch (main) so PRs can use those caches. Then let GH manage clearing caches as needed.

Unfortunatelly it is not easy to make sure that the cache on main is always there, and as soon as it isn't the runners for PRs start to create this 7GB cache over and over again.

But as @keturn mentioned, it also happened that that huggingface was down and no downloads where working at all, then a cache was pretty handy. So I guess there is no single answer to this, which is why I asked for some oppinions 🙈

But well, then I will leave them as they are for now, but then this PR makes a lot of sence, since you know for sure that you won't need a cache for a closed PR anymore 😅

edit: Since it actually just happened again, here is an example of a cache which loads forever

Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give this a try and see whether it helps or hurts in the long run.

@mauwii mauwii merged commit 31146eb into invoke-ai:main Feb 1, 2023
@mauwii mauwii deleted the add/ci/clean-caches branch February 2, 2023 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-CD Continuous integration / Continuous delivery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants