-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add workflow to clean caches after PR gets closed #2450
Conversation
Do we need to manage this ourselves? Won't the cache manager do this itself as the old ones are no longer used? |
Those links should give a good explanation: Personally I would even disable the caching of the Huggingface Models and just redownload them every time, since the cache is more ment to not need compute power to build artifacts over and over, but since it was asked more than once to cache those models, I thought it would be nice from us to delete unneeded caches early and therefore save GH some Storage early 😅 But after I tested this cache cleanup action succesfully now, I would even leave this in place, unecesarry if we disable caching of HF-Models or not (since I also cache the pip packages now, which indeed are not needed anymore after a PR got merged) 🤓 |
@lstein @ebr @keturn @psychedelicious What are your thoughts about caching the HF-Models? Leave enabled or remove them and save a lot of storage space? |
My understanding of the Github actions caching:
For our purposes, caching will speed up our tests only by preventing the need for a large download as the test sets up. So if I understand correctly, I'd say we make our caches on a base branch (main) so PRs can use those caches. Then let GH manage clearing caches as needed. |
🤷 while working on the diffusers branch, I had them enabled for a while, because I thought having cache locality for the runners would be the polite thing to do for stuff that would otherwise incur tens of gigabytes of traffic upstream. then the runners would literally break at the end of the run trying to put the cache together, because they didn't have enough disk space to hold all of that and build the archive zip of it at the end (effectively requiring the runner to have twice as much available filesystem space), so we disabled them. then huggingface.co had a bad day, so someone said we needed to cache these things to provide a buffer against upstream server outages, so I think we added the caching back in again? honestly I've lost track at this point. our integration tests seem too demanding for this low-tier GitHub infrastructure, and we'd probably be better served by focusing on containerization and obtaining access to higher-spec hosts. |
I would say yes
Can also be the opposite, when somehow the cache has a hickup and restoring suddenly takes up to 60 minutes (where the default timeout triggers)
Unfortunatelly it is not easy to make sure that the cache on main is always there, and as soon as it isn't the runners for PRs start to create this 7GB cache over and over again. But as @keturn mentioned, it also happened that that huggingface was down and no downloads where working at all, then a cache was pretty handy. So I guess there is no single answer to this, which is why I asked for some oppinions 🙈 But well, then I will leave them as they are for now, but then this PR makes a lot of sence, since you know for sure that you won't need a cache for a closed PR anymore 😅 edit: Since it actually just happened again, here is an example of a cache which loads forever |
0694758
to
229514a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's give this a try and see whether it helps or hurts in the long run.
6ce75bd
to
8dfe8fd
Compare
8dfe8fd
to
99cd598
Compare
This helps at least a bit to get rid of all those huge caches