Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make periodic cache cleanup retention times configurable #7018

Closed
wszeboreq opened this issue Oct 4, 2018 · 34 comments
Closed

Make periodic cache cleanup retention times configurable #7018

wszeboreq opened this issue Oct 4, 2018 · 34 comments
Assignees
Labels
Milestone

Comments

@wszeboreq
Copy link

Expected Behavior

Periodic Gradle cache cleanup was implemented as #1085. Comment #1085 (comment) summarizes the implemented strategy. It would be great to have various hardcoded cleanup algorithm parameters configurable, at least the 7/30 days retention times. This would give the additional flexibility and allow tuning the periodic cache cleanup to fit various possible scenarios and environments.

Current Behavior

Currently the 7/30 days retention times described in the mentioned comment are not configurable.

Context

Our projects use some dependency artifacts having significant disk sizes (e.g. 1 GB). Practically each day we have a new version of majority of these dependencies. This leads to '.gradle/caches/modules-2/files-2.1' grow bigger and bigger each day, leading to free disk space problems on the build servers. 30 days of cache retention time in this scenario may lead to requirement of 30 GB free cache disk space for each bigger dependency.

Your Environment

------------------------------------------------------------
Gradle 4.10.1
------------------------------------------------------------

Build time:   2018-09-12 11:33:27 UTC
Revision:     76c9179ea9bddc32810f9125ad97c3315c544919

Kotlin DSL:   1.0-rc-6
Kotlin:       1.2.61
Groovy:       2.4.15
Ant:          Apache Ant(TM) version 1.9.11 compiled on March 23 2018
JVM:          1.8.0_151 (Oracle Corporation 25.151-b12)
OS:           Windows 10 10.0 amd64
@oehme oehme added a:feature A new functionality from:contributor labels Oct 4, 2018
@marcphilipp marcphilipp self-assigned this Oct 4, 2018
@marcphilipp marcphilipp added this to the 5.1 RC1 milestone Oct 16, 2018
@Barteks2x
Copy link

Would it be also possible to give an option to completely disable it? On a development machine where space usage is not an issue, I don't want to wait for gradle to redownload things just because I didn't actually use gradle in a week.

@marcphilipp
Copy link
Contributor

@Barteks2x That's part of #6084.

@marcphilipp marcphilipp modified the milestones: 5.1 RC1, 5.2 RC1 Dec 4, 2018
@lukeu
Copy link
Contributor

lukeu commented Dec 22, 2018

I praise the auto-cleanup feature, but I work on various projects (like open source, or in-house utils) intermittently. I might dip in only after 3, 6 or 12 months to tweak something. I might be offline / travelling when I do so, and suddenly I find I can't tweak something simple [Update: or even just launch the existing app from my IDE] because my day-to-day work has cleaned out required dependencies.

A thing that has hits me is that I sometimes customise Eclipse's classpath (as a pragmatic workaround or knocking up small utilities quickly) adding .jars directly from gradle's cache. I don't mind updating these occasionally (as new versions come in, old versions go away), but it would be good if I could extend the timeout so that it doesn't happen too often.

Finally, I frequently git-bisect way back in history: multiple years is not uncommon.

Disc space is a non-issue for me. I would use settings like 90/365.

@lukeu
Copy link
Contributor

lukeu commented Dec 22, 2018

Just curious about a workaround. (My main projects will likely stay on 4.10 for some time longer. And hmm.... if any side-project happens to run 4.10 I guess it would zap the cache too.)

Maybe a cron-job? Would it be sufficient to say run gradlew tasks on all the projects I want to ensure aren't "cache flushed"? Or I would actually have to build them?

@marcphilipp
Copy link
Contributor

gradlew tasks would not be sufficient, you'd have to build them in order to keep the dependencies in the cache.

@Clintonio
Copy link

Just throwing in support for this issue. It will be very useful for CI agents to be able to decrease the time period for a company that has a wide range of different builds with different dependency sets but limited disc space on CI agents.

@big-guy big-guy modified the milestones: 5.2 RC1, 5.3 RC1 Jan 22, 2019
@big-guy big-guy added the @core Issue owned by GBT Core label Jan 31, 2019
@big-guy big-guy modified the milestones: 5.3 RC1, 5.4 RC1 Jan 31, 2019
@big-guy big-guy modified the milestones: 5.4 RC1, 6.0 RC1 Mar 4, 2019
@Mahoney
Copy link
Contributor

Mahoney commented Mar 18, 2019

Wouldn’t it make sense to be able to set a maximum cache size, and only use LRU ejection until the cache was below that limit?

@lukeu
Copy link
Contributor

lukeu commented Mar 18, 2019

For the benefit of others watching / finding this, it looks like we can disable it completely with :

org.gradle.cache.cleanup=false

in the gradle.properties. See a7257ab for #6371 (Oct 2018, since v5.0)

Would be great to document this. (I'd been keeping an eye out for such a fix, but I missed it because the discussion here indicated it wasn't done & it appeared under "external contributions" in the 5.0 release notes, rather than closed-issues.)

@marcphilipp
Copy link
Contributor

@lukeu It's not yet documented because it only works when specified in ~/.gradle/gradle.properties.

@wszeboreq
Copy link
Author

Any chances it will be implemented in some upcoming release? We are still very interested in possibility of configuring the retention times - even via some unofficial properties.

@big-guy big-guy modified the milestones: 6.0 RC1, 6.2 RC1 Sep 18, 2019
@big-guy big-guy removed this from the 6.2 RC1 milestone Jan 8, 2020
@skjolber
Copy link

skjolber commented Apr 7, 2020

@marcphilipp @big-guy Any hope for this? See also #9841.

@marcphilipp marcphilipp removed their assignment Apr 8, 2020
@sirinath
Copy link

Another strategy would be is to look at the latest version of each cached dependency and resolve their dependency within the cache. Once this is done delete the rest of the versions.

If there are group and name changes this means some old artefacts may remain. When the artefact is the latest version but not transitively referenced by another recent version of another project then these can be cleaned bases on the least recently used order. Once these are deleted newest artefacts transitively referenced by newest artefacts can be deleted in the least presently used order. This will also deal with group and name changes as these artefacts will become stale as the newer version with a new group or name will be used.

This will reduce the chance of cache misses as most likely what is deleted are old stuff.

@DanySK
Copy link

DanySK commented Jul 1, 2020

I believe there is a difference in the behavior we desire in these two conditions:

  • Development environments -- users typically have multiple projects, and there's intersection among their dependency set. For these cases, the current strategy (possibly with configuration options) works well, if something is unused for long enough, it gets cleared
  • CI environments -- here, typically, every environment deals with a single build. There is no intersection or other projects, so the dependencies of the build at hand are all that needs caching. For this kind of environments, it would make sense to provide a task or a CLI option clearing:
    • non-current wrapper versions
    • jars which are no longer part of the dependencies
    • any file that may trigger a cache repack (timestamps, logs, etc.)

@skjolber
Copy link

skjolber commented Jul 1, 2020

@ljacomet did the internal chat produce a conclusion? Is this feature interfering with your enterprise offering?

@sirinath
Copy link

sirinath commented Jul 1, 2020

The purely oldest strategy may remove dependencies of less frequently updated projects. If you have many such projects you could end up using bandwidth to re-downloading these.

You can have many projects which you maintain and contribute but not frequently. So some strategy to determine which is needed and not needed may have better success on what is removed.

@Fentonator
Copy link

Oddly enough, either of those options wouldn't actually work very well for me, at least at work: we have to retain multiple different versions of the same library for different builds — primarily because some of them are dependencies of third-party libraries that are only certified against a very specific set, but only matter within that specific build.

Additionally, we manage multiple active releases of our various products, some of them going back several years, but some of us also regularly work offline, and just opening a workspace does not appear to be sufficient to cause Gradle to consider something 'touched'. Since dev environments generally don't run Gradle directly, this means that going on vacation to somewhere without good network access presents a real risk of suddenly losing critical libraries.

Really, this seems like a classic case of a problem where there simply isn't going to be a single "right" answer. Good defaults, absolutely, but the ability to tune it (including all the way to "don't remove anything automatically, let me be responsible for me") really does matter.

@sirinath
Copy link

sirinath commented Jul 11, 2020

If there is a way to configure active projects and versions in a list then the dependencies of these projects will be retained. Rest of the dependencies projects will be removed base on the least recently built retaining common dependencies used in more recent projects. But in doing so, the most recent versions of the artefacts should have a higher priority in retention in case new projects need them but should be eventually purged in the least frequently used order once all old version are purged.

So in cleaning the cache what should be looked at is not the dependencies themselves but the projects that have been build which use them.

To surprise prioritisation will be based on:

  • configuration on retention and clearing information
  • when the projects were last built
  • dependencies to the built projects
  • the latest versions of the artefacts

@lukeu
Copy link
Contributor

lukeu commented Jul 11, 2020

@sirinath, I'm not sure I see how your ideas are substantially different from what Gradle already does. It may have merit but I think the issue to hand is really one of ergonomics - smoothly supporting a range of use-cases:

  • at one extreme: CI build farms with space-constrained VMs
  • at the other: git-bisecting back a year in "Project A" on a laptop on an airplane and have each build succeed (which we often could before); and also without clobbering Project B's dependencies (which isn't built often, but is still run while on said airplane).

In this case I suspect that adding more smarts will still likely fail to meet common scenarios. Really we just need some knobs: letting expiry times be adjustable and/or a cache size cap

@skjolber
Copy link

skjolber commented Aug 25, 2020

For reference, I have implemented a working PoC for using the journal to clear dependency (jar) files. It runs as part of the cleanup step in a CircleCI orb, improving the build time by reducing dependency downloads. We'll be running it for a few weeks to see if something breaks.

Looking at the code, it became clear that configuration of the cleanup times is just not sufficient (and seems to clean up only outdated gradle versions - I was never able to make the cleanup actually delete any files).

@NHendricks
Copy link

Is there a chance to configure the cleanup behavior in near future? A switch to optionally deactivate it would help as a short term solution.
Especially when working with different gradle versions in different projects it leads to a lot of network traffic and waiting time because the cleanup wipes a lot of files i have to reload again later. As @lukeu stated last year disc space usually should not be an issue in most cases.

@jjohannes jjohannes removed the @core Issue owned by GBT Core label Mar 22, 2021
@gchallen
Copy link

gchallen commented Aug 5, 2021

I'm new to this issue, but a bit confused about the status. Using Gradle 6.8.3, setting org.gradle.cache.cleanup=false in $GRADLE_USER_HOME/gradle.properties does not fix the issue. In our case we have essentially a CI build running in a container without network access, and so it starts to fail 30 days later when Gradle starts pruning the cache. I really need a way to disable this behavior.

One terrible but working workaroud is to just touch every file in the Gradle cache to update the timestamps. Unfortunately this is also, in our case, fairly slow, and frustratingly stupid.

@naqaden
Copy link

naqaden commented Aug 5, 2021

After wrestling with this problem for weeks, I also resorted to touch, automated daily when I'm not at my machine. I have to use old versions of Gradle for some projects so this is the only catch-all solution I've found.

@bigdaz
Copy link
Member

bigdaz commented Aug 18, 2021

@gchallen I've been looking into this and can clarify the current behaviour. The org.gradle.cache.cleanup=false property is only used to control the cleanup of version-specific caches in Gradle user home directory. If my understanding is correct, this means that setting org.gradle.cache.cleanup=false has no impact on the cleanup of cross-version caches like caches/modules-2 and caches/build-cache-1.

@gchallen
Copy link

gchallen commented Oct 3, 2021

I'm disappointed to see this still open. In the time being, I'm trying touching the dependencies into the future and I'll see if that works. But an option to disable this entirely would still be great!

@skjolber
Copy link

Here is an updated PoC library for clearing the cache - this time using regular Java instead of a build script. See it in action here: tidy-cache-github-action.

@big-guy big-guy added this to the 8.0 RC1 milestone Dec 2, 2022
@big-guy big-guy modified the milestones: 8.0 RC1, 8.1 RC1 Dec 20, 2022
@big-guy
Copy link
Member

big-guy commented Dec 20, 2022

This is partially delivered in 8.0

@BoD
Copy link

BoD commented Feb 5, 2023

@big-guy Mind clarifying what "partially delivered" precisely means please? (Link to documentation or code would be appreciated). Thanks!

Edit: seems to be #22756

@ghale
Copy link
Member

ghale commented Feb 6, 2023

The ability to configure these values will be available in 8.0. See "Improved Gradle User Home Cache Cleanup" in the 8.0 release notes and the section on configuring cache cleanup in the userguide docs. There may be some further changes in 8.1, but what's available in 8.0 should handle most use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests