Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[nix-local-build] Garbage collecting the store #3333
This is a bit of an interesting problem. On the one hand, it's intractable to determine the GC roots, because
referenced this issue
Apr 17, 2016
We should pick up the GC work that was done by the GSoC student (Vishal Agrawal) last summer. The approach to tracking roots that they came up with was essentially to register them centrally. I don't recall the exact details, but essentially a central dir with symlinks to the locations of local build trees (to ghc environment files specifying the required libs). Then on GC we scan those roots, and ignore (delete) any stale dangling symlink pointers.
referenced this issue
Jul 21, 2016
moved this from planned
to stretch goals
in Last Mile for `cabal new-build` (HSOC2017)
Aug 7, 2017
Me and @hvr had a chat about this some days ago. Here's what we came up with:
This is the representation of the pinned packages:
type PinnedPackages = Map UnitId PinState data PinState = UsedBy [PinUse] -- ^ Some use of the package prevents it from being gced. The list may be a Set instead | Explicit -- ^ the user explicitly ran something like `cabal new-gc pin pkgid` instance Monoid PinState where -- the list is concatenated, and 'Explicit' is the absorbing element data PinUse = Project FilePath -- ^ A project/package somewhere in the filesystem requires this package. -- The pinning is done when new-* is invoked in the project and cabal solves for a plan. | Installed FilePath -- ^ the exe/lib was new-installed
Like is suggested above, a run of
Please @hvr just edit this comment if I left something out
@fgaz well, for executables you also need to take into account that an
For libraries explicitly installed via "new-install" it's not so clear to me what to use as the retainer-entity (unless we install into a "package environment", but that's only supported w/ GHC 8.0.2 and later)
I have started writing a
Considering about better ways of determining roots now. First question: Why not
data PinUse = Project FilePath | Installed FilePath | Explicit
? But that is only a rather superficial change anyways i guess.
Also, how do we determine from
More importantly: What about the case that the same project is tested with multiple ghc versions? At the moment, when switching compiler, the old
data RootSources = Map FilePath (Map CompilerVersion [UnitId])
or something in a similar direction. Though this requires that projects actively update this central repository not only once to register, but almost whenever they create a new plan.
Yes, you make all good points. We already have a concept of having different directories for different configurations, e.g., with and without optimization. It might be good to cache plans separately for each configuration as well. See also #3343
I'd be happy to take any patch that makes your life easier on this front.
Not (yet) implemented
Sorry, but I won't be making PRs against cabal. That codebase intimidates me too much.
I have resolved most of the issues, although an important bit remains: pkgdbgc still does not track profiling, optimization level or other flags, so there is risk of e.g. garbage-collecting the profiling-enabled dependencies because your last compile was with profiling disabled.
I am not entirely convinced that having multiple plan.jsons is a good idea. It seems somewhat likely that the user ends up accumulating several plans, for various combinations of flags, which in turn effectively requires garbage-collecting outdated plans too. A more lazy approach is to support specifying the build directory, and to pass the responsibility to the user. But it is indeed rather lazy.