Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[do design for] garbage collection in a floxy way. #1054

Closed
stahnma opened this issue Feb 20, 2024 · 16 comments
Closed

[do design for] garbage collection in a floxy way. #1054

stahnma opened this issue Feb 20, 2024 · 16 comments
Assignees
Labels
enhancement Improvement to existing functionality product Tickets relevant to the flox product team and/or functional requirements team-cli Tickets relevant to the flox CLI team

Comments

@stahnma
Copy link
Contributor

stahnma commented Feb 20, 2024

Describe the feature:

We need a way for users to prune stuff in the nix store that isn't "Invoke this nix thing". We need a way that feels native to flox.

In an ideal world, we probably have a manual command to be run for immediate cleanup and something that does stuff periodically so that disk space isn't completely eaten.

Describe a use case of this feature:

As a user,
I want to not have a full disk because of flox
so that I can use my computer to do other stuff.

rm -rf .flox for managed environments leaves orphans that don't get garbage collected

Acceptance criteria:

  1. There is a "do it now" method of cleaning up disk
  2. Optional: There is some batch/periodic processing to clean things up.
  3. There are docs and test.
@stahnma stahnma added team-cli Tickets relevant to the flox CLI team product Tickets relevant to the flox product team and/or functional requirements labels Feb 20, 2024
@rossturk
Copy link
Contributor

rossturk commented Feb 20, 2024

In the docs we might want to do a bit of work to explain what GC actually prunes - i.e., if there is an existing .flox/run directory, it will have a corresponding gcroots symlink and will therefore be protected from GC.

Ideally users would have a way to "list all of the environments on this machine that are taking up space in the store" - basically an alias to ls /nix/var/nix/gcroots/auto perhaps with some additional info about its footprint to help a user decide what to do about it. Be nice if a user could conclude "I'm not going to use X environment for a while but I still need it around, so flox can go ahead and clear its cache"

@ghudgins
Copy link
Contributor

related issue https://github.com/flox/product/issues/471

@ghudgins ghudgins changed the title Need garbage collection in a floxy way. [do design for] garbage collection in a floxy way. Mar 11, 2024
@ysndr
Copy link
Contributor

ysndr commented Mar 19, 2024

rm -rf .flox for managed environments leaves orphans that don't get garbage collected

detecting these overlaps with #934

@ghudgins ghudgins added greenkeeping enhancement Improvement to existing functionality and removed greenkeeping labels Mar 27, 2024
@zmitchell zmitchell self-assigned this Apr 1, 2024
@zmitchell
Copy link
Contributor

Regarding design, I think it goes like this:

  • We add a new command: flox gc, flox prune, flox collect-garbage, whatever you want to call it. I'll use flox gc for the moment.
  • When a user runs this command:
    • Look in the ~/.local/share/flox/links directory, which contains reverse links for all the managed environments on the system.
      • For each symlink see if the target still exists
      • If the target doesn't exist, delete the symlink
      • Print a message saying that we've cleaned up "metadata" for an environment that used to be at <path>
    • Run nix-collect-garbage
      • Use a spinner that explains this could take a minute
      • When we're done, print how much space was freed (this information is printed by nix-collect-garbage)

The output would look like this:

$ flox gc
Removed metadata for environments deleted from:
<path>
<path>
<path>
<spinner> Deleting unused packages

then when the garbage collection is actually done we'll get rid of the spinner and show how much space was freed:

$ flox gc
Removed metadata for environments deleted from:
<path>
<path>
<path>
Removed unused packages: 38462.93 MiB freed

Docs

  • Add a new page in the command reference
    • Explain what exactly it will delete (packages that aren't used by any environment)

Questions

  • Offhand I can't remember if remote environments also store links in that same links directory. I'll do homework and update this design once I find out.
  • If you pull an environment into myproject, manually delete the .flox directory, then pull another environment into myproject, I think you'll end up with two symlinks that point to the same directory. The symlinks will have different names since I think the environment name is encoded into that symlink, so you can tell the symlinks apart. But, this means that you need a check a little more involved than does .flox exist. I think that can be amended to does .flox exist, and if so, is it the correct environment.

@ysndr
Copy link
Contributor

ysndr commented Apr 2, 2024

Offhand I can't remember if remote environments also store links in that same links directory. I'll do homework and update this design once I find out.

they do, links to the wrapped managed env are added there during construction of those envs.
we'll have a slightly greater view with the registration of envs droppin out of #1264 in whatever form that will be..

If you pull an environment into myproject, manually delete the .flox directory, then pull another environment into myproject, I think you'll end up with two symlinks that point to the same directory. The symlinks will have different names since I think the environment name is encoded into that symlink

the symlink name does not include the environment name


nix's garbage collection does currently not allow the removal of specific gc-roots at least last time i did this there were issues with doing this granularly. I think we could come up with something more precise in pkgdb that a bit more targeted than nix gc'ing everything -- though that can probably wait until we do support building. until then it is just annoying for nix users...

@ghudgins
Copy link
Contributor

ghudgins commented Apr 2, 2024

side question: do you have any ideas for automatically triggering this functionality. could that be async?

@tomberek
Copy link
Contributor

tomberek commented Apr 2, 2024

some thoughts:

  • nix store gc will be non-blocking, can be async
  • with store settings min/max can allow this to be done by the daemon without ever needing user interaction and runs async (@ghudgins )
  • might want to ensure things like nixpkgs, inputs, registry paths, are also protected from GC.
  • targeted GC would be nice, but is a refinement for the above (note: this feature was recently talked about on the Nix team, so upstream'ing is a possibility)
  • there is also https://github.com/risicle/nix-heuristic-gc to take a look at from a design perspective

@mkenigs
Copy link
Contributor

mkenigs commented Apr 2, 2024

  * Run `nix-collect-garbage`
  • might want to ensure things like nixpkgs, inputs, registry paths, are also protected from GC.

Just a note: this may also delete nixpkgs source trees which we have to re-download for pkgdb operations. If so it probably means we're doing an eval somewhere we don't need to.

@zmitchell
Copy link
Contributor

zmitchell commented Apr 2, 2024

side question: do you have any ideas for automatically triggering this functionality. could that be async?

Yeah, Tom pointed out a couple of options, but I would make this configurable. For instance, nix-gollect-garbage does use some CPU resources (chonker's CPU was at 30-40% running GC), so you may not want that running at an inopportune time.

If you make it configurable you either need to make it something that we trigger, or in order to set it you would have the flox CLI set something in nix.conf, meaning that it won't show up in the manifest without some tortured workarounds. For that reason I wouldn't use the daemon options.

Instead I would do this:

  • Make a config option auto-gc defaulted to false
  • When auto-gc is true check when the last time GC was run (you can just store a datetime somewhere in ~/.cache/flox or something). If we haven't run GC in X days, kick off an async GC using nix store gc. Store the GC date in the file. Also run flox gc.
  • When auto-gc is false, check whether we've run GC in the last X days and print a suggestion to run GC.
  • When the user runs flox gc call nix-collect-garbage so they can see how much space they freed. Update the GC time in the file.

Something that falls out of the above is we probably want a flag to only delete the stale environments instead of running the full GC. That will be useful in the auto-gc case above when we've already run nix store gc.

@mkenigs
Copy link
Contributor

mkenigs commented Apr 2, 2024

* When `auto-gc` is `true` check when the last time GC was run (you can just store a datetime somewhere in `~/.cache/flox` or something). If we haven't run GC in `X` days, kick off an async GC using `nix store gc`. Store the GC date in the file.

That would also do flox gc, right?

@zmitchell
Copy link
Contributor

That would also do flox gc, right?

Yes, updated the comment.

I think for a first pass we can just do the manual GC command in one ticket followed by a ticket for auto-gc as an enhancement whenever we prioritize it.

@mkenigs
Copy link
Contributor

mkenigs commented Apr 2, 2024

I think for a first pass we can just do the manual GC command in one ticket followed by a ticket for auto-gc as an enhancement whenever we prioritize it.

🚢
Most of the work will be the flox gc component which we'll need to do no matter what

@ghudgins
Copy link
Contributor

ghudgins commented Apr 2, 2024

all the managed environments on the system.

can this be amended to exclude current revision of any remote environments (just clean up older versions but not my default env that I activate with -r). it would be nice to drop stale managed env generations when this is run

I think for a first pass we can just do the manual GC command in one ticket followed by a ticket for auto-gc as an enhancement whenever we prioritize it.

Agree, we should take this and other enhancements proposed to issues once we have the first one. There's other things in this thread we can do to make the experience really nice. we can create those enhancements once we have the base implementation in our hands and review this full issue before closing

@limeytexan
Copy link
Contributor

limeytexan commented Apr 2, 2024

... it would be nice to drop stale managed env generations when this is run

This might be scope for a different issue entirely, but it might also be nice to delete the gcroots for stale/previous managed env generations automatically without requiring the user to run flox gc in the first place. The only reason to retain gcroots for previous generations is to facilitate rapid rollbacks, and the likelihood of wanting to roll back to a previous generation diminishes over time, and not everyone will share the same idea of what that retention period should be.

@stahnma
Copy link
Contributor Author

stahnma commented Apr 10, 2024

@zmitchell please add link.

@zmitchell
Copy link
Contributor

The design is at the bottom of the description for the environment registry ticket: #1287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to existing functionality product Tickets relevant to the flox product team and/or functional requirements team-cli Tickets relevant to the flox CLI team
Projects
None yet
Development

No branches or pull requests

8 participants