Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for limactl prune, to not delete everything #1410

Open
afbjorklund opened this issue Mar 8, 2023 · 16 comments
Open

Improvements for limactl prune, to not delete everything #1410

afbjorklund opened this issue Mar 8, 2023 · 16 comments

Comments

@afbjorklund
Copy link
Member

afbjorklund commented Mar 8, 2023

Description

Can be good to keep some images around, for instance:

  • those that are "in use" by a currently running instance, even though the actual files have been copied from the cache

  • those that are referenced by the current version of a template, while pruning the old and obsolete earlier versions

When not removing the entire cache, maybe also remove:

  • temporary files (data.tmp) left by previous aborted downloads

  • obsolete versions of nerdctl-full, or other cached images/packages


Partial implementations:

@afbjorklund
Copy link
Member Author

Here is my script to check what I currently have in the cache:

#!/bin/sh
export LC_ALL=C
cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}
for hash in "$cache_dir/lima/download/by-url-sha256/"*; do
  echo "#	$hash"
  du -hs "$hash/data"* |
  sed -e "s|$hash/data|$(cat "$hash/url")|"
done
echo "#	$cache_dir/lima"
du -hs "$cache_dir/lima" | sed -e "s|$cache_dir/lima|total|"

@jay7x
Copy link

jay7x commented Mar 8, 2023

As I replied in the #1409 my case is to cleanup downloads which are not used by existing VMs (like docker image prune). I see no real benefit to keep downloads referenced by bundled templates (I'm using my own templates in about half of cases).

Everything which is not referenced by existing VM configs will be deleted with the implementation in #1409 (i.e. temporary files and obsolete versions of nerdctl are already covered there).

@afbjorklund
Copy link
Member Author

afbjorklund commented Mar 8, 2023

Actually I was more thinking flags or additional options, not so much default behaviour (the suggested seems OK to me).

Just that I sometimes delete the "default" VM instance to save some space, only to recreate it again some days later.

@afbjorklund
Copy link
Member Author

afbjorklund commented Mar 8, 2023

A "--dry-run" option could also be useful, as it is now I never run limactl prune while I am developing Lima

Or maybe if it would prompt for confirmation before continuing, like nerdctl system prune --all does ?

@jay7x
Copy link

jay7x commented Mar 8, 2023

Well.. your example with "default" makes some sense.. but I'd prefer to keep things simple.. i.e. either to implement this for all templates or do not implement this at all :)

Dry run sounds like a good idea too.. I don't like prompts but lima is asking some confirmations already.. so might be better to follow the same behavior 🤔

@afbjorklund
Copy link
Member Author

I think your PR is an improvement, to the current command. Just was thinking if there could be others, too.

Then again, I also wanted to cache packages and images to make it on par with vagrant and minikube...

So don't want to complicate things, at least not needlessly.

@jay7x
Copy link

jay7x commented Mar 9, 2023

@AkihiroSuda proposed the following in the PR discussion

Wondering something like limactl prune --keep="3 days" may make more sense.

Images that have been used within 3 days will be kept in the cache, regardless to whether it is actually used by any active instance, so that you can avoid downloading when you are going to create an instance from it.

@jandubois
Copy link
Member

jandubois commented Mar 9, 2023

I see no real benefit to keep downloads referenced by bundled templates (I'm using my own templates in about half of cases).

I would find this useful though:

  • For bundled templates that I don't use I would not have any cached images, so it doesn't matter either way.

  • For templates that I do use (both bundled or my own), over time they migrate to newer versions of the same image. So I keep old versions in the cache, that I would no longer reference. They would be removed by this.

For this reason it will become useful to copy your own templates into the /usr/local/share/lima/examples/ directory1, so the current versions of their images would not be pruned.

I guess using a "keep only recently used templates" option would allow for more aggressive pruning, but maybe there should also be a way for the user to select which images to delete manually. Just show a list of all images and their sizes, and let them delete one at a time?

For me personally the only pruning option I would use is "delete everything that is not used by the local templates or by a current instance". The "current instance" addition helps with images that are not created from a template:// URL, but if I have an active instance, chances are that I might want to delete and recreate it.

Footnotes

  1. I feel like the directory should be called templates instead of examples. Or we should support templates in addition to examples so users can keep their own templates separate from the bundled examples. Sorry, off-topic for this discussion.

@jay7x
Copy link

jay7x commented Mar 9, 2023

So after some thoughts I would suggest the following:

  1. Add the ability to prune downloads not referenced by existing VMs nor templates (-u, --unref, --unreferenced)
  2. Add the ability to prune downloads older than X days (created more than X days ago by stat() results) (-d, --days) UPD: Not sure it's possible w/o touching some file in cache directory if atime is disabled on the FS...

Questions:

  1. From Akihiro's suggestion I see he sees the CLI options in "inverted" way though ("prune except X").. we should have CLI options style decision here.
  2. There is download directory under lima cache. Does it mean we can have something else cached? If so I'd say we should introduce CLI subcommands as I did in the Make cache pruning more flexible #1409 for every type of cached items. So rephrasing.. shall the command be lima prune or lima prune downloads (or something else?)

@jay7x
Copy link

jay7x commented Mar 9, 2023

JFYI I've updated the #1409 with ability to keep the template-referenced downloads too.

@jay7x
Copy link

jay7x commented Mar 9, 2023

Another concerning point are fallback images.. limactl prune is suggested as a way to update fallback image. So I see 2 consequences here:

  1. limactl prune should still cleanup the cache by default
  2. As fallback images are always referenced by a template/VM config they will likely be always outdated..

Considering this I'd change my mind to the "inverted" way ("prune everything except specified") maybe too..

@jandubois
Copy link
Member

  1. Add the ability to prune downloads older than X days (created more than X days ago by stat() results) (-d, --days) UPD: Not sure it's possible w/o touching some file in cache directory if atime is disabled on the FS...

I understood the suggestion to be about "usage" time, not "download". I'm not sure how you could actually implement this though. The closest I can think of is to keep a symlink in the instance directory back to the cache directory, and "touch" the cache directory every time you start/stop a VM, or any time you run limactl ls. But if you just keep running a VM and working with it, I don't know how you would keep the usage timestamp current.

Maybe that is overkill though; I can't tell though as I wouldn't be using time-based pruning myself.

Questions:

  1. From Akihiro's suggestion I see he sees the CLI options in "inverted" way though ("prune except X").. we should have CLI options style decision here.

It feels a bit weird to me though. It is the opposite of what docker image prune does, which requires the --all option to delete everything.

It is also contrary to what "pruning" means: the selective trimming of dead or overgrown branches, not the total removal of the whole plant.

@jandubois
Copy link
Member

limactl prune is suggested as a way to update fallback image.

So I wonder if this should instead have a special option to include them instead:

limactl prune --fallback-images

Fallback images would be identified as images in templates that don't specify a digest.

@jay7x
Copy link

jay7x commented Mar 10, 2023

Fallback images would be identified as images in templates that don't specify a digest.

Hmm.. this might work! TY for the suggestion 🤔

@jay7x
Copy link

jay7x commented Mar 19, 2023

In the latest #1409 update I've simplified things and removed the prune subcommands. New CLI switches added instead:

  -F, --fallback       Only prune images without a digest specified (fallback images usually)
  -U, --unreferenced   Only prune downloads not referenced in any VM or template

@afbjorklund afbjorklund changed the title Improvements for limactl prune Improvements for limactl prune, to not delete everything Mar 19, 2023
@jandubois
Copy link
Member

How should flags combine?

There are 2 obvious mechanisms:

Flags create a union (combined via or)

This means we start with an empty bucket, and each flag adds the matching cache entries to the bucket. This is partially how it is currently implemented:

limactl prune --no-digest-only --unreferenced-only

This will delete all cache entries that either don't have a digest, or are not referenced by an instance or template.

There are 2 problems with this:

  • Without specifying any flags the command assumes that all entries should be removed, so the bucket is already full. Which means the command above really means:

    limactl prune --all AND (--no-digest-only OR --unreferenced-only)
    

    Which is at least somewhat confusing conceptually.

  • This mechanism doesn't provide any additional functionality; the command is equivalent to:

    limactl prune --no-digest-only
    limactl prune --unreferenced-only
    

Flags produce an intersection (combined via and)

In this model we start with a full bucket and remove any entries that are not matched by all flags.

This is conceptually compatible with our default mode of deleting everything when no flags are provided. We probably don't want to change that for backwards compatibility reasons.

More importantly this offers the user more control that is not available via the union model:

limactl prune --unreferenced --matches ubuntu --older-than 30

This would only remove no longer referenced Ubuntu images that have not been accessed for at least 30 days.

I strongly recommend that we use this model!

Additional flags

I find the no-digest-only name confusing because I have to think if it means (no-digest)-only or no-(digest-only). IMO once we go to the intersection model we can just drop the only part and rename the options --unreferenced and --no-digest.

--unused

This would select all images that are not currently in use by an existing instance. This is also the current (IMO incorrect) implementation of --unreferenced-only.

This option is useful because starting with Lima 0.16 on macOS all local copies will be done via clonefile (if possible), so the cached image does not take up any additional space and therefore there is no benefit to pruning it.

--older-than DAYS

I believe @jay7x is already working on this. We'll need to define what exactly this means: "Older than what?". I propose this means the last time the cached entry has been copied/cloned locally. So ideally it would be the "last access" date, but I'm not sure if clonefile will update it, as technically it doesn't access the file content, just the metadata. We can record the last access time ourselves in downloader.localCopy if we have to.

--matches SUBSTRING

This one I just made up in the example above. I think it makes sense to prune just images for a particular distro. The SUBSTRING would be matched against the filename.

Future work (out-of-scope for now)

I think it would be useful to be able to show the current cache content, and their sizes. limactl prune --dry-run is a poor-man's substitute, but it doesn't display individual files unless you specify filter options, and it doesn't display the file sizes (like @afbjorklund's lima-cache script does).

It may make sense to create limactl cache prune and limectl cache ls commands, but not sure how important that really is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants