Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to see when an image has last been used by a container #4237

Open
kencochrane opened this issue Feb 19, 2014 · 34 comments
Open

Add ability to see when an image has last been used by a container #4237

kencochrane opened this issue Feb 19, 2014 · 34 comments
Labels
area/runtime kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@kencochrane
Copy link
Contributor

Right now, it is hard to know when the last time an image was used by a container, so when you need to do some image cleanup, you have to either guess, or track that data outside of docker.

It would be nice if we add a new attribute for the docker images command that showed the last time the image was used, so that we can use that when running a image clean up job. Ideally the timestamp would follow the whole image hierarchy and not just the last image on the tree, but it's parents, grandparents, etc.

I'm not sure when the image last_update is set, when the container is started, or stopped, but probably, best to update both times.

@jessfraz jessfraz added the kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny label Feb 25, 2015
@slok
Copy link

slok commented May 11, 2015

Hi!
I want to start contributing to docker and messing around docker's code base, I believe this is a good start. What do you think?

If every one is ok with it #dibs

@duglin
Copy link
Contributor

duglin commented May 11, 2015

I would start with a proposed design first to make sure people are ok with it. Wouldn't want you to waste your time coding stuff when people may not be ok with the idea in general.

@slok
Copy link

slok commented May 11, 2015

I was thinking to add the column "LAST UPDATED", "LAST USED" or similar to the "history" and "images" command, this field would be like the created one but will be updated when the image is created (import, build, pull...), a container runs/start and container stops
It would be stored in the config json file, although I saw that the layersize of an image is stored in a separate file, this is because of the frequent updates to not update the whole json? I could use a similar approach if that affects the performance. The file should be locked when the updates are done, scripts could be creating containers from the same image at the same time.

Finally as @kencochrane said, it should update all the graph images (< none >:< none > too) with the new timestamp

@slok
Copy link

slok commented May 28, 2015

I'm thinking on witch actions should update the last_updated field, at first I'm thinking of:

  • pull, build, load
  • stop, run

Should be considered other actions like restart, kill...? what do you think?

@duglin
Copy link
Contributor

duglin commented May 28, 2015

I would think a LastUsed field would only need to be updated upon container.create(). A value of nil would mean that, while its been pulled/downloaded, it has yet to be used by any container.

@slok
Copy link

slok commented May 29, 2015

Well, thinking as a graph and not only as a sngle image, when you pull an image that has as parent other image I think that the whole graph needs to be updated

@duglin
Copy link
Contributor

duglin commented May 29, 2015

I agree the entire graph should "touched" upon container.create()

@slok
Copy link

slok commented May 29, 2015

I meant when you "create" an image (pull, build... )

@ndeloof
Copy link
Contributor

ndeloof commented Dec 4, 2015

@calavera can you please explain why this issue has been closed ?
I'm looking into implementing a docker garbage collector, and this would be a big helper for this use-case.

@tiborvass
Copy link
Contributor

👍 for this feature, I also needed this today.

Based on #20925 (comment) I believe the concern here is about keeping the Image data immutable.

Whether we should have a mutable metadata struct for images, or if we should implement the feature with events I think it's still open to debate.

I'm reopening this issue, the usecase is more than relevant.

@dnephin
Copy link
Member

dnephin commented Aug 15, 2016

This would be very useful, it would fix one of the pain points I hit with https://github.com/yelp/docker-custodian

A section for daemon-local mutable meta data would have multiple uses, including #25728.

@titpetric
Copy link

Is somebody working on this? I'd like to give it a stab if it has been abandoned. From the comments:

  1. store a LastUsed as @duglin suggested on container.create (if still applicable),
  2. extend docker images to list this column as well in the style of CREATED

Open to additional suggestions or warnings with some up to date info. Warnings in the sense of "because we now have docker swarm, LastUsed needs to be stored [...]". HMU

@thaJeztah
Copy link
Member

@titpetric we're going to be releasing "data management" commands in docker 1.13; #26108

However this could still be useful if we want a docker prune --filter ... option

@mlaventure wdyt?

@mlaventure
Copy link
Contributor

For sorting and pruning purposes this would indeed be useful.

But it seems the PR from @slok (#13621) was refused at the time. Are the reasons behind the rejection still a valid concern today?

/cc @icecrime @crosbymichael @jfrazelle @calavera

@mark-church
Copy link

For garbage collection purposes, it may be best to define the filter as last in use. We should care less about the container.create() and more about the last time the image was active as a container (AKA when the last container of that image died).

@ryanwohara
Copy link

Will this ever be properly supported to see when an image was last in use? This is a huge problem with Docker containers in a CI environment where you run out of disk space and still want to preserve the mostly recently used containers.

@microadam
Copy link

Same use case as @ryanwohara. Does anyone have any workarounds for this at the moment? Some external script that can keep track of last use dates?

@docwhat
Copy link

docwhat commented May 11, 2018

My docker-gc has some go code that does this. It watches events.

@fr-sgujrati
Copy link

fr-sgujrati commented Jun 18, 2018

I extract the events of last few days using this command

docker events --since 2017-06-01T15:04:05 \
--until 2018-06-18T12:04:05 \
--filter image=<account_number>.dkr.ecr.us-west-1.amazonaws.com/<repository_name> \
--filter event=start --format '{{ .Time }} {{ .From }}'

For my use case where I need to know given two versions of an image, whether the older version has been started within last few days. If it has not been started, then I remove it.

@ndeloof
Copy link
Contributor

ndeloof commented Jun 19, 2018

@fs-sgujrati this will let you know last time it has been used to start a container, but you don't know how long this container has been running. Let's say you run a long-term service as docker container, you might consider image expired TTL while this container is still active (or maybe just recently stopped) so the actual "last use" of this image is way more recent

@thaJeztah
Copy link
Member

With #31497 merged, there's the possibility to add additional information to images, without modifying the image's data itself; if someone wants to work on this, I can see that change being accepted; what's needed is a design (as in: when do we update the "last accessed" / "last used" date? Perhaps each time a container is created and/or a container that uses the image exits (or starts ?) - that looks like the tricky bit.

@ndeloof
Copy link
Contributor

ndeloof commented Jul 16, 2018

You're right about the tricky part.

API to give "last use" status for an image will need to know

  • if image is in use by existing containers. In such case lastUse=now()
  • when last container using this image has been removed

I suggest new lastContainerUse metadata to be set as a container is removed. Could maybe also have a usedByContainer counter updated as a container is created/removed. Not sure this should be stored as metadata, as this can be computed from active containers.

When new "last use" API is invoked, it computes last_use = activeContainer ? now() : lastAccessTime

@basickarl
Copy link

This would be a nice feature. We are caching images to help reduce build times but we also would like to clean up redundant images which aren't used automatically via a cron job.

@mawl
Copy link

mawl commented Aug 15, 2018

Same Problem here in a CI enviroment - I would like to remove images where container start events are older than a couple of weeks.

@fr-sgujrati Using docker events sounds promising but the event log only stores the last 1000 events so older start/destroy events gets lost.

Or is it possible to increase the event log size in any way?

Here's my PowerShell script:

$images = docker image ls --format "{{.Repository}}:{{.Tag}}"
#4 weeks
$since = 4*7*24

foreach ($image in $images) {
    Write-Output "-----------------------------------------------"
    Write-Output "Check start events for $($image) since $($since)h"    
    $output = docker events --since "$($since)h" --until "0m" --filter image=$($image) --filter="type=container" --filter event=start --format "{{.From}}"
        
    if($output){
        Write-Output "$($image) used - skipped."
    } else {
        Write-Output "$($image) not used - remove..."
        docker image rm $image
    }
}

@fr-sgujrati
Copy link

fr-sgujrati commented Aug 15, 2018

@mawl I ended up using something else. My use case was that I had a container running on robot, which could periodically be updated. Due to this update, robot ended up having many older versions of the container. At a give point of time, the newest version will be in running state. If the newest version had any issues, then a previous stable version will be used. If a version is running fine for half day, it is safe to assume that other older (how old? described below) images will not be used and can be removed. I created a routine which periodically checks

  1. Find the running version of the service (say v1)
  2. Find how long it has been running (say 2 days) (docker ps --format '{{.Image}} {{.RunningFor}}')
  3. Find all versions of the images of service, and the time they were created (say, v1:2 days, v2:10 days, v3:20 days) (docker images --format '{{.Repository}}:{{.Tag}} {{.CreatedSince}}')
  4. Check for each version obtained in step 3
  	a. If (the version is not currently running 
                 and the version created more than 15 days ago 
                 and current version running for more than half a day) then
  			delete version
  
  In the example above (specified in Step 1 to 3), it checks as follows
  1. Check v1:2 days - since it is currently running, nothing happens
  2. Check v2:10 days - it is not running and it was created 10 days ago, so nothing happens
  3. Check v3:20 days - it is not running, it was created 20 days ago and current version (v1) is running for 2 days, so this version is removed.

It's not clean, but works for me.

@thomasf
Copy link
Contributor

thomasf commented Oct 5, 2018

Using intermediate containers in builds also creates problems since docker image prune will just delete them since they have not tag associated. For some
applications I have started using docker labels to help CI run and hopefully
not fill up disks. Here is a slightly shortened version of a django+node
build where I change the label ci.docker.prune_dangling all the time to be able
to use it for a somewhat controlled way of pruning.
Being able to see when an image layer was last used would really help situations like this.

# prune danging might be used like this on a ci/build server to prune at different rates:
# docker image prune -f --filter label=prune_dangling=soon --filter until=2h
# docker image prune -f --filter label=prune_dangling=week --filter until=168h
# docker image prune -f --filter until=1440h

from node:8-stretch as jsbuilder
label ci.docker.prune_dangling=week
RUN yarn global add webpack@4.12.1 webpack-cli@3.0.8 jest prettier
copy jsapps/package*.json /opt/foo/jsapps/
workdir /opt/foo/jsapps
run npm ci
label ci.docker.prune_dangling=soon
copy jsapps/ /opt/foo/jsapps/
run npm run prod && rm -rf node_modules

from python:3.6 as pythonbuilder
label ci.docker.prune_dangling=week
run apt-get update -qq && \
    apt-get install -y gettext && \
    rm -rf /var/lib/apt/lists/*
workdir /opt/foo
copy requirements* /opt/foo/
copy requirements /opt/foo/requirements
run pip -q \
        wheel \
        --wheel-dir /wheel \
        --find-links /wheel \
        --no-cache-dir \
        -r requirements.txt \
        -r requirements/requirements-docker.txt
run pip install \
        --find-links /wheel \
        --no-index \
        --no-cache-dir \
        -r requirements.txt \
        -r requirements/requirements-docker.txt
label ci.docker.prune_dangling=soon
copy . /opt/foo
copy --from=jsbuilder /opt/foo/jsapps/build /opt/foo/jsapps/build
run mkdir -p /opt/foo_media && \
    DATABASE_URL=sqlite:////tmp/no.db \
    ENV_FILE=/opt/foo/docker.build.env \
    python manage.py collectstatic -v0 --noinput && \
    rm -rf /opt/foo_media
run find /opt/foo_static -type f -size +200c ! -iname '*.gz' -execdir gzip -9 --keep --force {} \;
run DATABASE_URL=sqlite:////tmp/no.db \
    ENV_FILE=/opt/foo/docker.build.env \
    python manage.py compilemessages --no-color
run python -m compileall -q /opt/foo

from python:3.6
label ci.docker.prune_dangling=week
workdir /opt/foo
env STATICFILES_STORAGE=django.contrib.staticfiles.storage.ManifestStaticFilesStorage
copy --from=pythonbuilder /wheel /wheel
copy requirements* /opt/foo/
copy requirements /opt/foo/requirements
run pip install \
        --find-links /wheel \
        --no-index \
        --no-cache-dir \
        -r requirements.txt \
        -r requirements/requirements-docker.txt
label ci.docker.prune_dangling=soon
copy . /opt/foo
copy --from=pythonbuilder /opt/foo /opt/foo
copy --from=pythonbuilder /opt/foo_static /opt/foo_static
run mkdir -p /opt/foo_media

@thaJeztah
Copy link
Member

Note that, when using buildkit as a builder (DOCKER_BUILDKIT=1 docker build...), build-cache is now stored separate from the image cache (so no longer creates "dangling" / "untagged" images) for intermediate build-steps.

When using buildkit, the build cache can be pruned separately with docker builder prune (besides being included in docker system prune)

 docker builder prune --help

Usage:	docker builder prune

Remove build cache

Options:
  -a, --all                  Remove all unused images, not just dangling ones
      --filter filter        Provide filter values (e.g. 'unused-for=24h')
  -f, --force                Do not prompt for confirmation
      --keep-storage bytes   Amount of disk space to keep for cache

@MrSapps
Copy link

MrSapps commented Mar 18, 2019

Was any progress made on this? I have the same CI clean up use case.

@stepchowfun
Copy link

I also have the same CI clean up use case. I created Docuum to address this problem. Like docker-gc, it watches events to learn when images are used. However, it makes some slightly different design choices. In particular:

  • It keeps some small state on disk (in a YAML file) to remember all the timestamps if/when it is restarted. As far as I can tell, docker-gc will loose all timestamp data if/when the process dies (correct me if I'm wrong here).
  • Rather than enforcing a maximum image age, it enforces a maximum disk usage.
  • It runs immediately when new events come in, rather than on an a timer. A nice benefit of this is that it uses no CPU when there is no Docker activity, so you can run it as a daemon on your laptop without worrying about draining your battery.

Hope this is helpful. Contributions welcome.

@docwhat
Copy link

docwhat commented Jan 8, 2020

Cool!

docker-gc was good enough without storage so I never implemented it.

Also I wasn’t smart enough to allow selecting storage on the fly in Go.

I like your ideas though. I never thought this use it for my laptop.

Sent with GitHawk

@bureado
Copy link

bureado commented Sep 12, 2022

In case it helps readers, https://github.com/Azure/eraser

@pharapeti
Copy link

Has there been any progress in implementing a solution for this issue?

@tfreiling989
Copy link

tfreiling989 commented Jan 30, 2023

@stepchowfun , I tried out Docuum, but I experienced it removing newly created images that were not yet used; which was problematic.

UPDATE: Nvm, I think i just set limit too small. Relevant resolved issue: stepchowfun/docuum#137

@stepchowfun
Copy link

Hi @tfreiling989, please file any Docuum-related bug reports against the Docuum repo. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
21.x Planning
  
To do
Development

Successfully merging a pull request may close this issue.