Skip to content

Move to using cache directory in KITOPS_HOME instead of the system temp dir#762

Merged
amisevsk merged 3 commits intokitops-ml:mainfrom
amisevsk:cache-directory
Feb 13, 2025
Merged

Move to using cache directory in KITOPS_HOME instead of the system temp dir#762
amisevsk merged 3 commits intokitops-ml:mainfrom
amisevsk:cache-directory

Conversation

@amisevsk
Copy link
Copy Markdown
Contributor

Description

Stop using the system's normal temporary directory for temp files, and instead use $KITOPS_HOME/cache. This allows more easily mounting storage to e.g. containers to support large cached files. However, since this directory will not be emptied by the system, we have to clean it up.

To manage automatic cleanup (in case a command is cancelled via SIGINT, for example), we do something similar to how we handle the ingest direction for kit pull: if a command completes successfully, we'll clear its corresponding cache subdir (e.g. a successful import will clear all cached import files). This limits how much data can leak into the cache.

This PR also includes a new command: kit cache [info|clear]. The info subcommand can be used to see to total size of the cache, if it's not empty. The clear command can be used to manually clean up any stray files.

The changes in this PR do not touch kit pull, since it manages its own ingest directory and I didn't want to risk introducing a subtle bug. However, the current cache implementation should support that too, as a future improvement.

Finally, this change opens the door to resumable hugging face downloads, which can be implemented in another PR.

Linked issues

Closes #758

To enable using Kit in a containerized environment, avoid using the
system-wide temporary directory. Instead, use $KITOPS_HOME/cache for
temporary files in import, pack, etc. This allows for mounting a volume
to $KITOPS_HOME/cache to add storage to a container.

Some functions, such as unpacking the dev mode harness and setting up
tests still use the default system temporary directory, to ensure those
files are cleaned up.
Comment thread pkg/cmd/cache/cmd.go Outdated
import (
"fmt"
"io"
"kitops/pkg/lib/filesystem/cache"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be aliased like fsCache not to be confused with the cache in here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added alias and also renamed the cmd package to kitcache (like we have with kitimport and kitinit`)

Since cancelling a Kit command with SIGINT can leave files in the cache,
and since we can no longer rely on the system to clean these files up
eventually, we'll need to do something similar to what we do for kit
pull: when a command completes successfully, assume that that command's
cache directory should be empty and so remove any files present.

This should clean up files left over after previous cancelled runs while
not impacting any one action.
@amisevsk amisevsk merged commit 4b39969 into kitops-ml:main Feb 13, 2025
@amisevsk amisevsk deleted the cache-directory branch February 13, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move away from using the system temporary directory in kit commands

2 participants