Skip to content

Feature Request: tool to list and delete cached models #16393

@sultanqasim

Description

@sultanqasim

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I'd love to have a tool to list and delete cached models (that were fetched automatically when using the -hf option). This would be akin to the ls and rm commands in Ollama.

Motivation

A lot of people (myself included) use the -hf option to automatically fetch models from Hugging Face. This places models in a model cache directory, which can get rather big over time. Each model typically has at least three associated files (manifest, gguf, and etag), and sometimes five (adding mmprog gguf and associated etag). Manually managing files in the cache is a bit cumbersome. It would be nice to have an elegant way to see which models you have cached, how much space they take, and have a single command to delete all cached files associated with a model.

Possible Implementation

I built a Python script to do this for my own convenience: https://gist.github.com/sultanqasim/5b6d9654236e18dea4896d3c9ce2dc1b

The output of the script looks like this:

$ ./llama-cache ls
Name                                                        Size (GB) Modified
--------------------------------------------------------------------------------
ibm-granite/granite-4.0-h-small-GGUF:Q4_K_M                 18.1      2025-10-02 13:34
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF:Q4_K_XL            16.5      2025-08-07 00:32
unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:IQ4_XS     12.7      2025-07-24 12:10
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_XL           16.5      2025-10-01 17:19
unsloth/Qwen3-14B-GGUF:Q4_K_XL                              8.5       2025-08-01 23:03

$ ./llama-cache rm unsloth/Qwen3-14B-GGUF:Q4_K_XL
Deleted: /Users/sultan/Library/Caches/llama.cpp/unsloth_Qwen3-14B-GGUF_Qwen3-14B-UD-Q4_K_XL.gguf
Deleted: /Users/sultan/Library/Caches/llama.cpp/unsloth_Qwen3-14B-GGUF_Qwen3-14B-UD-Q4_K_XL.gguf.json
Deleted: /Users/sultan/Library/Caches/llama.cpp/manifest=unsloth_Qwen3-14B-GGUF=Q4_K_XL.json

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions