# mac-llmops Examples

First we initialize our model manager. It auto-detects memory size and uses the default HF cache location. Both can be overridden.

In [1]:
import mac_llmops


models = mac_llmops.Models()

We can display all models in our local cache that might fit in memory.

In [2]:
models.list()

['mlx-community/Llama-3.2-3B-Instruct',
 'mlx-community/Llama-3.3-70B-Instruct-3bit',
 'mlx-community/Meta-Llama-3.1-8B-Instruct-bf16',
 'mlx-community/Llama-3.1-8B-Instruct',
 'mlx-community/Meta-Llama-3-8B-Instruct-4bit',
 'mlx-community/Meta-Llama-3.1-8B-Instruct-4bit',
 'mlx-community/Llama-3.2-3B-Instruct-4bit',
 'mlx-community/Meta-Llama-3.1-8B-Instruct-8bit',
 'mlx-community/Llama-3.2-3B-Instruct-8bit',
 'microsoft/Phi-3-medium-128k-instruct',
 'mlx-community/Qwen2.5-Coder-14B-Instruct-bf16',
 'mlx-community/phi-4-bf16',
 'ibm-granite/granite-3.1-8b-instruct']

Or search the local cache for specific models that might fit in memory.

In [3]:
models.search_local("mlx llama 3.2 bf16")

[]

Since there were no local results, we can do the same search but online.

In [4]:
models.search_online("mlx llama 3.2 bf16")

['mlx-community/Llama-3.2-1B-Instruct-bf16',
 'mlx-community/Llama-3.2-3B-Instruct-bf16',
 'mlx-community/Llama-3.2-3B-bf16',
 'mlx-community/Hermes-3-Llama-3.2-3B-bf16']

If we find a model we like, it can be added to the local cache.

In [5]:
models.add("mlx-community/Llama-3.2-3B-Instruct-bf16")

Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/969 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/16.0k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/21.9k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.37G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.06G [00:00<?, ?B/s]

Now the model is in our local cache.

In [6]:
models.search_local("mlx llama 3.2 bf16")

['mlx-community/Llama-3.2-3B-Instruct-bf16']

If we need to free up some disk space, we can also delete models from the local cache.

In [7]:
models.delete("mlx-community/Llama-3.2-3B-Instruct-bf16")

In [8]:
models.search_local("mlx llama 3.2 bf16")

[]