Releases: ggml-org/Llama-macOS
0.32.0: The llama has left the barn
LlamaBarn is now Llama. It's the same app with a new name, and your settings and downloaded models carry over automatically.
Because of the rename, this update isn't automatic -- download and install the new app.
Also in this release:
- Default server port is now
8080to match llama.cpp, was2276 - Set a custom port in Settings, including
2276if you relied on the old one - Deeplinks now use the
llama://scheme;llamabarn://still works
0.31.1
0.31.0: llama.cpp and catalog, unbundled
LlamaBarn used to bundle the llama.cpp engine inside the app. Now it shares one install with your command line: if you already have llama.cpp the app uses it, and if the app installs it you have it in your terminal too.
- Uses an existing llama.cpp from the install script at https://llama.app or Homebrew if you have one
- Shows the in-use llama.cpp build in the footer
- Ships as a
1.6 MB.dmg (down from7 MB)
We've also replaced the built-in catalog. Instead of a fixed list in the app, a new "Recommended for your Mac" section suggests models that fit, to get you started. For browsing, there's now a curated catalog at https://llama.app that's more up to date and easier to maintain.
Heads-up: a future update will rename LlamaBarn to Llama. It updates automatically and your models and settings carry over. Two things to know if you connect to the app: the local server will move to port 8080 (from 2276) to match llama.cpp's default, and llamabarn:// deep links will become llama://.
0.30.0: Install models via deeplinks
- Add Qwen 3.6 family: 27B and 35B-A3B
- Install models from Hugging Face via llamabarn:// deeplinks
- Pause and resume in-progress downloads; partials survive app quit
- Enable prompt-based speculative decoding by default
- Find sideloaded models in HF cache subdirectories; fix split-shard quant labels
- Fix MoE compatibility for sideloaded models using measured memory
- Fix sideloaded estimation hanging forever when llama-fit-params failed
- Improve sideload memory estimate accuracy
- Move models.ini to Application Support; ~/.llamabarn no longer required
- Update llama.cpp to b8902
0.29.1
0.29.0: Sideloaded models
This release opens LlamaBarn up beyond the curated catalog: any GGUF model in your Hugging Face cache now shows up in the installed list with the same one-click load, run, and delete as curated models, with context tiers sized to your device automatically.
- Detect and support sideloaded GGUF models from the Hugging Face cache
- Match llama-server format for sideloaded model IDs so IDs are portable
- Default every model to the 4K context tier for a smaller footprint
- Show the model's native max context alongside the device-fit tier
- Show every size in catalog family drawers, with installed ones badged
- Keep deprecated families like Qwen3 visible for already-installed models
- Add a caption under Launch at login explaining idle resource use
- Show friendlier HTTP download errors
- Fix Gemma 4 download URLs after Hugging Face repo file renames
- Update llama.cpp to b8797
0.28.0: Gemma 4
- Add Gemma 4 models to catalog: 31B, 26B-A4B, E4B, E2B
- Update llama.cpp to b8648
0.27.0: Hugging Face cache
New downloads are now stored in ~/.cache/huggingface/hub/ using the standard Hugging Face cache layout. This means models downloaded by LlamaBarn can be used by llama.cpp and other Hugging Face aware tools. Existing models in ~/.llamabarn/ continue to work.