Releases · ggml-org/Llama-macOS

LlamaBarn used to bundle the llama.cpp engine inside the app. Now it shares one install with your command line: if you already have llama.cpp the app uses it, and if the app installs it you have it in your terminal too.

Uses an existing llama.cpp from the install script at https://llama.app or Homebrew if you have one
Shows the in-use llama.cpp build in the footer
Ships as a 1.6 MB .dmg (down from 7 MB)

We've also replaced the built-in catalog. Instead of a fixed list in the app, a new "Recommended for your Mac" section suggests models that fit, to get you started. For browsing, there's now a curated catalog at https://llama.app that's more up to date and easier to maintain.

Heads-up: a future update will rename LlamaBarn to Llama. It updates automatically and your models and settings carry over. Two things to know if you connect to the app: the local server will move to port 8080 (from 2276) to match llama.cpp's default, and llamabarn:// deep links will become llama://.

Assets 4

25 Apr 06:50

erusev

0.30.0

fb8b18b

0.30.0: Install models via deeplinks

Add Qwen 3.6 family: 27B and 35B-A3B
Install models from Hugging Face via llamabarn:// deeplinks
Pause and resume in-progress downloads; partials survive app quit
Enable prompt-based speculative decoding by default
Find sideloaded models in HF cache subdirectories; fix split-shard quant labels
Fix MoE compatibility for sideloaded models using measured memory
Fix sideloaded estimation hanging forever when llama-fit-params failed
Improve sideload memory estimate accuracy
Move models.ini to Application Support; ~/.llamabarn no longer required
Update llama.cpp to b8902

Assets 4

16 Apr 11:55

erusev

0.29.1

8f6a7e6

0.29.1

Fix notarization. 0.29.0 shipped unnotarized due to a build pipeline bug.

Assets 4

15 Apr 08:53

erusev

0.29.0

6f3ae72

0.29.0: Sideloaded models

This release opens LlamaBarn up beyond the curated catalog: any GGUF model in your Hugging Face cache now shows up in the installed list with the same one-click load, run, and delete as curated models, with context tiers sized to your device automatically.

Detect and support sideloaded GGUF models from the Hugging Face cache
Match llama-server format for sideloaded model IDs so IDs are portable
Default every model to the 4K context tier for a smaller footprint
Show the model's native max context alongside the device-fit tier
Show every size in catalog family drawers, with installed ones badged
Keep deprecated families like Qwen3 visible for already-installed models
Add a caption under Launch at login explaining idle resource use
Show friendlier HTTP download errors
Fix Gemma 4 download URLs after Hugging Face repo file renames
Update llama.cpp to b8797

Assets 4

03 Apr 15:37

erusev

0.28.0

db6077e

0.28.0: Gemma 4

Add Gemma 4 models to catalog: 31B, 26B-A4B, E4B, E2B
Update llama.cpp to b8648

Assets 4

01 Apr 13:39

erusev

0.27.0

b622157

0.27.0: Hugging Face cache

New downloads are now stored in ~/.cache/huggingface/hub/ using the standard Hugging Face cache layout. This means models downloaded by LlamaBarn can be used by llama.cpp and other Hugging Face aware tools. Existing models in ~/.llamabarn/ continue to work.

Assets 4

24 Mar 13:56

erusev

0.26.0

4c935cb

0.26.0

Add Qwen 3.5 models, replacing Qwen 3
Add Hugging Face token option to settings
Update llama.cpp to b8496

Assets 4

18 Feb 14:23

erusev

0.25.0

1d71643

0.25.0

Add custom models folder setting
Update llama.cpp to b8088

Assets 4

Releases: ggml-org/Llama-macOS

0.32.0: The llama has left the barn

Uh oh!

0.31.1

Uh oh!

0.31.0: llama.cpp and catalog, unbundled

Uh oh!

0.30.0: Install models via deeplinks

Uh oh!

0.29.1

Uh oh!

0.29.0: Sideloaded models

Uh oh!

0.28.0: Gemma 4

Uh oh!

0.27.0: Hugging Face cache

Uh oh!

0.26.0

Uh oh!

0.25.0

Uh oh!