Skip to content

Releases: ggml-org/Llama-macOS

0.32.0: The llama has left the barn

17 Jun 09:55

Choose a tag to compare

LlamaBarn is now Llama. It's the same app with a new name, and your settings and downloaded models carry over automatically.

Because of the rename, this update isn't automatic -- download and install the new app.

Also in this release:

  • Default server port is now 8080 to match llama.cpp, was 2276
  • Set a custom port in Settings, including 2276 if you relied on the old one
  • Deeplinks now use the llama:// scheme; llamabarn:// still works

0.31.1

12 Jun 08:33

Choose a tag to compare

The app now also accepts llama:// links, ahead of the upcoming rename to Llama. Existing llamabarn:// links continue to work.

0.31.0: llama.cpp and catalog, unbundled

11 Jun 13:14

Choose a tag to compare

LlamaBarn used to bundle the llama.cpp engine inside the app. Now it shares one install with your command line: if you already have llama.cpp the app uses it, and if the app installs it you have it in your terminal too.

  • Uses an existing llama.cpp from the install script at https://llama.app or Homebrew if you have one
  • Shows the in-use llama.cpp build in the footer
  • Ships as a 1.6 MB .dmg (down from 7 MB)

We've also replaced the built-in catalog. Instead of a fixed list in the app, a new "Recommended for your Mac" section suggests models that fit, to get you started. For browsing, there's now a curated catalog at https://llama.app that's more up to date and easier to maintain.

Heads-up: a future update will rename LlamaBarn to Llama. It updates automatically and your models and settings carry over. Two things to know if you connect to the app: the local server will move to port 8080 (from 2276) to match llama.cpp's default, and llamabarn:// deep links will become llama://.

0.30.0: Install models via deeplinks

25 Apr 06:50

Choose a tag to compare

  • Add Qwen 3.6 family: 27B and 35B-A3B
  • Install models from Hugging Face via llamabarn:// deeplinks
  • Pause and resume in-progress downloads; partials survive app quit
  • Enable prompt-based speculative decoding by default
  • Find sideloaded models in HF cache subdirectories; fix split-shard quant labels
  • Fix MoE compatibility for sideloaded models using measured memory
  • Fix sideloaded estimation hanging forever when llama-fit-params failed
  • Improve sideload memory estimate accuracy
  • Move models.ini to Application Support; ~/.llamabarn no longer required
  • Update llama.cpp to b8902

0.29.1

16 Apr 11:55

Choose a tag to compare

Fix notarization. 0.29.0 shipped unnotarized due to a build pipeline bug.

0.29.0: Sideloaded models

15 Apr 08:53

Choose a tag to compare

This release opens LlamaBarn up beyond the curated catalog: any GGUF model in your Hugging Face cache now shows up in the installed list with the same one-click load, run, and delete as curated models, with context tiers sized to your device automatically.

  • Detect and support sideloaded GGUF models from the Hugging Face cache
  • Match llama-server format for sideloaded model IDs so IDs are portable
  • Default every model to the 4K context tier for a smaller footprint
  • Show the model's native max context alongside the device-fit tier
  • Show every size in catalog family drawers, with installed ones badged
  • Keep deprecated families like Qwen3 visible for already-installed models
  • Add a caption under Launch at login explaining idle resource use
  • Show friendlier HTTP download errors
  • Fix Gemma 4 download URLs after Hugging Face repo file renames
  • Update llama.cpp to b8797

0.28.0: Gemma 4

03 Apr 15:37

Choose a tag to compare

  • Add Gemma 4 models to catalog: 31B, 26B-A4B, E4B, E2B
  • Update llama.cpp to b8648

0.27.0: Hugging Face cache

01 Apr 13:39

Choose a tag to compare

New downloads are now stored in ~/.cache/huggingface/hub/ using the standard Hugging Face cache layout. This means models downloaded by LlamaBarn can be used by llama.cpp and other Hugging Face aware tools. Existing models in ~/.llamabarn/ continue to work.

0.26.0

24 Mar 13:56

Choose a tag to compare

  • Add Qwen 3.5 models, replacing Qwen 3
  • Add Hugging Face token option to settings
  • Update llama.cpp to b8496

0.25.0

18 Feb 14:23

Choose a tag to compare

  • Add custom models folder setting
  • Update llama.cpp to b8088