Mac LLM Bench

Community-driven benchmark database for running LLMs locally on Apple Silicon Macs.

Goal: Build a comprehensive, reproducible performance database so anyone can look up how fast a given LLM runs on their specific Mac — and find the optimal settings for it.

Benchmark Results

Browse results by chip generation:

Generation	Link	Status
Apple M1	View results	Awaiting contributions
Apple M2	View results	Awaiting contributions
Apple M3	View results	Awaiting contributions
Apple M4	View results	Awaiting contributions
Apple M5	View results	1 config, 22 models

Each generation page contains separate tables for every variant (base, Pro, Max, Ultra) and hardware configuration (CPU cores, GPU cores, RAM).

Full results index with cross-generation comparison: results/README.md

Quick Start

git clone https://github.com/enescingoz/mac-llm-bench.git
cd mac-llm-bench

# Install dependencies
brew install llama.cpp
pip3 install huggingface-hub

# Run a quick smoke test (~0.8GB download)
./bench.sh --quick

# Benchmark all models that fit in your RAM
./bench.sh --auto

# Regenerate result tables after benchmarking
python3 scripts/generate_results.py

How It Works

We use llama-bench as the core benchmark — standardized, content-agnostic, and fully reproducible. It measures raw token processing and generation speed at fixed token counts (pp128, pp256, pp512, tg128, tg256). No custom prompts, no subjectivity, no need to ever re-benchmark if test cases change.

Metric	Source	Description
pp128/pp256/pp512 (tok/s)	`llama-bench`	Prompt processing speed
tg128/tg256 (tok/s)	`llama-bench`	Text generation speed
Peak Memory (GB)	`/usr/bin/time`	Maximum RAM usage
Perplexity	`llama-perplexity`	Quality on WikiText-2 (optional)

Supported Models

Currently benchmarking 5 model families (22 models total):

Family	Models	Sizes
Gemma 3 (Google)	4 models	1B, 4B, 12B, 27B
Qwen 3 (Alibaba)	7 models	0.6B, 1.7B, 4B, 8B, 14B, 32B, 30B-A3B MoE
DeepSeek R1 Distill	3 models	7B, 14B, 32B
Phi-4 (Microsoft)	4 models	Mini 3.8B, Mini Reasoning 3.8B, 14B, Reasoning Plus 14B
Mistral	4 models	7B v0.3, Nemo 12B, Small 3.1 24B, Devstral Small 24B

All ungated — no HuggingFace login required. More model families can be added via PR. Run ./bench.sh --list to see all available models.

Apple Silicon Coverage

We aim to cover every Apple Silicon configuration:

M1 / M2 / M3 / M4 / M5
  × base / Pro / Max / Ultra
    × various CPU/GPU core counts
      × various RAM sizes (8GB – 256GB)

Results are organized by generation → variant → hardware config. See CONTRIBUTING.md for how to add your machine.

Parameter Optimization

Find optimal settings for each model on your hardware:

./bench.sh --model gemma-3-4b --sweep        # Quick sweep
./bench.sh --model gemma-3-4b --sweep-full   # Exhaustive sweep

Project Structure

mac-llm-bench/
├── bench.sh                        # Main CLI
├── models.yaml                     # Model registry
├── requirements.txt                # Python dependencies
├── lib/                            # Benchmark scripts
├── scripts/
│   └── generate_results.py         # Generates result tables from raw data
├── results/
│   ├── README.md                   # Auto-generated index
│   ├── m1/ ... m5/                 # Per-generation results
│   │   ├── README.md               # Auto-generated tables
│   │   └── raw/                    # Raw JSON benchmark data
│   │       └── {chip}_{cpu}c-{gpu}g_{ram}gb/
│   │           └── {model}_{quant}_ngl{n}.json
├── schemas/
│   └── result.schema.json          # Result JSON format
├── CONTRIBUTING.md                  # How to submit results
└── GUIDE.md                        # User guide

Documentation

GUIDE.md — Detailed user guide for benchmarking
CONTRIBUTING.md — How to submit results and add models
results/ — All benchmark results

Requirements

macOS on Apple Silicon (M1/M2/M3/M4/M5)
llama.cpp — brew install llama.cpp
huggingface-hub — pip3 install huggingface-hub
Python 3 (pre-installed on macOS)

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mac LLM Bench

Benchmark Results

Quick Start

How It Works

Supported Models

Apple Silicon Coverage

Parameter Optimization

Project Structure

Documentation

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
results		results
schemas		schemas
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
GUIDE.md		GUIDE.md
LICENSE		LICENSE
README.md		README.md
bench.sh		bench.sh
models.yaml		models.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Mac LLM Bench

Benchmark Results

Quick Start

How It Works

Supported Models

Apple Silicon Coverage

Parameter Optimization

Project Structure

Documentation

Requirements

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages