This repository contains tools and scripts for learning and benchmarking MLX on Apple Silicon, specifically comparing it against Ollama.
- Python 3.11+
- uv (Recommended for fast dependency management)
- Ollama (for comparison benchmarks)
-
Clone the repository:
git clone <repository-url> cd mlx-learning
-
Install dependencies using
uv:uv sync
This will create a virtual environment and install all required packages.
We provide a CLI tool mlx-bench to compare generation speed between MLX and Ollama.
To run the default benchmark (comparing Qwen 3.5 9B):
uv run mlx-benchOptions:
--ollama-model: Specify the Ollama model tag (default:qwen3.5:latest)--mlx-model: Specify the MLX model path or HuggingFace repo (default:mlx-community/Qwen3.5-9B-MLX-4bit)--prompt: Custom prompt text--max-tokens: Maximum tokens to generate--verbose: Show generated text and detailed logs
Example:
uv run mlx-bench --prompt "Explain black holes" --max-tokens 256Linting & Formatting:
uv run ruff check .
uv run ruff format .Type Checking:
uv run mypy .Running Tests:
uv run pytestWe tested the generation performance of MLX vs Ollama using Qwen 3.5 9B (4-bit quantization).
| Engine | Model | Tokens/sec | Relative Speed |
|---|---|---|---|
| Ollama | qwen3.5:latest (9B) | 18.58 | 1.00x |
| MLX | Qwen3.5-9B-MLX-4bit | 28.35 | 1.53x |
Run it yourself:
uv run mlx-bench --max-tokens 128src/mlx_learning/: Source code packagetests/: Unit and integration testspyproject.toml: Project configuration and dependencies