-
-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Labels
Description
Basic information
- Board URL (official): https://www.apple.com/mac-mini/
- Board purchased from: Apple (direct)
- Board purchase date: October 29, 2024 (arrived Nov 11, 2024)
- Board specs (as tested): M4 10/10/16-core, 32GB RAM, 1TB SSD, 10 GbE
- Board price (as tested): 1499.00
Linux/system information
# output of `screenfetch`
-/+:. jgeerling@jeff-mini
:++++. OS: 64bit macOS
/+++/. Kernel: arm64 Darwin 24.1.0
.:-::- .+/:-``.::- Uptime: 5h 39m
.:/++++++/::::/++++++/:` Packages: 183
.:///////////////////////:` Shell: zsh 5.9
////////////////////////` Resolution: 3840x2160
-+++++++++++++++++++++++` DE: Aqua
/++++++++++++++++++++++/ WM: Quartz Compositor
/sssssssssssssssssssssss. WM Theme: Blue (Dark)
:ssssssssssssssssssssssss- Font: FMonoMedium
osssssssssssssssssssssssso/` Disk: 190G / 995G (20%)
`syyyyyyyyyyyyyyyyyyyyyyyy+` CPU: Apple M4
`ossssssssssssssssssssss/ GPU: Apple M4
:ooooooooooooooooooo+. RAM: 3974MiB / 32768MiB
`:+oo+/:-..-:/+o+/-
# output of `uname -a`
Darwin jeff-mini.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:06:23 PDT 2024; root:xnu-11215.41.3~3/RELEASE_ARM64_T8132 arm64
Benchmark results
CPU
- Geekbench 6: (3678 single / 14678 multi - https://browser.geekbench.com/v6/cpu/8791920)
- 299.93 Gflops (7.57 Gflops/W) (geerlingguy/top500-benchmark HPL result)
- Cinebench 2024 (169 single / 893 multi / 3787 GPU)
Power
- Idle power draw (at wall): 4.1 W
- Maximum simulated power draw (
stress-ng --matrix 0): 31.2 W - During Geekbench multicore benchmark: 36 W
- During
top500HPL benchmark: 39.6 W - During Cinebench 2024: 38 W
Disk
Internal Apple Storage
| Benchmark | Result |
|---|---|
| AmorphousDiskMark 4K random read QD64 | 1113.00 MB/s |
| AmorphousDiskMark 4K random write QD64 | 121.97 MB/s |
| AmorphousDiskMark 1M sequential read | 3017.64 MB/s |
| AmorphousDiskMark 1M sequential write | 3196.68 MB/s |
Network
iperf3 results:
iperf3 -c $SERVER_IP: 9.40 Gbpsiperf3 -c $SERVER_IP --reverse: 9.38 Gbpsiperf3 -c $SERVER_IP --bidir: 9.37 Gbps up, 7.73 Gbps down
The 10 GbE connection adds about 2W to total system power draw.
(Be sure to test all interfaces, noting any that are non-functional.)
GPU
Memory
tinymembench results:
Click to expand memory benchmark result
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)
==========================================================================
== Memory bandwidth tests ==
== ==
== Note 1: 1MB = 1000000 bytes ==
== Note 2: Results for 'copy' tests show how many bytes can be ==
== copied per second (adding together read and writen ==
== bytes would have provided twice higher numbers) ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
== to first fetch data into it, and only then write it to the ==
== destination (source -> L1 cache, L1 cache -> destination) ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in ==
== brackets ==
==========================================================================
C copy backwards : 30582.1 MB/s (2.4%)
C copy backwards (32 byte blocks) : 30488.7 MB/s (2.7%)
C copy backwards (64 byte blocks) : 30756.7 MB/s (0.8%)
C copy : 31050.0 MB/s (0.7%)
C copy prefetched (32 bytes step) : 31217.9 MB/s (0.3%)
C copy prefetched (64 bytes step) : 31255.8 MB/s (1.8%)
C 2-pass copy : 25266.3 MB/s (1.3%)
C 2-pass copy prefetched (32 bytes step) : 25340.9 MB/s (1.5%)
C 2-pass copy prefetched (64 bytes step) : 25332.9 MB/s (1.5%)
C fill : 45000.0 MB/s (10.3%)
C fill (shuffle within 16 byte blocks) : 35503.4 MB/s (2.5%)
C fill (shuffle within 32 byte blocks) : 37420.2 MB/s (4.0%)
C fill (shuffle within 64 byte blocks) : 41411.4 MB/s (6.9%)
NEON 64x2 COPY : 44108.2 MB/s (1.9%)
NEON 64x2x4 COPY : 44995.9 MB/s (3.8%)
NEON 64x1x4_x2 COPY : 43933.6 MB/s (3.5%)
NEON 64x2 COPY prefetch x2 : 38081.9 MB/s (4.2%)
NEON 64x2x4 COPY prefetch x1 : 37652.8 MB/s (0.8%)
NEON 64x2 COPY prefetch x1 : 38499.6 MB/s (1.7%)
NEON 64x2x4 COPY prefetch x1 : 36585.0 MB/s (1.6%)
---
standard memcpy : 44986.9 MB/s (2.0%)
standard memset : 69795.4 MB/s (1.2%)
---
NEON LDP/STP copy : 44254.6 MB/s (4.7%)
NEON LDP/STP copy pldl2strm (32 bytes step) : 45326.7 MB/s (4.9%)
NEON LDP/STP copy pldl2strm (64 bytes step) : 43931.5 MB/s (3.8%)
NEON LDP/STP copy pldl1keep (32 bytes step) : 44670.6 MB/s (2.6%)
NEON LDP/STP copy pldl1keep (64 bytes step) : 44082.8 MB/s (1.3%)
NEON LD1/ST1 copy : 42881.2 MB/s (1.6%)
NEON STP fill : 80754.5 MB/s (5.4%)
NEON STNP fill : 68623.4 MB/s (0.5%)
ARM LDP/STP copy : 43418.8 MB/s (0.3%)
ARM STP fill : 82462.2 MB/s (5.2%)
ARM STNP fill : 68986.8 MB/s (1.3%)
==========================================================================
== Memory latency test ==
== ==
== Average time is measured for random memory accesses in the buffers ==
== of different sizes. The larger is the buffer, the more significant ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM ==
== accesses. For extremely large buffer sizes we are expecting to see ==
== page table walk with several requests to SDRAM for almost every ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest). ==
== ==
== Note 1: All the numbers are representing extra time, which needs to ==
== be added to L1 cache latency. The cycle timings for L1 cache ==
== latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
== two independent memory accesses at a time. In the case if ==
== the memory subsystem can't handle multiple outstanding ==
== requests, dual random read has the same timings as two ==
== single reads performed one after another. ==
==========================================================================
block size : single random read / dual random read
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 0.0 ns / 0.1 ns
131072 : 0.0 ns / 0.0 ns
262144 : 2.0 ns / 3.0 ns
524288 : 2.9 ns / 3.8 ns
1048576 : 3.4 ns / 4.1 ns
2097152 : 3.7 ns / 4.1 ns
4194304 : 5.1 ns / 5.6 ns
8388608 : 6.1 ns / 6.3 ns
16777216 : 12.7 ns / 17.6 ns
33554432 : 49.1 ns / 71.5 ns
67108864 : 71.1 ns / 91.0 ns
sbc-bench results
The script doesn't run on macOS.
Phoronix Test Suite
Results from pi-general-benchmark.sh:
- pts/encode-mp3: DNF (doesn't install on macOS)
- pts/x264 4K: 12.82 fps
- pts/x264 1080p: 55.53 fps
- pts/phpbench: 1125967
- pts/build-linux-kernel (defconfig): DNF (doesn't run on macOS)
Run inside a Docker container:
- pts/encode-mp3: 4.250 s
- pts/x264 4K: 25.27 fps
- pts/x264 1080p: 108.50 fps
- pts/phpbench: 932720
- pts/build-linux-kernel (defconfig): 383.776 s
Additional Benchmarks
Ollama (LLMs)
See: https://github.com/geerlingguy/ollama-benchmark?tab=readme-ov-file#findings and geerlingguy/ai-benchmarks#2
| System | CPU/GPU | Model | Eval Rate | Power (Peak) |
|---|---|---|---|---|
| M4 Mac mini (10 core CPU) / 32GB | GPU | llama3.2:3b | 41.31 Tokens/s | 30.1 W |
| M4 Mac mini (10 core CPU) / 32GB | GPU | llama3.1:8b | 20.95 Tokens/s | 29.4 W |
| M4 Mac mini (10 core CPU) / 32GB | GPU | llama2:13b | 13.60 Tokens/s | 29.8 W |
Reactions are currently unavailable