Interactive calculator for LLM serving — compute, memory, and KV cache bandwidth estimation.
Live demo: https://simpx.github.io/llmcalc/
- Architecture analysis: parses model config (params, KV/token, per-token FLOPs)
- Deployment planning: GPUs per instance, DP instances per machine
- Hardware fitting: memory allocation bars for GPU HBM
- Workload bandwidth estimation: bucket-based analysis for agentic multi-turn workloads
- Topology visualization: see instances, memory layout, and traffic flow at a glance
| Architecture | Example models | Status |
|---|---|---|
| MLA + DSA + MoE | GLM-5, DeepSeek-V3.2 | ✅ Built-in + config.json paste |
| Hybrid GQA + Linear Attn + MoE | Qwen-series hybrids | ✅ Via config.json paste |
| GQA + MoE | DeepSeek-V3, Mixtral | 🚧 Planned |
| Dense MHA/GQA | LLaMA-3, Qwen3 dense | 🚧 Planned |
# Clone and open
git clone https://github.com/simpx/llmcalc.git
cd llmcalc
open index.html # or: python -m http.server 8000Or use the hosted version at https://simpx.github.io/llmcalc/
- Select model: Preset (GLM-5) or paste a HuggingFace
config.json - Deployment: GPUs per instance and DP replicas per machine
- Hardware: GPU type, HBM size, MFU
- Workload: Buckets with (T, h) per bucket
- Hit rate: Local cache hit rate to compute network bandwidth
The traffic overview panel on the right updates in real time as you change parameters.
All derived values are computed from architecture params:
avg_pos = T × (1 + h) / 2
FLOPs/tok = LinearConst + PosCoef × avg_pos
X = (Peak × MFU × 10⁶) / FLOPs/tok tokens/s
Write BW = X × KV_per_token GiB/s
Read raw = Write × h / (1 - h) GiB/s (amortized)
External BW = Read raw × (1 - h_local) GiB/s (goes to network)
For MLA + DSA:
LinearConst= projections + FFN + lm_head + MLA bounded attn bodyPosCoef=2 × index_n_heads × index_head_dim × num_layers / 10⁹- KV/token =
(kv_lora_rank × bytes + qk_rope_dim × bytes + index_head_dim × bytes) × num_layers
Single-file HTML with CDN-hosted dependencies:
- Tailwind CSS (styling)
- Chart.js (bandwidth chart)
- Vanilla JavaScript (no build step)
MIT