llmcalc

Interactive calculator for LLM serving — compute, memory, and KV cache bandwidth estimation.

Live demo: https://simpx.github.io/llmcalc/

Features

Architecture analysis: parses model config (params, KV/token, per-token FLOPs)
Deployment planning: GPUs per instance, DP instances per machine
Hardware fitting: memory allocation bars for GPU HBM
Workload bandwidth estimation: bucket-based analysis for agentic multi-turn workloads
Topology visualization: see instances, memory layout, and traffic flow at a glance

Supported Architectures

Architecture	Example models	Status
MLA + DSA + MoE	GLM-5, DeepSeek-V3.2	✅ Built-in + config.json paste
Hybrid GQA + Linear Attn + MoE	Qwen-series hybrids	✅ Via config.json paste
GQA + MoE	DeepSeek-V3, Mixtral	🚧 Planned
Dense MHA/GQA	LLaMA-3, Qwen3 dense	🚧 Planned

Quick Start

# Clone and open
git clone https://github.com/simpx/llmcalc.git
cd llmcalc
open index.html   # or: python -m http.server 8000

Or use the hosted version at https://simpx.github.io/llmcalc/

Usage Flow

Select model: Preset (GLM-5) or paste a HuggingFace config.json
Deployment: GPUs per instance and DP replicas per machine
Hardware: GPU type, HBM size, MFU
Workload: Buckets with (T, h) per bucket
Hit rate: Local cache hit rate to compute network bandwidth

The traffic overview panel on the right updates in real time as you change parameters.

Formulas

All derived values are computed from architecture params:

avg_pos     = T × (1 + h) / 2
FLOPs/tok   = LinearConst + PosCoef × avg_pos
X           = (Peak × MFU × 10⁶) / FLOPs/tok       tokens/s
Write BW    = X × KV_per_token                     GiB/s
Read raw    = Write × h / (1 - h)                  GiB/s (amortized)
External BW = Read raw × (1 - h_local)             GiB/s (goes to network)

For MLA + DSA:

LinearConst = projections + FFN + lm_head + MLA bounded attn body
PosCoef = 2 × index_n_heads × index_head_dim × num_layers / 10⁹
KV/token = (kv_lora_rank × bytes + qk_rope_dim × bytes + index_head_dim × bytes) × num_layers

Tech Stack

Single-file HTML with CDN-hosted dependencies:

Tailwind CSS (styling)
Chart.js (bandwidth chart)
Vanilla JavaScript (no build step)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmcalc

Features

Supported Architectures

Quick Start

Usage Flow

Formulas

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmcalc

Features

Supported Architectures

Quick Start

Usage Flow

Formulas

Tech Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages