diff --git a/site/index.html b/site/index.html index 68cc9c4..8624585 100644 --- a/site/index.html +++ b/site/index.html @@ -480,7 +480,7 @@

Compression vs Quality

-

vs llama.cpp

+

vs llama.cpp KV compression

Same 4-bit budget, 3.5x less quality degradation:

PPL Degradation at 4-bit (lower is better)
@@ -494,6 +494,23 @@

vs llama.cpp

+

When to use which?

+

llama.cpp is excellent. The difference is integration scope, not capability:

+
+ + + + + + + + + + +
Scenarioquant.cppllama.cpp
WASM browser demo192 KB binaryTensor graph too large
Microcontroller / RTOS#include onlyNeeds build system
Game engine pluginDrop one .h file250K LOC build
Learn in an afternoon16K LOC250K+ LOC
GPU throughputBasicFull Metal/CUDA
Model coverage7 architectures100+
+
+

Use llama.cpp for speed on a workstation. Use quant.cpp when you need to ship LLM inference inside something.

+

Context Length on 8GB Mac

@@ -572,12 +589,28 @@

Glossary

Try It Yourself

-

Three lines of Python. No GPU, no API key, no setup.

-
pip install quantcpp
+    

Python one-liner or C single-header. No GPU, no API key, no setup.

+
+
+
Python
+
pip install quantcpp
 
 from quantcpp import Model
 m = Model.from_pretrained("Llama-3.2-1B")
 print(m.ask("What is gravity?"))
+
+
+
C (single header)
+
#include "quant.h"
+
+int main() {
+    quant_model* m = quant_load("model.gguf");
+    quant_generate(quant_new(m, NULL),
+        "Hello!", print_token, NULL);
+}
+// cc app.c -lm -lpthread
+
+

GitHub PyPI @@ -715,7 +748,7 @@

Try It Yourself

'ch5.label':'Chapter 5','ch5.title':'Benchmarks','ch5.desc':'All measurements on Llama 3.2 1B Instruct (Q8_0 GGUF), Apple M1 Pro, 8 threads.', 'ch6.label':'Chapter 6','ch6.title':'Research Foundations','ch6.desc':'Each technique in quant.cpp is grounded in peer-reviewed research:', 'gl.label':'Reference','gl.title':'Glossary', - 'cta.title':'Try It Yourself','cta.desc':'Three lines of Python. No GPU, no API key, no setup.', + 'cta.title':'Try It Yourself','cta.desc':'Python one-liner or C single-header. No GPU, no API key, no setup.', }, ko: { 'nav.problem':'문제점','nav.solution':'핵심 발견','nav.techniques':'4가지 기술', @@ -748,7 +781,7 @@

Try It Yourself

'ch5.label':'챕터 5','ch5.title':'벤치마크','ch5.desc':'모든 측정: Llama 3.2 1B Instruct (Q8_0 GGUF), Apple M1 Pro, 8 스레드.', 'ch6.label':'챕터 6','ch6.title':'연구 기반','ch6.desc':'quant.cpp의 각 기술은 동료 심사를 거친 연구에 기반합니다:', 'gl.label':'참조','gl.title':'용어집', - 'cta.title':'직접 해보기','cta.desc':'Python 3줄. GPU도, API 키도, 설정도 필요 없습니다.', + 'cta.title':'직접 해보기','cta.desc':'Python 한 줄 또는 C 헤더 하나. GPU도, API 키도, 설정도 필요 없습니다.', } };