Cognitive Cost of LLM Quantization

Thesis: Quantization trades reasoning quality for efficiency — but how much? This project measures that tradeoff using CARS (Cognitive Accuracy per Resource-Second), a unified metric that captures accuracy, memory, and speed in a single number.

The CARS Metric

CARS = Reasoning Accuracy / (VRAM_GB × Latency_s)

Component	What it measures
Reasoning Accuracy	% correct on reasoning benchmarks (GSM8K, ARC)
VRAM_GB	Peak GPU memory during inference
Latency_s	Average per-sample inference time

Higher CARS = better reasoning per unit of compute resource.

Models Under Test

Model	Parameters	Quantization	Expected VRAM
Llama-3.2-3B	3B	FP16 (baseline)	~6 GB
Llama-3.2-3B	3B	AWQ 4-bit	~2 GB
Llama-3.2-3B	3B	GPTQ 4-bit	~2 GB
Llama-3-8B	8B	8-bit bitsandbytes (baseline)	~9 GB
Llama-3-8B	8B	AWQ 4-bit	~5 GB
Llama-3-8B	8B	GPTQ 4-bit	~5 GB

Benchmarks

GSM8K — Grade school math word problems (multi-step reasoning)
ARC-Easy — Elementary science questions
ARC-Challenge — Harder science questions requiring reasoning

Quick Start

Install

pip install -e .

Run a benchmark

# Small model, quick validation
python -m src.benchmark --model meta-llama/Llama-3.2-3B --tasks arc_easy --limit 10

# Full benchmark suite
python -m src.benchmark --model meta-llama/Llama-3.2-3B --tasks gsm8k arc_easy arc_challenge

View CARS scores

python -m src.cars results/

Colab Setup

This project runs on Google Colab Free (T4 GPU). See notebooks/colab_setup.ipynb for SSH tunnel setup, then run:

./setup_colab.sh

Project Structure

src/
├── cars.py          # CARS metric calculator + comparison table
├── benchmark.py     # Benchmark runner (wraps lm-evaluation-harness)
└── models.py        # Model loading helpers (FP16/AWQ/GPTQ)
notebooks/
└── colab_setup.ipynb  # Colab SSH tunnel setup
results/             # Benchmark output JSONs
setup_colab.sh       # One-command Colab provisioning

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
setup_colab.sh		setup_colab.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cognitive Cost of LLM Quantization

The CARS Metric

Models Under Test

Benchmarks

Quick Start

Install

Run a benchmark

View CARS scores

Colab Setup

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cognitive Cost of LLM Quantization

The CARS Metric

Models Under Test

Benchmarks

Quick Start

Install

Run a benchmark

View CARS scores

Colab Setup

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages