📖 Article: https://www.ramankishore.com/blog/cache-containment/
This repository is a companion demo to the article above.
The article explains the system-design concepts; this repo lets you run the failure modes and observe them live.
The README is intentionally focused on setup, execution, and observation.
-
api/
API service (FastAPI). Handles requests, interacts with the cache, and exposes metrics. -
cache/
In-memory cache implementation. This is where contained vs non-contained behavior is implemented. -
db/
Mock database service with a fixed-size connection pool and simulated latency. -
loadgen/
Load generator used to apply steady and overload traffic to the system. -
prometheus/
Prometheus scrape configuration for API and DB metrics. -
grafana/
Provisioned Grafana setup (datasource + dashboard) so metrics are visible out of the box. -
docker-compose.yml
Wires all services together. -
.env.example / .env
Simulation configuration. All behavior is driven via environment variables.
- A cache that admits all work (non-contained)
- A cache that caps admission under overload (contained)
- How each choice affects DB pressure, queues, latency, and failures
All behavior is driven by configuration, not code changes.
Further details present in the article link mentioned above.
-
Clone the repository
git clone https://github.com/kramankishore/cache-containment cd cache-containment -
Create the environment file
cp .env.example .env
All simulation behavior is controlled via .env.
Important toggle inside .env
CONTAINED_CACHE=1 # enable cache containment
CONTAINED_CACHE=0 # disable cache containment
Start all services using Docker Compose (detached mode):
docker compose up -d --build
This starts:
- API → http://localhost:8000
- DB → http://localhost:8001
- Prometheus → http://localhost:9090
- Grafana → http://localhost:3000
- Open: http://localhost:3000
- Dashboard title: Cache Containment
The dashboard is auto-provisioned and works out of the box.
Key panels include:
- Incoming request rate (RPS)
- Cache miss rate
- DB active connections
- DB waiting requests
- API p95 latency
- API failure rate
The load generator runs outside Docker for easy tuning.
-
Activate the virtual environment
python3 -m venv .venv source ./.venv/bin/activate -
Install dependencies
pip install -r requirements.txt -
Start the load
python loadgen/steady.py
You can modify RPS, test duration, and keyspace size directly inside loadgen/steady.py.
To simulate a real-world shock (cold cache / invalidation):
curl -X POST http://localhost:8000/_control/clear_cache
This forces a sudden cache miss surge and downstream load.
-
Run with non-contained cache by setting below flag in .env
CONTAINED_CACHE=0 -
Wait for a while for the system to reach steady state
-
Trigger disturbance and observe metrics in Grafana dashboard mentioned above.
-
Stop containers
docker compose down -v -
Switch to contained cache by updating below flag in .env
CONTAINED_CACHE=1 -
Restart and repeat the test
Compare:
- queue growth
- DB saturation
- tail latency
- failure rate
- etc. as mentioned in the article
To stop and remove everything:
docker compose down -v
docker volume prune
This repository is a failure-mode demonstration, not a benchmark.
The value is in observing how overload propagates, where it is absorbed, and what different design choices optimize for.