CacheForge

title

CacheForge Environment

emoji

⚡

colorFrom

indigo

colorTo

blue

sdk

docker

pinned

false

app_port

8000

base_path

/web

CacheForge

A production-grade multi-tier cache optimisation environment built on the OpenEnv specification. An RL agent observes live cache health metrics and tunes TTL, capacity, eviction policy, and tier placement to maximise hit rate while minimising latency and memory waste. Includes task-based evaluation with deterministic graders (0.0–1.0 scoring).

Designed to evaluate both reinforcement learning policies and LLM-based decision agents.

This environment is fully compliant with OpenEnv and supports both local Docker execution and remote evaluation via Hugging Face Spaces.

Real-World Motivation

Every large-scale web service — Google, Netflix, Amazon, Cloudflare — relies on multi-tier caching to serve billions of requests per second. Manual cache tuning is often suboptimal and brittle:

Static TTLs can't adapt to shifting traffic patterns.
Over-provisioned caches waste memory; under-provisioned ones spike latency.
Eviction policy choice (LRU vs LFU vs FIFO) depends on workload skew.

CacheForge models this problem as an RL environment: the agent receives real-time cache telemetry and must learn a policy that generalises across traffic patterns of increasing difficulty.

RL Loop

┌─────────┐   observation    ┌───────┐   action    ┌─────────────┐
│  Agent  │ ◄──────────────  │ Cache │ ◄────────── │  Agent      │
│  (LLM)  │ ──────────────► │ Env   │ ──────────► │  Decision   │
└─────────┘   reward         └───────┘             └─────────────┘

Observe: hit rate, latency, memory usage, request distribution
Act: adjust TTL, resize capacity, set eviction policy, shift tiers
Reward: composite signal balancing hit rate, latency, and memory
Repeat for up to 200 steps per episode

Observation Space

Field	Type	Range	Description
`hit_rate`	`float`	0.0 – 1.0	Cache hit rate across all tiers
`miss_rate`	`float`	0.0 – 1.0	Cache miss rate (1 - hit_rate)
`avg_latency`	`float`	≥ 0.0	Average request latency (ms)
`memory_usage`	`float`	≥ 0.0	Total memory as fraction of capacity
`request_rate`	`int`	≥ 0	Requests processed this step
`hot_keys_ratio`	`float`	0.0 – 1.0	Fraction of requests hitting hot keys
`cache_distribution`	`dict`	—	Per-tier utilisation `{L1, L2, L3}`
`done`	`bool`	—	Episode termination flag
`reward`	`float`	—	Step reward value

Action Space

Field	Type	Range	Description
`adjust_ttl`	`int`	-10 to +10	Global TTL delta (seconds)
`resize_cache`	`float`	-0.2 to +0.2	Relative capacity resize
`eviction_policy`	`str`	`"LRU"` / `"LFU"` / `"FIFO"`	Eviction strategy
`tier_shift`	`str`	`"none"` / `"L1→L2"` / `"L2→L3"`	Tier data migration

Tasks

CacheForge defines 3 tasks of increasing difficulty that map to different workload generators. Each task is evaluated using a deterministic grader returning a score in [0.0, 1.0]. Tasks are automatically selectable via reset(mode=...).

Task 1 — Easy (Static Workload)


Mode	`easy`
Workload	Static Zipf (α = 1.2), fixed key-space
Goal	Maximise cache hit rate
Grader	`score = clamp(hit_rate / 0.75, 0.001, 0.998)`

Task 2 — Medium (Mixed Workload)


Mode	`medium`
Workload	Alternating between two Zipf distributions every 50 steps
Goal	Balance hit rate and latency
Grader	`score = 0.5 × hit_rate + 0.5 × (1 - min(latency / 50, 1))`

Task 3 — Hard (Dynamic Workload)


Mode	`hard`
Workload	Sine-wave α modulation + Gaussian noise, continuous drift
Goal	Maintain high, stable performance under unpredictable load
Grader	4-component score including stability bonus (hit-rate variance penalty)

score = 0.4 × hit_rate
      + 0.2 × (1 - normalised_latency)
      + 0.2 × (1 - memory_penalty)
      + 0.2 × stability_bonus

The stability component discourages agents that spike then crash — consistent performance is rewarded.

All graders enforce strict open-interval bounds (0, 1) to comply with evaluation constraints, ensuring scores never reach exactly 0.0 or 1.0 due to floating-point rounding.

Note: Maximum score is capped at 0.998 instead of 1.0 to prevent floating-point rounding from producing invalid boundary values (e.g., "1.000") during evaluation.

Reward Function

Per-step reward (continuous, non-sparse):

reward = +2.0 × hit_rate
         -1.5 × normalised_latency   (latency / 50ms)
         -1.0 × memory_overuse       (max(0, usage - 0.85))

This provides a meaningful partial-progress signal every step.

Setup Instructions

Local Development

# Install dependencies
uv sync

# Start the environment server
uv run python -m server.app

# Server runs at http://localhost:8000
# API docs at http://localhost:8000/docs

Docker

# Build (Dockerfile is at project root)
docker build -t cacheforge-env:latest .

# Run
docker run -p 8000:8000 cacheforge-env:latest

Deploy to Hugging Face Spaces

openenv push --repo-id <your-username>/cacheforge

🌐 Live Deployment

CacheForge is deployed and publicly accessible on Hugging Face Spaces:

👉 https://tuhindev2029-cacheforge.hf.space

Health Check

curl https://tuhindev2029-cacheforge.hf.space/health

Reset Example

curl -X POST https://tuhindev2029-cacheforge.hf.space/reset \
  -H "Content-Type: application/json" \
  -d '{}'

Step Example

curl -X POST https://tuhindev2029-cacheforge.hf.space/step \
  -H "Content-Type: application/json" \
  -d '{
    "action": {
      "adjust_ttl": 1,
      "resize_cache": 0.05,
      "eviction_policy": "LRU",
      "tier_shift": "none"
    }
  }'

The API is stateless per episode — a /reset call is required before each /step sequence.

Example Usage

Python Client

from client import CacheforgeEnv
from models import CacheforgeAction

with CacheforgeEnv(base_url="http://localhost:8000") as env:
    result = env.reset(mode="easy", seed=42)
    print(f"Initial hit rate: {result.observation.hit_rate}")

    action = CacheforgeAction(
        adjust_ttl=3,
        resize_cache=0.1,
        eviction_policy="LFU",
        tier_shift="none",
    )
    result = env.step(action)
    print(f"Reward: {result.reward:.3f}")

cURL

# Reset
curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"seed": 42, "mode": "easy"}'

# Step
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{
    "action": {
      "adjust_ttl": 3,
      "resize_cache": 0.1,
      "eviction_policy": "LFU",
      "tier_shift": "none"
    }
  }'

Running Inference

The inference.py script runs an LLM agent against the environment using an OpenAI-compatible client (HuggingFace Router). It supports two execution modes:

Local Mode (server already running)

export HF_TOKEN="your_hf_token"

# Start server first
uv run python -m server.app

# Then run inference
python inference.py

Remote Mode (Hugging Face Spaces — no Docker required)

export HF_TOKEN="your_hf_token"
export API_BASE_URL="https://tuhindev2029-cacheforge.hf.space"

python inference.py

Default model: Qwen/Qwen2.5-72B-Instruct (configurable via MODEL_NAME).

The script automatically connects to the environment using API_BASE_URL when set, otherwise defaults to local Docker execution.

The agent interacts with the environment via HTTP API calls, making the system compatible with both local and remote deployments.

Baseline Results

Task	Mode	Score
Easy	Static Zipf	0.998
Medium	Mixed Zipf	0.856
Hard	Dynamic + noise	0.912

Baseline uses an LLM agent (Qwen/Qwen2.5-72B-Instruct via HuggingFace Router) with a fixed seed for deterministic, reproducible runs. Scores demonstrate strong generalisation across all three workload difficulty levels.

All scores are computed using deterministic graders defined in tasks.py, ensuring reproducibility across runs. Scores exceed the success threshold (0.6) across all tasks.

Project Structure

cacheforge/
├── Dockerfile                    # Container build
├── .dockerignore                 # Docker build exclusions
├── .gitignore                    # Git exclusions
├── openenv.yaml                  # OpenEnv manifest
├── pyproject.toml                # Dependencies & metadata
├── uv.lock                      # Locked dependency versions
├── LICENSE                       # BSD 3-Clause
├── models.py                     # Action & Observation Pydantic models
├── client.py                     # CacheforgeEnv client (WebSocket)
├── tasks.py                      # Task definitions & graders
├── inference.py                  # Baseline inference script
├── README.md                     # This file
└── server/
    ├── __init__.py
    ├── app.py                    # FastAPI server
    ├── cacheforge_environment.py # Core environment simulation
    └── requirements.txt          # Server dependencies

License

BSD-style license. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CacheForge

Real-World Motivation

RL Loop

Observation Space

Action Space

Tasks

Task 1 — Easy (Static Workload)

Task 2 — Medium (Mixed Workload)

Task 3 — Hard (Dynamic Workload)

Reward Function

Setup Instructions

Local Development

Docker

Deploy to Hugging Face Spaces

🌐 Live Deployment

Health Check

Reset Example

Step Example

Example Usage

Python Client

cURL

Running Inference

Local Mode (server already running)

Remote Mode (Hugging Face Spaces — no Docker required)

Baseline Results

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
server		server
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
client.py		client.py
inference.py		inference.py
models.py		models.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
tasks.py		tasks.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

CacheForge

Real-World Motivation

RL Loop

Observation Space

Action Space

Tasks

Task 1 — Easy (Static Workload)

Task 2 — Medium (Mixed Workload)

Task 3 — Hard (Dynamic Workload)

Reward Function

Setup Instructions

Local Development

Docker

Deploy to Hugging Face Spaces

🌐 Live Deployment

Health Check

Reset Example

Step Example

Example Usage

Python Client

cURL

Running Inference

Local Mode (server already running)

Remote Mode (Hugging Face Spaces — no Docker required)

Baseline Results

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages