💡 The Problem: LLMs have limited context windows, require efficient token management, need proper authentication, and demand production-grade reliability.
✅ The Solution:
RELAY Context Engineprovides intelligent context window management, automatic compression, semantic deduplication, JWT authentication, and distributed rate limiting with full observability.
| 🔐 JWT Auth | 📊 Token Management | 🗜️ Compression |
|---|---|---|
| Role-based access control | Smart token budgeting | ZSTD/LZ4 compression |
| 🚦 Rate Limiting | 📈 Observability | 🏗️ Production Ready |
| Distributed sliding window | OpenTelemetry + Prometheus | Docker + Kubernetes ready |
┌──────────────┐
│ Client │
└──────┬───────┘
│ HTTPS
▼
┌──────────────┐
│ FastAPI │
└──────┬───────┘
│
├─────────────────┬─────────────────┬──────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ JWT Auth │ │ Context │ │ Compression │ │OpenTelemetry │
│ │ │ Engine │ │ Engine │ │ │
│ Role-Based │ │ │ │ │ │ Metrics & │
│ Access Ctrl │ │ Token Mgmt │ │ ZSTD/LZ4 │ │ Tracing │
└──────────────┘ └──────┬───────┘ └──────────────┘ └──────────────┘
│
├──────────────┬──────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Redis │ │ PostgreSQL │ │ Vector DB │
│ (Cache) │ │ (Sessions) │ │(Embeddings) │
└──────────────┘ └──────────────┘ └──────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ CLIENTS │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web App │ │ Mobile App │ │ CLI Tool │ │
│ └───────┬──────┘ └───────┬──────┘ └───────┬──────┘ │
│ │ │ │ │
│ └───────────────────┼───────────────────┘ │
│ │ HTTPS │
└──────────────────────────────┼──────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ API GATEWAY │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Load Balancer │ │
│ └────────────────────────────┬────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Authentication (JWT) │ │
│ └────────────────────────────┬────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Rate Limiter │ │
│ └────────────────────────────┬────────────────────────────────────┘ │
└───────────────────────────────┼─────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ REST ENDPOINTS │
│ ┌────────────┐ ┌──────────────┐ ┌───────────┐ ┌────────────┐ ┌──────┐│
│ │POST │ │GET │ │GET │ │DELETE │ │GET ││
│ │/messages │ │/context/{id} │ │/stats │ │/context │ │stream││
│ └─────┬──────┘ └──────┬───────┘ └─────┬─────┘ └─────┬──────┘ └──┬───┘│
│ │ │ │ │ │ │
│ └───────────────┴───────┬───────┴─────────────┴────────────┘ │
└────────────────────────────────┼────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ CORE SERVICES │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Context Engine │ │
│ │ (Compression • Summarization • Drift Detection) │ │
│ └────────────────────────────┬────────────────────────────────────┘ │
│ ├──────────────────┐ │
│ ▼ ▼ │
│ ┌────────────────────────────────────┐ ┌──────────────────────────┐ │
│ │ Token Manager │ │ Budget Optimizer │ │
│ │ (Budget Tracking • Allocation) │ │ (Knapsack Algorithm) │ │
│ └────────────────────────────────────┘ └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
┌────────┐ ┌─────────────┐ ┌───────────────┐
│ Client │ │ API Gateway │ │ Context Engine│
└───┬────┘ └──────┬──────┘ └───────┬───────┘
│ │ │
│ 1. POST /messages │ │
│────────────────────>│ │
│ │ 2. Process Message │
│ │───────────────────────>│
│ │ │
│ │ │ 3. Count Tokens
│ │ │──────────────┐
│ │ │ │
│ │ │<─────────────┤
│ │ │ Token Count │
│ │ │
│ │ │ 4. Optimize Selection
│ │ │──────────────┐
│ │ │ │
│ │ │<─────────────┤
│ │ │ Selected Atoms
│ │ │
│ │ │ 5. Cache Context
│ │ │──────────────┐
│ │ │ │
│ │ │ 6. Persist Session
│ │ │──────────────┐
│ │ │ │
│ │ 7. Response │
│ │<───────────────────────│
│ 8. 200 OK + │ │
│ Context ID │ │
│<────────────────────│ │
│ │ │
Pipeline Steps:
- Client Request → Sends message with metadata
- API Gateway → Validates auth, checks rate limits
- Token Counting → Calculates token cost of incoming data
- Budget Optimization → Runs knapsack algorithm for optimal atom selection
- Context Caching → Stores compressed context in Redis
- Session Persistence → Saves session state to PostgreSQL
- Response Generation → Returns context ID and summary
- Client Receipt → Receives confirmation for future retrieval
---
## Quick Start
### 1. Installation
```bash
# Clone the repository
git clone https://github.com/ramsterr/RELAY-2.git
cd RELAY-2
# Install dependencies
pip install -e ".[dev]"
# Or use Make
make install
Create a .env file:
# Required: At least one LLM provider
OPENAI_API_KEY=sk-...
# or
ANTHROPIC_API_KEY=sk-ant-...
# Database (optional - defaults shown)
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/relay
# Redis (optional - defaults shown)
REDIS_URL=redis://localhost:6379/0
# Security (optional - auto-generated)
SECRET_KEY=your-secret-key# Development
uvicorn relay_context.main:app --reload
# Or use Make
make run
# Docker
docker-compose up -dfrom relay_context.main import create_app
from relay_context.security.auth import AuthService, UserRole
from relay_context.config.settings import Settings, Environment
from pydantic import SecretStr
# Create app
settings = Settings(
environment=Environment.TESTING,
security__secret_key=SecretStr("test-key"),
llm__openai_api_key=SecretStr("sk-test"),
)
app = create_app(settings)
# Get auth token
auth = AuthService(settings)
token = auth.create_access_token("user-1", UserRole.USER)
# Use the API
# POST /api/v1/context/messages
# {
# "session_id": "session-001",
# "role": "user",
# "content": "Hello, help me write a function"
# }
# Headers: Authorization: Bearer <token>| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /api/v1/context/messages |
Add message to context | ✅ |
| GET | /api/v1/context/{session_id} |
Get context window | ✅ |
| GET | /api/v1/context/{session_id}/stats |
Get window statistics | ✅ |
| DELETE | /api/v1/context/{session_id} |
Clear context window | ✅ |
| GET | /api/v1/context/{session_id}/stream |
Stream messages (SSE) | ✅ |
| GET | /health |
Health check | ❌ |
| GET | /metrics |
Prometheus metrics | ❌ |
# JWT Bearer Token
curl -H "Authorization: Bearer <token>" ...
# API Key
curl -H "X-API-Key: relay_<key>" ...| Role | Permissions |
|---|---|
ADMIN |
Full access to all operations |
USER |
Read/Write context, limited sessions |
VIEWER |
Read-only access |
Manages context windows with intelligent token budgeting:
from relay_context.core.context_engine import ContextEngine, MessageRole
from relay_context.core.token_manager import TokenManager
from relay_context.core.compression import CompressionEngine
token_manager = TokenManager("gpt-4")
compression = CompressionEngine(algorithm="zstd", level=3)
engine = ContextEngine(settings=settings, token_manager=token_manager, compression=compression)
# Add messages
await engine.add_message("session-001", MessageRole.USER, "Hello!")
context = await engine.get_context("session-001")Smart token counting and budget management:
from relay_context.core.token_manager import TokenManager
manager = TokenManager("gpt-4")
count = await manager.count_tokens("Hello world") # 2 tokens
# Truncate to budget
truncated = await manager.truncate_to_budget(long_text, max_tokens=1000)Reduce context size with ZSTD/LZ4:
from relay_context.core.compression import CompressionEngine
engine = CompressionEngine(algorithm="zstd", level=3)
compressed = await engine.compress_text("long context...")
restored = await engine.decompress_text(compressed)JWT-based auth with RBAC:
from relay_context.security.auth import AuthService, UserRole
auth = AuthService(settings)
token = auth.create_access_token("user-id", UserRole.ADMIN)
# Verify token
payload = auth.verify_token(token)Distributed sliding window rate limiter:
from relay_context.security.rate_limiter import SlidingWindowRateLimiter
limiter = SlidingWindowRateLimiter(redis_client, requests=100, window_seconds=60)
allowed, info = await limiter.is_allowed("user-1")
# info = {"remaining": 99, "reset_at": ...}| Feature | Description | Status |
|---|---|---|
| 🔐 JWT Auth | Token-based authentication with RBAC | ✅ |
| 🚦 Rate Limiting | Distributed sliding window limiter | ✅ |
| 🗜️ Compression | ZSTD/LZ4 context compression | ✅ |
| 📊 Metrics | Prometheus + OpenTelemetry | ✅ |
| 🐳 Docker | Containerized deployment | ✅ |
| 🔄 Async | Full async/await architecture | ✅ |
| 🛡️ Security | CORS, headers, validation | ✅ |
graph LR
subgraph Collection["Metrics Collection"]
App[Application]
Sys[System]
DB[Database]
end
subgraph Storage["Storage"]
Prometheus[Prometheus TSDB]
end
subgraph Visualization["Visualization"]
Grafana[Grafana]
Alerts[Alert Manager]
end
App --> Prometheus
Sys --> Prometheus
DB --> Prometheus
Prometheus --> Grafana
Prometheus --> Alerts
style Collection fill:#e3f2fd
style Storage fill:#e8f5e9
style Visualization fill:#fff3e0
| Metric | Type | Alert Threshold | Description |
|---|---|---|---|
relay_context_tokens_used |
Gauge | > 90% | Token utilization |
relay_context_drift_score |
Gauge | > 0.5 | Context drift |
relay_http_request_duration |
Histogram | p99 > 500ms | Latency |
relay_cache_hit_rate |
Gauge | < 0.7 | Cache efficiency |
relay_error_rate |
Counter | > 1% | Error rate |
curl http://localhost:8080/health
# Response:
{
"status": "healthy",
"version": "2.0.0",
"environment": "development",
"checks": {
"database": {"status": "healthy", "latency_ms": 2},
"redis": {"status": "healthy", "latency_ms": 1}
},
"uptime_seconds": 1234.56
}+---------------------------------------------------------------+
| RELAY CONTEXT ENGINE [Health: ✅] |
+---------------------------------------------------------------+
| [TOKEN METRICS] | [RATE LIMITS] | [SESSIONS] |
| | | |
| Used: 45.2k/128k | Remaining: 87/100 | Active: 12 |
| [████████░░] 35% | [███████░░] 87% | Total: 156 |
+---------------------+-----------------------+-----------------+
| [COMPRESSION] | [AUTH] |
| | |
| Ratio: 2.3x | JWT: ✅ Active |
| Saved: 23.4k tokens | API Keys: 5 |
+-----------------------------------+-----------------------+
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f relay
# Stop
docker-compose down| Service | Port | Description |
|---|---|---|
| API | 8080 | Main FastAPI application |
| PostgreSQL | 5432 | Database |
| Redis | 6379 | Cache & rate limiting |
| Prometheus | 9090 | Metrics collection |
| Grafana | 3000 | Dashboards |
# Run all tests
pytest
# With coverage
pytest --cov=src/relay_context --cov-report=term-missing
# Run specific test categories
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/
pytest tests/load/============ 151 passed, 15 skipped, 22 warnings in 5.95s =============
Coverage: 81%
| Category | Option | Default | Description |
|---|---|---|---|
| Context | max_window_tokens |
128000 | Max tokens per session |
| Context | compression_algorithm |
zstd | Compression method |
| Context | ttl_seconds |
3600 | Session TTL |
| Security | rate_limit_requests |
100 | Requests per window |
| Security | rate_limit_window_seconds |
60 | Window duration |
| Database | pool_size |
20 | DB connection pool |
| Redis | max_connections |
50 | Redis connection pool |
# Application
APP_NAME=relay-context
ENVIRONMENT=production
LOG_LEVEL=INFO
# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/relay
# Redis
REDIS_URL=redis://localhost:6379/0
# Security
SECRET_KEY=your-secret-key-min-32-chars
JWT_EXPIRATION_MINUTES=60
# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...📚 For complete configuration options, see Complete Documentation
graph TD
Root[RELAY-2/] --> Src[src/relay_context/]
Root --> Tests[tests/]
Root --> Monitoring[monitoring/]
Root --> Infra[infrastructure/]
Root --> Docker[docker-compose.yml]
Src --> API[api/]
Src --> Core[core/]
Src --> Security[security/]
Src --> Config[config/]
Tests --> Unit[unit/]
Tests --> Integration[integration/]
Tests --> E2E[e2e/]
Tests --> Load[load/]
style Root fill:#e3f2fd
style Src fill:#f3e5f5
style Tests fill:#e8f5e9
style Core fill:#fff3e0
RELAY-2/
├── src/relay_context/
│ ├── api/ # REST API endpoints
│ ├── core/ # Context engine, token manager, compression
│ ├── security/ # Auth, rate limiting
│ ├── infrastructure/ # Database, Redis, telemetry
│ └── config/ # Settings & configuration
├── tests/
│ ├── unit/ # Unit tests
│ ├── integration/ # API tests
│ ├── e2e/ # End-to-end tests
│ └── load/ # Performance tests
├── monitoring/ # Prometheus, Grafana configs
├── infrastructure/ # OpenTelemetry config
├── docker-compose.yml
├── Dockerfile
├── Makefile
└── pyproject.toml
# Setup
make install
# Run linter
make lint
# Run tests
make test
# Format code
make formatflowchart LR
Fork[Fork Repo] --> Clone[Clone & Branch]
Clone --> Code[Make Changes]
Code --> Test[Test Locally]
Test --> Commit[Commit Changes]
Commit --> Push[Push to Fork]
Push --> PR[Create Pull Request]
PR --> Review[Code Review]
Review --> Merge[Merge to Main]
style Fork fill:#e3f2fd
style Code fill:#f3e5f5
style Test fill:#e8f5e9
style Merge fill:#c8e6c9
| Resource | Link | Description |
|---|---|---|
| 📖 Full Documentation | Complete Docs | In-depth technical reference |
| 🐛 Issue Tracker | GitHub Issues | Report bugs & request features |
| 💬 Discussions | GitHub Discussions | Ask questions & share ideas |
| 📊 Roadmap | Project Board | See what's coming next |
MIT License - see LICENSE for details.
Built with ❤️ for production-grade LLM context management. 🐱
