GitHub - ramsterr/RELAY-2: context handover with increased security and bounded knapsack, brain child of RELAY_context

RELAY Context Engine 🐱

Production-Grade LLM Context Management System

💡 The Problem: LLMs have limited context windows, require efficient token management, need proper authentication, and demand production-grade reliability.

✅ The Solution: RELAY Context Engine provides intelligent context window management, automatic compression, semantic deduplication, JWT authentication, and distributed rate limiting with full observability.

Core Capabilities

🔐 JWT Auth	📊 Token Management	🗜️ Compression
Role-based access control	Smart token budgeting	ZSTD/LZ4 compression
🚦 Rate Limiting	📈 Observability	🏗️ Production Ready
Distributed sliding window	OpenTelemetry + Prometheus	Docker + Kubernetes ready

Architecture Overview

System Flow

┌──────────────┐
│    Client    │
└──────┬───────┘
       │ HTTPS
       ▼
┌──────────────┐
│   FastAPI    │
└──────┬───────┘
       │
       ├─────────────────┬─────────────────┬──────────────────┐
       │                 │                 │                  │
       ▼                 ▼                 ▼                  ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐  ┌──────────────┐
│   JWT Auth   │ │   Context    │ │ Compression  │  │OpenTelemetry │
│              │ │   Engine     │ │   Engine     │  │              │
│ Role-Based   │ │              │ │              │  │ Metrics &    │
│ Access Ctrl  │ │ Token Mgmt   │ │ ZSTD/LZ4     │  │ Tracing      │
└──────────────┘ └──────┬───────┘ └──────────────┘  └──────────────┘
                        │
                        ├──────────────┬──────────────┐
                        │              │              │
                        ▼              ▼              ▼
                 ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
                 │    Redis     │ │ PostgreSQL   │ │ Vector DB    │
                 │   (Cache)    │ │  (Sessions)  │ │(Embeddings)  │
                 └──────────────┘ └──────────────┘ └──────────────┘

API Layer Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              CLIENTS                                    │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              │
│  │   Web App    │    │  Mobile App  │    │   CLI Tool   │              │
│  └───────┬──────┘    └───────┬──────┘    └───────┬──────┘              │
│          │                   │                   │                      │
│          └───────────────────┼───────────────────┘                      │
│                              │ HTTPS                                    │
└──────────────────────────────┼──────────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                           API GATEWAY                                   │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    Load Balancer                                │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
│                               ▼                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                  Authentication (JWT)                           │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
│                               ▼                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                   Rate Limiter                                  │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
└───────────────────────────────┼─────────────────────────────────────────┘
                                ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         REST ENDPOINTS                                  │
│  ┌────────────┐ ┌──────────────┐ ┌───────────┐ ┌────────────┐ ┌──────┐│
│  │POST        │ │GET           │ │GET        │ │DELETE      │ │GET   ││
│  │/messages   │ │/context/{id} │ │/stats     │ │/context    │ │stream││
│  └─────┬──────┘ └──────┬───────┘ └─────┬─────┘ └─────┬──────┘ └──┬───┘│
│        │               │               │             │            │    │
│        └───────────────┴───────┬───────┴─────────────┴────────────┘    │
└────────────────────────────────┼────────────────────────────────────────┘
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         CORE SERVICES                                   │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    Context Engine                               │   │
│  │         (Compression • Summarization • Drift Detection)         │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
│                               ├──────────────────┐                     │
│                               ▼                  ▼                     │
│  ┌────────────────────────────────────┐ ┌──────────────────────────┐  │
│  │        Token Manager               │ │     Budget Optimizer     │  │
│  │   (Budget Tracking • Allocation)   │ │  (Knapsack Algorithm)    │  │
│  └────────────────────────────────────┘ └──────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘

Data Processing Pipeline

┌────────┐         ┌─────────────┐         ┌───────────────┐
│ Client │         │ API Gateway │         │ Context Engine│
└───┬────┘         └──────┬──────┘         └───────┬───────┘
    │                     │                        │
    │  1. POST /messages  │                        │
    │────────────────────>│                        │
    │                     │  2. Process Message    │
    │                     │───────────────────────>│
    │                     │                        │
    │                     │                        │  3. Count Tokens
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │                        │<─────────────┤
    │                     │                        │  Token Count │
    │                     │                        │
    │                     │                        │  4. Optimize Selection
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │                        │<─────────────┤
    │                     │                        │ Selected Atoms
    │                     │                        │
    │                     │                        │  5. Cache Context
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │                        │  6. Persist Session
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │  7. Response          │
    │                     │<───────────────────────│
    │  8. 200 OK +        │                        │
    │     Context ID      │                        │
    │<────────────────────│                        │
    │                     │                        │

Pipeline Steps:

Client Request → Sends message with metadata
API Gateway → Validates auth, checks rate limits
Token Counting → Calculates token cost of incoming data
Budget Optimization → Runs knapsack algorithm for optimal atom selection
Context Caching → Stores compressed context in Redis
Session Persistence → Saves session state to PostgreSQL
Response Generation → Returns context ID and summary
Client Receipt → Receives confirmation for future retrieval


---

## Quick Start

### 1. Installation

```bash
# Clone the repository
git clone https://github.com/ramsterr/RELAY-2.git
cd RELAY-2

# Install dependencies
pip install -e ".[dev]"

# Or use Make
make install

2. Configuration

Create a .env file:

# Required: At least one LLM provider
OPENAI_API_KEY=sk-...
# or
ANTHROPIC_API_KEY=sk-ant-...

# Database (optional - defaults shown)
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/relay

# Redis (optional - defaults shown)
REDIS_URL=redis://localhost:6379/0

# Security (optional - auto-generated)
SECRET_KEY=your-secret-key

3. Run the Server

# Development
uvicorn relay_context.main:app --reload

# Or use Make
make run

# Docker
docker-compose up -d

4. Hello World

from relay_context.main import create_app
from relay_context.security.auth import AuthService, UserRole
from relay_context.config.settings import Settings, Environment
from pydantic import SecretStr

# Create app
settings = Settings(
    environment=Environment.TESTING,
    security__secret_key=SecretStr("test-key"),
    llm__openai_api_key=SecretStr("sk-test"),
)
app = create_app(settings)

# Get auth token
auth = AuthService(settings)
token = auth.create_access_token("user-1", UserRole.USER)

# Use the API
# POST /api/v1/context/messages
# {
#   "session_id": "session-001",
#   "role": "user",
#   "content": "Hello, help me write a function"
# }
# Headers: Authorization: Bearer <token>

API Reference

Endpoints

Method	Endpoint	Description	Auth
POST	`/api/v1/context/messages`	Add message to context	✅
GET	`/api/v1/context/{session_id}`	Get context window	✅
GET	`/api/v1/context/{session_id}/stats`	Get window statistics	✅
DELETE	`/api/v1/context/{session_id}`	Clear context window	✅
GET	`/api/v1/context/{session_id}/stream`	Stream messages (SSE)	✅
GET	`/health`	Health check	❌
GET	`/metrics`	Prometheus metrics	❌

Authentication

# JWT Bearer Token
curl -H "Authorization: Bearer <token>" ...

# API Key
curl -H "X-API-Key: relay_<key>" ...

Roles & Permissions

Role	Permissions
`ADMIN`	Full access to all operations
`USER`	Read/Write context, limited sessions
`VIEWER`	Read-only access

Core Components

Context Engine

Manages context windows with intelligent token budgeting:

from relay_context.core.context_engine import ContextEngine, MessageRole
from relay_context.core.token_manager import TokenManager
from relay_context.core.compression import CompressionEngine

token_manager = TokenManager("gpt-4")
compression = CompressionEngine(algorithm="zstd", level=3)
engine = ContextEngine(settings=settings, token_manager=token_manager, compression=compression)

# Add messages
await engine.add_message("session-001", MessageRole.USER, "Hello!")
context = await engine.get_context("session-001")

Token Manager

Smart token counting and budget management:

from relay_context.core.token_manager import TokenManager

manager = TokenManager("gpt-4")
count = await manager.count_tokens("Hello world")  # 2 tokens

# Truncate to budget
truncated = await manager.truncate_to_budget(long_text, max_tokens=1000)

Compression Engine

Reduce context size with ZSTD/LZ4:

from relay_context.core.compression import CompressionEngine

engine = CompressionEngine(algorithm="zstd", level=3)
compressed = await engine.compress_text("long context...")
restored = await engine.decompress_text(compressed)

Authentication

JWT-based auth with RBAC:

from relay_context.security.auth import AuthService, UserRole

auth = AuthService(settings)
token = auth.create_access_token("user-id", UserRole.ADMIN)

# Verify token
payload = auth.verify_token(token)

Rate Limiting

Distributed sliding window rate limiter:

from relay_context.security.rate_limiter import SlidingWindowRateLimiter

limiter = SlidingWindowRateLimiter(redis_client, requests=100, window_seconds=60)
allowed, info = await limiter.is_allowed("user-1")
# info = {"remaining": 99, "reset_at": ...}

Production Features

Feature	Description	Status
🔐 JWT Auth	Token-based authentication with RBAC	✅
🚦 Rate Limiting	Distributed sliding window limiter	✅
🗜️ Compression	ZSTD/LZ4 context compression	✅
📊 Metrics	Prometheus + OpenTelemetry	✅
🐳 Docker	Containerized deployment	✅
🔄 Async	Full async/await architecture	✅
🛡️ Security	CORS, headers, validation	✅

Monitoring & Observability

Metrics Dashboard

graph LR
    subgraph Collection["Metrics Collection"]
        App[Application]
        Sys[System]
        DB[Database]
    end
    
    subgraph Storage["Storage"]
        Prometheus[Prometheus TSDB]
    end
    
    subgraph Visualization["Visualization"]
        Grafana[Grafana]
        Alerts[Alert Manager]
    end
    
    App --> Prometheus
    Sys --> Prometheus
    DB --> Prometheus
    Prometheus --> Grafana
    Prometheus --> Alerts
    
    style Collection fill:#e3f2fd
    style Storage fill:#e8f5e9
    style Visualization fill:#fff3e0

Key Metrics

Metric	Type	Alert Threshold	Description
`relay_context_tokens_used`	Gauge	> 90%	Token utilization
`relay_context_drift_score`	Gauge	> 0.5	Context drift
`relay_http_request_duration`	Histogram	p99 > 500ms	Latency
`relay_cache_hit_rate`	Gauge	< 0.7	Cache efficiency
`relay_error_rate`	Counter	> 1%	Error rate

Health Check

curl http://localhost:8080/health

# Response:
{
  "status": "healthy",
  "version": "2.0.0",
  "environment": "development",
  "checks": {
    "database": {"status": "healthy", "latency_ms": 2},
    "redis": {"status": "healthy", "latency_ms": 1}
  },
  "uptime_seconds": 1234.56
}

Real-time Dashboard

+---------------------------------------------------------------+
|  RELAY CONTEXT ENGINE                          [Health: ✅]  |
+---------------------------------------------------------------+
|  [TOKEN METRICS]     |  [RATE LIMITS]       |  [SESSIONS]     |
|                      |                       |                  |
|   Used: 45.2k/128k   |   Remaining: 87/100   |   Active: 12     |
|   [████████░░] 35%   |   [███████░░] 87%     |   Total: 156     |
+---------------------+-----------------------+-----------------+
|  [COMPRESSION]                     |  [AUTH]               |
|                                   |                       |
|   Ratio: 2.3x                     |   JWT: ✅ Active      |
|   Saved: 23.4k tokens             |   API Keys: 5         |
+-----------------------------------+-----------------------+

Docker & Deployment

Quick Start with Docker

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f relay

# Stop
docker-compose down

Services

Service	Port	Description
API	8080	Main FastAPI application
PostgreSQL	5432	Database
Redis	6379	Cache & rate limiting
Prometheus	9090	Metrics collection
Grafana	3000	Dashboards

Testing

# Run all tests
pytest

# With coverage
pytest --cov=src/relay_context --cov-report=term-missing

# Run specific test categories
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/
pytest tests/load/

Test Results

============ 151 passed, 15 skipped, 22 warnings in 5.95s =============
Coverage: 81%

Configuration Reference

Quick Settings

Category	Option	Default	Description
Context	`max_window_tokens`	128000	Max tokens per session
Context	`compression_algorithm`	zstd	Compression method
Context	`ttl_seconds`	3600	Session TTL
Security	`rate_limit_requests`	100	Requests per window
Security	`rate_limit_window_seconds`	60	Window duration
Database	`pool_size`	20	DB connection pool
Redis	`max_connections`	50	Redis connection pool

Environment Variables

# Application
APP_NAME=relay-context
ENVIRONMENT=production
LOG_LEVEL=INFO

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/relay

# Redis
REDIS_URL=redis://localhost:6379/0

# Security
SECRET_KEY=your-secret-key-min-32-chars
JWT_EXPIRATION_MINUTES=60

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

📚 For complete configuration options, see Complete Documentation

Project Structure

graph TD
    Root[RELAY-2/] --> Src[src/relay_context/]
    Root --> Tests[tests/]
    Root --> Monitoring[monitoring/]
    Root --> Infra[infrastructure/]
    Root --> Docker[docker-compose.yml]
    
    Src --> API[api/]
    Src --> Core[core/]
    Src --> Security[security/]
    Src --> Config[config/]
    
    Tests --> Unit[unit/]
    Tests --> Integration[integration/]
    Tests --> E2E[e2e/]
    Tests --> Load[load/]
    
    style Root fill:#e3f2fd
    style Src fill:#f3e5f5
    style Tests fill:#e8f5e9
    style Core fill:#fff3e0

RELAY-2/
├── src/relay_context/
│   ├── api/           # REST API endpoints
│   ├── core/          # Context engine, token manager, compression
│   ├── security/      # Auth, rate limiting
│   ├── infrastructure/ # Database, Redis, telemetry
│   └── config/        # Settings & configuration
├── tests/
│   ├── unit/          # Unit tests
│   ├── integration/   # API tests
│   ├── e2e/           # End-to-end tests
│   └── load/          # Performance tests
├── monitoring/        # Prometheus, Grafana configs
├── infrastructure/   # OpenTelemetry config
├── docker-compose.yml
├── Dockerfile
├── Makefile
└── pyproject.toml

Contributing

# Setup
make install

# Run linter
make lint

# Run tests
make test

# Format code
make format

Contribution Workflow

flowchart LR
    Fork[Fork Repo] --> Clone[Clone & Branch]
    Clone --> Code[Make Changes]
    Code --> Test[Test Locally]
    Test --> Commit[Commit Changes]
    Commit --> Push[Push to Fork]
    Push --> PR[Create Pull Request]
    PR --> Review[Code Review]
    Review --> Merge[Merge to Main]
    
    style Fork fill:#e3f2fd
    style Code fill:#f3e5f5
    style Test fill:#e8f5e9
    style Merge fill:#c8e6c9

Support & Resources

Resource	Link	Description
📖 Full Documentation	Complete Docs	In-depth technical reference
🐛 Issue Tracker	GitHub Issues	Report bugs & request features
💬 Discussions	GitHub Discussions	Ask questions & share ideas
📊 Roadmap	Project Board	See what's coming next

License

MIT License - see LICENSE for details.

Built with ❤️ for production-grade LLM context management. 🐱

📖 Read Complete Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
docs		docs
examples		examples
infrastructure/otel		infrastructure/otel
monitoring		monitoring
src/relay_context		src/relay_context
tests		tests
.gitignore		.gitignore
CHANGES_PHASE1.md		CHANGES_PHASE1.md
CHANGES_PHASE2.md		CHANGES_PHASE2.md
Dockerfile		Dockerfile
IMPROVEMENTS_SUMMARY.md		IMPROVEMENTS_SUMMARY.md
IMPROVEMENT_PLAN.md		IMPROVEMENT_PLAN.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VISUALIZATION_GUIDE.md		VISUALIZATION_GUIDE.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Core Capabilities

Architecture Overview

System Flow

API Layer Architecture

Data Processing Pipeline

2. Configuration

3. Run the Server

4. Hello World

API Reference

Endpoints

Authentication

Roles & Permissions

Core Components

Context Engine

Token Manager

Compression Engine

Authentication

Rate Limiting

Production Features

Monitoring & Observability

Metrics Dashboard

Key Metrics

Health Check

Real-time Dashboard

Docker & Deployment

Quick Start with Docker

Services

Testing

Test Results

Configuration Reference

Quick Settings

Environment Variables

Project Structure

Contributing

Contribution Workflow

Support & Resources

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages