Skip to content

ramsterr/RELAY-2

Repository files navigation

Banner

RELAY Context Engine 🐱

Production-Grade LLM Context Management System

Python Version FastAPI License Status Coverage Tests

📖 Read Full Documentation


💡 The Problem: LLMs have limited context windows, require efficient token management, need proper authentication, and demand production-grade reliability.

✅ The Solution: RELAY Context Engine provides intelligent context window management, automatic compression, semantic deduplication, JWT authentication, and distributed rate limiting with full observability.


Core Capabilities

🔐 JWT Auth 📊 Token Management 🗜️ Compression
Role-based access control Smart token budgeting ZSTD/LZ4 compression
🚦 Rate Limiting 📈 Observability 🏗️ Production Ready
Distributed sliding window OpenTelemetry + Prometheus Docker + Kubernetes ready

Architecture Overview

System Flow

┌──────────────┐
│    Client    │
└──────┬───────┘
       │ HTTPS
       ▼
┌──────────────┐
│   FastAPI    │
└──────┬───────┘
       │
       ├─────────────────┬─────────────────┬──────────────────┐
       │                 │                 │                  │
       ▼                 ▼                 ▼                  ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐  ┌──────────────┐
│   JWT Auth   │ │   Context    │ │ Compression  │  │OpenTelemetry │
│              │ │   Engine     │ │   Engine     │  │              │
│ Role-Based   │ │              │ │              │  │ Metrics &    │
│ Access Ctrl  │ │ Token Mgmt   │ │ ZSTD/LZ4     │  │ Tracing      │
└──────────────┘ └──────┬───────┘ └──────────────┘  └──────────────┘
                        │
                        ├──────────────┬──────────────┐
                        │              │              │
                        ▼              ▼              ▼
                 ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
                 │    Redis     │ │ PostgreSQL   │ │ Vector DB    │
                 │   (Cache)    │ │  (Sessions)  │ │(Embeddings)  │
                 └──────────────┘ └──────────────┘ └──────────────┘

API Layer Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              CLIENTS                                    │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              │
│  │   Web App    │    │  Mobile App  │    │   CLI Tool   │              │
│  └───────┬──────┘    └───────┬──────┘    └───────┬──────┘              │
│          │                   │                   │                      │
│          └───────────────────┼───────────────────┘                      │
│                              │ HTTPS                                    │
└──────────────────────────────┼──────────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                           API GATEWAY                                   │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    Load Balancer                                │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
│                               ▼                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                  Authentication (JWT)                           │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
│                               ▼                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                   Rate Limiter                                  │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
└───────────────────────────────┼─────────────────────────────────────────┘
                                ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         REST ENDPOINTS                                  │
│  ┌────────────┐ ┌──────────────┐ ┌───────────┐ ┌────────────┐ ┌──────┐│
│  │POST        │ │GET           │ │GET        │ │DELETE      │ │GET   ││
│  │/messages   │ │/context/{id} │ │/stats     │ │/context    │ │stream││
│  └─────┬──────┘ └──────┬───────┘ └─────┬─────┘ └─────┬──────┘ └──┬───┘│
│        │               │               │             │            │    │
│        └───────────────┴───────┬───────┴─────────────┴────────────┘    │
└────────────────────────────────┼────────────────────────────────────────┘
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         CORE SERVICES                                   │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    Context Engine                               │   │
│  │         (Compression • Summarization • Drift Detection)         │   │
│  └────────────────────────────┬────────────────────────────────────┘   │
│                               ├──────────────────┐                     │
│                               ▼                  ▼                     │
│  ┌────────────────────────────────────┐ ┌──────────────────────────┐  │
│  │        Token Manager               │ │     Budget Optimizer     │  │
│  │   (Budget Tracking • Allocation)   │ │  (Knapsack Algorithm)    │  │
│  └────────────────────────────────────┘ └──────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘

Data Processing Pipeline

┌────────┐         ┌─────────────┐         ┌───────────────┐
│ Client │         │ API Gateway │         │ Context Engine│
└───┬────┘         └──────┬──────┘         └───────┬───────┘
    │                     │                        │
    │  1. POST /messages  │                        │
    │────────────────────>│                        │
    │                     │  2. Process Message    │
    │                     │───────────────────────>│
    │                     │                        │
    │                     │                        │  3. Count Tokens
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │                        │<─────────────┤
    │                     │                        │  Token Count │
    │                     │                        │
    │                     │                        │  4. Optimize Selection
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │                        │<─────────────┤
    │                     │                        │ Selected Atoms
    │                     │                        │
    │                     │                        │  5. Cache Context
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │                        │  6. Persist Session
    │                     │                        │──────────────┐
    │                     │                        │              │
    │                     │  7. Response          │
    │                     │<───────────────────────│
    │  8. 200 OK +        │                        │
    │     Context ID      │                        │
    │<────────────────────│                        │
    │                     │                        │

Pipeline Steps:

  1. Client Request → Sends message with metadata
  2. API Gateway → Validates auth, checks rate limits
  3. Token Counting → Calculates token cost of incoming data
  4. Budget Optimization → Runs knapsack algorithm for optimal atom selection
  5. Context Caching → Stores compressed context in Redis
  6. Session Persistence → Saves session state to PostgreSQL
  7. Response Generation → Returns context ID and summary
  8. Client Receipt → Receives confirmation for future retrieval

---

## Quick Start

### 1. Installation

```bash
# Clone the repository
git clone https://github.com/ramsterr/RELAY-2.git
cd RELAY-2

# Install dependencies
pip install -e ".[dev]"

# Or use Make
make install

2. Configuration

Create a .env file:

# Required: At least one LLM provider
OPENAI_API_KEY=sk-...
# or
ANTHROPIC_API_KEY=sk-ant-...

# Database (optional - defaults shown)
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/relay

# Redis (optional - defaults shown)
REDIS_URL=redis://localhost:6379/0

# Security (optional - auto-generated)
SECRET_KEY=your-secret-key

3. Run the Server

# Development
uvicorn relay_context.main:app --reload

# Or use Make
make run

# Docker
docker-compose up -d

4. Hello World

from relay_context.main import create_app
from relay_context.security.auth import AuthService, UserRole
from relay_context.config.settings import Settings, Environment
from pydantic import SecretStr

# Create app
settings = Settings(
    environment=Environment.TESTING,
    security__secret_key=SecretStr("test-key"),
    llm__openai_api_key=SecretStr("sk-test"),
)
app = create_app(settings)

# Get auth token
auth = AuthService(settings)
token = auth.create_access_token("user-1", UserRole.USER)

# Use the API
# POST /api/v1/context/messages
# {
#   "session_id": "session-001",
#   "role": "user",
#   "content": "Hello, help me write a function"
# }
# Headers: Authorization: Bearer <token>

API Reference

Endpoints

Method Endpoint Description Auth
POST /api/v1/context/messages Add message to context
GET /api/v1/context/{session_id} Get context window
GET /api/v1/context/{session_id}/stats Get window statistics
DELETE /api/v1/context/{session_id} Clear context window
GET /api/v1/context/{session_id}/stream Stream messages (SSE)
GET /health Health check
GET /metrics Prometheus metrics

Authentication

# JWT Bearer Token
curl -H "Authorization: Bearer <token>" ...

# API Key
curl -H "X-API-Key: relay_<key>" ...

Roles & Permissions

Role Permissions
ADMIN Full access to all operations
USER Read/Write context, limited sessions
VIEWER Read-only access

Core Components

Context Engine

Manages context windows with intelligent token budgeting:

from relay_context.core.context_engine import ContextEngine, MessageRole
from relay_context.core.token_manager import TokenManager
from relay_context.core.compression import CompressionEngine

token_manager = TokenManager("gpt-4")
compression = CompressionEngine(algorithm="zstd", level=3)
engine = ContextEngine(settings=settings, token_manager=token_manager, compression=compression)

# Add messages
await engine.add_message("session-001", MessageRole.USER, "Hello!")
context = await engine.get_context("session-001")

Token Manager

Smart token counting and budget management:

from relay_context.core.token_manager import TokenManager

manager = TokenManager("gpt-4")
count = await manager.count_tokens("Hello world")  # 2 tokens

# Truncate to budget
truncated = await manager.truncate_to_budget(long_text, max_tokens=1000)

Compression Engine

Reduce context size with ZSTD/LZ4:

from relay_context.core.compression import CompressionEngine

engine = CompressionEngine(algorithm="zstd", level=3)
compressed = await engine.compress_text("long context...")
restored = await engine.decompress_text(compressed)

Authentication

JWT-based auth with RBAC:

from relay_context.security.auth import AuthService, UserRole

auth = AuthService(settings)
token = auth.create_access_token("user-id", UserRole.ADMIN)

# Verify token
payload = auth.verify_token(token)

Rate Limiting

Distributed sliding window rate limiter:

from relay_context.security.rate_limiter import SlidingWindowRateLimiter

limiter = SlidingWindowRateLimiter(redis_client, requests=100, window_seconds=60)
allowed, info = await limiter.is_allowed("user-1")
# info = {"remaining": 99, "reset_at": ...}

Production Features

Feature Description Status
🔐 JWT Auth Token-based authentication with RBAC
🚦 Rate Limiting Distributed sliding window limiter
🗜️ Compression ZSTD/LZ4 context compression
📊 Metrics Prometheus + OpenTelemetry
🐳 Docker Containerized deployment
🔄 Async Full async/await architecture
🛡️ Security CORS, headers, validation

Monitoring & Observability

Metrics Dashboard

graph LR
    subgraph Collection["Metrics Collection"]
        App[Application]
        Sys[System]
        DB[Database]
    end
    
    subgraph Storage["Storage"]
        Prometheus[Prometheus TSDB]
    end
    
    subgraph Visualization["Visualization"]
        Grafana[Grafana]
        Alerts[Alert Manager]
    end
    
    App --> Prometheus
    Sys --> Prometheus
    DB --> Prometheus
    Prometheus --> Grafana
    Prometheus --> Alerts
    
    style Collection fill:#e3f2fd
    style Storage fill:#e8f5e9
    style Visualization fill:#fff3e0
Loading

Key Metrics

Metric Type Alert Threshold Description
relay_context_tokens_used Gauge > 90% Token utilization
relay_context_drift_score Gauge > 0.5 Context drift
relay_http_request_duration Histogram p99 > 500ms Latency
relay_cache_hit_rate Gauge < 0.7 Cache efficiency
relay_error_rate Counter > 1% Error rate

Health Check

curl http://localhost:8080/health

# Response:
{
  "status": "healthy",
  "version": "2.0.0",
  "environment": "development",
  "checks": {
    "database": {"status": "healthy", "latency_ms": 2},
    "redis": {"status": "healthy", "latency_ms": 1}
  },
  "uptime_seconds": 1234.56
}

Real-time Dashboard

+---------------------------------------------------------------+
|  RELAY CONTEXT ENGINE                          [Health: ✅]  |
+---------------------------------------------------------------+
|  [TOKEN METRICS]     |  [RATE LIMITS]       |  [SESSIONS]     |
|                      |                       |                  |
|   Used: 45.2k/128k   |   Remaining: 87/100   |   Active: 12     |
|   [████████░░] 35%   |   [███████░░] 87%     |   Total: 156     |
+---------------------+-----------------------+-----------------+
|  [COMPRESSION]                     |  [AUTH]               |
|                                   |                       |
|   Ratio: 2.3x                     |   JWT: ✅ Active      |
|   Saved: 23.4k tokens             |   API Keys: 5         |
+-----------------------------------+-----------------------+

Docker & Deployment

Quick Start with Docker

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f relay

# Stop
docker-compose down

Services

Service Port Description
API 8080 Main FastAPI application
PostgreSQL 5432 Database
Redis 6379 Cache & rate limiting
Prometheus 9090 Metrics collection
Grafana 3000 Dashboards

Testing

# Run all tests
pytest

# With coverage
pytest --cov=src/relay_context --cov-report=term-missing

# Run specific test categories
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/
pytest tests/load/

Test Results

============ 151 passed, 15 skipped, 22 warnings in 5.95s =============
Coverage: 81%

Configuration Reference

Quick Settings

Category Option Default Description
Context max_window_tokens 128000 Max tokens per session
Context compression_algorithm zstd Compression method
Context ttl_seconds 3600 Session TTL
Security rate_limit_requests 100 Requests per window
Security rate_limit_window_seconds 60 Window duration
Database pool_size 20 DB connection pool
Redis max_connections 50 Redis connection pool

Environment Variables

# Application
APP_NAME=relay-context
ENVIRONMENT=production
LOG_LEVEL=INFO

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/relay

# Redis
REDIS_URL=redis://localhost:6379/0

# Security
SECRET_KEY=your-secret-key-min-32-chars
JWT_EXPIRATION_MINUTES=60

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

📚 For complete configuration options, see Complete Documentation


Project Structure

graph TD
    Root[RELAY-2/] --> Src[src/relay_context/]
    Root --> Tests[tests/]
    Root --> Monitoring[monitoring/]
    Root --> Infra[infrastructure/]
    Root --> Docker[docker-compose.yml]
    
    Src --> API[api/]
    Src --> Core[core/]
    Src --> Security[security/]
    Src --> Config[config/]
    
    Tests --> Unit[unit/]
    Tests --> Integration[integration/]
    Tests --> E2E[e2e/]
    Tests --> Load[load/]
    
    style Root fill:#e3f2fd
    style Src fill:#f3e5f5
    style Tests fill:#e8f5e9
    style Core fill:#fff3e0
Loading
RELAY-2/
├── src/relay_context/
│   ├── api/           # REST API endpoints
│   ├── core/          # Context engine, token manager, compression
│   ├── security/      # Auth, rate limiting
│   ├── infrastructure/ # Database, Redis, telemetry
│   └── config/        # Settings & configuration
├── tests/
│   ├── unit/          # Unit tests
│   ├── integration/   # API tests
│   ├── e2e/           # End-to-end tests
│   └── load/          # Performance tests
├── monitoring/        # Prometheus, Grafana configs
├── infrastructure/   # OpenTelemetry config
├── docker-compose.yml
├── Dockerfile
├── Makefile
└── pyproject.toml

Contributing

# Setup
make install

# Run linter
make lint

# Run tests
make test

# Format code
make format

Contribution Workflow

flowchart LR
    Fork[Fork Repo] --> Clone[Clone & Branch]
    Clone --> Code[Make Changes]
    Code --> Test[Test Locally]
    Test --> Commit[Commit Changes]
    Commit --> Push[Push to Fork]
    Push --> PR[Create Pull Request]
    PR --> Review[Code Review]
    Review --> Merge[Merge to Main]
    
    style Fork fill:#e3f2fd
    style Code fill:#f3e5f5
    style Test fill:#e8f5e9
    style Merge fill:#c8e6c9
Loading

Support & Resources

Resource Link Description
📖 Full Documentation Complete Docs In-depth technical reference
🐛 Issue Tracker GitHub Issues Report bugs & request features
💬 Discussions GitHub Discussions Ask questions & share ideas
📊 Roadmap Project Board See what's coming next

License

MIT License - see LICENSE for details.


Built with ❤️ for production-grade LLM context management. 🐱

📖 Read Complete Documentation

About

context handover with increased security and bounded knapsack, brain child of RELAY_context

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors