Skip to content

riipandi/radium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

106 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Radium LLM Proxy Gateway

GitHub Release MSRV Dependencies License Apache-2.0 Contribution

Radium is an open-source, unified HTTP API gateway for accessing multiple AI model providers. Built in Rust as a resource-efficient, high-performance open-source LLM proxy, Radium acts as a central orchestrator, handling every request and response with precision. It delivers uniform endpoints for text and chat completions, intelligent fallback logic, and complete observability, transforming client-to-LLM interactions into a seamless, elegantly managed experience.

Key Features

  • πŸš€ High Performance: Leverages Rust's speed and memory safety for low-latency, high-throughput proxying.
  • πŸ”Œ Multi-Provider Support: Seamlessly connects to OpenAI, Anthropic, AWS Bedrock, Cohere, etc.
  • ⚑ Flexible Integration: Minimal configuration required for various LLM backends.
  • πŸ“Š Built-in Monitoring: Prometheus metrics and comprehensive observability.
  • πŸ› οΈ Developer-Friendly: Simple setup, clear documentation, and extensible design.
  • πŸ”„ Fallback Support: Automatic failover between providers for reliability.
  • 🌐 CORS Support: Configurable Cross-Origin Resource Sharing.
  • πŸ“ Structured Logging: Configurable logging with rotation and timestamps.
  • 🐳 Docker Ready: Container support with multi-platform builds.
  • πŸ“ˆ Scalable Architecture: Connection pooling and request timeout handling.
  • πŸ“ Open Source: Licensed under Apache 2.0.

Supported Providers

Provider Key Configuration Section Status
OpenAI openai [openai] βœ…
Anthropic anthropic [anthropic] ⏳
Azure OpenAI azure [azure_openai] ⏳
AWS Bedrock bedrock [bedrock] ⏳
Cohere cohere [cohere] ⏳
Google Vertex AI vertex [vertext] ⏳

Getting Started

Prerequisites

  • Rust: Ensure you have Rust installed (version 1.93 or later). Install via rustup
  • Git: Required to clone the repository
  • API Keys: Valid API keys for your chosen LLM providers
  • Optional: Docker for containerized deployment

Installation

  1. Clone the repository:
git clone https://github.com/riipandi/radium.git && cd radium
  1. Build the project:
# Using cargo directly
cargo build --release

# Or using just (recommended)
just build
  1. Set up configuration:
# Copy example configuration
cp config.toml.example config.toml

# Edit with your API keys and settings
nano config.toml

Configuration

Create your config.toml file based on the config.toml.example

Running the Server

# Using cargo
cargo run -- serve

# Using just (with auto-reload for development)
just dev

# Using built binary
./target/release/radium serve

# With custom config path
./target/release/radium serve -config /path/to/config.toml

API Endpoints

Radium provides OpenAI-compatible API endpoints:

  • POST /v1/chat/completions - Chat completions with conversation context
  • POST /v1/text/completions - Simple text completions
  • GET /metrics - Prometheus metrics for monitoring

Base URL

http://localhost:8000

Performance Benchmarks

Here are example benchmarks using bombardier HTTP benchmarking tool:

Health Check Endpoint Performance

Test Configuration:

  • Concurrent connections: 125
  • Number of requests: 100,000
  • Target endpoint: GET /healthz
  • Environment: local development server
bombardier -c 125 -n 100000 -m GET http://localhost:8000/healthz

Example Results:

Statistics        Avg      Stdev        Max
  Reqs/sec     35160.01    8333.65   46008.85
  Latency        3.55ms     2.04ms    44.74ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    16.64MB/s

Load Testing Different Scenarios

# Light load - 50 concurrent connections, 1000 requests
bombardier -n 1000 -c 50 http://localhost:8000/healthz

# Medium load - 100 concurrent connections, 2500 requests with 10s duration
bombardier -d 10s -n 2500 -c 100 http://localhost:8000/healthz

# Heavy load - 500 concurrent connections, 10000 requests
bombardier -d 10s -n 10000 -c 500 http://localhost:8000/healthz

# Sustained load test - 30 seconds duration
bombardier -c 100 -d 30s http://localhost:8000/healthz

API Endpoint Benchmarks

For testing actual LLM proxy endpoints:

# Test chat completions endpoint (requires valid API key)
bombardier -n 100 -c 10 -m POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -b '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}' \
  http://localhost:8000/v1/chat/completions

Performance Notes:

  • Actual LLM endpoints performance depends on upstream provider latency
  • Connection pooling and keep-alive significantly improve throughput
  • Memory usage remains stable under high concurrent load

Documentation

For detailed documentation, see:

Monitoring

Radium provides comprehensive monitoring through Prometheus metrics at /metrics endpoint, including:

  • Request counts by provider, model, and status
  • Request latency histograms
  • Token usage statistics
  • Error rates and types
  • Connection pool statistics

Docker Support

Radium includes full Docker support with multi-platform builds:

# Build Docker image
just docker-build

# Run with Docker
just docker-run serve

# Using Docker Compose
just compose-up

Contributing

We welcome contributions to make Radium even better!

  • Read our Contributing Guidelines for detailed guidelines
  • Fork the repository and create a feature branch
  • Submit a pull request with a clear title and description
  • Join the discussion on GitHub Issues

Join the flow. Amplify your your AI-powered applications with Radium! πŸš€

License

Radium is licensed under the Apache License 2.0. See the LICENSE file for more information.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you shall be licensed under the Apache License 2.0, without any additional terms or conditions.

Copyrights in this project are retained by their contributors.


🀫 Psst! If you like my work you can support me via GitHub sponsors.

Made by

About

[WIP] Radium is an open-source LLM proxy gateway built for resource efficiency and high performance.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors