Release v1.0.0
Initial public release of Adaptive Rate Limiter.
Added
Core Features
- Provider-Agnostic Architecture: Works with any OpenAI-compatible API (OpenAI, Anthropic, Venice, Groq, Together, etc.)
- Adaptive Rate Limiting: Intelligent rate limit discovery from response headers
- Streaming Support: Refund-based token accounting for streaming responses
- Multi-Tenant Isolation: Namespace-based isolation for multi-tenant applications
Scheduling Modes
- Basic Mode: Simple direct execution with retry logic for low-volume use cases
- Intelligent Mode: Advanced queuing with bucket-based scheduling and rate limit discovery
- Account Mode: Account-level request management for multi-tenant applications
Backends
- MemoryBackend: In-memory state storage for single-instance deployments
- RedisBackend: Distributed state storage with Lua scripts for atomic operations
distributed_check_and_reserve.lua: Atomic capacity reservationdistributed_recover_orphan.lua: Orphaned reservation recoverydistributed_release_capacity.lua: Capacity release operationsdistributed_release_streaming.lua: Streaming response cleanupdistributed_update_rate_limits.lua: Rate limit state updatesdistributed_update_rate_limits_429.lua: 429 response handling
Protocols & Interfaces
ClientProtocol: Define how clients connect to APIsProviderInterface: Extensible provider system for rate limit parsingClassifierProtocol: Request classification for routingStreamingResponseProtocol: Streaming response handling
State Management
StateManager: Centralized state management with configurable cache policiesCachePolicy.WRITE_THROUGH: Immediate persistence for production safetyCachePolicy.WRITE_BACK: Deferred writes for performance optimizationCachePolicy.WRITE_AROUND: Direct backend writes for read-heavy workloads- Bulk operations support for efficient state updates
Reservation System
ReservationTracker: Token capacity reservation and trackingReservationContext: Context manager for automatic reservation cleanup- Heap-based cleanup for expired reservations
- Orphan recovery mechanisms
Streaming Support
StreamingInFlightTracker: Track streaming response lifecycleStreamingReservationContext: Context manager for streaming operationsStreamingIterator: Async iterator wrapper with token accountingStreamingInFlightEntry: Entry tracking for in-flight streaming requests- Automatic token refunds on stream completion
Observability
UnifiedMetricsCollector: Main collector for all rate limiter metrics- 30+ named metric constants available for instrumentation
- Both dict and Prometheus output formats supported
- Built-in Prometheus metrics via optional
prometheus-client - Request latency histograms
- Queue depth gauges
- Rate limit state metrics
Exception Hierarchy
RateLimiterError: Base exception for all rate limiter errorsCapacityExceededError: Rate limit capacity exceeded with retry-afterBucketNotFoundError: Unknown bucket identifierReservationCapacityError: Reservation tracker at capacityBackendConnectionError: Backend connection failuresBackendOperationError: Backend operation failuresConfigurationError: Invalid configurationQueueOverflowError: Request queue overflow with backpressureTooManyFailedRequestsError: Circuit breaker for failure rate protection
Type System
DiscoveredBucket: Bucket configuration discovered from providers (bucket_id, RPM/TPM limits)RateLimitInfo: Parsed rate limit response dataRequestMetadata: Request metadata for scheduling decisionsResourceType: Type-safe resource type constants (TEXT,IMAGE,AUDIO,EMBEDDING,GENERIC)QueuedRequest,QueueInfo,ScheduleResultfor queue managementRateLimitType,RateLimitBucket,LimitCheckResultfor rate limit types
Documentation
- Comprehensive README with Quick Start guide
- API reference documentation
- Backend configuration guide
- Provider implementation guide
- Streaming support documentation
Testing Infrastructure
- Unit tests for all core components
- Integration tests for backend consistency
- Redis cluster integration tests
- Lua script integration tests
- End-to-end workflow tests
- Benchmark tests for concurrent scaling and scheduler overhead
Technical Details
- Python: Requires Python 3.10+
- Dependencies:
pydantic - Optional Dependencies:
[metrics]:prometheus-clientfor Prometheus integration[redis]:redisfor distributed backends[full]: All optional dependencies
- License: Apache-2.0
Full Changelog: https://github.com/sethbang/adaptive-rate-limiter/commits/v1.0.0