Skip to content

Avoid draining Postgres connection pool on wrong API key requests.#252

Merged
think-in-universe merged 6 commits intomainfrom
check-keys-in-mem
Dec 12, 2025
Merged

Avoid draining Postgres connection pool on wrong API key requests.#252
think-in-universe merged 6 commits intomainfrom
check-keys-in-mem

Conversation

@PierreLeGuen
Copy link
Contributor

No description provided.

- Added a new rate limiting middleware to control API key usage, preventing excessive requests.
- Introduced a Bloom filter for efficient API key validation and caching to improve performance.
- Updated the API key repository to support fetching active key hashes for Bloom filter initialization and incremental updates.
- Refactored API key generation and hashing functions for better modularity and reusability.
- Enhanced the AuthService to utilize the new caching and Bloom filter mechanisms for validating API keys.
@PierreLeGuen
Copy link
Contributor Author

Fixes #253

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@claude
Copy link

claude bot commented Dec 11, 2025

Code Review - PR 252

This PR adds Bloom filter-based API key validation and rate limiting to prevent connection pool exhaustion from invalid API keys. The implementation is well-structured, but I have identified several CRITICAL issues that need fixing before merge.

CRITICAL ISSUES:

  1. Race Condition in Rate Limiter Counter (crates/api/src/middleware/rate_limit.rs:36-40)

The Counter::clone() implementation is unsafe and creates a race condition. When cloning, you are reading the current value with Relaxed ordering and creating a new atomic. This is not an atomic clone operation. If another thread increments between the load and the new Counter creation, the clone will have an incorrect value.

Impact: Rate limit counts can be incorrect under concurrent load, potentially allowing more requests than intended or incorrectly blocking legitimate requests.

Fix: Remove the Clone implementation entirely - you do not actually need it since Counter is wrapped in Arc. The cache returns Arc, so cloning the Arc is sufficient.

  1. Missing Index on created_at Column (crates/database/src/repositories/api_key.rs:573-577)

The incremental Bloom filter sync query filters on created_at but there is likely no index on created_at, causing a full table scan every 10 seconds.

Impact: Performance degradation as the api_keys table grows. At scale (millions of keys), this could take seconds and strain the database.

Fix: Add a migration to create an index on api_keys(created_at) or a composite index on (is_active, deleted_at, created_at).

  1. Bloom Filter Not Synchronized After Key Revocation (crates/services/src/auth/mod.rs:478-490)

The incremental sync only adds new keys created after last_sync. If a key is revoked, it remains in the Bloom filter forever. Revoked keys will pass the Bloom filter check and hit the database unnecessarily.

Impact: Defeats the purpose of the optimization. Attackers using revoked keys will still query the database on every request.

Recommendation: Add a full rebuild every hour to clear out revoked keys.

  1. Bloom Filter Task Panic = Silent Failure (crates/services/src/auth/mod.rs:427-428)

If Bloom filter creation fails, the service panics during initialization.

Impact: Service startup failure in production. Hard to debug since it is a panic, not a graceful error.

Fix: Handle the error gracefully with proper error logging and fallback.

  1. Rate Limit Middleware Ordering Issue (crates/api/src/lib.rs:726-738)

Rate limiting happens AFTER authentication. Invalid API keys still hit the database for authentication before rate limiting is checked.

Impact: Does not fully solve the connection pool exhaustion problem. An attacker with many invalid keys can still drain connections during authentication.

Solution: Add a separate pre-auth rate limiter that works on raw API key strings before hashing/validation, applied BEFORE auth middleware.

SUMMARY - Cannot merge until these are addressed:

  1. Remove unsafe Clone implementation for Counter
  2. Add database index on api_keys.created_at
  3. Add periodic Bloom filter rebuild to remove revoked keys
  4. Handle Bloom filter creation failure gracefully
  5. Add pre-auth rate limiter before authentication

After fixes: This will be an excellent optimization for preventing database overload from invalid API keys.

Copy link
Contributor

@Evrard-Nil Evrard-Nil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should help quite a bit.
We may want to introduce signed key later, which we can differentiate with a different prefix for example, and deprecate other keys over time.

- Updated rate limiting middleware to clone the state for better handling in route definitions.
- Introduced a constant for maximum capacity in the rate limit cache to enhance maintainability.
- Added a new SQL migration to create an index on the api_keys table for optimized incremental Bloom filter sync queries.
- Enhanced the AuthService to periodically rebuild the Bloom filter, ensuring it remains up-to-date with active keys and improves performance.
use super::auth::AuthenticatedApiKey;
use crate::models::ErrorResponse;

const DEFAULT_API_KEY_RATE_LIMIT: u32 = 1000; // requests per minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we should make the rate limit configurable. 1K / min may not be enough for some enterprise customers when the load is really high, although they may rotate with multiple API keys.

Copy link
Contributor

@think-in-universe think-in-universe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@think-in-universe think-in-universe merged commit f9ff3d0 into main Dec 12, 2025
1 check passed
@think-in-universe think-in-universe deleted the check-keys-in-mem branch December 12, 2025 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants