Skip to content

feat: implement OpenAI-compatible proxy with Vercel AI Gateway#19

Merged
wtfsayo merged 6 commits intomainfrom
feat/openai-proxy-vercel-gateway
Oct 13, 2025
Merged

feat: implement OpenAI-compatible proxy with Vercel AI Gateway#19
wtfsayo merged 6 commits intomainfrom
feat/openai-proxy-vercel-gateway

Conversation

@wtfsayo
Copy link
Member

@wtfsayo wtfsayo commented Oct 13, 2025

Summary

This PR implements an OpenAI-compatible proxy layer using Vercel AI Gateway, allowing the application to act as a drop-in replacement for OpenAI's API.

Changes

  • ✅ Add chat completions endpoint (/api/v1/chat/completions)
  • ✅ Add embeddings endpoint (/api/v1/embeddings)
  • ✅ Add models list endpoint (/api/v1/models)
  • ✅ Add model-specific endpoint (/api/v1/models/{model})
  • ✅ Update middleware to allow new proxy endpoints
  • ✅ Add provider configuration and routing logic
  • ✅ Add test script for comprehensive proxy testing
  • ✅ Update environment variables documentation

Key Features

  • Full OpenAI API compatibility for chat completions
  • Support for embeddings API
  • Dynamic model listing and details
  • Proper authentication via API keys
  • Error handling and validation
  • Usage tracking and analytics

Testing

Run the test script:

./scripts/test-openai-proxy.sh

Environment Variables

Add to your .env.local:

VERCEL_AI_GATEWAY_API_KEY=your_vercel_gateway_key_here

- Add chat completions endpoint (/api/v1/chat/completions)
- Add embeddings endpoint (/api/v1/embeddings)
- Add models list endpoint (/api/v1/models)
- Add model-specific endpoint (/api/v1/models/{model})
- Update middleware to allow new proxy endpoints
- Add provider configuration and routing logic
- Add test script for OpenAI proxy functionality
- Update environment variables documentation
…tion

- Fix apiKey type error in chat completions (undefined to null conversion)
- Remove unused encoder variable in streaming handler
- Remove unused catch variable parameter
- Fix readonly array type in usage chart YAxis domain
- Fix unknown type errors in test scripts
- Exclude scripts folder from tsconfig to prevent build-time type checking
Critical Fixes:
- Fix credit deduction race condition by checking credits BEFORE API calls
- Add pre-flight credit validation for both chat completions and embeddings
- Fix hardcoded embedding pricing to use proper calculateCost function
- Add fallback token estimation for streaming when usage data is missing
- Fix provider detection logic to handle provider-prefixed models (e.g., 'openai/gpt-4o')
- Add comprehensive input validation for all endpoints

Improvements:
- Add normalizeModelName() helper to handle model name variations
- Add estimateTokens() and estimateRequestCost() for pre-flight checks
- Add checkSufficientCredits() for reusable credit validation
- Return 402 status code for insufficient credits
- Log warnings when credit deduction fails after API calls
- Store prompt messages in streaming mode for better audit trails

Security & Code Quality:
- Remove API key logging from test script (security issue)
- Replace console.error with logger.error for consistency
- Add proper error handling for insufficient credits
- Validate empty messages arrays and null content

All changes address critical billing/revenue issues identified in PR review.
Critical Fixes:
- Make non-streaming credit deduction synchronous (prevents revenue loss)
- Return 402 error if credit deduction fails after API call
- Increase credit buffer from 20% to 50% for safety margin
- Fix message validation to allow null content for tool/function calls
- Fix type safety for multimodal content in estimateRequestCost

Error Handling Improvements:
- Add structured error propagation from Vercel Gateway
- Properly parse and forward OpenAI-formatted errors with status codes
- Add timeout handling (60s for chat/embeddings, 30s for models)
- Use AbortController for all gateway requests

Performance & Reliability:
- Only analytics (usage records, generation records) run in background
- Critical billing operations (credit deduction) are synchronous
- Better error messages for timeout scenarios
- Proper cleanup of timeout handlers

This addresses all critical issues from PR review:
- Issue #1: Synchronous credit deduction ✅
- Issue #2: Increased credit buffer ✅
- Issue #3: Tool call message validation ✅
- Issue #4: Multimodal content type safety ✅
- Issue #5: Gateway error propagation ✅
- Issue #6: Timeout handling ✅
@elizaOS elizaOS deleted a comment from vercel bot Oct 13, 2025
@elizaOS elizaOS deleted a comment from coderabbitai bot Oct 13, 2025
@elizaOS elizaOS deleted a comment from claude bot Oct 13, 2025
@elizaOS elizaOS deleted a comment from claude bot Oct 13, 2025
@elizaOS elizaOS deleted a comment from claude bot Oct 13, 2025
@elizaOS elizaOS deleted a comment from claude bot Oct 13, 2025
@elizaOS elizaOS deleted a comment from claude bot Oct 13, 2025
@elizaOS elizaOS deleted a comment from claude bot Oct 13, 2025
@wtfsayo
Copy link
Member Author

wtfsayo commented Oct 13, 2025

@claude final review

@claude
Copy link

claude bot commented Oct 13, 2025

Claude finished @wtfsayo's task —— View job


Final Review Complete ✅

I've conducted a comprehensive review of your OpenAI-compatible proxy implementation. This is a solid PR with well-structured code. Below are my findings organized by severity.


🎯 Overall Assessment

Strong Points:

  • ✅ Full OpenAI API compatibility achieved
  • ✅ Proper authentication and authorization flow
  • ✅ Comprehensive error handling with structured errors
  • ✅ Credit checking before API calls (good billing protection)
  • ✅ Support for both streaming and non-streaming responses
  • ✅ Excellent test coverage script
  • ✅ Clean separation of concerns with provider abstraction

Areas for Improvement:
There are several issues ranging from critical to minor that should be addressed before merging.


🔴 Critical Issues

1. Race Condition in Streaming Credit Deduction

Location: app/api/v1/chat/completions/route.ts:410-423

The streaming response returns immediately to the client (line 477), but credit deduction happens asynchronously in the background (lines 410-423). If deduction fails, the user has already received the response.

Problem:

// Returns immediately at line 477
return new Response(readable, { ... });

// But credit deduction happens later (lines 410-423)
const deductResult = await deductCredits(...);
if (!deductResult.success) {
  logger.error(...); // User already got the response!
}

Impact: Users could receive free service if credits run out during streaming.

Recommendation: Consider one of these approaches:

  1. Pre-deduct estimated credits (like you do for embeddings with 50% buffer), then adjust after completion
  2. Implement a post-completion reconciliation process
  3. Block the stream start until pre-deduction succeeds

2. Inconsistent Error Response for Embeddings Credit Deduction

Location: app/api/v1/embeddings/route.ts:190-196

Credits are deducted in a background async block, so failures are only logged, not returned to the user. Unlike chat completions (line 245 in chat/completions/route.ts), embeddings always return success even if credit deduction fails.

Problem:

(async () => {
  const deductResult = await deductCredits(...);
  if (!deductResult.success) {
    logger.error(...); // Only logged, response already sent
  }
})().catch(() => {});

Impact: Free service when credits are insufficient.

Recommendation: Move credit deduction before return Response.json(data) (line 225) to match the pattern in chat completions.


🟡 High Priority Issues

3. Missing Type Safety in Provider Error Handling

Location: app/api/v1/chat/completions/route.ts:180-186

The error type guard is too permissive and uses runtime checks instead of proper TypeScript types.

Current:

if (error && typeof error === 'object' && 'error' in error && 'status' in error) {
  const gatewayError = error as { status: number; error: { message: string; type?: string; code?: string } };

Recommendation: Define a proper type guard:

interface GatewayError {
  status: number;
  error: {
    message: string;
    type?: string;
    code?: string;
  };
}

function isGatewayError(error: unknown): error is GatewayError {
  return (
    typeof error === 'object' &&
    error !== null &&
    'status' in error &&
    typeof error.status === 'number' &&
    'error' in error &&
    typeof error.error === 'object'
  );
}

4. Inconsistent Logging Usage

Location: app/api/v1/models/route.ts:57

This file uses console.error instead of the logger utility used everywhere else.

Current:

console.error("Error fetching models:", error);

Should be:

logger.error("Error fetching models:", error);

5. Missing Input Validation for Token Limits

Location: app/api/v1/chat/completions/route.ts:40-69

The validation checks for required fields but doesn't validate:

  • Maximum message array length
  • Maximum content length per message
  • Token limits (max_tokens parameter)

Recommendation: Add validation:

if (request.max_tokens && request.max_tokens > MAX_TOKENS_LIMIT) {
  return Response.json({
    error: {
      message: `max_tokens cannot exceed ${MAX_TOKENS_LIMIT}`,
      type: "invalid_request_error",
      param: "max_tokens",
      code: "invalid_value",
    }
  }, { status: 400 });
}

🟠 Medium Priority Issues

6. Potential Memory Issue with Streaming Content Buffer

Location: app/api/v1/chat/completions/route.ts:331

The fullContent string accumulates all streaming content without bounds. For very long responses, this could cause memory issues.

Current:

let fullContent = "";
// ...
if (content) {
  fullContent += content; // Unbounded accumulation
}

Recommendation: Add a content length limit or implement chunked storage for very long responses.

7. Token Estimation Could Be More Accurate

Location: lib/pricing.ts:117-119

The token estimation uses a simple 4-character-per-token heuristic, which is inaccurate for many cases.

Current:

export function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

Recommendation: Consider using a more accurate tokenizer like js-tiktoken for better estimates, especially since you use estimates for credit checking.

8. Timeout Consistency Issues

Location: Multiple files

Different timeout values across similar operations:

  • Chat completions: 60 seconds (route.ts:21)
  • Embeddings: 60 seconds (embeddings/route.ts:137)
  • Models fetch: 30 seconds (models/route.ts:29)
  • Gateway provider: 60 seconds (vercel-gateway.ts:17)

Recommendation: Consider defining timeout constants in a shared config file for consistency.

9. Missing Rate Limiting Headers

Location: All API routes

OpenAI's API returns rate limit headers (x-ratelimit-*). Your proxy doesn't forward or set these headers.

Recommendation: Consider adding rate limit headers to help clients implement proper backoff strategies:

return Response.json(data, {
  headers: {
    'x-ratelimit-limit-requests': '...',
    'x-ratelimit-remaining-requests': '...',
  }
});

🟢 Minor Issues & Suggestions

10. Unused Import

Location: app/api/v1/chat/completions/route.ts:3

VercelGatewayProvider is imported but only used indirectly through getProvider().

11. Inconsistent Provider String

Location: Multiple files

The provider is sometimes "vercel-gateway" (usage records) and sometimes dynamically determined via getProviderFromModel(). This could cause confusion in analytics.

Example:

  • Line 268: provider: "vercel-gateway"
  • Line 109: const provider = getProviderFromModel(model)

Recommendation: Be consistent—either always use "vercel-gateway" since that's the actual provider, or use the upstream provider (openai/anthropic) for pricing purposes and "vercel-gateway" for routing.

12. Environment Variable Documentation

The .env.local file shows AI_GATEWAY_API_KEY but the PR description mentions VERCEL_AI_GATEWAY_API_KEY. Your code correctly supports both (with fallback), but the PR description should mention both options.

13. Test Script Robustness

Location: scripts/test-openai-proxy.sh:143

The multi-provider test only tests two models. Consider testing more providers to ensure full compatibility:

  • Google (gemini models)
  • Meta (llama models)

14. Missing JSDoc Comments

Consider adding JSDoc comments to exported functions, especially in lib/pricing.ts and lib/providers/vercel-gateway.ts, to improve developer experience.

15. Type Narrowing for Message Content

Location: lib/providers/types.ts:4

The content field allows both string and complex multimodal arrays, but the type could be more specific:

content: string | Array<
  | { type: "text"; text: string }
  | { type: "image_url"; image_url: { url: string; detail?: "auto" | "low" | "high" } }
>;

📝 Code Quality Observations

Excellent Practices Observed:

  1. ✅ Proper use of database transactions for credit operations
  2. ✅ Comprehensive error messages with proper error codes
  3. ✅ Background async operations for non-critical analytics
  4. ✅ Proper cleanup with clearTimeout() in all timeout scenarios
  5. ✅ Good use of AbortController for request cancellation
  6. ✅ Proper model name normalization handling provider prefixes
  7. ✅ 50% buffer on credit estimates (conservative approach)

Test Script Quality:

The test script (scripts/test-openai-proxy.sh) is excellent:

  • ✅ Tests all major endpoints
  • ✅ Tests both streaming and non-streaming
  • ✅ Tests URL encoding edge cases
  • ✅ Good visual output with colors
  • ✅ Clear error messages

🎬 Recommendations for Next Steps

Before Merging:

  1. MUST FIX: Address critical issues Feat: Auth & Chat: Conversation History, Credit Tracking, Error Handling #1 and feat: Add versioned /api/v1 endpoints with secure auth, credit tracking, and usage auditing #2 (credit deduction race conditions)
  2. SHOULD FIX: Address high priority issues feat : Add generation metadata tracking and auditing across chat, image, and video APIs #3-5 (type safety, logging, validation)
  3. CONSIDER: Medium priority issues feat: Unified Page Header System with Theme-Aware Logos #6-9 for production readiness

Post-Merge Improvements:

  1. Add integration tests using an actual OpenAI SDK client
  2. Add monitoring/alerting for credit deduction failures
  3. Consider implementing request caching for model list endpoint
  4. Add metrics for response times and error rates per provider

📊 Summary

Category Count
Critical Issues 2
High Priority 3
Medium Priority 4
Minor Issues 6

This is a well-architected feature with good separation of concerns and comprehensive testing. The critical issues around credit deduction need to be resolved before merging to prevent potential revenue loss. Once those are fixed, this will be a solid OpenAI-compatible proxy implementation.

Great work on the overall structure and completeness! 🎉


[Job Run](https://github.com/elizaOS/eliza-cloud-v2/actions/runs/${{ github.run_id }}) | Branch: feat/openai-proxy-vercel-gateway

…outes\n\n- Add shared provider factory in lib/providers\n- Extend AIProvider to support embeddings and models\n- Implement embeddings, listModels, getModel in VercelGatewayProvider\n- Refactor chat/completions, embeddings, and models routes to use provider\n- Keep non-OpenAI routes untouched
@vercel
Copy link

vercel bot commented Oct 13, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
eliza-cloud-v2 Ready Ready Preview Comment Oct 13, 2025 3:19pm

@coderabbitai
Copy link

coderabbitai bot commented Oct 13, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Oct 13, 2025

Pull Request Review: OpenAI-Compatible Proxy Implementation

Thank you for this comprehensive implementation! This PR adds a solid OpenAI-compatible proxy layer. Here's my detailed review:


✅ Strengths

Architecture & Design

  • Clean abstraction: The provider pattern (AIProvider interface + VercelGatewayProvider) is well-structured and extensible
  • OpenAI compatibility: Proper adherence to OpenAI API specifications for chat completions, embeddings, and models endpoints
  • Type safety: Comprehensive TypeScript types in lib/providers/types.ts provide excellent type coverage

Security & Authentication

  • Proper authentication: Uses existing requireAuthOrApiKey for consistent auth across all endpoints
  • API key validation: Correctly validates and increments API key usage
  • Credit checks: Pre-flight credit verification prevents unauthorized usage

Business Logic

  • Credit management: Synchronous credit deduction before response prevents giving away free service
  • Cost estimation: Handles both string and multimodal content in token estimation
  • Error handling: Structured error responses matching OpenAI format

🔴 Critical Issues

1. Race Condition in Streaming Credit Deduction

Location: app/api/v1/chat/completions/route.ts:306-416

Issue: In streaming responses, credits are deducted after the stream completes in an async background task. If the user disconnects or the process crashes, credits may never be deducted.

// Current code (line 306)
(async () => {
  // ... streaming logic ...
  // Credit deduction happens here AFTER streaming
  const deductResult = await deductCredits(...);
})();

// Return streaming response immediately (line 463)
return new Response(readable, { ... });

Risk: Users could abuse this by rapidly disconnecting during streams to get free tokens.

Recommendation:

  • Option 1: Deduct estimated cost upfront, then adjust after completion
  • Option 2: Use a reservation system that locks credits before streaming
  • Option 3: Implement stream monitoring with guaranteed cleanup

2. Missing Input Validation for Message Content

Location: app/api/v1/chat/completions/route.ts:63-95

Issue: While the code validates that messages have content OR tool_calls, it doesn't validate the structure of multimodal content arrays.

// Missing validation for:
content: [
  { type: "text", text: "..." },
  { type: "image_url", image_url: { url: "..." } }
]

Risk: Malformed multimodal content could crash the endpoint or cause unexpected token estimation errors.

Recommendation: Add schema validation for multimodal content structure, especially for image URLs and token estimation.

3. Variable Shadowing Bug

Location: app/api/v1/embeddings/route.ts:111

Issue: The variable provider is declared twice - once from getProviderFromModel (line 68) and again from getProvider (line 111).

const provider = getProviderFromModel(request.model);  // line 68
// ...
const provider = getProvider();  // line 111 - SHADOWING!

Risk: This will cause a compilation error or runtime issues. The second declaration shadows the first, breaking the pricing calculation.

Fix: Rename one of the variables:

const providerName = getProviderFromModel(request.model);
// ...
const providerInstance = getProvider();

⚠️ High Priority Issues

4. Insufficient Error Handling in Stream Processing

Location: app/api/v1/chat/completions/route.ts:349-375

Parse errors in streaming chunks are silently ignored:

try {
  const parsed = JSON.parse(data);
  // ...
} catch {
  // Ignore parse errors - RISKY!
}

Recommendation: Log parse errors and implement a counter. If too many parse errors occur, abort the stream to prevent silent failures.

5. Token Estimation Accuracy

Location: lib/pricing.ts:117-119

The token estimation formula (text.length / 4) is overly simplistic and may be inaccurate for:

  • Code blocks
  • Non-English text (especially Asian languages)
  • Special characters and emojis

Recommendation: Consider using a proper tokenizer library like @dqbd/tiktoken or gpt-tokenizer for accurate token counting, especially since credit estimation has a 50% buffer.

6. Missing Environment Variable Validation

Location: lib/providers/index.ts:5-11

The provider initialization could fail at runtime with unclear errors:

export function getProvider(): AIProvider {
  const apiKey = process.env.VERCEL_AI_GATEWAY_API_KEY || process.env.AI_GATEWAY_API_KEY;
  if (!apiKey) {
    throw new Error("VERCEL_AI_GATEWAY_API_KEY or AI_GATEWAY_API_KEY not configured");
  }
  return new VercelGatewayProvider(apiKey);
}

Recommendation: Add startup validation to ensure required environment variables are set, preventing runtime failures during user requests.


🟡 Medium Priority Issues

7. Hardcoded Timeout Value

Location: lib/providers/vercel-gateway.ts:21

private timeout = 60000; // 60 seconds - hardcoded

Recommendation: Make timeout configurable via environment variable to support longer-running models or different deployment environments.

8. Background Analytics Error Handling

Location: app/api/v1/chat/completions/route.ts:242-289

Background analytics failures are caught but could leave orphaned data:

(async () => {
  try {
    // Analytics...
  } catch (error) {
    logger.error("[OpenAI Proxy] Analytics error:", error);
    // No cleanup or retry mechanism
  }
})().catch(() => {});

Recommendation: Implement a queue-based approach for analytics to ensure eventual consistency, or at least log more details for manual reconciliation.

9. Embeddings Credit Buffer Inconsistency

Location: app/api/v1/embeddings/route.ts:83-84

Comment says 50% buffer increased from 20%, but the same buffer is used in chat completions without the comment about the increase. Consider documenting why embeddings need a larger buffer.

10. Missing Rate Limiting

The OpenAI proxy endpoints don't appear to implement rate limiting beyond credit checks. This could lead to abuse scenarios where users spam requests until credits run out.

Recommendation: Add rate limiting per API key or per organization to prevent abuse.


🟢 Low Priority / Code Quality

11. Type Safety in Error Handling

Location: app/api/v1/chat/completions/route.ts:166-174

The error type checking could be more robust:

if (error && typeof error === 'object' && 'error' in error && 'status' in error) {
  const gatewayError = error as { status: number; error: { message: string; type?: string; code?: string } };

Recommendation: Define a proper error type guard or use a type-safe error handling library.

12. Inconsistent Logging

Some endpoints use logger.info for completion, others use logger.debug. Standardize logging levels across all endpoints.

13. Magic Numbers

Location: lib/pricing.ts:150,160

const estimatedOutputTokens = 500; // Why 500?
return Math.ceil(totalCost * 1.5); // Why 1.5?

Recommendation: Extract these as named constants with documentation explaining the rationale.

14. Model-Specific Endpoint Path Handling

Location: app/api/v1/models/[...model]/route.ts:32

The code joins model segments with "/" which is correct, but there's no validation that the resulting model string is valid.

Recommendation: Add validation or sanitization of the model parameter to prevent potential issues.


📊 Test Coverage

Good: The bash test script (scripts/test-openai-proxy.sh) provides comprehensive integration testing.

Missing:

  • No unit tests for individual functions
  • No automated tests for error scenarios
  • No tests for edge cases (empty messages, very long inputs, etc.)
  • No tests for concurrent requests and race conditions

Recommendation: Add Jest/Vitest unit tests for:

  • Token estimation accuracy
  • Cost calculation edge cases
  • Error handling paths
  • Credit deduction race conditions
  • Multimodal content parsing

🔒 Security Concerns

15. No Input Size Limits

The endpoints don't enforce limits on:

  • Number of messages in a chat request
  • Length of individual messages
  • Size of embeddings input arrays

Risk: Could lead to DoS attacks or excessive memory usage.

Recommendation: Add configurable limits based on model constraints.

16. Image URL Validation

If multimodal content with image URLs is supported, there's no validation that:

  • URLs are from allowed domains
  • URLs don't point to internal services (SSRF protection)
  • Image sizes are reasonable

Recommendation: Implement URL validation and consider proxying images through your infrastructure.


📈 Performance Considerations

17. Database Queries in Hot Path

Location: lib/pricing.ts:28-34

Every request queries the database for pricing information:

const pricing = await db.query.modelPricing.findFirst({ ... });

Impact: This could become a bottleneck under high load.

Recommendation:

  • Cache model pricing in memory with periodic refresh
  • Use Redis for distributed caching
  • Consider pricing as part of application config

18. Multiple Database Round Trips

In chat completions, there are separate queries for:

  1. Credit check (checkSufficientCredits)
  2. Credit deduction (deductCredits)
  3. Usage record creation (createUsageRecord)
  4. Generation record creation (createGeneration)

Recommendation: Batch operations where possible or use database transactions more efficiently.


📝 Documentation & Maintenance

19. Missing JSDoc Comments

Key functions lack documentation:

  • handleStreamingResponse
  • handleNonStreamingResponse
  • estimateRequestCost

Recommendation: Add JSDoc comments explaining parameters, return values, and important behavior (especially around credit deduction timing).

20. Incomplete Environment Documentation

The PR description mentions adding VERCEL_AI_GATEWAY_API_KEY to .env.local, but it's not clear:

  • How to obtain this key
  • Whether it's required in all environments
  • Fallback behavior if not configured

Recommendation: Add comprehensive documentation about environment variables and setup process.


🎯 Recommendations Summary

Must Fix Before Merge:

  1. Fix variable shadowing bug in embeddings/route.ts
  2. Implement proper credit deduction for streaming responses
  3. Add input validation for multimodal content

Should Fix Soon:

  1. Add unit tests for critical paths
  2. Implement proper token counting (consider tiktoken)
  3. Add rate limiting
  4. Cache model pricing data
  5. Add input size limits

Nice to Have:

  1. Improve error type safety
  2. Standardize logging
  3. Extract magic numbers to constants
  4. Add comprehensive JSDoc comments

🎉 Conclusion

This is a well-architected implementation with good separation of concerns and proper authentication. The main concerns are around streaming credit deduction race conditions and missing input validation.

Once the critical issues are addressed, this will be a solid foundation for OpenAI proxy functionality.

Overall Assessment: ⚠️ Approve with Changes Required

Great work on this feature! Let me know if you'd like me to elaborate on any of these points or help implement the fixes.


Review generated with attention to code quality, security, and best practices for production systems.

Renames the 'provider' variable to 'providerName' and 'gatewayProvider' for clarity and consistency in the embeddings API route. Updates all relevant references to improve code readability and prevent confusion between provider name and provider instance.
@claude
Copy link

claude bot commented Oct 13, 2025

Pull Request Review: OpenAI-Compatible Proxy with Vercel AI Gateway

Thank you for this comprehensive implementation! This is a well-structured PR that adds significant functionality. I've conducted a thorough review covering code quality, security, performance, and potential issues.


Overall Assessment

Strong Points:

  • Clean architecture with proper separation of concerns
  • OpenAI-compatible API implementation
  • Comprehensive error handling
  • Good logging throughout
  • Database transaction safety for credit deductions

Areas for Improvement:
Security concerns, race conditions in streaming, and some edge cases that need attention.


Critical Issues

1. Race Condition in Streaming Credit Deduction (app/api/v1/chat/completions/route.ts:300-460)

Severity: High

The streaming implementation has a critical race condition where credits are deducted AFTER the response has been sent to the client.

Problems:

  • User receives full response before payment
  • If credit deduction fails, service was provided for free
  • No way to stop the stream if credits become insufficient mid-stream
  • Users could exploit this by canceling requests before deduction completes

Recommendation:
Pre-deduct estimated credits before streaming (using estimateRequestCost), then reconcile after completion with a refund/additional charge. This matches the non-streaming flow.

2. Missing Function Call Validation (app/api/v1/chat/completions/route.ts:61-95)

Severity: Medium

The validation logic checks for function_call but the TypeScript types don't include it. The types only define tool_calls but the OpenAI API also supports the legacy function_call field.

Recommendation:
Add function_call to the OpenAIChatMessage interface in lib/providers/types.ts

3. Token Estimation Accuracy (lib/pricing.ts:117-119)

Severity: Medium

The simple character-based token estimation (text.length / 4) is quite inaccurate:

  • GPT models use different tokenization than "4 chars per token"
  • Multimodal content (images) is just stringified, leading to massive underestimation
  • Code and special characters tokenize differently than natural language

Recommendation:
Consider using a proper tokenization library like tiktoken or js-tiktoken for accurate token counting.


High Priority Issues

4. Unsafe Error Object Throwing (lib/providers/vercel-gateway.ts:60-93)

Throwing plain objects instead of Error instances breaks stack traces and error tracking tools.

Recommendation:
Create a custom GatewayError class that extends Error.

5. Silent Analytics Failures (app/api/v1/chat/completions/route.ts:233-290)

Analytics errors are logged but swallowed. Consider:

  • Alerting on repeated failures (might indicate DB issues)
  • Queuing failed analytics for retry
  • Monitoring the error rate

6. Insufficient Buffer for Embeddings (app/api/v1/embeddings/route.ts:84)

The 50% buffer might not be sufficient for embeddings. Consider model-specific buffers or increase to 100%.


Code Quality & Best Practices

7. Missing Input Validation

Some endpoints lack size/rate limit validation:

  • No max message count validation
  • No max tokens validation
  • No max content length validation
  • No max array length for batch embeddings

Recommendation:
Add validation for max messages (100), max message length (32000), max embeddings batch (2048).

8. Hard-coded Fallback Pricing (lib/pricing.ts:59-84)

The hard-coded fallback pricing values should be:

  • Documented with the date they were accurate
  • Monitored and updated regularly
  • Possibly configured via environment variables

Security Considerations

9. API Key Exposure in Logs

Ensure logger implementations don't accidentally log sensitive data. Verify that:

  • API keys are redacted in logs
  • Message content is handled appropriately (might contain PII)
  • Error objects don't expose internal details to users

10. Rate Limiting

This implementation doesn't include rate limiting. Consider adding:

  • Per-API-key rate limits
  • Per-organization rate limits
  • Per-IP rate limits for unauthenticated endpoints

11. Model Access Control

All authenticated users can access all models. Consider:

  • Organization-level model access controls
  • Usage quotas per model
  • Premium model restrictions

Performance Considerations

12. Database Queries in Hot Path

Every request does multiple DB queries. Consider:

  • Cache model pricing (rarely changes)
  • Consider optimistic concurrency for credit balance
  • Batch analytics writes

13. Streaming Parser Overhead (app/api/v1/chat/completions/route.ts:338-366)

The streaming handler parses every chunk to extract usage data. Issues:

  • Repeated parsing in hot loop
  • Try-catch in loop has performance overhead
  • Splitting on every chunk might split mid-line

Recommendation:
Use a proper SSE parser or buffer incomplete lines.


Testing Concerns

14. Test Coverage

The test script is excellent! However, missing test cases for:

  • Error scenarios (invalid API key, insufficient credits, malformed requests)
  • Edge cases (empty messages, null content, very long inputs)
  • Tool/function calling
  • Multimodal content (images)
  • Concurrent request handling

What's Done Well

  1. Excellent Error Handling Structure - Proper OpenAI-compatible error format
  2. Good Logging - Comprehensive logging throughout
  3. Transaction Safety - Credit deduction uses database transactions
  4. Type Safety - Strong TypeScript typing
  5. Provider Abstraction - Clean provider interface allows future extensions
  6. Middleware Integration - Proper authentication flow
  7. Comprehensive Test Script - Great starting point for testing

Recommended Action Items (Priority Order)

  1. Fix streaming credit deduction race condition (Critical)
  2. Add function_call to TypeScript types (Quick fix)
  3. Improve token estimation accuracy (or document limitations)
  4. Add input validation (size limits)
  5. Implement proper error classes
  6. Add rate limiting
  7. Cache model pricing
  8. Add error scenario tests
  9. Document API and configuration
  10. Add monitoring/alerting for analytics failures

Summary

This is a solid implementation with good architecture and error handling. The main concerns are around the streaming credit deduction and token estimation accuracy, both of which could lead to billing issues. With the recommended fixes, this will be production-ready.

Recommendation: Approve with changes - Fix the critical streaming issue before merging.

Great work overall! The code is clean, well-structured, and demonstrates good understanding of the OpenAI API specification.

@wtfsayo wtfsayo merged commit a975090 into main Oct 13, 2025
4 checks passed
lalalune pushed a commit that referenced this pull request Dec 13, 2025
feat: implement OpenAI-compatible proxy with Vercel AI Gateway
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant