feat: implement OpenAI-compatible proxy with Vercel AI Gateway#19
feat: implement OpenAI-compatible proxy with Vercel AI Gateway#19
Conversation
- Add chat completions endpoint (/api/v1/chat/completions)
- Add embeddings endpoint (/api/v1/embeddings)
- Add models list endpoint (/api/v1/models)
- Add model-specific endpoint (/api/v1/models/{model})
- Update middleware to allow new proxy endpoints
- Add provider configuration and routing logic
- Add test script for OpenAI proxy functionality
- Update environment variables documentation
…tion - Fix apiKey type error in chat completions (undefined to null conversion) - Remove unused encoder variable in streaming handler - Remove unused catch variable parameter - Fix readonly array type in usage chart YAxis domain - Fix unknown type errors in test scripts - Exclude scripts folder from tsconfig to prevent build-time type checking
Critical Fixes: - Fix credit deduction race condition by checking credits BEFORE API calls - Add pre-flight credit validation for both chat completions and embeddings - Fix hardcoded embedding pricing to use proper calculateCost function - Add fallback token estimation for streaming when usage data is missing - Fix provider detection logic to handle provider-prefixed models (e.g., 'openai/gpt-4o') - Add comprehensive input validation for all endpoints Improvements: - Add normalizeModelName() helper to handle model name variations - Add estimateTokens() and estimateRequestCost() for pre-flight checks - Add checkSufficientCredits() for reusable credit validation - Return 402 status code for insufficient credits - Log warnings when credit deduction fails after API calls - Store prompt messages in streaming mode for better audit trails Security & Code Quality: - Remove API key logging from test script (security issue) - Replace console.error with logger.error for consistency - Add proper error handling for insufficient credits - Validate empty messages arrays and null content All changes address critical billing/revenue issues identified in PR review.
Critical Fixes: - Make non-streaming credit deduction synchronous (prevents revenue loss) - Return 402 error if credit deduction fails after API call - Increase credit buffer from 20% to 50% for safety margin - Fix message validation to allow null content for tool/function calls - Fix type safety for multimodal content in estimateRequestCost Error Handling Improvements: - Add structured error propagation from Vercel Gateway - Properly parse and forward OpenAI-formatted errors with status codes - Add timeout handling (60s for chat/embeddings, 30s for models) - Use AbortController for all gateway requests Performance & Reliability: - Only analytics (usage records, generation records) run in background - Critical billing operations (credit deduction) are synchronous - Better error messages for timeout scenarios - Proper cleanup of timeout handlers This addresses all critical issues from PR review: - Issue #1: Synchronous credit deduction ✅ - Issue #2: Increased credit buffer ✅ - Issue #3: Tool call message validation ✅ - Issue #4: Multimodal content type safety ✅ - Issue #5: Gateway error propagation ✅ - Issue #6: Timeout handling ✅
|
@claude final review |
|
Claude finished @wtfsayo's task —— View job Final Review Complete ✅I've conducted a comprehensive review of your OpenAI-compatible proxy implementation. This is a solid PR with well-structured code. Below are my findings organized by severity. 🎯 Overall AssessmentStrong Points:
Areas for Improvement: 🔴 Critical Issues1. Race Condition in Streaming Credit DeductionLocation: The streaming response returns immediately to the client (line 477), but credit deduction happens asynchronously in the background (lines 410-423). If deduction fails, the user has already received the response. Problem: // Returns immediately at line 477
return new Response(readable, { ... });
// But credit deduction happens later (lines 410-423)
const deductResult = await deductCredits(...);
if (!deductResult.success) {
logger.error(...); // User already got the response!
}Impact: Users could receive free service if credits run out during streaming. Recommendation: Consider one of these approaches:
2. Inconsistent Error Response for Embeddings Credit DeductionLocation: Credits are deducted in a background async block, so failures are only logged, not returned to the user. Unlike chat completions (line 245 in chat/completions/route.ts), embeddings always return success even if credit deduction fails. Problem: (async () => {
const deductResult = await deductCredits(...);
if (!deductResult.success) {
logger.error(...); // Only logged, response already sent
}
})().catch(() => {});Impact: Free service when credits are insufficient. Recommendation: Move credit deduction before 🟡 High Priority Issues3. Missing Type Safety in Provider Error HandlingLocation: The error type guard is too permissive and uses runtime checks instead of proper TypeScript types. Current: if (error && typeof error === 'object' && 'error' in error && 'status' in error) {
const gatewayError = error as { status: number; error: { message: string; type?: string; code?: string } };Recommendation: Define a proper type guard: interface GatewayError {
status: number;
error: {
message: string;
type?: string;
code?: string;
};
}
function isGatewayError(error: unknown): error is GatewayError {
return (
typeof error === 'object' &&
error !== null &&
'status' in error &&
typeof error.status === 'number' &&
'error' in error &&
typeof error.error === 'object'
);
}4. Inconsistent Logging UsageLocation: This file uses Current: console.error("Error fetching models:", error);Should be: logger.error("Error fetching models:", error);5. Missing Input Validation for Token LimitsLocation: The validation checks for required fields but doesn't validate:
Recommendation: Add validation: if (request.max_tokens && request.max_tokens > MAX_TOKENS_LIMIT) {
return Response.json({
error: {
message: `max_tokens cannot exceed ${MAX_TOKENS_LIMIT}`,
type: "invalid_request_error",
param: "max_tokens",
code: "invalid_value",
}
}, { status: 400 });
}🟠 Medium Priority Issues6. Potential Memory Issue with Streaming Content BufferLocation: The Current: let fullContent = "";
// ...
if (content) {
fullContent += content; // Unbounded accumulation
}Recommendation: Add a content length limit or implement chunked storage for very long responses. 7. Token Estimation Could Be More AccurateLocation: The token estimation uses a simple 4-character-per-token heuristic, which is inaccurate for many cases. Current: export function estimateTokens(text: string): number {
return Math.ceil(text.length / 4);
}Recommendation: Consider using a more accurate tokenizer like 8. Timeout Consistency IssuesLocation: Multiple files Different timeout values across similar operations:
Recommendation: Consider defining timeout constants in a shared config file for consistency. 9. Missing Rate Limiting HeadersLocation: All API routes OpenAI's API returns rate limit headers ( Recommendation: Consider adding rate limit headers to help clients implement proper backoff strategies: return Response.json(data, {
headers: {
'x-ratelimit-limit-requests': '...',
'x-ratelimit-remaining-requests': '...',
}
});🟢 Minor Issues & Suggestions10. Unused ImportLocation:
11. Inconsistent Provider StringLocation: Multiple files The provider is sometimes Example:
Recommendation: Be consistent—either always use "vercel-gateway" since that's the actual provider, or use the upstream provider (openai/anthropic) for pricing purposes and "vercel-gateway" for routing. 12. Environment Variable DocumentationThe 13. Test Script RobustnessLocation: The multi-provider test only tests two models. Consider testing more providers to ensure full compatibility:
14. Missing JSDoc CommentsConsider adding JSDoc comments to exported functions, especially in 15. Type Narrowing for Message ContentLocation: The content: string | Array<
| { type: "text"; text: string }
| { type: "image_url"; image_url: { url: string; detail?: "auto" | "low" | "high" } }
>;📝 Code Quality ObservationsExcellent Practices Observed:
Test Script Quality:The test script (
🎬 Recommendations for Next StepsBefore Merging:
Post-Merge Improvements:
📊 Summary
This is a well-architected feature with good separation of concerns and comprehensive testing. The critical issues around credit deduction need to be resolved before merging to prevent potential revenue loss. Once those are fixed, this will be a solid OpenAI-compatible proxy implementation. Great work on the overall structure and completeness! 🎉 [Job Run](https://github.com/elizaOS/eliza-cloud-v2/actions/runs/${{ github.run_id }}) | Branch: feat/openai-proxy-vercel-gateway |
…outes\n\n- Add shared provider factory in lib/providers\n- Extend AIProvider to support embeddings and models\n- Implement embeddings, listModels, getModel in VercelGatewayProvider\n- Refactor chat/completions, embeddings, and models routes to use provider\n- Keep non-OpenAI routes untouched
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. Comment |
Pull Request Review: OpenAI-Compatible Proxy ImplementationThank you for this comprehensive implementation! This PR adds a solid OpenAI-compatible proxy layer. Here's my detailed review: ✅ StrengthsArchitecture & Design
Security & Authentication
Business Logic
🔴 Critical Issues1. Race Condition in Streaming Credit DeductionLocation: app/api/v1/chat/completions/route.ts:306-416 Issue: In streaming responses, credits are deducted after the stream completes in an async background task. If the user disconnects or the process crashes, credits may never be deducted. // Current code (line 306)
(async () => {
// ... streaming logic ...
// Credit deduction happens here AFTER streaming
const deductResult = await deductCredits(...);
})();
// Return streaming response immediately (line 463)
return new Response(readable, { ... });Risk: Users could abuse this by rapidly disconnecting during streams to get free tokens. Recommendation:
2. Missing Input Validation for Message ContentLocation: app/api/v1/chat/completions/route.ts:63-95 Issue: While the code validates that messages have content OR tool_calls, it doesn't validate the structure of multimodal content arrays. // Missing validation for:
content: [
{ type: "text", text: "..." },
{ type: "image_url", image_url: { url: "..." } }
]Risk: Malformed multimodal content could crash the endpoint or cause unexpected token estimation errors. Recommendation: Add schema validation for multimodal content structure, especially for image URLs and token estimation. 3. Variable Shadowing BugLocation: app/api/v1/embeddings/route.ts:111 Issue: The variable provider is declared twice - once from getProviderFromModel (line 68) and again from getProvider (line 111). const provider = getProviderFromModel(request.model); // line 68
// ...
const provider = getProvider(); // line 111 - SHADOWING!Risk: This will cause a compilation error or runtime issues. The second declaration shadows the first, breaking the pricing calculation. Fix: Rename one of the variables: const providerName = getProviderFromModel(request.model);
// ...
const providerInstance = getProvider();
|
Renames the 'provider' variable to 'providerName' and 'gatewayProvider' for clarity and consistency in the embeddings API route. Updates all relevant references to improve code readability and prevent confusion between provider name and provider instance.
Pull Request Review: OpenAI-Compatible Proxy with Vercel AI GatewayThank you for this comprehensive implementation! This is a well-structured PR that adds significant functionality. I've conducted a thorough review covering code quality, security, performance, and potential issues. Overall AssessmentStrong Points:
Areas for Improvement: Critical Issues1. Race Condition in Streaming Credit Deduction (app/api/v1/chat/completions/route.ts:300-460)Severity: High The streaming implementation has a critical race condition where credits are deducted AFTER the response has been sent to the client. Problems:
Recommendation: 2. Missing Function Call Validation (app/api/v1/chat/completions/route.ts:61-95)Severity: Medium The validation logic checks for function_call but the TypeScript types don't include it. The types only define tool_calls but the OpenAI API also supports the legacy function_call field. Recommendation: 3. Token Estimation Accuracy (lib/pricing.ts:117-119)Severity: Medium The simple character-based token estimation (text.length / 4) is quite inaccurate:
Recommendation: High Priority Issues4. Unsafe Error Object Throwing (lib/providers/vercel-gateway.ts:60-93)Throwing plain objects instead of Error instances breaks stack traces and error tracking tools. Recommendation: 5. Silent Analytics Failures (app/api/v1/chat/completions/route.ts:233-290)Analytics errors are logged but swallowed. Consider:
6. Insufficient Buffer for Embeddings (app/api/v1/embeddings/route.ts:84)The 50% buffer might not be sufficient for embeddings. Consider model-specific buffers or increase to 100%. Code Quality & Best Practices7. Missing Input ValidationSome endpoints lack size/rate limit validation:
Recommendation: 8. Hard-coded Fallback Pricing (lib/pricing.ts:59-84)The hard-coded fallback pricing values should be:
Security Considerations9. API Key Exposure in LogsEnsure logger implementations don't accidentally log sensitive data. Verify that:
10. Rate LimitingThis implementation doesn't include rate limiting. Consider adding:
11. Model Access ControlAll authenticated users can access all models. Consider:
Performance Considerations12. Database Queries in Hot PathEvery request does multiple DB queries. Consider:
13. Streaming Parser Overhead (app/api/v1/chat/completions/route.ts:338-366)The streaming handler parses every chunk to extract usage data. Issues:
Recommendation: Testing Concerns14. Test CoverageThe test script is excellent! However, missing test cases for:
What's Done Well
Recommended Action Items (Priority Order)
SummaryThis is a solid implementation with good architecture and error handling. The main concerns are around the streaming credit deduction and token estimation accuracy, both of which could lead to billing issues. With the recommended fixes, this will be production-ready. Recommendation: Approve with changes - Fix the critical streaming issue before merging. Great work overall! The code is clean, well-structured, and demonstrates good understanding of the OpenAI API specification. |
feat: implement OpenAI-compatible proxy with Vercel AI Gateway
Summary
This PR implements an OpenAI-compatible proxy layer using Vercel AI Gateway, allowing the application to act as a drop-in replacement for OpenAI's API.
Changes
/api/v1/chat/completions)/api/v1/embeddings)/api/v1/models)/api/v1/models/{model})Key Features
Testing
Run the test script:
Environment Variables
Add to your
.env.local: