Retry upon Azure GPT-4 model overloaded requests error (#23)

- Upgrade to latest nextjs and migrate api code to App Route - Retry for `That model is currently overloaded with other requests` error - Revised and updated the README documentation for clarity and more detailed instructions - Updated the application's Next.js configuration and TS configuration significantly Resolves #21 #22
scalaone · Jul 24, 2023 · 055d02a · 055d02a
1 parent 79f2953
commit 055d02a
Show file tree

Hide file tree

Showing 23 changed files with 1,770 additions and 5,620 deletions.
diff --git a/.eslintrc.json b/.eslintrc.json
diff --git a/.vscode/extensions.json b/.vscode/extensions.json
diff --git a/.vscode/settings.json b/.vscode/settings.json
diff --git a/README.en-US.md b/README.en-US.md
@@ -2,31 +2,37 @@
 
 [English](./README.en-US.md) | Simplified Chinese
 
-Azure OpenAI Proxy is a tool that transforms OpenAI API requests into Azure OpenAI API requests. This allows applications that are compatible only with OpenAI to use Azure Open AI seamlessly.
+Azure OpenAI Proxy is a tool that transforms OpenAI API requests into Azure OpenAI API requests, allowing OpenAI-compatible applications to seamlessly use Azure Open AI.
 
 ## Prerequisites
 
-To use Azure OpenAI Proxy, you need an Azure OpenAI account.
+An Azure OpenAI account is required to use Azure OpenAI Proxy.
 
 ## Azure Deployment
 
 [![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fscalaone%2Fazure-openai-proxy%2Fmain%2Fdeploy%2Fazure-deploy.json)
 
+Remember to:
+
+- Select the region that matches your Azure OpenAI resource for best performance.
+- If deployment fails because the 'proxywebapp' name is already taken, change the resource prefix and redeploy.
+- The deployed proxy app is part of a B1 pricing tier Azure web app plan, which can be modified in the Azure Portal after deployment.
+
 ## Docker Deployment
 
-Run the following command to deploy using Docker:
+To deploy using Docker, execute the following command:
 
 `docker run -d -p 3000:3000 scalaone/azure-openai-proxy`
 
 ## Local Execution and Testing (Command Line)
 
-Follow the steps below:
+Follow these steps:
 
 1. Install NodeJS 18.
 2. Clone the repository in the command line window.
 3. Run `npm install` to install the dependencies.
 4. Run `npm start` to start the application.
-5. Use the script below for testing. Replace `AZURE_RESOURCE_ID`, `AZURE_MODEL_DEPLOYMENT`, and `AZURE_API_KEY` before executing. `AZURE_API_VERSION` is optional, its default value is `2023-05-15`.
+5. Use the script below for testing. Replace `AZURE_RESOURCE_ID`, `AZURE_MODEL_DEPLOYMENT`, and `AZURE_API_KEY` before running. The default value for `AZURE_API_VERSION` is `2023-05-15` and is optional.
 
 <details>
 <summary>Test script</summary>
@@ -60,44 +66,44 @@ The azure-openai-proxy has been tested and confirmed to work with the following
 
 | Application Name                                                | Docker-compose File                                             |
 | --------------------------------------------------------------- | --------------------------------------------------------------- |
-| [chatbot-ui](https://github.com/mckaywrigley/chatbot-ui)        | [docker-compose.yml](./e2e/chatbot-ui/docker-compose.yml)       |
+| [chatgpt-lite](https://github.com/blrchen/chatgpt-lite)         | [docker-compose.yml](./e2e/chatgpt-lite/docker-compose.yml)     |
 | [chatgpt-next-web](https://github.com/Yidadaa/ChatGPT-Next-Web) | [docker-compose.yml](./e2e/chatgpt-next-web/docker-compose.yml) |
+| [chatbot-ui](https://github.com/mckaywrigley/chatbot-ui)        | [docker-compose.yml](./e2e/chatbot-ui/docker-compose.yml)       |
 | [chatgpt-web](https://github.com/Chanzhaoyu/chatgpt-web)        | [docker-compose.yml](./e2e/chatgpt-web/docker-compose.yml)      |
-| [chatgpt-lite](https://github.com/blrchen/chatgpt-lite)         | [docker-compose.yml](./e2e/chatgpt-lite/docker-compose.yml)     |
 | [chatgpt-mininal](https://github.com/blrchen/chatgpt-mininal)   | [docker-compose.yml](./e2e/chatgpt-mininal/docker-compose.yml)  |
 
 To test locally, follow these steps:
 
 1. Clone the repository in a command-line window.
-2. Update the `OPENAPI_API_KEY` environment variable with `AZURE_RESOURCE_ID:AZURE_MODEL_DEPLOYMENT:AZURE_API_KEY`. Alternatively, update the OPENAPI_API_KEY value directly in the docker-compose.yml file.
+2. Update the `OPENAPI_API_KEY` environment variable with `AZURE_RESOURCE_ID:AZURE_MODEL_DEPLOYMENT:AZURE_API_KEY`. Alternatively, update the OPENAPI_API_KEY value in the docker-compose.yml file directly.
 3. Navigate to the directory containing the `docker-compose.yml` file for the application you want to test.
-4. Run the build command: `docker-compose build`.
+4. Execute the build command: `docker-compose build`.
 5. Start the service: `docker-compose up -d`.
-6. Launch the application locally using the exposed port defined in the docker-compose.yml file. For example, visit http://localhost:3000.
+6. Access the application locally using the port defined in the docker-compose.yml file. For example, visit http://localhost:3000.
 
 ## FAQs
 
 <details>
 <summary>Q: What are `AZURE_RESOURCE_ID`,`AZURE_MODEL_DEPLOYMENT`, and `AZURE_API_KEY`?</summary>
 
-A: You can find these in the Azure management portal. Refer to the image below for details:
+A: These can be found in the Azure management portal. See the image below for reference:
 
 ![resource-and-model](./resource-and-model.jpg)
 
 </details>
 
 <details>
-<summary>Q: How can I use GPT-4?</summary>
+<summary>Q: How can I use gpt-4 and gpt-4-32k models?</summary>
 
-A: To use GPT-4, use the key format as follows:
+A: To use gpt-4 and gpt-4-32k models, follow the key format below:
 
 `AZURE_RESOURCE_ID:gpt-3.5-turbo|AZURE_MODEL_DEPLOYMENT,gpt-4|AZURE_MODEL_DEPLOYMENT,gpt-4-32k|AZURE_MODEL_DEPLOYMENT:AZURE_API_KEY:AZURE_API_VERSION`
 
 </details>
 
 ## Contributing
 
-We welcome various PR submissions.
+We welcome all PR submissions.
 
 ## Disclaimer
 

diff --git a/README.md b/README.md
@@ -2,17 +2,23 @@
 
 [English](./README.en-US.md) | 简体中文
 
-Azure OpenAI Proxy 是一个 OpenAI API 代理工具，它可以将 OpenAI API 请求转换为 Azure OpenAI API 请求，使仅支持 OpenAI 的应用程序可以无缝地使用 Azure OpenAI。
+Azure OpenAI Proxy 是一个 OpenAI API 的代理工具，能将 OpenAI API 请求转为 Azure OpenAI API 请求，从而让只支持 OpenAI 的应用程序无缝使用 Azure OpenAI。
 
-## 使用要求
+## 使用条件
 
-必须拥有 Azure OpenAI 帐户才能使用 Azure OpenAI Proxy。
+你需要有一个 Azure OpenAI 账户才能使用 Azure OpenAI Proxy。
 
 ## Azure 部署
 
 [![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fscalaone%2Fazure-openai-proxy%2Fmain%2Fdeploy%2Fazure-deploy.json)
 
-## Docker Deployment
+请注意：
+
+- 选择与你的 Azure OpenAI 资源相匹配的区域以获得最佳性能。
+- 如果部署失败是因为 'proxywebapp' 名称已被占用，只需修改资源前缀再重新部署。
+- 已部署的代理应用位于 B1 定价层级的 Azure 网页应用计划下，你可以在部署后在 Azure 门户中进行更新。
+
+## Docker 部署
 
 ```bash
 docker run -d -p 3000:3000 scalaone/azure-openai-proxy
@@ -24,7 +30,7 @@ docker run -d -p 3000:3000 scalaone/azure-openai-proxy
 2. 克隆代码到命令行窗口。
 3. 运行 `npm install` 安装依赖项。
 4. 运行 `npm start` 启动应用程序。
-5. 运行下面脚本测试，运行前需要把`AZURE_RESOURCE_ID`，`AZURE_MODEL_DEPLOYMENT`，`AZURE_API_KEY`, `AZURE_API_VERSION`替换，`AZURE_API_VERSION`参数可选，目前默认是`2023-05-15`。
+5. 运行下面脚本测试，运行前需要把`AZURE_RESOURCE_ID`，`AZURE_MODEL_DEPLOYMENT`，`AZURE_API_KEY`和`AZURE_API_VERSION`替换，`AZURE_API_VERSION`参数可选，默认是`2023-05-15`。
 
 <details>
 <summary>测试脚本</summary>
@@ -56,12 +62,12 @@ curl -X "POST" "http://localhost:3000/v1/chat/completions" \
 
 以下应用已经过测试，确认可以与 azure-openai-proxy 一起工作：
 
-| App Name                                                        | E2E Docker-compose file                                         |
+| 应用名称                                                        | E2E测试 Docker-compose 文件                                     |
 | --------------------------------------------------------------- | --------------------------------------------------------------- |
-| [chatbot-ui](https://github.com/mckaywrigley/chatbot-ui)        | [docker-compose.yml](./e2e/chatbot-ui/docker-compose.yml)       |
+| [chatgpt-lite](https://github.com/blrchen/chatgpt-lite)         | [docker-compose.yml](./e2e/chatgpt-lite/docker-compose.yml)     |
 | [chatgpt-next-web](https://github.com/Yidadaa/ChatGPT-Next-Web) | [docker-compose.yml](./e2e/chatgpt-next-web/docker-compose.yml) |
+| [chatbot-ui](https://github.com/mckaywrigley/chatbot-ui)        | [docker-compose.yml](./e2e/chatbot-ui/docker-compose.yml)       |
 | [chatgpt-web](https://github.com/Chanzhaoyu/chatgpt-web)        | [docker-compose.yml](./e2e/chatgpt-web/docker-compose.yml)      |
-| [chatgpt-lite](https://github.com/blrchen/chatgpt-lite)         | [docker-compose.yml](./e2e/chatgpt-lite/docker-compose.yml)     |
 | [chatgpt-minimal](https://github.com/blrchen/chatgpt-minimal)   | [docker-compose.yml](./e2e/chatgpt-minimal/docker-compose.yml)  |
 
 要在本地运行测试，请按照以下步骤操作：
@@ -84,8 +90,8 @@ A: 可以在Azure的管理门户里查找，具体见下图标注
 </details>
 
 <details>
-<summary>Q: 如何支持GPT-4</summary>
-A: 要使用GPT-4，请使用下列格式的key:
+<summary>Q: 如何使用gpt-4 and gpt-4-32k模型</summary>
+A: 要使用gpt-4 and gpt-4-32k模型，请使用下列格式的key:
 
 `AZURE_RESOURCE_ID:gpt-3.5-turbo|AZURE_MODEL_DEPLOYMENT,gpt-4|AZURE_MODEL_DEPLOYMENT,gpt-4-32k|AZURE_MODEL_DEPLOYMENT:AZURE_API_KEY:AZURE_API_VERSION`
 

diff --git a/public/favicon.ico → app/favicon.ico b/public/favicon.ico → app/favicon.ico
diff --git a/app/layout.tsx b/app/layout.tsx
@@ -0,0 +1,17 @@
+import type { Metadata } from 'next'
+import { Inter } from 'next/font/google'
+
+const inter = Inter({ subsets: ['latin'] })
+
+export const metadata: Metadata = {
+  title: 'Create Next App',
+  description: 'Generated by create next app'
+}
+
+export default function RootLayout({ children }: { children: React.ReactNode }) {
+  return (
+    <html lang="en">
+      <body className={inter.className}>{children}</body>
+    </html>
+  )
+}
diff --git a/app/page.tsx b/app/page.tsx
@@ -0,0 +1,8 @@
+import Image from 'next/image'
+
+export default function Home() {
+  return (
+    <main>
+    </main>
+  )
+}
diff --git a/app/v1/chat/completions/route.ts b/app/v1/chat/completions/route.ts
@@ -0,0 +1,116 @@
+import { NextRequest, NextResponse } from 'next/server'
+
+const DEFAULT_API_VERSION = '2023-05-15'
+const MAX_RETRY_COUNT = 3
+const RETRY_DELAY = 1000
+
+export async function POST(request: NextRequest) {
+  const apiKey = request.headers.get('authorization')?.replace('Bearer ', '')
+  if (!apiKey) {
+    return NextResponse.json({ message: 'Unauthenticated' }, { status: 401 })
+  }
+  const body = await request.json()
+
+  let retryCount = 0
+  while (true) {
+    let response = await chat(apiKey, body)
+    const status = response.status
+    if (status < 300 || status === 400) {
+      return response
+    }
+    if (retryCount >= MAX_RETRY_COUNT) {
+      return response
+    } else {
+      retryCount++
+      console.log(`Status is ${status}, Retry ${retryCount} times`)
+      await delay(RETRY_DELAY)
+    }
+  }
+}
+
+async function chat(apiKey: string, body: any) {
+  const [resourceId, mapping, azureApiKey, apiVersion] = apiKey.split(':')
+  const model = body['model']
+
+  // get deployment id
+  let deploymentId
+  if (mapping.includes('|')) {
+    const modelMapping = Object.fromEntries(mapping.split(',').map((pair) => pair.split('|')))
+    deploymentId = modelMapping[model] || Object.values(modelMapping)[0]
+  } else {
+    deploymentId = mapping
+  }
+
+  let url = `https://${resourceId}.openai.azure.com/openai/deployments/${deploymentId}/chat/completions?api-version=${
+    apiVersion || DEFAULT_API_VERSION
+  }`
+  const response = await fetch(url, {
+    method: 'POST',
+    headers: {
+      'api-key': azureApiKey,
+      'Content-Type': 'application/json'
+    },
+    body: JSON.stringify(body)
+  })
+  console.log(`[${resourceId}][${deploymentId}] ${response.status} ${response.statusText}`)
+  let resultStream: ReadableStream | undefined
+  let isFirstEventData = true
+  const status: number = await new Promise((resolve) => {
+    const decoder = new TextDecoder()
+    resultStream = new ReadableStream(
+      {
+        async pull(controller) {
+          const reader = response.body!.getReader()
+
+          while (true) {
+            const { value, done } = await reader.read()
+            if (done) {
+              controller.close()
+            }
+            let data = decoder.decode(value)
+            if (isFirstEventData) {
+              isFirstEventData = false
+              if (shouldRetry(data)) {
+                resolve(500)
+              } else {
+                resolve(response.status)
+              }
+            }
+            controller.enqueue(value)
+          }
+        }
+      },
+      {
+        highWaterMark: 1,
+        size(chunk) {
+          return chunk.length
+        }
+      }
+    )
+  })
+  return new Response(resultStream, {
+    status: status,
+    headers: response.headers
+  })
+}
+
+function delay(ms: number) {
+  return new Promise((resolve) => setTimeout(resolve, ms))
+}
+
+function shouldRetry(data: string) {
+  let shouldRetry = false
+  try {
+    const json = data.startsWith('data: ') ? data.match(/^data: (.*?)$/m)?.[1] : data
+    const jobject = JSON.parse(json!!)
+    if (
+      jobject?.error?.message.startsWith('That model is currently overloaded with other requests')
+    ) {
+      shouldRetry = true
+    }
+  } catch (e) {
+    console.error(`first event data string: ${data}`)
+    console.error(`parse json error: ${e}`)
+  }
+  return shouldRetry
+}