docs: update rate limit related section

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
mlflow · Jan 22, 2024 · 7d1aa2b · 7d1aa2b
1 parent acca49f
commit 7d1aa2b
Show file tree

Hide file tree

Showing 3 changed files with 24 additions and 13 deletions.
diff --git a/docs/source/llms/deployments/index.rst b/docs/source/llms/deployments/index.rst
@@ -101,6 +101,9 @@ For details about the configuration file's parameters (including parameters for
           name: gpt-3.5-turbo
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
       - name: chat
         endpoint_type: llm/v1/chat
@@ -284,6 +287,9 @@ Here's an example of a provider configuration within an endpoint:
           name: gpt-4
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
 In the above configuration, ``openai`` is the `provider` for the model.
 
@@ -324,6 +330,11 @@ an endpoint in the MLflow Deployments Server consists of the following fields:
     * **name**: The name of the model to use. For example, ``gpt-3.5-turbo`` for OpenAI's ``GPT-3.5-Turbo`` model.
     * **config**: Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key.
 
+* **limit**: Specify the rate limit setting this endpoint will follow. The limit field contains the following fields:
+
+    * **renewal_period**: The time unit of the rate limit, one of [second|minute|hour|day|month|year].
+    * **calls**: The number of calls this endpoint will accept within the specified time unit.
+
 Here's an example of an endpoint configuration:
 
 .. code-block:: yaml
@@ -336,6 +347,9 @@ Here's an example of an endpoint configuration:
           name: gpt-3.5-turbo
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
 In the example above, a request sent to the completions endpoint would be forwarded to the
 ``gpt-3.5-turbo`` model provided by ``openai``.
@@ -423,10 +437,13 @@ Here is an example of a single-endpoint configuration:
           name: gpt-3.5-turbo
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
 
 In this example, we define an endpoint named ``chat`` that corresponds to the ``llm/v1/chat`` type, which
-will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service.
+will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service, and accept up to 10 requests per minute.
 
 The MLflow Deployments Server configuration is very easy to update.
 Simply edit the configuration file and save your changes, and the MLflow Deployments Server will automatically
@@ -681,6 +698,9 @@ An example configuration for Azure OpenAI is:
             openai_deployment_name: "{your_deployment_name}"
             openai_api_base: "https://{your_resource_name}-azureopenai.openai.azure.com/"
             openai_api_version: "2023-05-15"
+        limit:
+          renewal_period: minute
+          calls: 10
 
 
 .. note::

diff --git a/examples/deployments/deployments_server/openai/config.yaml b/examples/deployments/deployments_server/openai/config.yaml
@@ -7,7 +7,7 @@ endpoints:
       config:
         openai_api_key: $OPENAI_API_KEY
     limit:
-      renewal_period: "minute"
+      renewal_period: minute
       calls: 10
 
   - name: completions

diff --git a/mlflow/deployments/server/app.py b/mlflow/deployments/server/app.py
@@ -1,11 +1,10 @@
 from pathlib import Path
-from typing import Any, Dict, List, Optional, Type, Union
+from typing import Any, Dict, List, Optional, Union
 
 from fastapi import FastAPI, HTTPException, Request
-from fastapi.exceptions import RequestValidationError
 from fastapi.openapi.docs import get_swagger_ui_html
 from fastapi.responses import FileResponse, RedirectResponse
-from pydantic import BaseModel, ValidationError
+from pydantic import BaseModel
 from slowapi import Limiter, _rate_limit_exceeded_handler
 from slowapi.errors import RateLimitExceeded
 from slowapi.util import get_remote_address
@@ -78,14 +77,6 @@ def get_dynamic_route(self, route_name: str) -> Optional[Route]:
         return self.dynamic_routes.get(route_name)
 
 
-async def parse_request_schema(request: Request, cls: Type[BaseModel]) -> BaseModel:
-    payload = await request.json()
-    try:
-        return cls(**payload)
-    except ValidationError as e:
-        raise RequestValidationError(e.errors())
-
-
 def _create_chat_endpoint(config: RouteConfig):
     prov = get_provider(config.model.provider)(config)