docs: update rate limit related section

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
mlflow · Jan 22, 2024 · e0aa4fa · e0aa4fa
1 parent acca49f
commit e0aa4fa
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 2 deletions.
diff --git a/docs/source/llms/deployments/index.rst b/docs/source/llms/deployments/index.rst
@@ -101,6 +101,9 @@ For details about the configuration file's parameters (including parameters for
           name: gpt-3.5-turbo
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
       - name: chat
         endpoint_type: llm/v1/chat
@@ -284,6 +287,9 @@ Here's an example of a provider configuration within an endpoint:
           name: gpt-4
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
 In the above configuration, ``openai`` is the `provider` for the model.
 
@@ -324,6 +330,11 @@ an endpoint in the MLflow Deployments Server consists of the following fields:
     * **name**: The name of the model to use. For example, ``gpt-3.5-turbo`` for OpenAI's ``GPT-3.5-Turbo`` model.
     * **config**: Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key.
 
+* **limit**: Specify the rate limit setting this endpoint will follow. The limit field contains the following fields:
+
+    * **renewal_period**: The time unit of the rate limit, one of [second|minute|hour|day|month|year].
+    * **calls**: The number of calls this endpoint will accept within the specified time unit.
+
 Here's an example of an endpoint configuration:
 
 .. code-block:: yaml
@@ -336,6 +347,9 @@ Here's an example of an endpoint configuration:
           name: gpt-3.5-turbo
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
 In the example above, a request sent to the completions endpoint would be forwarded to the
 ``gpt-3.5-turbo`` model provided by ``openai``.
@@ -423,10 +437,13 @@ Here is an example of a single-endpoint configuration:
           name: gpt-3.5-turbo
           config:
             openai_api_key: $OPENAI_API_KEY
+        limit:
+          renewal_period: minute
+          calls: 10
 
 
 In this example, we define an endpoint named ``chat`` that corresponds to the ``llm/v1/chat`` type, which
-will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service.
+will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service, and accept up to 10 requests per minute.
 
 The MLflow Deployments Server configuration is very easy to update.
 Simply edit the configuration file and save your changes, and the MLflow Deployments Server will automatically
@@ -681,6 +698,9 @@ An example configuration for Azure OpenAI is:
             openai_deployment_name: "{your_deployment_name}"
             openai_api_base: "https://{your_resource_name}-azureopenai.openai.azure.com/"
             openai_api_version: "2023-05-15"
+        limit:
+          renewal_period: minute
+          calls: 10
 
 
 .. note::

diff --git a/examples/deployments/deployments_server/openai/config.yaml b/examples/deployments/deployments_server/openai/config.yaml
@@ -7,7 +7,7 @@ endpoints:
       config:
         openai_api_key: $OPENAI_API_KEY
     limit:
-      renewal_period: "minute"
+      renewal_period: minute
       calls: 10
 
   - name: completions