Skip to content

Commit

Permalink
docs: update rate limit related section
Browse files Browse the repository at this point in the history
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
  • Loading branch information
TomeHirata committed Jan 22, 2024
1 parent acca49f commit e0aa4fa
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 2 deletions.
22 changes: 21 additions & 1 deletion docs/source/llms/deployments/index.rst
Expand Up @@ -101,6 +101,9 @@ For details about the configuration file's parameters (including parameters for
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
- name: chat
endpoint_type: llm/v1/chat
Expand Down Expand Up @@ -284,6 +287,9 @@ Here's an example of a provider configuration within an endpoint:
name: gpt-4
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
In the above configuration, ``openai`` is the `provider` for the model.

Expand Down Expand Up @@ -324,6 +330,11 @@ an endpoint in the MLflow Deployments Server consists of the following fields:
* **name**: The name of the model to use. For example, ``gpt-3.5-turbo`` for OpenAI's ``GPT-3.5-Turbo`` model.
* **config**: Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key.

* **limit**: Specify the rate limit setting this endpoint will follow. The limit field contains the following fields:

* **renewal_period**: The time unit of the rate limit, one of [second|minute|hour|day|month|year].
* **calls**: The number of calls this endpoint will accept within the specified time unit.

Here's an example of an endpoint configuration:

.. code-block:: yaml
Expand All @@ -336,6 +347,9 @@ Here's an example of an endpoint configuration:
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
In the example above, a request sent to the completions endpoint would be forwarded to the
``gpt-3.5-turbo`` model provided by ``openai``.
Expand Down Expand Up @@ -423,10 +437,13 @@ Here is an example of a single-endpoint configuration:
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
In this example, we define an endpoint named ``chat`` that corresponds to the ``llm/v1/chat`` type, which
will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service.
will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service, and accept up to 10 requests per minute.

The MLflow Deployments Server configuration is very easy to update.
Simply edit the configuration file and save your changes, and the MLflow Deployments Server will automatically
Expand Down Expand Up @@ -681,6 +698,9 @@ An example configuration for Azure OpenAI is:
openai_deployment_name: "{your_deployment_name}"
openai_api_base: "https://{your_resource_name}-azureopenai.openai.azure.com/"
openai_api_version: "2023-05-15"
limit:
renewal_period: minute
calls: 10
.. note::
Expand Down
2 changes: 1 addition & 1 deletion examples/deployments/deployments_server/openai/config.yaml
Expand Up @@ -7,7 +7,7 @@ endpoints:
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: "minute"
renewal_period: minute
calls: 10

- name: completions
Expand Down

0 comments on commit e0aa4fa

Please sign in to comment.