Skip to content

Commit

Permalink
docs: update rate limit related section
Browse files Browse the repository at this point in the history
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
  • Loading branch information
TomeHirata committed Jan 22, 2024
1 parent acca49f commit 7d1aa2b
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 13 deletions.
22 changes: 21 additions & 1 deletion docs/source/llms/deployments/index.rst
Expand Up @@ -101,6 +101,9 @@ For details about the configuration file's parameters (including parameters for
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
- name: chat
endpoint_type: llm/v1/chat
Expand Down Expand Up @@ -284,6 +287,9 @@ Here's an example of a provider configuration within an endpoint:
name: gpt-4
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
In the above configuration, ``openai`` is the `provider` for the model.

Expand Down Expand Up @@ -324,6 +330,11 @@ an endpoint in the MLflow Deployments Server consists of the following fields:
* **name**: The name of the model to use. For example, ``gpt-3.5-turbo`` for OpenAI's ``GPT-3.5-Turbo`` model.
* **config**: Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key.

* **limit**: Specify the rate limit setting this endpoint will follow. The limit field contains the following fields:

* **renewal_period**: The time unit of the rate limit, one of [second|minute|hour|day|month|year].
* **calls**: The number of calls this endpoint will accept within the specified time unit.

Here's an example of an endpoint configuration:

.. code-block:: yaml
Expand All @@ -336,6 +347,9 @@ Here's an example of an endpoint configuration:
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
In the example above, a request sent to the completions endpoint would be forwarded to the
``gpt-3.5-turbo`` model provided by ``openai``.
Expand Down Expand Up @@ -423,10 +437,13 @@ Here is an example of a single-endpoint configuration:
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 10
In this example, we define an endpoint named ``chat`` that corresponds to the ``llm/v1/chat`` type, which
will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service.
will use the ``gpt-3.5-turbo`` model from OpenAI to return query responses from the OpenAI service, and accept up to 10 requests per minute.

The MLflow Deployments Server configuration is very easy to update.
Simply edit the configuration file and save your changes, and the MLflow Deployments Server will automatically
Expand Down Expand Up @@ -681,6 +698,9 @@ An example configuration for Azure OpenAI is:
openai_deployment_name: "{your_deployment_name}"
openai_api_base: "https://{your_resource_name}-azureopenai.openai.azure.com/"
openai_api_version: "2023-05-15"
limit:
renewal_period: minute
calls: 10
.. note::
Expand Down
2 changes: 1 addition & 1 deletion examples/deployments/deployments_server/openai/config.yaml
Expand Up @@ -7,7 +7,7 @@ endpoints:
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: "minute"
renewal_period: minute
calls: 10

- name: completions
Expand Down
13 changes: 2 additions & 11 deletions mlflow/deployments/server/app.py
@@ -1,11 +1,10 @@
from pathlib import Path
from typing import Any, Dict, List, Optional, Type, Union
from typing import Any, Dict, List, Optional, Union

from fastapi import FastAPI, HTTPException, Request
from fastapi.exceptions import RequestValidationError
from fastapi.openapi.docs import get_swagger_ui_html
from fastapi.responses import FileResponse, RedirectResponse
from pydantic import BaseModel, ValidationError
from pydantic import BaseModel
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded
from slowapi.util import get_remote_address
Expand Down Expand Up @@ -78,14 +77,6 @@ def get_dynamic_route(self, route_name: str) -> Optional[Route]:
return self.dynamic_routes.get(route_name)


async def parse_request_schema(request: Request, cls: Type[BaseModel]) -> BaseModel:
payload = await request.json()
try:
return cls(**payload)
except ValidationError as e:
raise RequestValidationError(e.errors())


def _create_chat_endpoint(config: RouteConfig):
prov = get_provider(config.model.provider)(config)

Expand Down

0 comments on commit 7d1aa2b

Please sign in to comment.