Add rate limit to deployment api #10779

TomeHirata · 2024-01-04T04:36:04Z

Related Issues/PRs

#9939
Copy of [2023-11-18] One-Decision Doc_ Rate limits for OSS AI Gateway.pdf

What changes are proposed in this pull request?

This PR will introduce the ability to configure endpoint-level rate limits for MLflow Deployments Server

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2024-01-04T04:36:25Z

Documentation preview for 7d1aa2b will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/7610887829.

mlflow/deployments/server/app.py

TomeHirata · 2024-01-04T04:53:02Z

@harupy would you mind taking a look at this PR when you have time?

harupy · 2024-01-09T02:09:38Z

@TomeHirata Thanks for filing the PR! I was taking a vacation. I'll review this PR this week.

TomeHirata · 2024-01-17T03:00:45Z

@harupy Thank you for reviewing this PR! Can I ask if there is anything I should add or modify on this PR?

harupy · 2024-01-17T09:05:27Z

@TomeHirata I'm manually testing this PR now :)

harupy · 2024-01-17T10:26:01Z

Does rate limiting work correctly for you? I'm testing with the following config but hit Rate limit exceeded before I make 5 requests:

    limit:
      renewal_period: "minute"
      calls: 5

harupy · 2024-01-17T10:33:34Z

tests/deployments/server/test_server_app.py

+        assert resp.json() == test_response
+
+
+def test_rate_limit():


harupy · 2024-01-17T10:51:42Z

This change fixes the issue:

diff --git a/mlflow/deployments/server/app.py b/mlflow/deployments/server/app.py
index 235a075c90..6af2b0717d 100644
--- a/mlflow/deployments/server/app.py
+++ b/mlflow/deployments/server/app.py
@@ -68,12 +68,12 @@ class GatewayAPI(FastAPI):
                 methods=["POST"],
             )
             # TODO: Remove Gateway server URLs after deprecation window elapses
-            self.add_api_route(
-                path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SUFFIX}",
-                endpoint=_route_type_to_endpoint(route, limiter),
-                methods=["POST"],
-                include_in_schema=False,
-            )
+            # self.add_api_route(
+            #     path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SUFFIX}",
+            #     endpoint=_route_type_to_endpoint(route, limiter),
+            #     methods=["POST"],
+            #     include_in_schema=False,
+            # )
             self.dynamic_routes[route.name] = route.to_route()
 
     def get_dynamic_route(self, route_name: str) -> Optional[Route]:

harupy · 2024-01-17T11:12:11Z

laurentS/slowapi#173 might be the cause. For backward compatibility, we create two routes for one endpoint, one with gateway prefix and one with deployments prefix. The route handlers have the same __name__ and lead to duplicate request checks. This is a hack, but de-duplicating __name__ as shown below works:

diff --git a/mlflow/deployments/server/app.py b/mlflow/deployments/server/app.py
index 235a075c90..133864f5f3 100644
--- a/mlflow/deployments/server/app.py
+++ b/mlflow/deployments/server/app.py
@@ -64,13 +64,13 @@ class GatewayAPI(FastAPI):
                 path=(
                     MLFLOW_DEPLOYMENTS_ENDPOINTS_BASE + route.name + MLFLOW_DEPLOYMENTS_QUERY_SUFFIX
                 ),
-                endpoint=_route_type_to_endpoint(route, limiter),
+                endpoint=_route_type_to_endpoint(route, limiter, "deployments"),
                 methods=["POST"],
             )
             # TODO: Remove Gateway server URLs after deprecation window elapses
             self.add_api_route(
                 path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SU
FFIX}",
-                endpoint=_route_type_to_endpoint(route, limiter),
+                endpoint=_route_type_to_endpoint(route, limiter, "gateway"),
                 methods=["POST"],
                 include_in_schema=False,
             )
@@ -115,7 +115,7 @@ async def _custom(request: Request):
     return request.json()
 
 
-def _route_type_to_endpoint(config: RouteConfig, limiter: Limiter):
+def _route_type_to_endpoint(config: RouteConfig, limiter: Limiter, key: str):
     provider_to_factory = {
         RouteType.LLM_V1_CHAT: _create_chat_endpoint,
         RouteType.LLM_V1_COMPLETIONS: _create_completions_endpoint,
@@ -125,6 +125,7 @@ def _route_type_to_endpoint(config: RouteConfig, limiter: L
imiter):
         handler = factory(config)
         if limit := config.limit:
             limit_value = f"{limit.calls}/{limit.renewal_period}"
+            handler.__name__ = f"{handler.__name__}_{config.name}_{key}"
             return limiter.limit(limit_value)(handler)
         else:
             return handler

TomeHirata · 2024-01-17T17:54:28Z

@harupy Thank you for raising the issue, this is a nice catch. I think you're right, the Limiter object stores the rate limits settings as a dict whose key is f{view_func.__module__}.{view_func.__name__} and it seems there is no way for users to specify the key name except for overwriting the handler's __name__ field (ref).

harupy · 2024-01-18T01:59:54Z

@TomeHirata tests/gateway/test_integration.py::test_invalid_query_request_raises failed. Could be relate the changes in this PR.

TomeHirata · 2024-01-19T02:02:15Z

@harupy The test failed due to the lack of request schema validation as a side effect of the endpoint signature change to just Request. I'm guessing we can resolve it by configuring a customized Route class, so let me investigate around that. Of course, any suggestions are appreciated.

harupy · 2024-01-19T06:47:37Z

@TomeHirata Got it. Thanks for the explanation! Is it possible to remove validate_limit to avoid importing limits?

examples/deployments/deployments_server/openai/config.yaml

TomeHirata · 2024-01-19T07:31:21Z

@harupy It's possible, but don't we need a validation for limits?

harupy · 2024-01-19T07:50:39Z

I don't think we need to validate limit values if we need to import limits but curious what error slowapi would raise for an invalid limit value.

TomeHirata · 2024-01-19T11:24:10Z

I don't think we need to validate limit values if we need to import limits but curious what error slowapi would raise for an invalid limit value.

If the format of the limit parameter is invalid, slowapi does not raise an exception and just emits a log error, leading to a failure of rate limiting (ref). So I think it's better to have a validation for the limit parameter. We could have our own validation, but there might be a discrepancy unless we use https://github.com/alisaifee/limits/ library which slowapi delegates the actual rate limitting.

harupy · 2024-01-22T07:10:53Z

Got it, let's add validation. Can we import limits in validate_limit?

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

TomeHirata · 2024-01-22T14:43:59Z

@harupy Hi, I've moved the import statement to the validation logic and updated the examples of the docs. Could you take another look?

harupy · 2024-01-23T01:13:30Z

docs/source/llms/deployments/index.rst

+* **limit**: Specify the rate limit setting this endpoint will follow. The limit field contains the following fields:
+
+    * **renewal_period**: The time unit of the rate limit, one of [second|minute|hour|day|month|year].
+    * **calls**: The number of calls this endpoint will accept within the specified time unit.
+


thanks for updating the docs!

harupy

LGTM!

TomeHirata commented Jan 4, 2024

View reviewed changes

mlflow/deployments/server/app.py Show resolved Hide resolved

TomeHirata commented Jan 4, 2024

View reviewed changes

mlflow/deployments/server/app.py Show resolved Hide resolved

TomeHirata marked this pull request as ready for review January 17, 2024 02:59

harupy reviewed Jan 17, 2024

View reviewed changes

TomeHirata force-pushed the gateway-rate-limit branch 2 times, most recently from 2d833eb to 3e72dbe Compare January 17, 2024 18:01

TomeHirata force-pushed the gateway-rate-limit branch 2 times, most recently from defc7e8 to bd3e0ae Compare January 18, 2024 17:18

TomeHirata marked this pull request as draft January 18, 2024 18:08

harupy reviewed Jan 19, 2024

View reviewed changes

examples/deployments/deployments_server/openai/config.yaml Outdated Show resolved Hide resolved

TomeHirata marked this pull request as ready for review January 22, 2024 11:19

TomeHirata added 3 commits January 22, 2024 11:19

feat: add rate limit to deployment api

0ecf629

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

fix: fix unit tests and address slowapi issues

1ceb291

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

chore: add validation for request schema

08c34f7

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

TomeHirata added 3 commits January 22, 2024 11:20

chore: add validation for limit value

f361239

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

fix: fix the errors happenning for pydantic v2

a9e93c6

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

feat: receive both request and body

acca49f

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

TomeHirata force-pushed the gateway-rate-limit branch from e0287cb to acca49f Compare January 22, 2024 11:20

github-actions bot added area/deployments MLflow Deployments client APIs, server, and third-party Deployments integrations area/docs Documentation issues rn/feature Mention under Features in Changelogs. labels Jan 22, 2024

docs: update rate limit related section

7d1aa2b

Signed-off-by: TomeHirata <tomu.hirata@gmail.com>

TomeHirata force-pushed the gateway-rate-limit branch from e0aa4fa to 7d1aa2b Compare January 22, 2024 12:09

harupy reviewed Jan 23, 2024

View reviewed changes

harupy approved these changes Jan 24, 2024

View reviewed changes

harupy merged commit 6c0768a into mlflow:master Jan 24, 2024
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rate limit to deployment api #10779

Add rate limit to deployment api #10779

TomeHirata commented Jan 4, 2024 •

edited

github-actions bot commented Jan 4, 2024 •

edited

TomeHirata commented Jan 4, 2024

harupy commented Jan 9, 2024

TomeHirata commented Jan 17, 2024

harupy commented Jan 17, 2024

harupy commented Jan 17, 2024

harupy Jan 17, 2024

harupy commented Jan 17, 2024

harupy commented Jan 17, 2024 •

edited

TomeHirata commented Jan 17, 2024 •

edited

harupy commented Jan 18, 2024

TomeHirata commented Jan 19, 2024 •

edited

harupy commented Jan 19, 2024 •

edited

TomeHirata commented Jan 19, 2024

harupy commented Jan 19, 2024 •

edited

TomeHirata commented Jan 19, 2024 •

edited

harupy commented Jan 22, 2024

TomeHirata commented Jan 22, 2024

harupy Jan 23, 2024

harupy left a comment

Add rate limit to deployment api #10779

Add rate limit to deployment api #10779

Conversation

TomeHirata commented Jan 4, 2024 • edited

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Jan 4, 2024 • edited

TomeHirata commented Jan 4, 2024

harupy commented Jan 9, 2024

TomeHirata commented Jan 17, 2024

harupy commented Jan 17, 2024

harupy commented Jan 17, 2024

harupy Jan 17, 2024

Choose a reason for hiding this comment

harupy commented Jan 17, 2024

harupy commented Jan 17, 2024 • edited

TomeHirata commented Jan 17, 2024 • edited

harupy commented Jan 18, 2024

TomeHirata commented Jan 19, 2024 • edited

harupy commented Jan 19, 2024 • edited

TomeHirata commented Jan 19, 2024

harupy commented Jan 19, 2024 • edited

TomeHirata commented Jan 19, 2024 • edited

harupy commented Jan 22, 2024

TomeHirata commented Jan 22, 2024

harupy Jan 23, 2024

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

TomeHirata commented Jan 4, 2024 •

edited

github-actions bot commented Jan 4, 2024 •

edited

harupy commented Jan 17, 2024 •

edited

TomeHirata commented Jan 17, 2024 •

edited

TomeHirata commented Jan 19, 2024 •

edited

harupy commented Jan 19, 2024 •

edited

harupy commented Jan 19, 2024 •

edited

TomeHirata commented Jan 19, 2024 •

edited