Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rate limit to deployment api #10779

Merged
merged 7 commits into from Jan 24, 2024
Merged

Conversation

TomeHirata
Copy link
Contributor

@TomeHirata TomeHirata commented Jan 4, 2024

Related Issues/PRs

#9939
Copy of [2023-11-18] One-Decision Doc_ Rate limits for OSS AI Gateway.pdf

What changes are proposed in this pull request?

This PR will introduce the ability to configure endpoint-level rate limits for MLflow Deployments Server

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Copy link

github-actions bot commented Jan 4, 2024

Documentation preview for 7d1aa2b will be available here when this CircleCI job completes successfully.

More info

@TomeHirata
Copy link
Contributor Author

@harupy would you mind taking a look at this PR when you have time?

@harupy
Copy link
Member

harupy commented Jan 9, 2024

@TomeHirata Thanks for filing the PR! I was taking a vacation. I'll review this PR this week.

@TomeHirata TomeHirata marked this pull request as ready for review January 17, 2024 02:59
@TomeHirata
Copy link
Contributor Author

@harupy Thank you for reviewing this PR! Can I ask if there is anything I should add or modify on this PR?

@harupy
Copy link
Member

harupy commented Jan 17, 2024

@TomeHirata I'm manually testing this PR now :)

@harupy
Copy link
Member

harupy commented Jan 17, 2024

Does rate limiting work correctly for you? I'm testing with the following config but hit Rate limit exceeded before I make 5 requests:

    limit:
      renewal_period: "minute"
      calls: 5

assert resp.json() == test_response


def test_rate_limit():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

@harupy
Copy link
Member

harupy commented Jan 17, 2024

This change fixes the issue:

diff --git a/mlflow/deployments/server/app.py b/mlflow/deployments/server/app.py
index 235a075c90..6af2b0717d 100644
--- a/mlflow/deployments/server/app.py
+++ b/mlflow/deployments/server/app.py
@@ -68,12 +68,12 @@ class GatewayAPI(FastAPI):
                 methods=["POST"],
             )
             # TODO: Remove Gateway server URLs after deprecation window elapses
-            self.add_api_route(
-                path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SUFFIX}",
-                endpoint=_route_type_to_endpoint(route, limiter),
-                methods=["POST"],
-                include_in_schema=False,
-            )
+            # self.add_api_route(
+            #     path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SUFFIX}",
+            #     endpoint=_route_type_to_endpoint(route, limiter),
+            #     methods=["POST"],
+            #     include_in_schema=False,
+            # )
             self.dynamic_routes[route.name] = route.to_route()
 
     def get_dynamic_route(self, route_name: str) -> Optional[Route]:

@harupy
Copy link
Member

harupy commented Jan 17, 2024

laurentS/slowapi#173 might be the cause. For backward compatibility, we create two routes for one endpoint, one with gateway prefix and one with deployments prefix. The route handlers have the same __name__ and lead to duplicate request checks. This is a hack, but de-duplicating __name__ as shown below works:

diff --git a/mlflow/deployments/server/app.py b/mlflow/deployments/server/app.py
index 235a075c90..133864f5f3 100644
--- a/mlflow/deployments/server/app.py
+++ b/mlflow/deployments/server/app.py
@@ -64,13 +64,13 @@ class GatewayAPI(FastAPI):
                 path=(
                     MLFLOW_DEPLOYMENTS_ENDPOINTS_BASE + route.name + MLFLOW_DEPLOYMENTS_QUERY_SUFFIX
                 ),
-                endpoint=_route_type_to_endpoint(route, limiter),
+                endpoint=_route_type_to_endpoint(route, limiter, "deployments"),
                 methods=["POST"],
             )
             # TODO: Remove Gateway server URLs after deprecation window elapses
             self.add_api_route(
                 path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SU
FFIX}",
-                endpoint=_route_type_to_endpoint(route, limiter),
+                endpoint=_route_type_to_endpoint(route, limiter, "gateway"),
                 methods=["POST"],
                 include_in_schema=False,
             )
@@ -115,7 +115,7 @@ async def _custom(request: Request):
     return request.json()
 
 
-def _route_type_to_endpoint(config: RouteConfig, limiter: Limiter):
+def _route_type_to_endpoint(config: RouteConfig, limiter: Limiter, key: str):
     provider_to_factory = {
         RouteType.LLM_V1_CHAT: _create_chat_endpoint,
         RouteType.LLM_V1_COMPLETIONS: _create_completions_endpoint,
@@ -125,6 +125,7 @@ def _route_type_to_endpoint(config: RouteConfig, limiter: L
imiter):
         handler = factory(config)
         if limit := config.limit:
             limit_value = f"{limit.calls}/{limit.renewal_period}"
+            handler.__name__ = f"{handler.__name__}_{config.name}_{key}"
             return limiter.limit(limit_value)(handler)
         else:
             return handler

@TomeHirata
Copy link
Contributor Author

TomeHirata commented Jan 17, 2024

@harupy Thank you for raising the issue, this is a nice catch. I think you're right, the Limiter object stores the rate limits settings as a dict whose key is f{view_func.__module__}.{view_func.__name__} and it seems there is no way for users to specify the key name except for overwriting the handler's __name__ field (ref).

@TomeHirata TomeHirata force-pushed the gateway-rate-limit branch 2 times, most recently from 2d833eb to 3e72dbe Compare January 17, 2024 18:01
@harupy
Copy link
Member

harupy commented Jan 18, 2024

@TomeHirata tests/gateway/test_integration.py::test_invalid_query_request_raises failed. Could be relate the changes in this PR.

@TomeHirata TomeHirata force-pushed the gateway-rate-limit branch 2 times, most recently from defc7e8 to bd3e0ae Compare January 18, 2024 17:18
@TomeHirata TomeHirata marked this pull request as draft January 18, 2024 18:08
@TomeHirata
Copy link
Contributor Author

TomeHirata commented Jan 19, 2024

@harupy The test failed due to the lack of request schema validation as a side effect of the endpoint signature change to just Request. I'm guessing we can resolve it by configuring a customized Route class, so let me investigate around that. Of course, any suggestions are appreciated.

@harupy
Copy link
Member

harupy commented Jan 19, 2024

@TomeHirata Got it. Thanks for the explanation! Is it possible to remove validate_limit to avoid importing limits?

@TomeHirata
Copy link
Contributor Author

@harupy It's possible, but don't we need a validation for limits?

@harupy
Copy link
Member

harupy commented Jan 19, 2024

I don't think we need to validate limit values if we need to import limits but curious what error slowapi would raise for an invalid limit value.

@TomeHirata
Copy link
Contributor Author

TomeHirata commented Jan 19, 2024

I don't think we need to validate limit values if we need to import limits but curious what error slowapi would raise for an invalid limit value.

If the format of the limit parameter is invalid, slowapi does not raise an exception and just emits a log error, leading to a failure of rate limiting (ref). So I think it's better to have a validation for the limit parameter. We could have our own validation, but there might be a discrepancy unless we use https://github.com/alisaifee/limits/ library which slowapi delegates the actual rate limitting.

@harupy
Copy link
Member

harupy commented Jan 22, 2024

Got it, let's add validation. Can we import limits in validate_limit?

@TomeHirata TomeHirata marked this pull request as ready for review January 22, 2024 11:19
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
@github-actions github-actions bot added area/deployments MLflow Deployments client APIs, server, and third-party Deployments integrations area/docs Documentation issues rn/feature Mention under Features in Changelogs. labels Jan 22, 2024
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
@TomeHirata
Copy link
Contributor Author

@harupy Hi, I've moved the import statement to the validation logic and updated the examples of the docs. Could you take another look?

Comment on lines +333 to +337
* **limit**: Specify the rate limit setting this endpoint will follow. The limit field contains the following fields:

* **renewal_period**: The time unit of the rate limit, one of [second|minute|hour|day|month|year].
* **calls**: The number of calls this endpoint will accept within the specified time unit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating the docs!

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@harupy harupy merged commit 6c0768a into mlflow:master Jan 24, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deployments MLflow Deployments client APIs, server, and third-party Deployments integrations area/docs Documentation issues rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants