New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rate limit to deployment api #10779
Conversation
Documentation preview for 7d1aa2b will be available here when this CircleCI job completes successfully. More info
|
@harupy would you mind taking a look at this PR when you have time? |
@TomeHirata Thanks for filing the PR! I was taking a vacation. I'll review this PR this week. |
@harupy Thank you for reviewing this PR! Can I ask if there is anything I should add or modify on this PR? |
@TomeHirata I'm manually testing this PR now :) |
Does rate limiting work correctly for you? I'm testing with the following config but hit
|
assert resp.json() == test_response | ||
|
||
|
||
def test_rate_limit(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test!
This change fixes the issue: diff --git a/mlflow/deployments/server/app.py b/mlflow/deployments/server/app.py
index 235a075c90..6af2b0717d 100644
--- a/mlflow/deployments/server/app.py
+++ b/mlflow/deployments/server/app.py
@@ -68,12 +68,12 @@ class GatewayAPI(FastAPI):
methods=["POST"],
)
# TODO: Remove Gateway server URLs after deprecation window elapses
- self.add_api_route(
- path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SUFFIX}",
- endpoint=_route_type_to_endpoint(route, limiter),
- methods=["POST"],
- include_in_schema=False,
- )
+ # self.add_api_route(
+ # path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SUFFIX}",
+ # endpoint=_route_type_to_endpoint(route, limiter),
+ # methods=["POST"],
+ # include_in_schema=False,
+ # )
self.dynamic_routes[route.name] = route.to_route()
def get_dynamic_route(self, route_name: str) -> Optional[Route]: |
laurentS/slowapi#173 might be the cause. For backward compatibility, we create two routes for one endpoint, one with diff --git a/mlflow/deployments/server/app.py b/mlflow/deployments/server/app.py
index 235a075c90..133864f5f3 100644
--- a/mlflow/deployments/server/app.py
+++ b/mlflow/deployments/server/app.py
@@ -64,13 +64,13 @@ class GatewayAPI(FastAPI):
path=(
MLFLOW_DEPLOYMENTS_ENDPOINTS_BASE + route.name + MLFLOW_DEPLOYMENTS_QUERY_SUFFIX
),
- endpoint=_route_type_to_endpoint(route, limiter),
+ endpoint=_route_type_to_endpoint(route, limiter, "deployments"),
methods=["POST"],
)
# TODO: Remove Gateway server URLs after deprecation window elapses
self.add_api_route(
path=f"{MLFLOW_GATEWAY_ROUTE_BASE}{route.name}{MLFLOW_QUERY_SU
FFIX}",
- endpoint=_route_type_to_endpoint(route, limiter),
+ endpoint=_route_type_to_endpoint(route, limiter, "gateway"),
methods=["POST"],
include_in_schema=False,
)
@@ -115,7 +115,7 @@ async def _custom(request: Request):
return request.json()
-def _route_type_to_endpoint(config: RouteConfig, limiter: Limiter):
+def _route_type_to_endpoint(config: RouteConfig, limiter: Limiter, key: str):
provider_to_factory = {
RouteType.LLM_V1_CHAT: _create_chat_endpoint,
RouteType.LLM_V1_COMPLETIONS: _create_completions_endpoint,
@@ -125,6 +125,7 @@ def _route_type_to_endpoint(config: RouteConfig, limiter: L
imiter):
handler = factory(config)
if limit := config.limit:
limit_value = f"{limit.calls}/{limit.renewal_period}"
+ handler.__name__ = f"{handler.__name__}_{config.name}_{key}"
return limiter.limit(limit_value)(handler)
else:
return handler |
@harupy Thank you for raising the issue, this is a nice catch. I think you're right, the Limiter object stores the rate limits settings as a dict whose key is |
2d833eb
to
3e72dbe
Compare
@TomeHirata |
defc7e8
to
bd3e0ae
Compare
@harupy The test failed due to the lack of request schema validation as a side effect of the endpoint signature change to just |
@TomeHirata Got it. Thanks for the explanation! Is it possible to remove |
@harupy It's possible, but don't we need a validation for limits? |
I don't think we need to validate limit values if we need to import |
If the format of the limit parameter is invalid, slowapi does not raise an exception and just emits a log error, leading to a failure of rate limiting (ref). So I think it's better to have a validation for the limit parameter. We could have our own validation, but there might be a discrepancy unless we use https://github.com/alisaifee/limits/ library which slowapi delegates the actual rate limitting. |
Got it, let's add validation. Can we import |
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
e0287cb
to
acca49f
Compare
Signed-off-by: TomeHirata <tomu.hirata@gmail.com>
e0aa4fa
to
7d1aa2b
Compare
@harupy Hi, I've moved the import statement to the validation logic and updated the examples of the docs. Could you take another look? |
* **limit**: Specify the rate limit setting this endpoint will follow. The limit field contains the following fields: | ||
|
||
* **renewal_period**: The time unit of the rate limit, one of [second|minute|hour|day|month|year]. | ||
* **calls**: The number of calls this endpoint will accept within the specified time unit. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating the docs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Related Issues/PRs
#9939
Copy of [2023-11-18] One-Decision Doc_ Rate limits for OSS AI Gateway.pdf
What changes are proposed in this pull request?
This PR will introduce the ability to configure endpoint-level rate limits for MLflow Deployments Server
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes