[DO NOT REVIEW] AI gateway rate limits design & quick POC #9939

harupy · 2023-10-13T10:25:19Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

mlflow gateway start --config-path examples/gateway/openai/config.yaml

curl -X 'POST' 'http://127.0.0.1:5000/gateway/chat/invocations' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
  "max_tokens": 10,
  "messages": [
    {
      "role": "user",
      "content": "hello"
    }
  ]
}'

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2023-10-13T10:25:36Z

Documentation preview for 15ce328 will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/6529357779.

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2023-10-16T05:27:36Z

examples/gateway/openai/config.yaml

+    limit:
+      renewal_period: "minute"
+      calls: 1


Global limit vs. per-route limit.

per-route limit

harupy · 2023-10-16T05:27:58Z

mlflow/gateway/app.py

@@ -6,8 +6,14 @@
 from fastapi.openapi.docs import get_swagger_ui_html
 from fastapi.responses import FileResponse, RedirectResponse
 from pydantic import BaseModel
+from slowapi import Limiter, _rate_limit_exceeded_handler


https://github.com/laurentS/slowapi

https://github.com/laurentS/slowapi/blob/master/docs/examples.md#examples

harupy · 2023-10-16T07:09:45Z

examples/gateway/openai/config.yaml

@@ -1,6 +1,9 @@
 routes:
  - name: chat
    route_type: llm/v1/chat
+    limit:


Do we need multiple limits?

harupy · 2023-10-20T01:13:11Z

@TomeHirata Would you be interested in working on this?

TomeHirata · 2023-10-23T04:03:05Z

@harupy Yes, is there any ticket for this task? If so, please assign this task to me!

TomeHirata · 2023-12-18T22:23:54Z

@harupy I read the doc and understood the overall direction. Before starting the implementation, can I ask if there is a way for me to see this link?
I also saw mlflow.gateway.get_limits is deprecated. Should I reuse the method? or recreate a new one?

harupy · 2023-12-19T01:39:40Z

can I ask if there is a way for me to see this link?

Sorry, this is a link to our internal repo.

I also saw mlflow.gateway.get_limits is deprecated. Should I reuse the method? or recreate a new one?

mlflow.gateway has been deprecated and replaced by mlflow.deployments. We no longer have get_limits. The rate limit info should be included in the response of get_endpoint. I can take care of that part.

TomeHirata · 2023-12-20T15:27:24Z

Thank you for the reply. So what did the link refer to? Is it the same as the example code in the design doc?
Regarding the scope of the change, can I understand the necessary changes are as follows? Anything I'm missing?

Read the rate limit configuration from the endpoint yaml
Implement the rate limiting with slowapi
Include the rate limit info in the response of get_endpoint

harupy · 2023-12-21T00:21:05Z

Is it the same as the example code in the design doc?

Yes :)

Regarding the scope of the change, can I understand the necessary changes are as follows? Anything I'm missing?

Yes, that's correct!

TomeHirata · 2023-12-21T00:34:00Z

Thank you, let me start working on this shortly. Should I take over this PR? or create a new one?
->Update: I cannot take over this PR since this is not my repo, let me create a new one.

harupy marked this pull request as draft October 13, 2023 10:25

github-actions bot added the rn/none List under Small Changes in Changelogs. label Oct 13, 2023

harupy added 2 commits October 16, 2023 14:20

rate-limits

2093af6

Signed-off-by: harupy <hkawamura0130@gmail.com>

Make storage_uri configurable

15ce328

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy force-pushed the gateway-rate-limits branch from a51c5e0 to 15ce328 Compare October 16, 2023 05:20

harupy commented Oct 16, 2023

View reviewed changes

harupy changed the title ~~[POC] Gateway rate limits~~ [POC] Gateway rate limits design Oct 20, 2023

harupy changed the title ~~[POC] Gateway rate limits design~~ [Gateway rate limits design & quick POC Oct 20, 2023

harupy changed the title ~~[Gateway rate limits design & quick POC~~ AI gateway rate limits design & quick POC Oct 20, 2023

harupy changed the title ~~AI gateway rate limits design & quick POC~~ [DO NOT REVIEW] AI gateway rate limits design & quick POC Oct 20, 2023

harupy mentioned this pull request Nov 16, 2023

Add a guard job to prevent accidental auto-merging of PRs when cross-version tests fail #10210

Merged

37 tasks

TomeHirata mentioned this pull request Jan 4, 2024

Add rate limit to deployment api #10779

Merged

37 tasks

harupy closed this Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT REVIEW] AI gateway rate limits design & quick POC #9939

[DO NOT REVIEW] AI gateway rate limits design & quick POC #9939

harupy commented Oct 13, 2023 •

edited by github-actions bot

github-actions bot commented Oct 13, 2023 •

edited

harupy Oct 16, 2023

serena-ruan Oct 17, 2023

harupy Oct 16, 2023 •

edited

harupy Oct 16, 2023 •

edited

harupy commented Oct 20, 2023

TomeHirata commented Oct 23, 2023 •

edited

TomeHirata commented Dec 18, 2023 •

edited

harupy commented Dec 19, 2023 •

edited

TomeHirata commented Dec 20, 2023 •

edited

harupy commented Dec 21, 2023

TomeHirata commented Dec 21, 2023 •

edited

[DO NOT REVIEW] AI gateway rate limits design & quick POC #9939

[DO NOT REVIEW] AI gateway rate limits design & quick POC #9939

Conversation

harupy commented Oct 13, 2023 • edited by github-actions bot

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Oct 13, 2023 • edited

harupy Oct 16, 2023

Choose a reason for hiding this comment

serena-ruan Oct 17, 2023

Choose a reason for hiding this comment

harupy Oct 16, 2023 • edited

Choose a reason for hiding this comment

harupy Oct 16, 2023 • edited

Choose a reason for hiding this comment

harupy commented Oct 20, 2023

TomeHirata commented Oct 23, 2023 • edited

TomeHirata commented Dec 18, 2023 • edited

harupy commented Dec 19, 2023 • edited

TomeHirata commented Dec 20, 2023 • edited

harupy commented Dec 21, 2023

TomeHirata commented Dec 21, 2023 • edited

harupy commented Oct 13, 2023 •

edited by github-actions bot

github-actions bot commented Oct 13, 2023 •

edited

harupy Oct 16, 2023 •

edited

harupy Oct 16, 2023 •

edited

TomeHirata commented Oct 23, 2023 •

edited

TomeHirata commented Dec 18, 2023 •

edited

harupy commented Dec 19, 2023 •

edited

TomeHirata commented Dec 20, 2023 •

edited

TomeHirata commented Dec 21, 2023 •

edited