Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion src/quota/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
"""Quota management."""
"""Quota management.
Tokens and token quota limits
Tokens are small chunks of text, which can be as small as one character or as
large as one word. Tokens are the units of measurement used to quantify the
amount of text that the service sends to, or receives from, a large language
model (LLM). Every interaction with the Service and the LLM is counted in
tokens.
LLM providers typically charge for their services using a token-based pricing model.
Token quota limits define the number of tokens that can be used in a certain
timeframe. Implementing token quota limits helps control costs, encourage more
efficient use of queries, and regulate demand on the system. In a multi-user
configuration, token quota limits help provide equal access to all users
ensuring everyone has an opportunity to submit queries.
Comment on lines +5 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct the token description and fix comma splice.

Tokens in BPE vocabularies can span more than one full word, so saying they are “as large as one word” is inaccurate. The last sentence also needs a comma before “ensuring” to avoid a run-on. Please tighten both sentences.

-Tokens are small chunks of text, which can be as small as one character or as
-large as one word.
+Tokens are small chunks of text, which can be as small as a single character and
+may span multiple characters, including whitespace or fragments of words.
@@
-In a multi-user configuration, token quota limits help provide equal access to all users
-ensuring everyone has an opportunity to submit queries.
+In a multi-user configuration, token quota limits help provide equal access to all users,
+ensuring everyone has an opportunity to submit queries.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Tokens are small chunks of text, which can be as small as one character or as
large as one word. Tokens are the units of measurement used to quantify the
amount of text that the service sends to, or receives from, a large language
model (LLM). Every interaction with the Service and the LLM is counted in
tokens.
LLM providers typically charge for their services using a token-based pricing model.
Token quota limits define the number of tokens that can be used in a certain
timeframe. Implementing token quota limits helps control costs, encourage more
efficient use of queries, and regulate demand on the system. In a multi-user
configuration, token quota limits help provide equal access to all users
ensuring everyone has an opportunity to submit queries.
Tokens are small chunks of text, which can be as small as a single character and
may span multiple characters, including whitespace or fragments of words. Tokens are the units of measurement used to quantify the
amount of text that the service sends to, or receives from, a large language
model (LLM). Every interaction with the Service and the LLM is counted in
tokens.
LLM providers typically charge for their services using a token-based pricing model.
Token quota limits define the number of tokens that can be used in a certain
timeframe. Implementing token quota limits helps control costs, encourage more
efficient use of queries, and regulate demand on the system. In a multi-user
configuration, token quota limits help provide equal access to all users,
ensuring everyone has an opportunity to submit queries.
🤖 Prompt for AI Agents
In src/quota/__init__.py around lines 5 to 17, the token definition is
inaccurate and there's a comma splice: update the token sentence to say tokens
can be smaller than a word or span multiple words (e.g., subword/BPE units)
instead of “as large as one word,” and add a comma before “ensuring” in the
final sentence so it reads “…access to all users, ensuring everyone has an
opportunity to submit queries.”

"""
32 changes: 31 additions & 1 deletion src/quota/quota_limiter.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,34 @@
"""Abstract class that is the parent for all quota limiter implementations."""
"""Abstract class that is the parent for all quota limiter implementations.

It is possible to limit quota usage per user or per service or services (that
typically run in one cluster). Each limit is configured as a separate _quota
limiter_. It can be of type `user_limiter` or `cluster_limiter` (which is name
that makes sense in OpenShift deployment). There are three configuration
options for each limiter:

1. `period` specified in a human-readable form, see
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
for all possible options. When the end of the period is reached, quota is reset
or increased
1. `initial_quota` is set at beginning of the period
1. `quota_increase` this value (if specified) is used to increase quota when period is reached

There are two basic use cases:

1. When quota needs to be reset specific value periodically (for example on
weekly on monthly basis), specify `initial_quota` to the required value
1. When quota needs to be increased by specific value periodically (for example
on daily basis), specify `quota_increase`

Technically it is possible to specify both `initial_quota` and
`quota_increase`. It means that at the end of time period the quota will be
*reset* to `initial_quota + quota_increase`.

Please note that any number of quota limiters can be configured. For example,
two user quota limiters can be set to:
- increase quota by 100,000 tokens each day
- reset quota to 10,000,000 tokens each month
"""

from abc import ABC, abstractmethod

Expand Down
Loading