# How to handle rate limits
# 如何处理速率限制
When you call the OpenAI API repeatedly, you may encounter error messages that say `429: 'Too Many Requests'` or `RateLimitError`. These error messages come from exceeding the API's rate limits.
当你反复调用OpenAI API时，你可能会遇到错误消息，这些消息会说`429: 'Too Many Requests'`或`RateLimitError`。这些错误消息来自超过API的速率限制。

This guide shares tips for avoiding and handling rate limit errors.
指南分享了避免和处理速率限制错误的技巧。

To see an example script for throttling parallel requests to avoid rate limit errors, see [api_request_parallel_processor.py](api_request_parallel_processor.py).
看一个示例脚本，用于限制并行请求以避免速率限制错误，请参见[api_request_parallel_processor.py](api_request_parallel_processor.py)。
## Why rate limits exist
## 为什么存在速率限制
Rate limits are a common practice for APIs, and they're put in place for a few different reasons.
速率限制是API的常见做法，它们有几个不同的原因。
- First, they help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, OpenAI can prevent this kind of activity.
- 首先，它们有助于防止滥用或滥用API。例如，恶意的行为者可以通过向API发送大量请求来使其过载或导致服务中断。通过设置速率限制，OpenAI可以防止这种行为。
- Second, rate limits help ensure that everyone has fair access to the API. If one person or organization makes an excessive number of requests, it could bog down the API for everyone else. By throttling the number of requests that a single user can make, OpenAI ensures that everyone has an opportunity to use the API without experiencing slowdowns.
- 其次，速率限制有助于确保每个人都可以公平地访问API。如果一个人或组织发出了过多的请求，它可能会使其他人的API陷入困境。通过限制单个用户可以发出的请求数量，OpenAI确保每个人都有机会使用API而不会遇到减速。
- Lastly, rate limits can help OpenAI manage the aggregate load on its infrastructure. If requests to the API increase dramatically, it could tax the servers and cause performance issues. By setting rate limits, OpenAI can help maintain a smooth and consistent experience for all users.
- 最后，速率限制可以帮助OpenAI管理其基础设施的聚合负载。如果对API的请求增加了很多，它可能会加重服务器的负担并导致性能问题。通过设置速率限制，OpenAI可以帮助所有用户维持平稳和一致的体验。
Although hitting rate limits can be frustrating, rate limits exist to protect the reliable operation of the API for its users.
- 尽管遇到速率限制可能令人沮丧，但速率限制存在是为了保护API为其用户提供可靠操作的权利。

## Default rate limits
## 默认速率限制
As of Jan 2023, the default rate limits are:
2023年1月，缺省速率限制如下：
<table>
<thead>
  <tr>
    <th></th>
    <th>Text Completion &amp; Embedding endpoints </th>
    <th>Code &amp; Edit endpoints</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>Free trial users </td>
    <td>
        <ul>
            <li>20 requests / minute</li>
            <li>150,000 tokens / minute</li>
        </ul>
    </td>
    <td>
        <ul>
            <li>20 requests / minute</li>
            <li>150,000 tokens / minute</li>
        </ul>
    </td>
  </tr>
  <tr>
    <td>Pay-as-you-go users (in your first 48 hours)</td>
    <td>
        <ul>
            <li>60 requests / minute</li>
            <li>250,000 davinci tokens / minute (and proportionally more for cheaper models)</li>
        </ul>
    </td>
    <td>
        <ul>
            <li>20 requests / minute</li>
            <li>150,000 tokens / minute</li>
        </ul>
    </td>
  </tr>
  <tr>
    <td>Pay-as-you-go users (after your first 48 hours)</td>
    <td>
        <ul>
            <li>3,000 requests / minute</li>
            <li>250,000 davinci tokens / minute (and proportionally more for cheaper models)</li>
        </ul>
    </td>
    <td>
        <ul>
            <li>20 requests / minute</li>
            <li>150,000 tokens / minute</li>
        </ul>
    </td>
  </tr>
</tbody>
</table>

For reference, 1,000 tokens is roughly a page of text.
供参考，1000个令牌大约是一张纸。
### Other rate limit resources
### 其他速率限制资源
Read more about OpenAI's rate limits in these other resources:
读更多关于OpenAI的速率限制的其他资源：
- [Guide: Rate limits](https://beta.openai.com/docs/guides/rate-limits/overview)
- [Help Center: Is API usage subject to any rate limits?](https://help.openai.com/en/articles/5955598-is-api-usage-subject-to-any-rate-limits)
- [Help Center: How can I solve 429: 'Too Many Requests' errors?](https://help.openai.com/en/articles/5955604-how-can-i-solve-429-too-many-requests-errors)

### Requesting a rate limit increase
### 请求速率限制增加
If you'd like your organization's rate limit increased, please fill out the following form:
如果您想要增加您的组织的速率限制，请填写以下表格：
- [OpenAI Rate Limit Increase Request form](https://forms.gle/56ZrwXXoxAN1yt6i9)


## Example rate limit error
## 速率限制错误示例
A rate limit error will occur when API requests are sent too quickly. If using the OpenAI Python library, they will look something like:
当API请求发送得太快时，将发生速率限制错误。如果使用OpenAI Python库，它们看起来会像这样：
```
RateLimitError: Rate limit reached for default-codex in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min. Contact support@openai.com if you continue to have issues or if you’d like to request an increase.
```

Below is example code for triggering a rate limit error.
下面是触发速率限制错误的示例代码。

In [None]:
import openai  # for making OpenAI API requests

# request a bunch of completions in a loop
for _ in range(100):
    openai.Completion.create(
        model="code-cushman-001",
        prompt="def magic_function():\n\t",
        max_tokens=10,
    )


## How to avoid rate limit errors
## 如何避免速率限制错误
### Retrying with exponential backoff
### 使用指数退避重试
One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff. Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached.
一个简单的方法来避免速率限制错误是使用随机指数退避自动重试请求。使用指数退避重试意味着在遇到速率限制错误时执行短暂的睡眠，然后重试不成功的请求。如果请求仍然不成功，则增加睡眠时间，然后重复该过程。这将继续进行，直到请求成功或达到最大重试次数为止。
This approach has many benefits:
这种方法有很多好处：
- Automatic retries means you can recover from rate limit errors without crashes or missing data
- 自动重试意味着您可以从速率限制错误中恢复，而不会崩溃或丢失数据
- Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail
- 指数退避意味着您可以尝试快速重试，同时如果您的前几次重试失败，仍然可以从较长的延迟中受益
- Adding random jitter to the delay helps retries from all hitting at the same time
- 将随机抖动添加到延迟有助于所有重试同时命中
Note that unsuccessful requests contribute to your per-minute limit, so continuously resending a request won’t work.
请注意，不成功的请求会影响您的每分钟限制，因此连续重新发送请求是行不通的。
Below are a few example solutions.
下面是一些示例解决方案。

#### Example #1: Using the Tenacity library
#### 示例＃1：使用Tenacity库

[Tenacity](https://tenacity.readthedocs.io/en/latest/) is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.
[Tenacity](https://tenacity.readthedocs.io/en/latest/)是一个Apache 2.0许可的通用重试库，用Python编写，用于简化将重试行为添加到几乎任何内容的任务。
To add exponential backoff to your requests, you can use the `tenacity.retry` [decorator](https://peps.python.org/pep-0318/). The following example uses the `tenacity.wait_random_exponential` function to add random exponential backoff to a request.
要将指数退避添加到您的请求中，您可以使用`tenacity.retry`[装饰器](https://peps.python.org/pep-0318/)。以下示例使用`tenacity.wait_random_exponential`函数将随机指数退避添加到请求中。
Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.
请注意，Tenacity库是第三方工具，OpenAI不保证其可靠性或安全性。

In [1]:
import openai  # for OpenAI API calls
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # for exponential backoff

# retry the request with exponential backoff
# 使用指数退避重试请求
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)


completion_with_backoff(model="text-davinci-002", prompt="Once upon a time,")


<OpenAIObject text_completion id=cmpl-72EfrUviNnvZyvBliqxQdiC4vbFs5 at 0x11b72c0b0> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " there was a beautiful dragon.\n\nShe was the most beautiful dragon anyone had"
    }
  ],
  "created": 1680767419,
  "id": "cmpl-72EfrUviNnvZyvBliqxQdiC4vbFs5",
  "model": "text-davinci-002",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 16,
    "prompt_tokens": 5,
    "total_tokens": 21
  }
}

#### Example #2: Using the backoff library
#### 示例＃2：使用backoff库
Another library that provides function decorators for backoff and retry is [backoff](https://pypi.org/project/backoff/).
另一个为回退和重试提供函数装饰器的库是[backoff](https://pypi.org/project/backoff/)。
Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.
与Tenacity一样，backoff库是第三方工具，OpenAI不保证其可靠性或安全性。

In [2]:
import backoff  # for exponential backoff
import openai  # for OpenAI API calls

# retry the request with exponential backoff
# 使用指数退避重试请求
@backoff.on_exception(backoff.expo, openai.error.RateLimitError)
def completions_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)


completions_with_backoff(model="text-davinci-002", prompt="Once upon a time,")


<OpenAIObject text_completion id=cmpl-72Emkh4vXhmQsqGsteCrQLZ6yZZkQ at 0x11b72c5f0> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " there was a beautiful princess. She was kind and gentle, and loved by all"
    }
  ],
  "created": 1680767846,
  "id": "cmpl-72Emkh4vXhmQsqGsteCrQLZ6yZZkQ",
  "model": "text-davinci-002",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 16,
    "prompt_tokens": 5,
    "total_tokens": 21
  }
}

#### Example 3: Manual backoff implementation
#### 示例3：手动退避实现

If you don't want to use third-party libraries, you can implement your own backoff logic.
如果您不想使用第三方库，您可以实现自己的退避逻辑。

In [3]:
# imports
import random
import time

import openai

# define a retry decorator
def retry_with_exponential_backoff(
    func,
    initial_delay: float = 1,
    exponential_base: float = 2,
    jitter: bool = True,
    max_retries: int = 10,
    errors: tuple = (openai.error.RateLimitError,),
):
    """Retry a function with exponential backoff."""

    def wrapper(*args, **kwargs):
        # Initialize variables
        num_retries = 0
        delay = initial_delay

        # Loop until a successful response or max_retries is hit or an exception is raised
        while True:
            try:
                return func(*args, **kwargs)

            # Retry on specified errors
            except errors as e:
                # Increment retries
                num_retries += 1

                # Check if max retries has been reached
                if num_retries > max_retries:
                    raise Exception(
                        f"Maximum number of retries ({max_retries}) exceeded."
                    )

                # Increment the delay
                delay *= exponential_base * (1 + jitter * random.random())

                # Sleep for the delay
                time.sleep(delay)

            # Raise exceptions for any errors not specified
            except Exception as e:
                raise e

    return wrapper

# 在Python中，将一个函数或方法作为参数传递给另一个函数或方法是很常见的。这个过程有一个专门的术语 - 装饰器（decorator）。通过在函数或方法定义的顶部使用“@另一个函数”语法，我们可以使用装饰器来实现在不修改被装饰函数源代码的情况下，给该函数添加一些新的功能。装饰器函数接受被修饰函数作为参数，然后返回一个新函数，该新函数可以包装被修饰函数的操作。因此，可以使用装饰器添加各种功能，如缓存，输入验证，授权等。
@retry_with_exponential_backoff
def completions_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)


completions_with_backoff(model="text-davinci-002", prompt="Once upon a time,")


<OpenAIObject text_completion id=cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT at 0x111024220> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " a man decided to greatly improve his karma by turning his life around.\n\n"
    }
  ],
  "created": 1662793903,
  "id": "cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT",
  "model": "text-davinci-002",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 16,
    "prompt_tokens": 5,
    "total_tokens": 21
  }
}

## How to maximize throughput of batch processing given rate limits
## 给定速率限制如何最大化批处理的吞吐量
If you're processing real-time requests from users, backoff and retry is a great strategy to minimize latency while avoiding rate limit errors.
如果您正在处理来自用户的实时请求，则退避和重试是最小化延迟同时避免速率限制错误的好策略。
However, if you're processing large volumes of batch data, where throughput matters more than latency, there are a few other things you can do in addition to backoff and retry.
但是，如果您正在处理大量批处理数据，其中吞吐量比延迟更重要，则除了退避和重试之外，还可以执行一些其他操作。
### Proactively adding delay between requests
### 主动在请求之间添加延迟
If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again, then backing off again, it's possible that a good fraction of your request budget will be 'wasted' on requests that need to be retried. This limits your processing throughput, given a fixed rate limit.
如果您一直在达到速率限制，然后退避，然后再次达到速率限制，然后再次退避，那么很可能您的请求预算中的一大部分都会被“浪费”在需要重试的请求上。鉴于固定的速率限制，这会限制您的处理吞吐量。
Here, one potential solution is to calculate your rate limit and add a delay equal to its reciprocal (e.g., if your rate limit 20 requests per minute, add a delay of 3–6 seconds to each request). This can help you operate near the rate limit ceiling without hitting it and incurring wasted requests.
在这里，一个潜在的解决方案是计算您的速率限制并添加一个等于其倒数的延迟（例如，如果您的速率限制为每分钟20个请求，则为每个请求添加3-6秒的延迟）。这可以帮助您在不达到速率限制并产生浪费请求的情况下接近速率限制天花板。
#### Example of adding delay to a request
#### 添加延迟到请求的示例

In [4]:
# imports
import time
import openai

# Define a function that adds a delay to a Completion API call
def delayed_completion(delay_in_seconds: float = 1, **kwargs):
    """Delay a completion by a specified amount of time."""

    # Sleep for the delay
    time.sleep(delay_in_seconds)

    # Call the Completion API and return the result
    return openai.Completion.create(**kwargs)


# Calculate the delay based on your rate limit
rate_limit_per_minute = 20
delay = 60.0 / rate_limit_per_minute

delayed_completion(
    delay_in_seconds=delay,
    model="text-davinci-002",
    prompt="Once upon a time,"
)


<OpenAIObject text_completion id=cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp at 0x11b2c7680> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " there was an idyllic little farm that sat by a babbling brook"
    }
  ],
  "created": 1662793907,
  "id": "cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp",
  "model": "text-davinci-002",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 16,
    "prompt_tokens": 5,
    "total_tokens": 21
  }
}



### Batching requests
### 批量请求

The OpenAI API has separate limits for requests per minute and tokens per minute.
OpenAI API具有每分钟请求和每分钟令牌的单独限制。

If you're hitting the limit on requests per minute, but have headroom on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with the smaller models.
如果您已经达到每分钟请求的限制，但每分钟令牌的余量较大，则可以通过将多个任务批处理到每个请求中来提高吞吐量。这将允许您每分钟处理更多令牌，特别是对于较小的模型。
Sending in a batch of prompts works exactly the same as a normal API call, except that pass in a list of strings to `prompt` parameter instead of a single string.
发送一批提示与正常的API调用完全相同，只是将字符串列表传递给`prompt`参数而不是单个字符串。
**Warning:** the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the `index` field.
**警告：**响应对象可能不会按提示的顺序返回完成，因此始终记住使用`index`字段将响应匹配回提示。
#### Example without batching
#### 不带批处理的示例

In [5]:
import openai  # for making OpenAI API requests


num_stories = 10
prompt = "Once upon a time,"

# serial example, with one story completion per request
for _ in range(num_stories):
    response = openai.Completion.create(
        model="curie",
        prompt=prompt,
        max_tokens=20,
    )

    # print story
    print(prompt + response.choices[0].text)


Once upon a time, before there were grandiloquent tales of the massacre at Fort Mims, there were stories of
Once upon a time, a full-sized search and rescue was created. However, CIDIs are the addition of requiring
Once upon a time, Schubert was hot with the films. “Schubert sings of honey, flowers,
Once upon a time, you could watch these films on your VCR, sometimes years after their initial theatrical release, and there
Once upon a time, there was a forest. In that forest, the forest animals ruled. The forest animals had their homes
Once upon a time, there were two programs that complained about false positive scans. Peacock and Midnight Manager alike, only
Once upon a time, a long, long time ago, tragedy struck. it was the darkest of nights, and there was
Once upon a time, when Adam was a perfect little gentleman, he was presented at Court as a guarantee of good character.
Once upon a time, Adam and Eve made a mistake. They ate the fruit from the tree of immortality and split the co

#### Example with batching
#### 带批处理的示例

In [6]:
import openai  # for making OpenAI API requests


num_stories = 10
prompts = ["Once upon a time,"] * num_stories

# batched example, with 10 stories completions per request
response = openai.Completion.create(
    model="curie",
    prompt=prompts,
    max_tokens=20,
)

# match completions to prompts by index
stories = [""] * len(prompts)
for choice in response.choices:
    stories[choice.index] = prompts[choice.index] + choice.text

# print stories
for story in stories:
    print(story)


Once upon a time, there were two sisters, Eliza Pickering and Ariana 'Ari' Lucas. When these lovely
Once upon a time, Keene was stung by a worm — actually, probably a python — snaking through his leg
Once upon a time, there was a professor of physics during the depression. It was difficult, during this time, to get
Once upon a time, before you got sick, you told stories to all and sundry, and your listeners believed in you
Once upon a time, there was one very old nice donkey. He was incredibly smart, in a very old, kind of
Once upon a time, the property of a common lodging house was a common cup for all the inhabitants. Betimes a constant
Once upon a time, in an unspecified country, there was a witch who had an illegal product. It was highly effective,
Once upon a time, a long time ago, I turned 13, my beautiful dog Duncan swept me up into his jaws like
Once upon a time, as a thoroughly reformed creature from an army of Nazis, he took On Judgement Day myself and his
Once upon a time, C

## Example parallel processing script
## 示例并行处理脚本

We've written an example script for parallel processing large quantities of API requests: [api_request_parallel_processor.py](api_request_parallel_processor.py).
我们编写了一个示例脚本，用于并行处理大量API请求：[api_request_parallel_processor.py](api_request_parallel_processor.py)。
The script combines some handy features:
该脚本结合了一些方便的功能：
- Streams requests from file, to avoid running out of memory for giant jobs
- 流式请求，以避免大型作业耗尽内存
- Makes requests concurrently, to maximize throughput
- 并发请求，以最大化吞吐量
- Throttles both request and token usage, to stay under rate limits
- 限制请求和令牌的使用，以保持在速率限制之下
- Retries failed requests, to avoid missing data
- 重试失败的请求，以避免丢失数据
- Logs errors, to diagnose problems with requests
- 日志错误，以诊断请求的问题
Feel free to use it as is or modify it to suit your needs.
您可以随意使用它，也可以根据您的需要对其进行修改。