From a202231750febe06bce7f2cc6d904ef36a39ce42 Mon Sep 17 00:00:00 2001 From: fpagny Date: Thu, 27 Feb 2025 18:10:42 +0100 Subject: [PATCH 1/2] Update rate-limits.mdx --- .../reference-content/rate-limits.mdx | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/pages/generative-apis/reference-content/rate-limits.mdx b/pages/generative-apis/reference-content/rate-limits.mdx index f51b8b9df6..cef29d8141 100644 --- a/pages/generative-apis/reference-content/rate-limits.mdx +++ b/pages/generative-apis/reference-content/rate-limits.mdx @@ -21,28 +21,28 @@ Any model served through Scaleway Generative APIs gets limited by: These limits only apply if you created a Scaleway Account and registered a valid payment method. Otherwise, stricter limits apply to ensure usage stays within Free Tier only. +## How can I increase the rate limits? + +We actively monitor usage and will improve rates based on feedback. +If you need to increase your rate limits, [contact our support team](https://console.scaleway.com/support/create), providing details on the model used and specific use case. +Note that for increase up to x5 or x10 volumes, we highly recommend to use dedicated deployments with [Managed Inference](https://console.scaleway.com/inference/deployments), which provides exactly the same feature and API compatibility. + ### Chat models | Model string | Requests per minute | Total tokens per minute | |-----------------|-----------------|-----------------| -| `llama-3.1-8b-instruct` | 300 | 100K | -| `llama-3.1-70b-instruct` | 300 | 100K | -| `mistral-nemo-instruct-2407`| 300 | 100K | -| `pixtral-12b-2409`| 300 | 100K | -| `qwen2.5-32b-instruct`| 300 | 100K | +| `llama-3.1-8b-instruct` | 300 | 200K | +| `llama-3.1-70b-instruct` | 300 | 200K | +| `mistral-nemo-instruct-2407`| 300 | 200K | +| `pixtral-12b-2409`| 300 | 200K | +| `qwen2.5-32b-instruct`| 300 | 200K | ### Embedding models | Model string | Requests per minute | Input tokens per minute | |-----------------|-----------------|-----------------| -| `sentence-t5-xxl` | 100 | 200K | -| `bge-multilingual-gemma2` | 100 | 200K | +| `bge-multilingual-gemma2` | 300 | 400K | ## Why do we set rate limits? These limits safeguard against abuse or misuse of Scaleway Generative APIs, helping to ensure fair access to the API with consistent performance. - -## How can I increase the rate limits? - -We actively monitor usage and will improve rates based on feedback. -If you need to increase your rate limits, contact us via the support team, providing details on the model used and specific use case. From f44c7a933ec63bcf2ea4b575d3a7070d0f01cb54 Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 28 Feb 2025 10:10:50 +0100 Subject: [PATCH 2/2] Update pages/generative-apis/reference-content/rate-limits.mdx Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- pages/generative-apis/reference-content/rate-limits.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/generative-apis/reference-content/rate-limits.mdx b/pages/generative-apis/reference-content/rate-limits.mdx index cef29d8141..809babe1da 100644 --- a/pages/generative-apis/reference-content/rate-limits.mdx +++ b/pages/generative-apis/reference-content/rate-limits.mdx @@ -25,7 +25,7 @@ These limits only apply if you created a Scaleway Account and registered a valid We actively monitor usage and will improve rates based on feedback. If you need to increase your rate limits, [contact our support team](https://console.scaleway.com/support/create), providing details on the model used and specific use case. -Note that for increase up to x5 or x10 volumes, we highly recommend to use dedicated deployments with [Managed Inference](https://console.scaleway.com/inference/deployments), which provides exactly the same feature and API compatibility. +Note that for increases of up to x5 or x10 volumes, we highly recommend using dedicated deployments with [Managed Inference](https://console.scaleway.com/inference/deployments), which provides exactly the same features and API compatibility. ### Chat models