From 8c0f5a01635855f105e53d8517a92576e018f134 Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 4 Apr 2025 11:07:19 +0200 Subject: [PATCH 1/2] feat(genapi): add maximum concurrent requests --- .../additional-content/organization-quotas.mdx | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx index d77818bb6b..5c801903be 100644 --- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx +++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx @@ -168,6 +168,7 @@ Managed Inference Deployments are limited to a maximum number of nodes, dependin Generative APIs are rate limited based on: - Tokens per minute (total input and output tokens) - Requests per minute +- Concurrent requests (total active HTTP session at the same time) [Contact our support team](https://console.scaleway.com/support/create) if you want to increase your quotas above these limits. @@ -194,6 +195,9 @@ Generative APIs are rate limited based on: | qwen2.5-32b-instruct | 300 | 300 | | bge-multilingual-gemma2 | 300 | 300 | +| Concurrent requests | [Payment method validated](/billing/how-to/add-payment-method/#how-to-add-a-credit-card) | Payment method and [identity validated](/account/how-to/verify-identity/) | +|-------------|:----------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------:| +| All models | 25 | 25 | ## Apple silicon From 1e235824ef1da0214d3a3a693e3c8d76c0d6d7a8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?N=C3=A9da?= <87707325+nerda-codes@users.noreply.github.com> Date: Fri, 4 Apr 2025 15:11:07 +0200 Subject: [PATCH 2/2] Update pages/organizations-and-projects/additional-content/organization-quotas.mdx --- .../additional-content/organization-quotas.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx index 5c801903be..f61a701229 100644 --- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx +++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx @@ -168,7 +168,7 @@ Managed Inference Deployments are limited to a maximum number of nodes, dependin Generative APIs are rate limited based on: - Tokens per minute (total input and output tokens) - Requests per minute -- Concurrent requests (total active HTTP session at the same time) +- Concurrent requests (total active HTTP sessions at the same time) [Contact our support team](https://console.scaleway.com/support/create) if you want to increase your quotas above these limits.