From 6db955484f00149095a2f455ac0e789a6844be2a Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 3 Jun 2025 14:23:26 +0200 Subject: [PATCH 1/5] fix(genapi): update quota documentation --- pages/generative-apis/troubleshooting/fixing-common-issues.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/generative-apis/troubleshooting/fixing-common-issues.mdx b/pages/generative-apis/troubleshooting/fixing-common-issues.mdx index 1f6a986d79..821c10c5b0 100644 --- a/pages/generative-apis/troubleshooting/fixing-common-issues.mdx +++ b/pages/generative-apis/troubleshooting/fixing-common-issues.mdx @@ -90,9 +90,9 @@ Below are common issues that you may encounter when using Generative APIs, their ### Solution - Smooth out your API requests rate by limiting the number of API requests you perform over a given minute so that you remain below your [Organization quotas for Generative APIs](/organizations-and-projects/additional-content/organization-quotas/#generative-apis). - [Add a payment method](/billing/how-to/add-payment-method/#how-to-add-a-credit-card) and [validate your identity](/account/how-to/verify-identity/) to increase automatically your quotas [based on standard limits](/organizations-and-projects/additional-content/organization-quotas/#generative-apis). -- [Ask our support](https://console.scaleway.com/support/tickets/create) to raise your quota. - Reduce the size of the input or output tokens processed by your API requests. - Use [Managed Inference](/managed-inference/), where these quotas do not apply (your throughput will be only limited by the amount of Inference Deployment your provision) +- Contact your existing Scaleway account manager or [our Sales team ](https://www.scaleway.com/en/contact-sales/) to discuss about volume commitment for specific models that will allow us to increase your quota proportionnaly. ## 429: Too Many Requests - You exceeded your current threshold of concurrent requests From 04c80dfcea67ea491ee167be9f921b1d41021508 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Tue, 3 Jun 2025 14:31:52 +0200 Subject: [PATCH 2/5] Apply suggestions from code review --- pages/generative-apis/troubleshooting/fixing-common-issues.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/generative-apis/troubleshooting/fixing-common-issues.mdx b/pages/generative-apis/troubleshooting/fixing-common-issues.mdx index 821c10c5b0..b43c6c9c61 100644 --- a/pages/generative-apis/troubleshooting/fixing-common-issues.mdx +++ b/pages/generative-apis/troubleshooting/fixing-common-issues.mdx @@ -92,7 +92,7 @@ Below are common issues that you may encounter when using Generative APIs, their - [Add a payment method](/billing/how-to/add-payment-method/#how-to-add-a-credit-card) and [validate your identity](/account/how-to/verify-identity/) to increase automatically your quotas [based on standard limits](/organizations-and-projects/additional-content/organization-quotas/#generative-apis). - Reduce the size of the input or output tokens processed by your API requests. - Use [Managed Inference](/managed-inference/), where these quotas do not apply (your throughput will be only limited by the amount of Inference Deployment your provision) -- Contact your existing Scaleway account manager or [our Sales team ](https://www.scaleway.com/en/contact-sales/) to discuss about volume commitment for specific models that will allow us to increase your quota proportionnaly. +- Contact your assigned Scaleway account manager or [our Sales team](https://www.scaleway.com/en/contact-sales/) to discuss volume commitments for specific models, which will enable us to increase your quota proportionally. ## 429: Too Many Requests - You exceeded your current threshold of concurrent requests From a03a693b0e5e583675fa1a7dc560a9e042587bec Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 3 Jun 2025 14:33:19 +0200 Subject: [PATCH 3/5] fix(genapi): update lifecycle faq --- pages/generative-apis/faq.mdx | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx index bcaf657655..7037b01aee 100644 --- a/pages/generative-apis/faq.mdx +++ b/pages/generative-apis/faq.mdx @@ -114,11 +114,14 @@ Yes, Scaleway's Generative APIs are designed to be compatible with OpenAI librar To get started, explore the [Generative APIs Playground](/generative-apis/quickstart/#start-with-the-generative-apis-playground) in the Scaleway console. For application integration, refer to our [Quickstart guide](/generative-apis/quickstart/), which provides step-by-step instructions on accessing, configuring, and using a Generative APIs endpoint. ## Are there any rate limits for API usage? -Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits (by a factor from 2 to 5 times), you can request them by [creating a ticket](https://console.scaleway.com/support/tickets/create). If you require even higher rate limits, especially to absorb infrequent peak loads, we recommend using [Managed Inference](https://console.scaleway.com/inference/deployments) instead with dedicated provisioned capacity. +Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits we recommend either: +- using [Managed Inference](https://console.scaleway.com/inference/deployments) which provides dedicated capacity and doesn't enforce rate limits (you remain limited by the total provisioned capacity) +- contact your existing Scaleway account manager or our Sales team to discuss about volume commitment for specific models that will allow us to increase your quota proportionnaly. + Refer to our dedicated [documentation](/generative-apis/reference-content/rate-limits/) for more information on rate limits. ## What is the model lifecycle for Generative APIs? -Scaleway is dedicated to updating and offering the latest versions of generative AI models, ensuring improvements in capabilities, accuracy, and safety. As new versions of models are introduced, you can explore them through the Scaleway console. Learn more in our dedicated [documentation](/generative-apis/reference-content/model-lifecycle/). +Scaleway is dedicated to updating and offering the latest versions of generative AI models, while ensuring olders models remain accessible for significant period of time and ensure your production applications reliability. Learn more in our [model lifecycle policy](/generative-apis/reference-content/model-lifecycle/). ## What are the SLAs applicable to Generative APIs? We are currently working on defining our SLAs for Generative APIs. We will provide more information on this topic soon. From 2452df0bf71c9545cce8d2e28fc47d3fef2584e1 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 4 Jun 2025 10:08:26 +0200 Subject: [PATCH 4/5] Apply suggestions from code review Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- pages/generative-apis/faq.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx index 7037b01aee..98f0084b10 100644 --- a/pages/generative-apis/faq.mdx +++ b/pages/generative-apis/faq.mdx @@ -114,14 +114,14 @@ Yes, Scaleway's Generative APIs are designed to be compatible with OpenAI librar To get started, explore the [Generative APIs Playground](/generative-apis/quickstart/#start-with-the-generative-apis-playground) in the Scaleway console. For application integration, refer to our [Quickstart guide](/generative-apis/quickstart/), which provides step-by-step instructions on accessing, configuring, and using a Generative APIs endpoint. ## Are there any rate limits for API usage? -Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits we recommend either: -- using [Managed Inference](https://console.scaleway.com/inference/deployments) which provides dedicated capacity and doesn't enforce rate limits (you remain limited by the total provisioned capacity) -- contact your existing Scaleway account manager or our Sales team to discuss about volume commitment for specific models that will allow us to increase your quota proportionnaly. +Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits we recommend either: +- using [Managed Inference](https://console.scaleway.com/inference/deployments), which provides dedicated capacity and doesn't enforce rate limits (you remain limited by the total provisioned capacity) +- Contact your existing Scaleway account manager or our Sales team to discuss volume commitment for specific models that will allow us to increase your quota proportionally. Refer to our dedicated [documentation](/generative-apis/reference-content/rate-limits/) for more information on rate limits. ## What is the model lifecycle for Generative APIs? -Scaleway is dedicated to updating and offering the latest versions of generative AI models, while ensuring olders models remain accessible for significant period of time and ensure your production applications reliability. Learn more in our [model lifecycle policy](/generative-apis/reference-content/model-lifecycle/). +Scaleway is dedicated to updating and offering the latest versions of generative AI models, while ensuring older models remain accessible for a significant time, and also ensuring the reliability of your production applications. Learn more in our [model lifecycle policy](/generative-apis/reference-content/model-lifecycle/). ## What are the SLAs applicable to Generative APIs? We are currently working on defining our SLAs for Generative APIs. We will provide more information on this topic soon. From 48a83f19170e65c534d328370b940cf57dda25eb Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 4 Jun 2025 10:34:44 +0200 Subject: [PATCH 5/5] Apply suggestions from code review Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- pages/generative-apis/faq.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx index 98f0084b10..0fc4ef375e 100644 --- a/pages/generative-apis/faq.mdx +++ b/pages/generative-apis/faq.mdx @@ -115,8 +115,8 @@ To get started, explore the [Generative APIs Playground](/generative-apis/quicks ## Are there any rate limits for API usage? Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits we recommend either: -- using [Managed Inference](https://console.scaleway.com/inference/deployments), which provides dedicated capacity and doesn't enforce rate limits (you remain limited by the total provisioned capacity) -- Contact your existing Scaleway account manager or our Sales team to discuss volume commitment for specific models that will allow us to increase your quota proportionally. +- Using [Managed Inference](https://console.scaleway.com/inference/deployments), which provides dedicated capacity and doesn't enforce rate limits (you remain limited by the total provisioned capacity) +- Contacting your existing Scaleway account manager or our Sales team to discuss volume commitment for specific models that will allow us to increase your quota proportionally. Refer to our dedicated [documentation](/generative-apis/reference-content/rate-limits/) for more information on rate limits.