From bc8467b210ecc4a15e1756b1da5ace40656bc9e1 Mon Sep 17 00:00:00 2001 From: Rowena Date: Wed, 8 Oct 2025 17:32:44 +0200 Subject: [PATCH 1/3] fix(genapis): add reasoning models --- .../how-to/query-reasoning-models.mdx | 165 ++++++++++++++++++ pages/generative-apis/menu.ts | 4 + 2 files changed, 169 insertions(+) create mode 100644 pages/generative-apis/how-to/query-reasoning-models.mdx diff --git a/pages/generative-apis/how-to/query-reasoning-models.mdx b/pages/generative-apis/how-to/query-reasoning-models.mdx new file mode 100644 index 0000000000..a2bdceac39 --- /dev/null +++ b/pages/generative-apis/how-to/query-reasoning-models.mdx @@ -0,0 +1,165 @@ +--- +title: How to query reasoning models +description: Learn how to interact with powerful reasoning models using Scaleway's Generative APIs service. +tags: generative-apis ai-data language-models chat-completions-api reasoning think +dates: + validation: 2025-10-07 + posted: 2025-10-07 +--- +import Requirements from '@macros/iam/requirements.mdx' + +Scaleway's Generative APIs service allows users to interact with language models benefitting from additional reasoning capabilities. + +A reasoning model is a language model that is capable of carrying out multiple inference steps and systematically verifying intermediate results before producing answers. You can specify how much effort it should put into reasoning via dedicated parameters, and access reasoning metadata (summaries, step counts, etc.) in its outputs. Even with default parameters, such models are designed to perform better on reasoning tasks like maths and logic problems, than non-reasoning language models. + +Language models supporting the reasoning feature include `gpt-oss-120b`. See [Supported Models](/generative-apis/reference-content/supported-models/) for a full list. + +You can interact with reasoning models in the following ways: + +- Use the [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground) in the Scaleway [console](https://console.scaleway.com) to test models, adapt parameters, and observe how your changes affect the output in real-time. +- Use the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) + + + +- A Scaleway account logged into the [console](https://console.scaleway.com) +- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- A valid [API key](/iam/how-to/create-api-keys/) for API authentication +- Python 3.7+ installed on your system + +## Querying reasoning language models via the playground + +### Accessing the playground + +Scaleway provides a web playground for instruct-based models hosted on Generative APIs. + +1. Navigate to **Generative APIs** under the **AI** section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays. +2. Click the name of the chat model you want to try. Alternatively, click next to the chat model, and click **Try model** in the menu. Ensure that you choose a model with [reasoning capabilities](/generative-apis/reference-content/supported-models/). + +The web playground displays. + +### Using the playground + +1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area. +2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs. +3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs. +4. Click **View code** to get code snippets configured according to your settings in the playground. + + +You cannot currently set values for parameters such as `reasoning_effort`, or access reasoning metadata in the model's output, via the console playground. Query the models programatically as shown below in order to access the full reasoning featureset. + + +## Querying reasoning language models via API + +You can query models programmatically using your favorite tools or languages. +In the example that follows, we will use the OpenAI Python client. + +### Chat Completions API or Responses API? + +Both the [Chat Completions API](/https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) allow you to access and control reasoning for supported models. Scaleway's support of the Responses API is currently in beta. + +Note however, that the Responses API was introduced in part to better support features for reasoning workflows, among other tasks. It provides richer support for reasoning than Chat Completions, for example by providing chain-of-thought reasoning summaries in its responses. + +For more information on Chat Completions versus Responses API, see the information provided in the [querying language models](/generative-apis/how-to/query-language-models/#chat-completions-api-or-responses-api) documentation. + +### Installing the OpenAI SDK + +Install the OpenAI SDK using pip: + +```bash +pip install openai +``` + +### Initializing the client + +Initialize the OpenAI client with your base URL and API key: + +```python +from openai import OpenAI + +# Initialize the client with your base URL and API key +client = OpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API secret key from Scaleway +) +``` + +### Generating a chat completion with reasoning + +You can now create a chat completion with reasoning, using either the Chat Completions or Responses API, as shown in the following examples: + + + + + + ```python + # Create a chat completion using the 'gpt-oss-120b' model + response = client.chat.completions.create( + model="gpt-oss-120b", + messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}], + temperature=0.2, # Adjusts creativity + max_completion_tokens=512, # Limits the length of the output + top_p=0.7, # Controls diversity through nucleus sampling. You usually only need to use temperature. + reasoning_effort="medium" + ) + + # Print the generated response + print(f"Reasoning: {response.choices[0].message.reasoning_content}") + print(f"Answer: {response.choices[0].message.content}") + ``` + + This code sends a message to the model, as well as specifying the effort to make with reasoning, and returns an answer based on your input. The model's reasoning metadata can be accessed as well as its answer, with outputs such as: + + ```python + Reasoning: The user wants a description of a futuristic city with advanced tech and green energy solutions. Should be creative, vivid, detailed. No disallowed content. Provide description. + + Answer: **City of Luminara – A Blueprint for the Future** + ``` + + + + + + ```python + response = client.responses.create( + model="gpt-oss-120b", + input=[{"role": "user", "content": "Briefly describe a futuristic city with advanced technology and green energy solutions."}], + temperature=0.2, # Adjusts creativity + max_output_tokens=100, # Limits the length of the output + top_p=0.7, # Controls diversity through nucleus sampling. You usually only need to use temperature. + reasoning={"effort":"medium"} + ) + # Print the generated response. Here, the last output message will contain the final content. + # Previous outputs will contain reasoning content. + for output in response.output: + if output.type == "reasoning": + print(f"Reasoning: {output.content[0].text}") + if output.type == "message": + print(f"Answer: {output.content[0].text}") + ``` + This code sends a message to the model, as well as specifying the effort to make with reasoning, and returns an answer based on your input. The model's reasoning metadata can be accessed as well as its answer, with outputs such as: + + ```python + Reasoning: The user asks: "Briefly describe a futuristic city with advanced technology and green energy solutions." They want a brief description. Should be concise but vivid. Provide details: architecture, transport, energy, AI, sustainability. Probably a paragraph or a few sentences. Ensure it's brief. Let's produce a short description. + + Answer: **Solaris Arcadia** rises from a reclaimed river delta, its skyline a lattice of translucent, self‑healing bioglass towers that + ``` + + + +## Exceptions and legacy models + +Some legacy models such as `deepseek-r1-distill-llama-70b` do not output reasoning data as described above, but make it available in the `content` field of the response inside special tags, as shown in the example below: + +``` +response.content = " The user asks for questions about mathematics (...) Answer is 42." +``` + +The reasoning content is inside the ``...`` tags, and you can parse the response accordingly to access such content. There is, however, a known bug which can lead to the model to omit the opening `` tag, so we suggest taking care when parsing such outputs. + +Note that the `reasoning_effort` parameter is not available for this model. + + +## Impact on token generation + +Reasoning models generate reasoning tokens, which are billable. Generally these are in the model's output as part of the reasoning summary. To limit generation of reasoning tokens, you can adjust settings for the **reasoning effort** and **max completion/output tokens** parameters. Alternatively, use a non-reasoning model to avoid generation of reasoning tokens and subsequent billing. + diff --git a/pages/generative-apis/menu.ts b/pages/generative-apis/menu.ts index d0f17519fc..29b967a40a 100644 --- a/pages/generative-apis/menu.ts +++ b/pages/generative-apis/menu.ts @@ -22,6 +22,10 @@ export const generativeApisMenu = { label: 'Query language models', slug: 'query-language-models', }, + { + label: "Query reasoning models", + slug: "query-reasoning-models" + }, { label: 'Query vision models', slug: 'query-vision-models', From 6648a70c0ea36402748cae1dddc14377488d2c0a Mon Sep 17 00:00:00 2001 From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> Date: Mon, 13 Oct 2025 14:22:30 +0200 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: fpagny --- .../how-to/query-reasoning-models.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pages/generative-apis/how-to/query-reasoning-models.mdx b/pages/generative-apis/how-to/query-reasoning-models.mdx index a2bdceac39..35257506b8 100644 --- a/pages/generative-apis/how-to/query-reasoning-models.mdx +++ b/pages/generative-apis/how-to/query-reasoning-models.mdx @@ -10,7 +10,7 @@ import Requirements from '@macros/iam/requirements.mdx' Scaleway's Generative APIs service allows users to interact with language models benefitting from additional reasoning capabilities. -A reasoning model is a language model that is capable of carrying out multiple inference steps and systematically verifying intermediate results before producing answers. You can specify how much effort it should put into reasoning via dedicated parameters, and access reasoning metadata (summaries, step counts, etc.) in its outputs. Even with default parameters, such models are designed to perform better on reasoning tasks like maths and logic problems, than non-reasoning language models. +A reasoning model is a language model that is capable of carrying out multiple inference steps and systematically verifying intermediate results before producing answers. You can specify how much effort it should put into reasoning via dedicated parameters, and access reasoning content in its outputs. Even with default parameters, such models are designed to perform better on reasoning tasks like maths and logic problems, than non-reasoning language models. Language models supporting the reasoning feature include `gpt-oss-120b`. See [Supported Models](/generative-apis/reference-content/supported-models/) for a full list. @@ -40,7 +40,7 @@ The web playground displays. ### Using the playground 1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area. -2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs. +2. Edit the parameters listed on the right column, for example the default temperature for more or less randomness on the outputs. 3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs. 4. Click **View code** to get code snippets configured according to your settings in the playground. @@ -57,7 +57,7 @@ In the example that follows, we will use the OpenAI Python client. Both the [Chat Completions API](/https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) allow you to access and control reasoning for supported models. Scaleway's support of the Responses API is currently in beta. -Note however, that the Responses API was introduced in part to better support features for reasoning workflows, among other tasks. It provides richer support for reasoning than Chat Completions, for example by providing chain-of-thought reasoning summaries in its responses. +Note however, that the Responses API was introduced in part to better support features for reasoning workflows, among other tasks. It provides richer support for reasoning than Chat Completions, for example by providing chain-of-thought reasoning content in its responses. For more information on Chat Completions versus Responses API, see the information provided in the [querying language models](/generative-apis/how-to/query-language-models/#chat-completions-api-or-responses-api) documentation. @@ -124,7 +124,7 @@ You can now create a chat completion with reasoning, using either the Chat Compl model="gpt-oss-120b", input=[{"role": "user", "content": "Briefly describe a futuristic city with advanced technology and green energy solutions."}], temperature=0.2, # Adjusts creativity - max_output_tokens=100, # Limits the length of the output + max_output_tokens=512, # Limits the length of the output top_p=0.7, # Controls diversity through nucleus sampling. You usually only need to use temperature. reasoning={"effort":"medium"} ) @@ -132,7 +132,7 @@ You can now create a chat completion with reasoning, using either the Chat Compl # Previous outputs will contain reasoning content. for output in response.output: if output.type == "reasoning": - print(f"Reasoning: {output.content[0].text}") + print(f"Reasoning: {output.content[0].text}") # output.content[0].text can only be used with openai >= 1.100.0 if output.type == "message": print(f"Answer: {output.content[0].text}") ``` @@ -161,5 +161,5 @@ Note that the `reasoning_effort` parameter is not available for this model. ## Impact on token generation -Reasoning models generate reasoning tokens, which are billable. Generally these are in the model's output as part of the reasoning summary. To limit generation of reasoning tokens, you can adjust settings for the **reasoning effort** and **max completion/output tokens** parameters. Alternatively, use a non-reasoning model to avoid generation of reasoning tokens and subsequent billing. +Reasoning models generate reasoning tokens, which are billable. Generally these are in the model's output as part of the reasoning content. To limit generation of reasoning tokens, you can adjust settings for the **reasoning effort** and **max completion/output tokens** parameters. Alternatively, use a non-reasoning model to avoid generation of reasoning tokens and subsequent billing. From c0a6d46ac3bc1bafc240feac77b824b833c3aab4 Mon Sep 17 00:00:00 2001 From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> Date: Wed, 15 Oct 2025 10:23:08 +0200 Subject: [PATCH 3/3] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Océane Co-authored-by: Néda <87707325+nerda-codes@users.noreply.github.com> --- .../how-to/query-reasoning-models.mdx | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/pages/generative-apis/how-to/query-reasoning-models.mdx b/pages/generative-apis/how-to/query-reasoning-models.mdx index 35257506b8..9653f69ae9 100644 --- a/pages/generative-apis/how-to/query-reasoning-models.mdx +++ b/pages/generative-apis/how-to/query-reasoning-models.mdx @@ -8,9 +8,9 @@ dates: --- import Requirements from '@macros/iam/requirements.mdx' -Scaleway's Generative APIs service allows users to interact with language models benefitting from additional reasoning capabilities. +Scaleway's Generative APIs service allows users to interact with language models benefiting from additional reasoning capabilities. -A reasoning model is a language model that is capable of carrying out multiple inference steps and systematically verifying intermediate results before producing answers. You can specify how much effort it should put into reasoning via dedicated parameters, and access reasoning content in its outputs. Even with default parameters, such models are designed to perform better on reasoning tasks like maths and logic problems, than non-reasoning language models. +A reasoning model is a language model that is capable of carrying out multiple inference steps and systematically verifying intermediate results before producing answers. You can specify how much effort it should put into reasoning via dedicated parameters, and access reasoning content in its outputs. Even with default parameters, such models are designed to perform better on reasoning tasks like maths and logic problems than non-reasoning language models. Language models supporting the reasoning feature include `gpt-oss-120b`. See [Supported Models](/generative-apis/reference-content/supported-models/) for a full list. @@ -45,7 +45,7 @@ The web playground displays. 4. Click **View code** to get code snippets configured according to your settings in the playground. -You cannot currently set values for parameters such as `reasoning_effort`, or access reasoning metadata in the model's output, via the console playground. Query the models programatically as shown below in order to access the full reasoning featureset. +You cannot currently set values for parameters such as `reasoning_effort`, or access reasoning metadata in the model's output, via the console playground. Query the models programmatically as shown below in order to access the full reasoning feature set. ## Querying reasoning language models via API @@ -57,7 +57,7 @@ In the example that follows, we will use the OpenAI Python client. Both the [Chat Completions API](/https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) allow you to access and control reasoning for supported models. Scaleway's support of the Responses API is currently in beta. -Note however, that the Responses API was introduced in part to better support features for reasoning workflows, among other tasks. It provides richer support for reasoning than Chat Completions, for example by providing chain-of-thought reasoning content in its responses. +Note however, that the Responses API was introduced in part to better support features for reasoning workflows, among other tasks. It provides richer support for reasoning than Chat Completions, for example, by providing chain-of-thought reasoning content in its responses. For more information on Chat Completions versus Responses API, see the information provided in the [querying language models](/generative-apis/how-to/query-language-models/#chat-completions-api-or-responses-api) documentation. @@ -139,7 +139,7 @@ You can now create a chat completion with reasoning, using either the Chat Compl This code sends a message to the model, as well as specifying the effort to make with reasoning, and returns an answer based on your input. The model's reasoning metadata can be accessed as well as its answer, with outputs such as: ```python - Reasoning: The user asks: "Briefly describe a futuristic city with advanced technology and green energy solutions." They want a brief description. Should be concise but vivid. Provide details: architecture, transport, energy, AI, sustainability. Probably a paragraph or a few sentences. Ensure it's brief. Let's produce a short description. + Reasoning: The user asks: "Briefly describe a futuristic city with advanced technology and green energy solutions." They want a brief description. Should be concise but vivid. Provide details: architecture, transport, energy, AI, and sustainability. Probably a paragraph or a few sentences. Ensure it's brief. Let's produce a short description. Answer: **Solaris Arcadia** rises from a reclaimed river delta, its skyline a lattice of translucent, self‑healing bioglass towers that ``` @@ -154,12 +154,12 @@ Some legacy models such as `deepseek-r1-distill-llama-70b` do not output reasoni response.content = " The user asks for questions about mathematics (...) Answer is 42." ``` -The reasoning content is inside the ``...`` tags, and you can parse the response accordingly to access such content. There is, however, a known bug which can lead to the model to omit the opening `` tag, so we suggest taking care when parsing such outputs. +The reasoning content is inside the ``...`` tags, and you can parse the response accordingly to access such content. There is, however, a known bug that can lead the model to omit the opening `` tag, so we suggest taking care when parsing such outputs. Note that the `reasoning_effort` parameter is not available for this model. ## Impact on token generation -Reasoning models generate reasoning tokens, which are billable. Generally these are in the model's output as part of the reasoning content. To limit generation of reasoning tokens, you can adjust settings for the **reasoning effort** and **max completion/output tokens** parameters. Alternatively, use a non-reasoning model to avoid generation of reasoning tokens and subsequent billing. +Reasoning models generate reasoning tokens, which are billable. Generally these are in the model's output as part of the reasoning content. To limit the generation of reasoning tokens, you can adjust settings for the **reasoning effort** and **max completion/output tokens** parameters. Alternatively, use a non-reasoning model to avoid the generation of reasoning tokens and subsequent billing.