From 8371ccdbaaaf126f22315c5c3d7afdfeb5e19c99 Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@online.net>
Date: Tue, 15 Apr 2025 11:26:48 +0200
Subject: [PATCH 01/16] feat(infr): add catalog page

---
 .../reference-content/models.mdx              | 107 ++++++++++++++++++
 1 file changed, 107 insertions(+)
 create mode 100644 pages/managed-inference/reference-content/models.mdx
diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx
new file mode 100644
index 0000000000..998e7c7724
--- /dev/null
+++ b/pages/managed-inference/reference-content/models.mdx
@@ -0,0 +1,107 @@
+---
+meta:
+  title: Managed Inference model catalog
+  description: Deploy your own secure Mixtral-8x7b-Instruct model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1:  Managed Inference model catalog
+  paragraph: This page provides information on the Mixtral-8x7b-instruct-v0.1 model
+tags:
+dates:
+  validation: 2025-03-19
+  posted: 2024-05-28
+categories:
+  - ai-data
+---
+A quick overview of available models and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
+
+
+## Summary table
+
+| Model Name | Provider | Context Size | Modalities | Instances | Endpoint |
+|------------|----------|--------------|------------|-----------|----------|
+| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | [Mistral](https://mistral.ai/technology/#models) | 32k | Text | H100 (FP8), H100-2 (BF16) | `/v1/chat/completions` |
+| [`molmo-72b-0924`](#molmo-72b-0924) | [Allen Institute](https://molmo.allenai.org/blog) | 50k | Multimodal | H100-2 (FP8) | `/v1/chat/completions` |
+
+
+## Model details
+
+<Concept>
+## Mixtral-8x7b-instruct-v0.1
+### Overview
+
+| Attribute             | Details                                           |
+|----------------------|---------------------------------------------------|
+| Provider              | [Mistral](https://mistral.ai/technology/#models) |
+| Context Size          | 32k tokens                                        |
+| Compatible Instances  | H100 (FP8), H100-2 (BF16)                         |
+
+### Model names
+
+```bash
+mistral/mixtral-8x7b-instruct-v0.1:fp8
+mistral/mixtral-8x7b-instruct-v0.1:bf16
+```
+
+### How to use (Text Inference)
+
+```bash
+curl -s \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Content-Type: application/json" \
+--request POST \
+--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+--data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8","messages":[{"role":"user","content":"Sing me a song about Scaleway"}],"max_tokens":200,"top_p":1,"temperature":1}'
+```
+
+<Message type="tip"> 
+  Ideal for instructional content, multilingual understanding, and code generation.
+</Message>
+
+</Concept>
+
+
+
+<Concept>
+## Molmo-72b-0924
+
+### Overview
+
+| Attribute             | Details                                                          |
+|----------------------|------------------------------------------------------------------|
+| Provider              | [Allen Institute for AI](https://molmo.allenai.org/blog)        |
+| License               | Apache 2.0                                                       |
+| Context Size          | 50k tokens                                                       |
+| Compatible Instances  | H100-2 (FP8)                                                     |
+
+### Model name
+
+```bash
+allenai/molmo-72b-0924:fp8
+```
+
+### How to use (Image + Text)
+
+```bash
+curl -s \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Content-Type: application/json" \
+--request POST \
+--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+--data '{
+  "model": "allenai/molmo-72b-0924:fp8",
+  "messages": [{
+    "role": "user",
+    "content": [
+      {"type": "text", "text": "Describe this image"},
+      {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}
+    ]
+  }],
+  "temperature": 0.7
+}'
+```
+
+<Message type="tip">
+  Known limitations: No system role, no structured output (`response_format`), and supports 1 image max per request.
+</Message>
+</Concept>
+

From f6dfa9b7d7e52b0b93b9602bac1561b350f1e3f7 Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@online.net>
Date: Wed, 16 Apr 2025 15:09:18 +0200
Subject: [PATCH 02/16] docs(infr): add model catalog page

---
 .../reference-content/models.mdx              | 1193 ++++++++++++++++-
 1 file changed, 1127 insertions(+), 66 deletions(-)

diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx
index 998e7c7724..f1d04426b0 100644
--- a/pages/managed-inference/reference-content/models.mdx
+++ b/pages/managed-inference/reference-content/models.mdx
@@ -1,10 +1,10 @@
 ---
 meta:
   title: Managed Inference model catalog
-  description: Deploy your own secure Mixtral-8x7b-Instruct model with Scaleway Managed Inference. Privacy-focused, fully managed.
+  description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed.
 content:
   h1:  Managed Inference model catalog
-  paragraph: This page provides information on the Mixtral-8x7b-instruct-v0.1 model
+  paragraph: This page provides information on the Scaleway Managed Inference product catalog
 tags:
 dates:
   validation: 2025-03-19
@@ -12,96 +12,1157 @@ dates:
 categories:
   - ai-data
 ---
-A quick overview of available models and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
 
+A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
 
 ## Summary table
 
-| Model Name | Provider | Context Size | Modalities | Instances | Endpoint |
-|------------|----------|--------------|------------|-----------|----------|
-| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | [Mistral](https://mistral.ai/technology/#models) | 32k | Text | H100 (FP8), H100-2 (BF16) | `/v1/chat/completions` |
-| [`molmo-72b-0924`](#molmo-72b-0924) | [Allen Institute](https://molmo.allenai.org/blog) | 50k | Multimodal | H100-2 (FP8) | `/v1/chat/completions` |
-
+| Model Name | Provider | Context Size | Modalities | Instances | License |
+|------------|----------|--------------|------------|-----------|---------|
+| `mixtral-8x7b-instruct-v0.1` | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
+| `llama-3.1-70b-instruct` | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
+| `llama-3.1-8b-instruct` | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
+| `llama-3-70b-instruct` | Meta | 8k tokens | Text | H100 | Llama 3 community |
+| `llama-3.3-70b-instruct` | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
+| `llama-3-nemotron-70b` | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community |
+| `deepseek-r1-distill-70b` | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT |
+| `deepseek-r1-distill-8b` | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 |
+| `mistral-7b-instruct-v0.3` | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 |
+| `mistral-small-24b-instruct-2501` | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 |
+| `mistral-nemo-instruct-2407` | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 |
+| `moshiko-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
+| `moshika-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
+| `wizardlm-70b-v1.0` | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
+| `pixtral-12b-2409` | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
+| `molmo-72b-0924` | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
+| `qwen2.5-coder-32b-instruct` | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
+| `sentence-t5-xxl` | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
 
 ## Model details
 
+<Message type="note">
+  Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
+</Message>
+
+## Text models
+
 <Concept>
-## Mixtral-8x7b-instruct-v0.1
-### Overview
 
-| Attribute             | Details                                           |
-|----------------------|---------------------------------------------------|
-| Provider              | [Mistral](https://mistral.ai/technology/#models) |
-| Context Size          | 32k tokens                                        |
-| Compatible Instances  | H100 (FP8), H100-2 (BF16)                         |
+### Mixtral-8x7b-instruct-v0.1
 
-### Model names
+  Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
+  Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
 
-```bash
-mistral/mixtral-8x7b-instruct-v0.1:fp8
-mistral/mixtral-8x7b-instruct-v0.1:bf16
-```
+  | Attribute             | Details |
+  |----------------------|---------|
+  | Provider             | Mistral |
+  | Context Size         | 32k tokens |
+  | License              | Apache 2.0 |
+  | Compatible Instances | H100 |
 
-### How to use (Text Inference)
+  #### Model names
 
-```bash
-curl -s \
--H "Authorization: Bearer <IAM API key>" \
--H "Content-Type: application/json" \
---request POST \
---url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
---data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8","messages":[{"role":"user","content":"Sing me a song about Scaleway"}],"max_tokens":200,"top_p":1,"temperature":1}'
-```
+  ```bash
+  mistral/mixtral-8x7b-instruct-v0.1:fp8
+  mistral/mixtral-8x7b-instruct-v0.1:bf16
+  ```
+  #### Sending Inference requests
 
-<Message type="tip"> 
-  Ideal for instructional content, multilingual understanding, and code generation.
-</Message>
+  To perform inference tasks with your Mixtral model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8", "messages":[{"role": "user","content": "Sing me a song about Scaleway"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
 
 </Concept>
 
+<Concept>
+  ### LLaMA 3.1 70B Instruct
+
+  Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
+  Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
 
+  | Attribute             | Details |
+  |----------------------|---------|
+  | Provider             | Meta |
+  | Context Size         | 32k tokens |
+  | License              | Llama 3 community |
+  | Compatible Instances | H100 17k (FP8), H100-2 128k (FP8), 70k (BF16) |
+
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"meta/llama-3.1-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
 
 <Concept>
-## Molmo-72b-0924
 
-### Overview
+### Llama-3.1-8b-instruct model
 
-| Attribute             | Details                                                          |
-|----------------------|------------------------------------------------------------------|
-| Provider              | [Allen Institute for AI](https://molmo.allenai.org/blog)        |
-| License               | Apache 2.0                                                       |
-| Context Size          | 50k tokens                                                       |
-| Compatible Instances  | H100-2 (FP8)                                                     |
+  Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
+  Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
 
-### Model name
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Meta](https://llama.meta.com/llama3/)  |
+  | License        | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/)  |
+  | Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
+  | Context Length | up to 128k tokens |
 
-```bash
-allenai/molmo-72b-0924:fp8
-```
+  #### Model names
 
-### How to use (Image + Text)
+  ```bash
+  meta/llama-3.1-8b-instruct:fp8
+  meta/llama-3.1-8b-instruct:bf16
+  ```
 
-```bash
-curl -s \
--H "Authorization: Bearer <IAM API key>" \
--H "Content-Type: application/json" \
---request POST \
---url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
---data '{
-  "model": "allenai/molmo-72b-0924:fp8",
-  "messages": [{
-    "role": "user",
-    "content": [
-      {"type": "text", "text": "Describe this image"},
-      {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}
-    ]
-  }],
-  "temperature": 0.7
-}'
-```
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L4      | 96k (FP8), 27k (BF16) | 
+  | L40S    | 128k (FP8, BF16) | 
+  | H100      | 128k (FP8, BF16) |
+  | H100-2      | 128k (FP8, BF16) |
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"meta/llama-3.1-8b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+### Llama-3-70b-instruct
+
+  Meta’s Llama 3 is an iteration of the open-access Llama family.
+  Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
+  With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Meta](https://llama.meta.com/llama3/)  |
+  | Compatible Instances | H100, H100-2 (FP8)    |
+  | Context size | 8192 tokens   |
+
+  #### Model names
+
+  ```bash
+  meta/llama-3-70b-instruct:fp8
+  ```
+
+  #### Compatible Instances
+
+  - [H100 (FP8)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
+  - H100-2 (FP8)
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your Llama-3 deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"meta/llama-3-70b-instruct:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "max_tokens": 500, "top_p": 1, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+### Llama-3.3-70b-instruct
+
+  Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
+  This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Meta](https://www.llama.com/)  |
+  | License        | [Llama 3.3 community](https://www.llama.com/llama3_3/license/)  |
+  | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
+  | Context length | Up to 131k tokens    |
+
+  #### Model names
+
+  ```bash
+  meta/llama-3.3-70b-instruct:bf16
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | H100      | 15k (FP8) |
+  | H100-2      | 131k (FP8), 62k (BF16) |
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your Llama-3.3 deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"meta/llama-3.3-70b-instruct:bf16", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+### Llama-3.1-Nemotron-70b-instruct
+
+  Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. 
+  NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Nvidia](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct)  |
+  | License        | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/)  |                |
+  | Compatible Instances | H100 (FP8), H100-2 (FP8) |
+  | Context Length | up to 128k tokens    |
+
+  #### Model names
+
+  ```bash
+  meta/llama-3.1-nemotron-70b-instruct:fp8
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | H100      | 16k (FP8) |
+  | H100-2      | 128k (FP8) |
+
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your Llama-3.1-Nemotron-70b-instruct deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"meta/llama-3.1-nemotron-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+### DeepSeek-R1-Distill-Llama-70B
+
+  Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
+  DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)  |
+  | License        | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)  |
+  | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
+  | Context Length | up to 131k tokens |
+
+  #### Model names
+
+  ```bash
+  deepseek/deepseek-r1-distill-llama-70b:bf16
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | H100      | 15k (FP8) |
+  | H100-2      | 131k (FP8), 56k (BF16) |
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"deepseek/deepseek-r1-distill-llama-70b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (user, assistant) and content.
+  </Message>
+
+  <Message type="tip">
+    This model is better used without `system prompt`, as suggested by the model provider.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+### DeepSeek-R1-Distill-Llama-8B
+
+  Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
+  DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
+
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)  |
+  | License        | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)  |
+  | Compatible Instances | L4, L40S, H100 (FP8, BF16) |
+  | Context Length | up to 131k tokens |
+
+  #### Model names
+
+  ```bash
+  deepseek/deepseek-r1-distill-llama-8b:bf16
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L4      | 90k (FP8), 39k (BF16) | 
+  | L40S      | 131k (FP8, BF16) | 
+  | H100      | 131k (FP8, BF16) |
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"deepseek/deepseek-r1-distill-llama-8b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (user, assistant) and content.
+  </Message>
+
+  <Message type="tip">
+    This model is better used without `system prompt`, as suggested by the model provider.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+### Mistral-7b-instruct-v0.3
+
+  The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
+  This model is open-weight and distributed under the Apache 2.0 license.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Mistral](https://mistral.ai/technology/#models)     |
+  | Compatible Instances | L4, L40S, H100, H100-2 (BF16)     |
+  | Context size | 32K tokens    |
+
+  #### Model name
+
+  ```bash
+  mistral/mistral-7b-instruct-v0.3:bf16
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L4      | 32k (BF16) |
+  | L40S      | 32k (BF16) |
+  | H100      | 32k (BF16) |
+  | H100-2      | 32k (BF16) |
+
+  #### Sending Inference requests
+
+  To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"mistral/mistral-7b-instruct-v0.3:bf16", "messages":[{"role": "user","content": "Explain Public Cloud in a nutshell."}], "top_p": 1, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+###  Mistral-small-24b-base-2501
+
+  Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
+  This model is open-weight and distributed under the Apache 2.0 license.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Mistral](https://mistral.ai/technology/#models)  |
+  | Compatible Instances | L40S, H100, H100-2 (FP8) |
+  | Context size | 32K tokens |
+
+  #### Model name
+
+  ```bash
+  mistral/mistral-small-24b-instruct-2501:fp8
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L40      | 20k (FP8) |
+  | H100      | 32k (FP8) |
+  | H100-2      | 32k (FP8) |
+
+  #### Sending Inference requests
+
+  To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"mistral/mistral-small-24b-instruct-2501:fp8", "messages":[{"role": "user","content": "Tell me about Scaleway."}], "top_p": 1, "temperature": 0.7, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+<Concept>
+
+### Mistral-nemo-instruct-2407
+
+  Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
+  This model is open-weight and distributed under the Apache 2.0 license.
+  It was trained on a large proportion of multilingual and code data.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Mistral](https://mistral.ai/technology/#models)  |
+  | Compatible Instances | L40S, H100, H100-2 (FP8) |
+  | Context size | 128K tokens |
+
+  #### Model name
+
+  ```bash
+  mistral/mistral-nemo-instruct-2407:fp8
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L40      | 128k (FP8) |
+  | H100      | 128k (FP8) |
+  | H100-2      | 128k (FP8) |
+
+  #### Sending Inference requests
+
+  <Message type="tip">
+    Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. It is recommend to use a temperature of 0.35.
+  </Message>
+
+  To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"mistral/mistral-nemo-instruct-2407:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "top_p": 1, "temperature": 0.35, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
 
-<Message type="tip">
-  Known limitations: No system role, no structured output (`response_format`), and supports 1 image max per request.
-</Message>
 </Concept>
 
+<Concept>
+
+### Moshiko-0.1-8b
+
+  Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+  Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+  While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+  Moshiko is the variant of Moshi with a male voice in English.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Kyutai](https://github.com/kyutai-labs/moshi)  |
+  | Compatible Instances | L4, H100 (FP8, BF16)    |
+  | Context size | 4096 tokens   |
+
+  #### Model names
+
+  ```bash
+  kyutai/moshiko-0.1-8b:bf16
+  kyutai/moshiko-0.1-8b:fp8
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L4      | 4096 (FP8, BF16) | 
+  | H100      | 4096 (FP8, BF16) |
+
+  #### How to use it
+
+  To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
+
+  ```bash
+  wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
+  ```
+
+  #### Testing the WebSocket endpoint
+
+  To test the endpoint, use the following command:
+
+  ```bash
+  curl -i --http1.1 \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Connection: Upgrade" \
+  -H "Upgrade: websocket" \
+  -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
+  -H "Sec-WebSocket-Version: 13" \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
+  </Message>
+
+  The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
+
+  #### Interacting with the model
+
+  We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
+  Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
+  This repository contains instructions on how to run the code samples and interact with the model.
+
+</Concept>
+
+<Concept>
+
+### Moshika-0.1-8b
+
+
+  Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+  Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+  While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+  Moshiko is the variant of Moshi with a male voice in English.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Kyutai](https://github.com/kyutai-labs/moshi)  |
+  | Compatible Instances | L4, H100 (FP8, BF16)    |
+  | Context size | 4096 tokens   |
+
+  #### Model names
+
+  ```bash
+  kyutai/moshiko-0.1-8b:bf16
+  kyutai/moshiko-0.1-8b:fp8
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L4      | 4096 (FP8, BF16) | 
+  | H100      | 4096 (FP8, BF16) |
+
+  #### How to use it
+
+  To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
+
+  ```bash
+  wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
+  ```
+
+  #### Testing the WebSocket endpoint
+
+  To test the endpoint, use the following command:
+
+  ```bash
+  curl -i --http1.1 \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Connection: Upgrade" \
+  -H "Upgrade: websocket" \
+  -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
+  -H "Sec-WebSocket-Version: 13" \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
+  </Message>
+
+  The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
+
+  #### Interacting with the model
+
+  We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
+  Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
+  This repository contains instructions on how to run the code samples and interact with the model.
+
+</Concept>
+
+<Concept>
+
+### WizardLM-70B-V1.0
+
+  WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
+  With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [WizardLM](https://wizardlm.github.io/WizardLM2/) |
+  | Compatible Instances | H100 (FP8) - H100-2 (FP16)    |
+  | Context size | 4,096 tokens    |
+
+  #### Model names
+
+  ```bash
+  wizardlm/wizardlm-70b-v1.0:fp8
+  wizardlm/wizardlm-70b-v1.0:fp16
+  ```
+
+  #### Compatible Instances
+
+  - [H100-1 (INT8)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
+  - [H100-2 (FP16)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
+
+  #### Sending Inference requests
+
+  To perform inference tasks with your WizardLM model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"wizardlm/wizardlm-70b-v1.0:fp8", "messages":[{"role": "user","content": "Say hello to Scaleway's Inference"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+## Multimodal models
+
+<Concept>
+
+### Pixtral-12b-2409
+
+  Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
+  It can analyze images and offer insights from visual content alongside text.
+  This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
+
+  Pixtral is open-weight and distributed under the Apache 2.0 license.
+
+  <Message type="note">
+    Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+  </Message>
+
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Mistral](https://mistral.ai/technology/#models)    |
+  | Compatible Instances | L40S, H100, H100-2 (bf16)      |
+  | Context size | 128k tokens  |
+
+  #### Model name
+
+  ```bash
+  mistral/pixtral-12b-2409:bf16
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L40S      | 50k (BF16)
+  | H100      | 128k (BF16)
+  | H100-2      | 128k (BF16)
+
+  #### Sending Inference requests
+
+  <Message type="tip">
+    Unlike previous Mistral models, Pixtral can take an `image_url` in the content array.
+  </Message>
+
+  To perform inference tasks with your Pixtral model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scw.cloud/v1/chat/completions" \
+  --data '{
+        "model": "mistral/pixtral-12b-2409:bf16",
+        "messages": [
+          {
+            "role": "user",
+            "content": [
+                {"type" : "text", "text": "Describe this image in detail please."},
+                {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
+                {"type" : "text", "text": "and this one as well."},
+                {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}}
+            ]
+          }
+        ],
+        "top_p": 1,
+        "temperature": 0.7,
+        "stream": false
+  }'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+  #### Passing images to Pixtral
+
+  1. Image URLs
+  If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding.
+
+  2. Base64 encoded image
+  Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet.
+
+  The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload.
+
+
+  ```python
+  import base64
+  from io import BytesIO
+  from PIL import Image
+
+  def encode_image(img):
+      buffered = BytesIO()
+      img.save(buffered, format="JPEG")
+      encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8")
+      return encoded_string
+
+  img = Image.open("path_to_your_image.jpg")
+  base64_img = encode_image(img)
+
+  payload = {
+      "messages": [
+          {
+              "role": "user",
+              "content": [
+                  {"type": "text", "text": "What is this image?"},
+                  {
+                      "type": "image_url",
+                      "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"},
+                  },
+              ],
+          }
+      ],
+      ... # other parameters
+  }
+
+  ```
+
+  #### Receiving Managed Inference responses
+
+  Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
+  Process the output data according to your application's needs. The response will contain the output generated by the visual language model based on the input provided in the request.
+
+  <Message type="note">
+    Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
+  </Message>
+
+  #### Frequently Asked Questions
+
+  ##### What types of images are supported by Pixtral?
+  - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular.
+  - Vector image formats (SVG, PSD) are not supported.
+
+  ##### Are other files supported?
+  Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported.
+
+  ##### Is there a limit to the size of each image?
+  Images size are limited:
+  - Directly by the maximum context window. As an example, since tokens are squares of 16x16 pixels, the maximum context window taken by a single image is `4096` tokens (ie. `(1024*1024)/(16*16)`)
+  - Indirectly by the model accuracy: resolution above 1024x1024 will not increase model output accuracy. Indeed, images above 1024 pixels width or height will be automatically downscaled to fit within 1024x1024 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionaly compressed).
+
+  ##### What is the maximum amount of images per conversation?
+  One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.
+
+</Concept>
+
+<Concept>
+
+### Molmo-72b-0924
+
+  Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
+  Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. 
+
+  Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
+  Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
+
+  <Message type="note">
+    Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+  </Message>
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Allen Institute for AI](https://molmo.allenai.org/blog)                         |
+  | License        | Apache 2.0  |                |
+  | Compatible Instances | H100-2 (FP8)                 |
+  | Context size | 50k tokens    |
+
+  #### Model name
+
+  ```bash
+  allenai/molmo-72b-0924:fp8
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | H100-2      | 50k (FP8)
+
+  #### Sending inference requests
+
+  <Message type="tip">
+    Unlike regular chat models, Molmo-72b can take an `image_url` in the content array.
+  </Message>
+
+  To perform inference tasks with your Molmo-72b model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scw.cloud/v1/chat/completions" \
+  --data '{
+        "model": "allenai/molmo-72b-0924:fp8",
+        "messages": [
+          {
+            "role": "user",
+            "content": [
+                {"type" : "text", "text": "Describe this image in detail please."},
+                {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}
+            ]
+          }
+        ],
+        "top_p": 1, 
+        "temperature": 0.7, 
+        "stream": false
+  }'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  #### Passing images to Molmo-72b
+
+  ##### Image URLs
+  If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding.
+
+  ##### Base64 encoded image
+  Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet.
+
+  The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload.
+
+  ```python
+  import base64
+  from io import BytesIO
+  from PIL import Image
+
+  def encode_image(img):
+      buffered = BytesIO()
+      img.save(buffered, format="JPEG")
+      encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8")
+      return encoded_string
+
+  img = Image.open("path_to_your_image.jpg")
+  base64_img = encode_image(img)
+
+  payload = {
+      "messages": [
+          {
+              "role": "user",
+              "content": [
+                  {"type": "text", "text": "What is this image?"},
+                  {
+                      "type": "image_url",
+                      "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"},
+                  },
+              ],
+          }
+      ],
+      ... # other parameters
+  }
+
+  ```
+
+  #### Frequently Asked Questions
+
+  ##### What types of images are supported by Molmo-72b?
+  - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular.
+  - Vector image formats (SVG, PSD) are not supported.
+
+  ##### Are other file types supported?
+  Only bitmaps can be analyzed by Molmo. PDFs and videos are not supported.
+
+  ##### Is there a limit to the size of each image?
+  The only limitation is the context window (1 token for each 16x16 pixel).
+
+  ##### What is the maximum amount of images per conversation?
+  One conversation can handle a maximum of 1 image (per request). Sending more than one image will return a 400 error.
+
+</Concept>
+
+## Code models
+
+<Concept>
+
+### Qwen2.5-coder-32b-instruct
+
+  Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
+  With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. 
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [Qwen](https://qwenlm.github.io/)  |
+  | License        | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE)  |
+  | Compatible Instances | H100, H100-2 (INT8) |
+  | Context Length | up to 32k tokens |
+
+  #### Model names
+
+  ```bash
+  qwen/qwen2.5-coder-32b-instruct:int8
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | H100      | 32k (INT8)
+  | H100-2      | 32k (INT8)
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your Qwen2.5-coder deployed at Scaleway, use the following command:
+
+  ```bash
+  curl -s \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  --request POST \
+  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+  --data '{"model":"qwen/qwen2.5-coder-32b-instruct:int8", "messages":[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful code assistant."},{"role": "user","content": "Write a quick sort algorithm."}], "max_tokens": 1000, "temperature": 0.8, "stream": false}'
+  ```
+
+  <Message type="tip">
+    The model name allows Scaleway to put your prompts in the expected format.
+  </Message>
+
+  <Message type="note">
+    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+  </Message>
+
+</Concept>
+
+## Embeddings models
+
+<Concept>
+
+### Sentence-t5-xxl
+
+  The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
+  Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
+  This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
+
+  | Attribute       | Details                            |
+  |-----------------|------------------------------------|
+  | Provider        | [sentence-transformers](https://www.sbert.net/)  |
+  | Compatible Instances | L4 (FP32)    |
+  | Context size | 512 tokens    |
+
+  #### Model name
+
+  ```bash
+  sentence-transformers/sentence-t5-xxl:fp32
+  ```
+
+  #### Compatible Instances
+
+  | Instance type  | Max context length |
+  | ------------- |-------------|
+  | L4      | 512 (FP32) |
+
+  #### Sending Managed Inference requests
+
+  To perform inference tasks with your Embedding model deployed at Scaleway, use the following command:
+
+  ```bash
+  curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings \
+    -H "Authorization: Bearer <IAM API key>" \
+    -H "Content-Type: application/json" \
+    -d '{
+      "input": "Embeddings can represent text in a numerical format.",
+      "model": "sentence-transformers/sentence-t5-xxl:fp32"
+    }'
+  ```
+
+  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+</Concept>
\ No newline at end of file

From 68313df0e0127b7395125341cf686dd336e9f4d7 Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@online.net>
Date: Wed, 16 Apr 2025 18:01:17 +0200
Subject: [PATCH 03/16] docs(infr): update

---
 .../reference-content/models.mdx              | 1211 +++--------------
 1 file changed, 198 insertions(+), 1013 deletions(-)

diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx
index f1d04426b0..4324a44bde 100644
--- a/pages/managed-inference/reference-content/models.mdx
+++ b/pages/managed-inference/reference-content/models.mdx
@@ -19,24 +19,24 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 
 | Model Name | Provider | Context Size | Modalities | Instances | License |
 |------------|----------|--------------|------------|-----------|---------|
-| `mixtral-8x7b-instruct-v0.1` | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
-| `llama-3.1-70b-instruct` | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
-| `llama-3.1-8b-instruct` | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
-| `llama-3-70b-instruct` | Meta | 8k tokens | Text | H100 | Llama 3 community |
-| `llama-3.3-70b-instruct` | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
-| `llama-3-nemotron-70b` | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community |
-| `deepseek-r1-distill-70b` | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT |
-| `deepseek-r1-distill-8b` | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 |
-| `mistral-7b-instruct-v0.3` | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 |
-| `mistral-small-24b-instruct-2501` | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 |
-| `mistral-nemo-instruct-2407` | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 |
-| `moshiko-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
-| `moshika-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
-| `wizardlm-70b-v1.0` | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
-| `pixtral-12b-2409` | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
-| `molmo-72b-0924` | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
-| `qwen2.5-coder-32b-instruct` | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
-| `sentence-t5-xxl` | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
+| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
+| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
+| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
+| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community |
+| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
+| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community |
+| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT |
+| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 |
+| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 |
+| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 |
+| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 |
+| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
+| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
+| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
+| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
+| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
+| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
+| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
 
 ## Model details
 
@@ -46,1123 +46,308 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 
 ## Text models
 
-<Concept>
-
 ### Mixtral-8x7b-instruct-v0.1
 
-  Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
-  Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
-
-  | Attribute             | Details |
-  |----------------------|---------|
-  | Provider             | Mistral |
-  | Context Size         | 32k tokens |
-  | License              | Apache 2.0 |
-  | Compatible Instances | H100 |
-
-  #### Model names
-
-  ```bash
-  mistral/mixtral-8x7b-instruct-v0.1:fp8
-  mistral/mixtral-8x7b-instruct-v0.1:bf16
-  ```
-  #### Sending Inference requests
-
-  To perform inference tasks with your Mixtral model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8", "messages":[{"role": "user","content": "Sing me a song about Scaleway"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
-
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-</Concept>
-
-<Concept>
-  ### LLaMA 3.1 70B Instruct
-
-  Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
-  Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
-
-  | Attribute             | Details |
-  |----------------------|---------|
-  | Provider             | Meta |
-  | Context Size         | 32k tokens |
-  | License              | Llama 3 community |
-  | Compatible Instances | H100 17k (FP8), H100-2 128k (FP8), 70k (BF16) |
-
+Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
+Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
 
-  #### Sending Managed Inference requests
+- Structured output supported: Yes
+- Function calling: No
+- Supported languages: English, French, German, Spanish
 
-  To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command:
+#### Model names
 
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"meta/llama-3.1-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
-
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-</Concept>
-
-<Concept>
-
-### Llama-3.1-8b-instruct model
-
-  Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
-  Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
+```
+mistral/mixtral-8x7b-instruct-v0.1:fp8
+mistral/mixtral-8x7b-instruct-v0.1:bf16
+```
 
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Meta](https://llama.meta.com/llama3/)  |
-  | License        | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/)  |
-  | Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
-  | Context Length | up to 128k tokens |
+### Llama-3.1-70b-instruct
 
-  #### Model names
+Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
+Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
 
-  ```bash
-  meta/llama-3.1-8b-instruct:fp8
-  meta/llama-3.1-8b-instruct:bf16
-  ```
-
-  #### Compatible Instances
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
 
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L4      | 96k (FP8), 27k (BF16) | 
-  | L40S    | 128k (FP8, BF16) | 
-  | H100      | 128k (FP8, BF16) |
-  | H100-2      | 128k (FP8, BF16) |
+#### Model names
 
-  #### Sending Managed Inference requests
+```
+meta/llama-3.1-70b-instruct:fp8
+meta/llama-3.1-70b-instruct:bf16
+```
 
-  To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command:
+### Llama-3.1-8b-instruct
 
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"meta/llama-3.1-8b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
-  ```
+Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
+Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
 
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
 
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
 
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
+#### Model names
 
-</Concept>
+```
+meta/llama-3.1-8b-instruct:fp8
+meta/llama-3.1-8b-instruct:bf16
+```
 
-<Concept>
 
 ### Llama-3-70b-instruct
 
-  Meta’s Llama 3 is an iteration of the open-access Llama family.
-  Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
-  With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
+Meta’s Llama 3 is an iteration of the open-access Llama family.
+Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
+With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
 
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Meta](https://llama.meta.com/llama3/)  |
-  | Compatible Instances | H100, H100-2 (FP8)    |
-  | Context size | 8192 tokens   |
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
 
-  #### Model names
+  #### Model name
 
-  ```bash
-  meta/llama-3-70b-instruct:fp8
   ```
-
-  #### Compatible Instances
-
-  - [H100 (FP8)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
-  - H100-2 (FP8)
-
-  #### Sending Managed Inference requests
-
-  To perform inference tasks with your Llama-3 deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"meta/llama-3-70b-instruct:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "max_tokens": 500, "top_p": 1, "temperature": 0.7, "stream": false}'
+  meta/llama-3-70b-instruct:fp8
   ```
 
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
-
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-</Concept>
-
-<Concept>
 
 ### Llama-3.3-70b-instruct
 
-  Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
-  This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Meta](https://www.llama.com/)  |
-  | License        | [Llama 3.3 community](https://www.llama.com/llama3_3/license/)  |
-  | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
-  | Context length | Up to 131k tokens    |
-
-  #### Model names
-
-  ```bash
-  meta/llama-3.3-70b-instruct:bf16
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | H100      | 15k (FP8) |
-  | H100-2      | 131k (FP8), 62k (BF16) |
-
-  #### Sending Managed Inference requests
-
-  To perform inference tasks with your Llama-3.3 deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"meta/llama-3.3-70b-instruct:bf16", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
+This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
 
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
 
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
+#### Model name
 
-</Concept>
-
-<Concept>
+```
+meta/llama-3.3-70b-instruct:bf16
+```
 
 ### Llama-3.1-Nemotron-70b-instruct
 
-  Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. 
-  NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Nvidia](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct)  |
-  | License        | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/)  |                |
-  | Compatible Instances | H100 (FP8), H100-2 (FP8) |
-  | Context Length | up to 128k tokens    |
-
-  #### Model names
-
-  ```bash
-  meta/llama-3.1-nemotron-70b-instruct:fp8
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | H100      | 16k (FP8) |
-  | H100-2      | 128k (FP8) |
-
-
-  #### Sending Managed Inference requests
-
-  To perform inference tasks with your Llama-3.1-Nemotron-70b-instruct deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"meta/llama-3.1-nemotron-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
+Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. 
+NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
 
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify)
 
-</Concept>
+#### Model name
 
-<Concept>
+```
+meta/llama-3.1-nemotron-70b-instruct:fp8
+```
 
 ### DeepSeek-R1-Distill-Llama-70B
 
-  Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
-  DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.
+Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
+DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.
 
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)  |
-  | License        | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)  |
-  | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
-  | Context Length | up to 131k tokens |
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, Simplified Chinese
+#### Model name
 
-  #### Model names
-
-  ```bash
-  deepseek/deepseek-r1-distill-llama-70b:bf16
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | H100      | 15k (FP8) |
-  | H100-2      | 131k (FP8), 56k (BF16) |
-
-  #### Sending Managed Inference requests
-
-  To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"deepseek/deepseek-r1-distill-llama-70b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (user, assistant) and content.
-  </Message>
-
-  <Message type="tip">
-    This model is better used without `system prompt`, as suggested by the model provider.
-  </Message>
-
-</Concept>
-
-<Concept>
+```
+deepseek/deepseek-r1-distill-llama-70b:bf16
+```
 
 ### DeepSeek-R1-Distill-Llama-8B
 
-  Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
-  DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
-
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)  |
-  | License        | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)  |
-  | Compatible Instances | L4, L40S, H100 (FP8, BF16) |
-  | Context Length | up to 131k tokens |
-
-  #### Model names
+Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
+DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
 
-  ```bash
-  deepseek/deepseek-r1-distill-llama-8b:bf16
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L4      | 90k (FP8), 39k (BF16) | 
-  | L40S      | 131k (FP8, BF16) | 
-  | H100      | 131k (FP8, BF16) |
-
-  #### Sending Managed Inference requests
-
-  To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"deepseek/deepseek-r1-distill-llama-8b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
 
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (user, assistant) and content.
-  </Message>
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, Simplified Chinese
 
-  <Message type="tip">
-    This model is better used without `system prompt`, as suggested by the model provider.
-  </Message>
+#### Model names
 
-</Concept>
-
-<Concept>
+```
+deepseek/deepseek-r1-distill-llama-8b:bf16
+```
 
 ### Mistral-7b-instruct-v0.3
 
-  The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
-  This model is open-weight and distributed under the Apache 2.0 license.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Mistral](https://mistral.ai/technology/#models)     |
-  | Compatible Instances | L4, L40S, H100, H100-2 (BF16)     |
-  | Context size | 32K tokens    |
-
-  #### Model name
-
-  ```bash
-  mistral/mistral-7b-instruct-v0.3:bf16
-  ```
-
-  #### Compatible Instances
+The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
+This model is open-weight and distributed under the Apache 2.0 license.
 
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L4      | 32k (BF16) |
-  | L40S      | 32k (BF16) |
-  | H100      | 32k (BF16) |
-  | H100-2      | 32k (BF16) |
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English
 
-  #### Sending Inference requests
+#### Model name
 
-  To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"mistral/mistral-7b-instruct-v0.3:bf16", "messages":[{"role": "user","content": "Explain Public Cloud in a nutshell."}], "top_p": 1, "temperature": 0.7, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
-
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-</Concept>
-
-<Concept>
+```
+mistral/mistral-7b-instruct-v0.3:bf16
+```
 
 ###  Mistral-small-24b-base-2501
 
-  Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
-  This model is open-weight and distributed under the Apache 2.0 license.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Mistral](https://mistral.ai/technology/#models)  |
-  | Compatible Instances | L40S, H100, H100-2 (FP8) |
-  | Context size | 32K tokens |
-
-  #### Model name
-
-  ```bash
-  mistral/mistral-small-24b-instruct-2501:fp8
-  ```
-
-  #### Compatible Instances
+Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
+This model is open-weight and distributed under the Apache 2.0 license.
 
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L40      | 20k (FP8) |
-  | H100      | 32k (FP8) |
-  | H100-2      | 32k (FP8) |
-
-  #### Sending Inference requests
-
-  To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"mistral/mistral-small-24b-instruct-2501:fp8", "messages":[{"role": "user","content": "Tell me about Scaleway."}], "top_p": 1, "temperature": 0.7, "stream": false}'
-  ```
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
 
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+#### Model name
 
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-</Concept>
-
-<Concept>
+```
+mistral/mistral-small-24b-instruct-2501:fp8
+```
 
 ### Mistral-nemo-instruct-2407
 
-  Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
-  This model is open-weight and distributed under the Apache 2.0 license.
-  It was trained on a large proportion of multilingual and code data.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Mistral](https://mistral.ai/technology/#models)  |
-  | Compatible Instances | L40S, H100, H100-2 (FP8) |
-  | Context size | 128K tokens |
-
-  #### Model name
-
-  ```bash
-  mistral/mistral-nemo-instruct-2407:fp8
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L40      | 128k (FP8) |
-  | H100      | 128k (FP8) |
-  | H100-2      | 128k (FP8) |
-
-  #### Sending Inference requests
-
-  <Message type="tip">
-    Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. It is recommend to use a temperature of 0.35.
-  </Message>
-
-  To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"mistral/mistral-nemo-instruct-2407:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "top_p": 1, "temperature": 0.35, "stream": false}'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
+This model is open-weight and distributed under the Apache 2.0 license.
+It was trained on a large proportion of multilingual and code data.
 
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
 
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
+#### Model name
 
-</Concept>
-
-<Concept>
+```
+mistral/mistral-nemo-instruct-2407:fp8
+```
 
 ### Moshiko-0.1-8b
 
-  Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
-  Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
-  While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
-  Moshiko is the variant of Moshi with a male voice in English.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Kyutai](https://github.com/kyutai-labs/moshi)  |
-  | Compatible Instances | L4, H100 (FP8, BF16)    |
-  | Context size | 4096 tokens   |
-
-  #### Model names
-
-  ```bash
-  kyutai/moshiko-0.1-8b:bf16
-  kyutai/moshiko-0.1-8b:fp8
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L4      | 4096 (FP8, BF16) | 
-  | H100      | 4096 (FP8, BF16) |
-
-  #### How to use it
-
-  To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
-
-  ```bash
-  wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
-  ```
-
-  #### Testing the WebSocket endpoint
-
-  To test the endpoint, use the following command:
-
-  ```bash
-  curl -i --http1.1 \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Connection: Upgrade" \
-  -H "Upgrade: websocket" \
-  -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
-  -H "Sec-WebSocket-Version: 13" \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+Moshiko is the variant of Moshi with a male voice in English.
 
-  <Message type="tip">
-    Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
-  </Message>
+- Structured output supported: No
+- Function calling: No
+- Supported languages: English
 
-  The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
+#### Model names
 
-  #### Interacting with the model
-
-  We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
-  Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
-  This repository contains instructions on how to run the code samples and interact with the model.
-
-</Concept>
-
-<Concept>
+```
+kyutai/moshiko-0.1-8b:bf16
+kyutai/moshiko-0.1-8b:fp8
+```
 
 ### Moshika-0.1-8b
 
+Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+Moshika is the variant of Moshi with a female voice in English.
 
-  Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
-  Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
-  While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
-  Moshiko is the variant of Moshi with a male voice in English.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Kyutai](https://github.com/kyutai-labs/moshi)  |
-  | Compatible Instances | L4, H100 (FP8, BF16)    |
-  | Context size | 4096 tokens   |
-
-  #### Model names
-
-  ```bash
-  kyutai/moshiko-0.1-8b:bf16
-  kyutai/moshiko-0.1-8b:fp8
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L4      | 4096 (FP8, BF16) | 
-  | H100      | 4096 (FP8, BF16) |
-
-  #### How to use it
-
-  To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
-
-  ```bash
-  wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
-  ```
-
-  #### Testing the WebSocket endpoint
+- Structured output supported: No
+- Function calling: No
+- Supported languages: English
 
-  To test the endpoint, use the following command:
-
-  ```bash
-  curl -i --http1.1 \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Connection: Upgrade" \
-  -H "Upgrade: websocket" \
-  -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
-  -H "Sec-WebSocket-Version: 13" \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
-  ```
+#### Model names
 
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
-  </Message>
-
-  The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
-
-  #### Interacting with the model
-
-  We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
-  Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
-  This repository contains instructions on how to run the code samples and interact with the model.
-
-</Concept>
-
-<Concept>
+```
+kyutai/moshika-0.1-8b:bf16
+kyutai/moshika-0.1-8b:fp8
+```
 
 ### WizardLM-70B-V1.0
 
-  WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
-  With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [WizardLM](https://wizardlm.github.io/WizardLM2/) |
-  | Compatible Instances | H100 (FP8) - H100-2 (FP16)    |
-  | Context size | 4,096 tokens    |
-
-  #### Model names
-
-  ```bash
-  wizardlm/wizardlm-70b-v1.0:fp8
-  wizardlm/wizardlm-70b-v1.0:fp16
-  ```
-
-  #### Compatible Instances
-
-  - [H100-1 (INT8)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
-  - [H100-2 (FP16)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
-
-  #### Sending Inference requests
-
-  To perform inference tasks with your WizardLM model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"wizardlm/wizardlm-70b-v1.0:fp8", "messages":[{"role": "user","content": "Say hello to Scaleway's Inference"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}'
-  ```
+WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
+With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
 
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+- Structured output supported: Yes
+- Function calling: No
+- Supported languages: English (to be verified)
 
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
+#### Model names
 
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-</Concept>
+```
+wizardlm/wizardlm-70b-v1.0:fp8
+wizardlm/wizardlm-70b-v1.0:fp16
+```
 
 ## Multimodal models
 
-<Concept>
-
 ### Pixtral-12b-2409
 
-  Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
-  It can analyze images and offer insights from visual content alongside text.
-  This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
+Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
+It can analyze images and offer insights from visual content alongside text.
+This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
 
-  Pixtral is open-weight and distributed under the Apache 2.0 license.
-
-  <Message type="note">
-    Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
-  </Message>
+Pixtral is open-weight and distributed under the Apache 2.0 license.
 
+<Message type="note">
+  Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+</Message>
 
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Mistral](https://mistral.ai/technology/#models)    |
-  | Compatible Instances | L40S, H100, H100-2 (bf16)      |
-  | Context size | 128k tokens  |
+- Structured output supported: Yes
+- Function calling: No
+- Supported languages: English, French, German, Spanish (to be verified)
 
   #### Model name
 
-  ```bash
-  mistral/pixtral-12b-2409:bf16
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L40S      | 50k (BF16)
-  | H100      | 128k (BF16)
-  | H100-2      | 128k (BF16)
-
-  #### Sending Inference requests
-
-  <Message type="tip">
-    Unlike previous Mistral models, Pixtral can take an `image_url` in the content array.
-  </Message>
-
-  To perform inference tasks with your Pixtral model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scw.cloud/v1/chat/completions" \
-  --data '{
-        "model": "mistral/pixtral-12b-2409:bf16",
-        "messages": [
-          {
-            "role": "user",
-            "content": [
-                {"type" : "text", "text": "Describe this image in detail please."},
-                {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
-                {"type" : "text", "text": "and this one as well."},
-                {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}}
-            ]
-          }
-        ],
-        "top_p": 1,
-        "temperature": 0.7,
-        "stream": false
-  }'
   ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
-
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-  #### Passing images to Pixtral
-
-  1. Image URLs
-  If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding.
-
-  2. Base64 encoded image
-  Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet.
-
-  The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload.
-
-
-  ```python
-  import base64
-  from io import BytesIO
-  from PIL import Image
-
-  def encode_image(img):
-      buffered = BytesIO()
-      img.save(buffered, format="JPEG")
-      encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8")
-      return encoded_string
-
-  img = Image.open("path_to_your_image.jpg")
-  base64_img = encode_image(img)
-
-  payload = {
-      "messages": [
-          {
-              "role": "user",
-              "content": [
-                  {"type": "text", "text": "What is this image?"},
-                  {
-                      "type": "image_url",
-                      "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"},
-                  },
-              ],
-          }
-      ],
-      ... # other parameters
-  }
-
+  mistral/pixtral-12b-2409:bf16
   ```
 
-  #### Receiving Managed Inference responses
-
-  Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
-  Process the output data according to your application's needs. The response will contain the output generated by the visual language model based on the input provided in the request.
-
-  <Message type="note">
-    Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
-  </Message>
-
-  #### Frequently Asked Questions
-
-  ##### What types of images are supported by Pixtral?
-  - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular.
-  - Vector image formats (SVG, PSD) are not supported.
-
-  ##### Are other files supported?
-  Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported.
-
-  ##### Is there a limit to the size of each image?
-  Images size are limited:
-  - Directly by the maximum context window. As an example, since tokens are squares of 16x16 pixels, the maximum context window taken by a single image is `4096` tokens (ie. `(1024*1024)/(16*16)`)
-  - Indirectly by the model accuracy: resolution above 1024x1024 will not increase model output accuracy. Indeed, images above 1024 pixels width or height will be automatically downscaled to fit within 1024x1024 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionaly compressed).
-
-  ##### What is the maximum amount of images per conversation?
-  One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.
-
-</Concept>
-
-<Concept>
-
 ### Molmo-72b-0924
 
-  Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
-  Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. 
-
-  Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
-  Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
-
-  <Message type="note">
-    Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
-  </Message>
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Allen Institute for AI](https://molmo.allenai.org/blog)                         |
-  | License        | Apache 2.0  |                |
-  | Compatible Instances | H100-2 (FP8)                 |
-  | Context size | 50k tokens    |
-
-  #### Model name
-
-  ```bash
-  allenai/molmo-72b-0924:fp8
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | H100-2      | 50k (FP8)
-
-  #### Sending inference requests
-
-  <Message type="tip">
-    Unlike regular chat models, Molmo-72b can take an `image_url` in the content array.
-  </Message>
-
-  To perform inference tasks with your Molmo-72b model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scw.cloud/v1/chat/completions" \
-  --data '{
-        "model": "allenai/molmo-72b-0924:fp8",
-        "messages": [
-          {
-            "role": "user",
-            "content": [
-                {"type" : "text", "text": "Describe this image in detail please."},
-                {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}
-            ]
-          }
-        ],
-        "top_p": 1, 
-        "temperature": 0.7, 
-        "stream": false
-  }'
-  ```
-
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
-
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
-
-  #### Passing images to Molmo-72b
-
-  ##### Image URLs
-  If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding.
-
-  ##### Base64 encoded image
-  Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet.
-
-  The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload.
+Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
+Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. 
 
-  ```python
-  import base64
-  from io import BytesIO
-  from PIL import Image
+Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
+Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
 
-  def encode_image(img):
-      buffered = BytesIO()
-      img.save(buffered, format="JPEG")
-      encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8")
-      return encoded_string
-
-  img = Image.open("path_to_your_image.jpg")
-  base64_img = encode_image(img)
-
-  payload = {
-      "messages": [
-          {
-              "role": "user",
-              "content": [
-                  {"type": "text", "text": "What is this image?"},
-                  {
-                      "type": "image_url",
-                      "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"},
-                  },
-              ],
-          }
-      ],
-      ... # other parameters
-  }
-
-  ```
-
-  #### Frequently Asked Questions
-
-  ##### What types of images are supported by Molmo-72b?
-  - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular.
-  - Vector image formats (SVG, PSD) are not supported.
-
-  ##### Are other file types supported?
-  Only bitmaps can be analyzed by Molmo. PDFs and videos are not supported.
+<Message type="note">
+  Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+</Message>
 
-  ##### Is there a limit to the size of each image?
-  The only limitation is the context window (1 token for each 16x16 pixel).
+- Structured output supported: Yes
+- Function calling: No
+- Supported languages: English, French, German, Spanish (to be verified)
 
-  ##### What is the maximum amount of images per conversation?
-  One conversation can handle a maximum of 1 image (per request). Sending more than one image will return a 400 error.
+#### Model name
 
-</Concept>
+```
+allenai/molmo-72b-0924:fp8
+```
 
 ## Code models
 
-<Concept>
-
 ### Qwen2.5-coder-32b-instruct
 
-  Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
-  With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. 
+Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
+With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. 
 
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [Qwen](https://qwenlm.github.io/)  |
-  | License        | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE)  |
-  | Compatible Instances | H100, H100-2 (INT8) |
-  | Context Length | up to 32k tokens |
+- Structured output supported: Yes
+- Function calling: Yes
+- Supported languages: over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
 
-  #### Model names
+#### Model name
 
-  ```bash
-  qwen/qwen2.5-coder-32b-instruct:int8
   ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | H100      | 32k (INT8)
-  | H100-2      | 32k (INT8)
-
-  #### Sending Managed Inference requests
-
-  To perform inference tasks with your Qwen2.5-coder deployed at Scaleway, use the following command:
-
-  ```bash
-  curl -s \
-  -H "Authorization: Bearer <IAM API key>" \
-  -H "Content-Type: application/json" \
-  --request POST \
-  --url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
-  --data '{"model":"qwen/qwen2.5-coder-32b-instruct:int8", "messages":[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful code assistant."},{"role": "user","content": "Write a quick sort algorithm."}], "max_tokens": 1000, "temperature": 0.8, "stream": false}'
+  qwen/qwen2.5-coder-32b-instruct:int8
   ```
 
-  <Message type="tip">
-    The model name allows Scaleway to put your prompts in the expected format.
-  </Message>
-
-  <Message type="note">
-    Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
-  </Message>
-
-</Concept>
-
 ## Embeddings models
 
-<Concept>
-
 ### Sentence-t5-xxl
 
-  The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
-  Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
-  This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
-
-  | Attribute       | Details                            |
-  |-----------------|------------------------------------|
-  | Provider        | [sentence-transformers](https://www.sbert.net/)  |
-  | Compatible Instances | L4 (FP32)    |
-  | Context size | 512 tokens    |
-
-  #### Model name
+The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
+Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
+This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
 
-  ```bash
-  sentence-transformers/sentence-t5-xxl:fp32
-  ```
-
-  #### Compatible Instances
-
-  | Instance type  | Max context length |
-  | ------------- |-------------|
-  | L4      | 512 (FP32) |
-
-  #### Sending Managed Inference requests
-
-  To perform inference tasks with your Embedding model deployed at Scaleway, use the following command:
-
-  ```bash
-  curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings \
-    -H "Authorization: Bearer <IAM API key>" \
-    -H "Content-Type: application/json" \
-    -d '{
-      "input": "Embeddings can represent text in a numerical format.",
-      "model": "sentence-transformers/sentence-t5-xxl:fp32"
-    }'
-  ```
+- Structured output supported: No
+- Function calling: No
+- Supported languages: English (to be verified)
 
-  Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+#### Model name
 
-</Concept>
\ No newline at end of file
+```
+sentence-transformers/sentence-t5-xxl:fp32
+```
\ No newline at end of file

From 9a6419f870ef253e4337f14d6005215917d77d4c Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@online.net>
Date: Thu, 17 Apr 2025 11:10:16 +0200
Subject: [PATCH 04/16] docs(infr): add table

---
 .../reference-content/models.mdx              | 24 ++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx
index 4324a44bde..93dd58b49d 100644
--- a/pages/managed-inference/reference-content/models.mdx
+++ b/pages/managed-inference/reference-content/models.mdx
@@ -17,7 +17,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 
 ## Summary table
 
-| Model Name | Provider | Context Size | Modalities | Instances | License |
+| Model Name | Provider | Context Length | Modalities | Instances | License |
 |------------|----------|--------------|------------|-----------|---------|
 | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
 | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
@@ -38,6 +38,28 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
 | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
 
+
+| Model Name | Structured output supported | Function calling | Supported languages |
+| --- | --- | --- | --- |
+| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish |
+| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
+| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese |
+| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese |
+| `Mistral-7b-instruct-v0.3` | Yes | Yes | English |
+| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
+| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
+| `Moshiko-0.1-8b` | No | No | English |
+| `Moshika-0.1-8b` | No | No | English |
+| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) |
+| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
+| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) |
+| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
+| `Sentence-t5-xxl` | No | No | English (to be verified) |
+
 ## Model details
 
 <Message type="note">

From 236bc4189e77e0828b08ab81014fead9fb11391d Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@online.net>
Date: Thu, 17 Apr 2025 13:45:26 +0200
Subject: [PATCH 05/16] docs(infr): test

---
 .../reference-content/models2.mdx             | 366 ++++++++++++++++++
 1 file changed, 366 insertions(+)
 create mode 100644 pages/managed-inference/reference-content/models2.mdx

diff --git a/pages/managed-inference/reference-content/models2.mdx b/pages/managed-inference/reference-content/models2.mdx
new file mode 100644
index 0000000000..1eb2e0bc7e
--- /dev/null
+++ b/pages/managed-inference/reference-content/models2.mdx
@@ -0,0 +1,366 @@
+---
+meta:
+  title: Managed Inference model catalog
+  description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1: Managed Inference model catalog
+  paragraph: This page provides information on the Scaleway Managed Inference product catalog
+tags:
+dates:
+  validation: 2025-03-19
+  posted: 2024-05-28
+categories:
+  - ai-data
+---
+A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
+
+## Summary table
+| Model Name | Provider | Context Length | Modalities | Instances | License |
+|------------|----------|--------------|------------|-----------|---------|
+| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
+| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
+| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
+| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community |
+| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
+| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | Lllama 3.3 community |
+| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT |
+| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 |
+| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 |
+| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 |
+| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 |
+| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
+| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
+| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
+| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
+| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
+| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
+| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
+
+| Model Name | Structured output supported | Function calling | Supported languages |
+| --- | --- | --- | --- |
+| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish |
+| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
+| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese |
+| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese |
+| `Mistral-7b-instruct-v0.3` | Yes | Yes | English |
+| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
+| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
+| `Moshiko-0.1-8b` | No | No | English |
+| `Moshika-0.1-8b` | No | No | English |
+| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) |
+| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
+| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) |
+| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
+| `Sentence-t5-xxl` | No | No | English (to be verified) |
+
+## Model details
+<Message type="note">
+  Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
+</Message>
+
+## Text models
+
+### Mixtral-8x7b-instruct-v0.1
+Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
+Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | No |
+| Supported languages | English, French, German, Spanish |
+
+#### Model names
+```
+mistral/mixtral-8x7b-instruct-v0.1:fp8
+mistral/mixtral-8x7b-instruct-v0.1:bf16
+```
+
+### Llama-3.1-70b-instruct
+Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
+Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+
+#### Model names
+```
+meta/llama-3.1-70b-instruct:fp8
+meta/llama-3.1-70b-instruct:bf16
+```
+
+### Llama-3.1-8b-instruct
+Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
+Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+
+#### Model names
+```
+meta/llama-3.1-8b-instruct:fp8
+meta/llama-3.1-8b-instruct:bf16
+```
+
+### Llama-3-70b-instruct
+Meta’s Llama 3 is an iteration of the open-access Llama family.
+Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
+With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+
+#### Model name
+```
+meta/llama-3-70b-instruct:fp8
+```
+
+### Llama-3.3-70b-instruct
+Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
+This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+
+#### Model name
+```
+meta/llama-3.3-70b-instruct:bf16
+```
+
+### Llama-3.1-Nemotron-70b-instruct
+Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions.
+NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify) |
+
+#### Model name
+```
+meta/llama-3.1-nemotron-70b-instruct:fp8
+```
+
+### DeepSeek-R1-Distill-Llama-70B
+Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
+DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, Simplified Chinese |
+
+#### Model name
+```
+deepseek/deepseek-r1-distill-llama-70b:bf16
+```
+
+### DeepSeek-R1-Distill-Llama-8B
+Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
+DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, Simplified Chinese |
+
+#### Model names
+```
+deepseek/deepseek-r1-distill-llama-8b:bf16
+```
+
+### Mistral-7b-instruct-v0.3
+The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
+This model is open-weight and distributed under the Apache 2.0 license.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English |
+
+#### Model name
+```
+mistral/mistral-7b-instruct-v0.3:bf16
+```
+
+### Mistral-small-24b-base-2501
+Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
+This model is open-weight and distributed under the Apache 2.0 license.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
+
+#### Model name
+```
+mistral/mistral-small-24b-instruct-2501:fp8
+```
+
+### Mistral-nemo-instruct-2407
+Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
+This model is open-weight and distributed under the Apache 2.0 license.
+It was trained on a large proportion of multilingual and code data.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
+
+#### Model name
+```
+mistral/mistral-nemo-instruct-2407:fp8
+```
+
+### Moshiko-0.1-8b
+Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+Moshiko is the variant of Moshi with a male voice in English.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | No |
+| Function calling | No |
+| Supported languages | English |
+
+#### Model names
+```
+kyutai/moshiko-0.1-8b:bf16
+kyutai/moshiko-0.1-8b:fp8
+```
+
+### Moshika-0.1-8b
+Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+Moshika is the variant of Moshi with a female voice in English.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | No |
+| Function calling | No |
+| Supported languages | English |
+
+#### Model names
+```
+kyutai/moshika-0.1-8b:bf16
+kyutai/moshika-0.1-8b:fp8
+```
+
+### WizardLM-70B-V1.0
+WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
+With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | No |
+| Supported languages | English (to be verified) |
+
+#### Model names
+```
+wizardlm/wizardlm-70b-v1.0:fp8
+wizardlm/wizardlm-70b-v1.0:fp16
+```
+
+## Multimodal models
+
+### Pixtral-12b-2409
+Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
+It can analyze images and offer insights from visual content alongside text.
+This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
+Pixtral is open-weight and distributed under the Apache 2.0 license.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | No |
+| Supported languages | English, French, German, Spanish (to be verified) |
+
+<Message type="note">
+  Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+</Message>
+
+#### Model name
+```
+mistral/pixtral-12b-2409:bf16
+```
+
+### Molmo-72b-0924
+Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
+Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
+Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
+Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | No |
+| Supported languages | English, French, German, Spanish (to be verified) |
+
+<Message type="note">
+  Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+</Message>
+
+#### Model name
+```
+allenai/molmo-72b-0924:fp8
+```
+
+## Code models
+
+### Qwen2.5-coder-32b-instruct
+Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
+With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | Yes |
+| Function calling | Yes |
+| Supported languages | over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
+
+#### Model name
+```
+qwen/qwen2.5-coder-32b-instruct:int8
+```
+
+## Embeddings models
+
+### Sentence-t5-xxl
+The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
+Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
+This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
+
+| Attribute | Value |
+|-----------|-------|
+| Structured output supported | No |
+| Function calling | No |
+| Supported languages | English (to be verified) |
+
+#### Model name
+```
+sentence-transformers/sentence-t5-xxl:fp32
+```

From cbce849bbee4aa4e7bbde92fc04fc4db4f6fbb00 Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@online.net>
Date: Fri, 18 Apr 2025 11:30:31 +0200
Subject: [PATCH 06/16] feat(infr): add model catalog

---
 menu/navigation.json                          | 104 ++---
 .../{models2.mdx => model-catalog.mdx}        |  14 +-
 .../reference-content/models.mdx              | 375 ------------------
 3 files changed, 62 insertions(+), 431 deletions(-)
 rename pages/managed-inference/reference-content/{models2.mdx => model-catalog.mdx} (98%)
 delete mode 100644 pages/managed-inference/reference-content/models.mdx

diff --git a/menu/navigation.json b/menu/navigation.json
index 8c65f19497..ef374612b9 100644
--- a/menu/navigation.json
+++ b/menu/navigation.json
@@ -872,6 +872,10 @@
                     "label": "Support for function calling in Scaleway Managed Inference",
                     "slug": "function-calling-support"
                   },
+                  {
+                    "label": "Managed Inference model catalog",
+                    "slug": "model-catalog"
+                  },
                   {
                     "label": "BGE-Multilingual-Gemma2 model",
                     "slug": "bge-multilingual-gemma2"
@@ -3300,7 +3304,7 @@
                 "slug": "faq"
               },
               {
-              "items": [
+                "items": [
                   {
                     "label": "Order an InterLink",
                     "slug": "order-interlink"
@@ -3310,61 +3314,61 @@
                     "slug": "complete-provisioning"
                   },
                   {
-                  "label": "Configure an InterLink",
-                  "slug": "configure-interlink"
-                },
-                {
-                  "label": "Create a routing policy",
-                  "slug": "create-routing-policy"
-                },
-                {
-                  "label": "Delete an InterLink",
-                  "slug": "delete-interlink"
-                }
-            ],
-            "label": "How to",
-            "slug": "how-to"
-          },
-          {
-            "items": [
-              {
-                "label": "InterLink API Reference",
-                "slug": "https://www.scaleway.com/en/developers/api/interlink/"
-              }
-            ],
-            "label": "API/CLI",
-            "slug": "api-cli"
-          },
-          {
-            "items": [
-              {
-                "label": "InterLink overview",
-                "slug": "overview"
-              },
-              {
-                "label": "InterLink provisioning",
-                "slug": "provisioning"
-              },
-              {
-                "label": "Configuring an InterLink",
-                "slug": "configuring"
+                    "label": "Configure an InterLink",
+                    "slug": "configure-interlink"
+                  },
+                  {
+                    "label": "Create a routing policy",
+                    "slug": "create-routing-policy"
+                  },
+                  {
+                    "label": "Delete an InterLink",
+                    "slug": "delete-interlink"
+                  }
+                ],
+                "label": "How to",
+                "slug": "how-to"
               },
               {
-                "label": "InterLink statuses",
-                "slug": "statuses"
+                "items": [
+                  {
+                    "label": "InterLink API Reference",
+                    "slug": "https://www.scaleway.com/en/developers/api/interlink/"
+                  }
+                ],
+                "label": "API/CLI",
+                "slug": "api-cli"
               },
               {
-                "label": "Using BGP communities",
-                "slug": "bgp-communities"
+                "items": [
+                  {
+                    "label": "InterLink overview",
+                    "slug": "overview"
+                  },
+                  {
+                    "label": "InterLink provisioning",
+                    "slug": "provisioning"
+                  },
+                  {
+                    "label": "Configuring an InterLink",
+                    "slug": "configuring"
+                  },
+                  {
+                    "label": "InterLink statuses",
+                    "slug": "statuses"
+                  },
+                  {
+                    "label": "Using BGP communities",
+                    "slug": "bgp-communities"
+                  }
+                ],
+                "label": "Additional Content",
+                "slug": "reference-content"
               }
             ],
-            "label": "Additional Content",
-            "slug": "reference-content"
-          }
-        ],
-        "label": "InterLink",
-        "slug": "interlink"
-      },
+            "label": "InterLink",
+            "slug": "interlink"
+          },
           {
             "items": [
               {
diff --git a/pages/managed-inference/reference-content/models2.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
similarity index 98%
rename from pages/managed-inference/reference-content/models2.mdx
rename to pages/managed-inference/reference-content/model-catalog.mdx
index 1eb2e0bc7e..d441bd68b1 100644
--- a/pages/managed-inference/reference-content/models2.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -7,15 +7,16 @@ content:
   paragraph: This page provides information on the Scaleway Managed Inference product catalog
 tags:
 dates:
-  validation: 2025-03-19
-  posted: 2024-05-28
+  validation: 2025-04-18
+  posted: 2024-04-18
 categories:
   - ai-data
 ---
 A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
 
-## Summary table
-| Model Name | Provider | Context Length | Modalities | Instances | License |
+## Models technical summary
+
+| Model name | Provider | Context length | Modalities | Instances | License |
 |------------|----------|--------------|------------|-----------|---------|
 | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
 | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
@@ -36,7 +37,8 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
 | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
 
-| Model Name | Structured output supported | Function calling | Supported languages |
+## Models feature summary
+| Model name | Structured output supported | Function calling | Supported languages |
 | --- | --- | --- | --- |
 | `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish |
 | `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
@@ -151,7 +153,7 @@ NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune t
 |-----------|-------|
 | Structured output supported | Yes |
 | Function calling | Yes |
-| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify) |
+| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
 
 #### Model name
 ```
diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx
deleted file mode 100644
index 93dd58b49d..0000000000
--- a/pages/managed-inference/reference-content/models.mdx
+++ /dev/null
@@ -1,375 +0,0 @@
----
-meta:
-  title: Managed Inference model catalog
-  description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed.
-content:
-  h1:  Managed Inference model catalog
-  paragraph: This page provides information on the Scaleway Managed Inference product catalog
-tags:
-dates:
-  validation: 2025-03-19
-  posted: 2024-05-28
-categories:
-  - ai-data
----
-
-A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
-
-## Summary table
-
-| Model Name | Provider | Context Length | Modalities | Instances | License |
-|------------|----------|--------------|------------|-----------|---------|
-| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
-| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
-| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
-| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community |
-| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
-| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community |
-| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT |
-| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 |
-| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 |
-| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 |
-| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 |
-| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
-| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
-| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
-| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
-| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
-| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
-| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
-
-
-| Model Name | Structured output supported | Function calling | Supported languages |
-| --- | --- | --- | --- |
-| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish |
-| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
-| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese |
-| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese |
-| `Mistral-7b-instruct-v0.3` | Yes | Yes | English |
-| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
-| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
-| `Moshiko-0.1-8b` | No | No | English |
-| `Moshika-0.1-8b` | No | No | English |
-| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) |
-| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
-| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) |
-| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
-| `Sentence-t5-xxl` | No | No | English (to be verified) |
-
-## Model details
-
-<Message type="note">
-  Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
-</Message>
-
-## Text models
-
-### Mixtral-8x7b-instruct-v0.1
-
-Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
-Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
-
-- Structured output supported: Yes
-- Function calling: No
-- Supported languages: English, French, German, Spanish
-
-#### Model names
-
-```
-mistral/mixtral-8x7b-instruct-v0.1:fp8
-mistral/mixtral-8x7b-instruct-v0.1:bf16
-```
-
-### Llama-3.1-70b-instruct
-
-Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
-Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
-
-#### Model names
-
-```
-meta/llama-3.1-70b-instruct:fp8
-meta/llama-3.1-70b-instruct:bf16
-```
-
-### Llama-3.1-8b-instruct
-
-Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
-Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
-
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
-
-#### Model names
-
-```
-meta/llama-3.1-8b-instruct:fp8
-meta/llama-3.1-8b-instruct:bf16
-```
-
-
-### Llama-3-70b-instruct
-
-Meta’s Llama 3 is an iteration of the open-access Llama family.
-Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
-With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
-
-  #### Model name
-
-  ```
-  meta/llama-3-70b-instruct:fp8
-  ```
-
-
-### Llama-3.3-70b-instruct
-
-Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
-This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
-
-#### Model name
-
-```
-meta/llama-3.3-70b-instruct:bf16
-```
-
-### Llama-3.1-Nemotron-70b-instruct
-
-Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. 
-NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify)
-
-#### Model name
-
-```
-meta/llama-3.1-nemotron-70b-instruct:fp8
-```
-
-### DeepSeek-R1-Distill-Llama-70B
-
-Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
-DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, Simplified Chinese
-#### Model name
-
-```
-deepseek/deepseek-r1-distill-llama-70b:bf16
-```
-
-### DeepSeek-R1-Distill-Llama-8B
-
-Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
-DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
-
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, Simplified Chinese
-
-#### Model names
-
-```
-deepseek/deepseek-r1-distill-llama-8b:bf16
-```
-
-### Mistral-7b-instruct-v0.3
-
-The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
-This model is open-weight and distributed under the Apache 2.0 license.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English
-
-#### Model name
-
-```
-mistral/mistral-7b-instruct-v0.3:bf16
-```
-
-###  Mistral-small-24b-base-2501
-
-Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
-This model is open-weight and distributed under the Apache 2.0 license.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
-
-#### Model name
-
-```
-mistral/mistral-small-24b-instruct-2501:fp8
-```
-
-### Mistral-nemo-instruct-2407
-
-Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
-This model is open-weight and distributed under the Apache 2.0 license.
-It was trained on a large proportion of multilingual and code data.
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
-
-#### Model name
-
-```
-mistral/mistral-nemo-instruct-2407:fp8
-```
-
-### Moshiko-0.1-8b
-
-Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
-Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
-While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
-Moshiko is the variant of Moshi with a male voice in English.
-
-- Structured output supported: No
-- Function calling: No
-- Supported languages: English
-
-#### Model names
-
-```
-kyutai/moshiko-0.1-8b:bf16
-kyutai/moshiko-0.1-8b:fp8
-```
-
-### Moshika-0.1-8b
-
-Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
-Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
-While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
-Moshika is the variant of Moshi with a female voice in English.
-
-- Structured output supported: No
-- Function calling: No
-- Supported languages: English
-
-#### Model names
-
-```
-kyutai/moshika-0.1-8b:bf16
-kyutai/moshika-0.1-8b:fp8
-```
-
-### WizardLM-70B-V1.0
-
-WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
-With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
-
-- Structured output supported: Yes
-- Function calling: No
-- Supported languages: English (to be verified)
-
-#### Model names
-
-```
-wizardlm/wizardlm-70b-v1.0:fp8
-wizardlm/wizardlm-70b-v1.0:fp16
-```
-
-## Multimodal models
-
-### Pixtral-12b-2409
-
-Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
-It can analyze images and offer insights from visual content alongside text.
-This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
-
-Pixtral is open-weight and distributed under the Apache 2.0 license.
-
-<Message type="note">
-  Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
-</Message>
-
-- Structured output supported: Yes
-- Function calling: No
-- Supported languages: English, French, German, Spanish (to be verified)
-
-  #### Model name
-
-  ```
-  mistral/pixtral-12b-2409:bf16
-  ```
-
-### Molmo-72b-0924
-
-Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
-Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. 
-
-Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
-Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
-
-<Message type="note">
-  Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
-</Message>
-
-- Structured output supported: Yes
-- Function calling: No
-- Supported languages: English, French, German, Spanish (to be verified)
-
-#### Model name
-
-```
-allenai/molmo-72b-0924:fp8
-```
-
-## Code models
-
-### Qwen2.5-coder-32b-instruct
-
-Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
-With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. 
-
-- Structured output supported: Yes
-- Function calling: Yes
-- Supported languages: over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
-
-#### Model name
-
-  ```
-  qwen/qwen2.5-coder-32b-instruct:int8
-  ```
-
-## Embeddings models
-
-### Sentence-t5-xxl
-
-The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
-Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
-This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
-
-- Structured output supported: No
-- Function calling: No
-- Supported languages: English (to be verified)
-
-#### Model name
-
-```
-sentence-transformers/sentence-t5-xxl:fp32
-```
\ No newline at end of file

From 83cc5923ec84adb6c31546ea64103485b12c2e79 Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@scaleway.com>
Date: Tue, 22 Apr 2025 09:44:08 +0200
Subject: [PATCH 07/16] Apply suggestions from code review

Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com>
---
 pages/managed-inference/reference-content/model-catalog.mdx | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index d441bd68b1..2fe69e5a6c 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -84,7 +84,7 @@ mistral/mixtral-8x7b-instruct-v0.1:bf16
 
 ### Llama-3.1-70b-instruct
 Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
-Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
+Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
 
 | Attribute | Value |
 |-----------|-------|
@@ -100,7 +100,7 @@ meta/llama-3.1-70b-instruct:bf16
 
 ### Llama-3.1-8b-instruct
 Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
-Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
+Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
 
 | Attribute | Value |
 |-----------|-------|
@@ -162,7 +162,7 @@ meta/llama-3.1-nemotron-70b-instruct:fp8
 
 ### DeepSeek-R1-Distill-Llama-70B
 Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
-DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.
+DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
 
 | Attribute | Value |
 |-----------|-------|

From 904cdc5308b1983889b4876d632c69beadc942c7 Mon Sep 17 00:00:00 2001
From: Benedikt Rollik <brollik@online.net>
Date: Tue, 22 Apr 2025 09:59:31 +0200
Subject: [PATCH 08/16] fix(infr): fix typos

---
 pages/managed-inference/reference-content/model-catalog.mdx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index 2fe69e5a6c..0592b1595f 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -19,7 +19,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | Model name | Provider | Context length | Modalities | Instances | License |
 |------------|----------|--------------|------------|-----------|---------|
 | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
-| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
+| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | Llama 3 community |
 | [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
 | [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community |
 | [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
@@ -34,7 +34,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
 | [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
 | [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
-| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
+| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Apache 2.0 |
 | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
 
 ## Models feature summary

From 2dfb5fe96b8eef2b1df79cb7ac3bbe81d0c4340d Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Tue, 22 Apr 2025 11:00:20 +0200
Subject: [PATCH 09/16] fix(inference): supported languages

---
 .../reference-content/model-catalog.mdx       | 36 +++++++++----------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index 0592b1595f..ff2a49d76c 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -40,24 +40,24 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 ## Models feature summary
 | Model name | Structured output supported | Function calling | Supported languages |
 | --- | --- | --- | --- |
-| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish |
-| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
-| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese |
-| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese |
-| `Mistral-7b-instruct-v0.3` | Yes | Yes | English |
-| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
-| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
-| `Moshiko-0.1-8b` | No | No | English |
-| `Moshika-0.1-8b` | No | No | English |
-| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) |
-| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
-| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) |
-| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
-| `Sentence-t5-xxl` | No | No | English (to be verified) |
+| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
+| `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
+| `llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English |
+| `llama-3-70b-instruct` | Yes | Yes | English |
+| `deepseek-r1-distill-llama-70B` | Yes | Yes | English, Chinese |
+| `deepseek-r1-distill-llama-8B` | Yes | Yes | English, Chinese |
+| `mistral-7b-instruct-v0.3` | Yes | Yes | English |
+| `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
+| `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
+| `moshiko-0.1-8b` | No | No | English |
+| `moshika-0.1-8b` | No | No | English |
+| `wizardLM-70b-v1.0` | Yes | No | English |
+| `pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
+| `molmo-72b-0924` | Yes | No | English |
+| `qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
+| `sentence-t5-xxl` | No | No | English |
 
 ## Model details
 <Message type="note">

From ccda4e577f957f107043f0f8c45aaf4de748f15d Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Tue, 22 Apr 2025 17:28:05 +0200
Subject: [PATCH 10/16] fix(inference): update licenses

---
 .../reference-content/model-catalog.mdx       | 47 ++++++++++---------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index ff2a49d76c..eb0ff42f6e 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -16,31 +16,33 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 
 ## Models technical summary
 
-| Model name | Provider | Context length | Modalities | Instances | License |
+| Model name | Provider | Context length (tokens) | Modalities | Instances | License |
 |------------|----------|--------------|------------|-----------|---------|
-| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
-| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | Llama 3 community |
-| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
-| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community |
-| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
-| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | Lllama 3.3 community |
-| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT |
-| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 |
-| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 |
-| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 |
-| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 |
-| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
-| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
-| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
-| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
-| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
-| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Apache 2.0 |
-| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
+| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 32k | Text | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
+| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
+| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
+| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) |
+| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | [Llama 3.3 community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
+| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
+| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
+| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
+| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
+| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
+| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) |
+| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | No | No | [Gemma](https://ai.google.dev/gemma/terms) |
+| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 
 ## Models feature summary
 | Model name | Structured output supported | Function calling | Supported languages |
 | --- | --- | --- | --- |
-| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
+| `gemma-3-27b-it` | Yes | Partial | English, Chinese, Japanese, Korean and 31 additional languages |
 | `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
 | `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
 | `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
@@ -51,14 +53,17 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | `mistral-7b-instruct-v0.3` | Yes | Yes | English |
 | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
 | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
+| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
 | `moshiko-0.1-8b` | No | No | English |
 | `moshika-0.1-8b` | No | No | English |
 | `wizardLM-70b-v1.0` | Yes | No | English |
 | `pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
 | `molmo-72b-0924` | Yes | No | English |
-| `qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
+| `qwen2.5-coder-32b-instruct` | Yes | Yes | English, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic and 16 additional languages. |
+| `bge-multilingual-gemma2` | No | No | English, French, Chinese, Japanese, Korean |
 | `sentence-t5-xxl` | No | No | English |
 
+
 ## Model details
 <Message type="note">
   Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.

From a165bd1d832559da683e0e4d2a02e73772c62831 Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Tue, 22 Apr 2025 17:37:12 +0200
Subject: [PATCH 11/16] fix(inference): models supported features

---
 pages/managed-inference/reference-content/model-catalog.mdx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index eb0ff42f6e..bfa5716d17 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -53,11 +53,11 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | `mistral-7b-instruct-v0.3` | Yes | Yes | English |
 | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
 | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
-| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
+| `mixtral-8x7b-instruct-v0.1` | Yes | Yes | English, French, German, Italian, Spanish |
 | `moshiko-0.1-8b` | No | No | English |
 | `moshika-0.1-8b` | No | No | English |
 | `wizardLM-70b-v1.0` | Yes | No | English |
-| `pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
+| `pixtral-12b-2409` | Yes | Yes | English |
 | `molmo-72b-0924` | Yes | No | English |
 | `qwen2.5-coder-32b-instruct` | Yes | Yes | English, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic and 16 additional languages. |
 | `bge-multilingual-gemma2` | No | No | English, French, Chinese, Japanese, Korean |

From 041af5f89230dfce66dc231b2072e0ff225398ce Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Wed, 23 Apr 2025 12:06:28 +0200
Subject: [PATCH 12/16] fix(inference): update context length and tasks

---
 .../reference-content/model-catalog.mdx       | 40 +++++++++----------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index bfa5716d17..fc8132bbc0 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -16,28 +16,28 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 
 ## Models technical summary
 
-| Model name | Provider | Context length (tokens) | Modalities | Instances | License |
+| Model name | Provider | Maximum Context length (tokens) | Modalities | Instances | License |
 |------------|----------|--------------|------------|-----------|---------|
-| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 32k | Text | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
-| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
-| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
-| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) |
-| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | [Llama 3.3 community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
-| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
-| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
-| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
-| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 40k | Text, Vision | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
+| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.3 community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
+| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
+| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | 128k | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
+| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k | Text | H100, H100-2 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) |
+| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
+| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
+| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
+| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
-| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
-| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) |
-| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | No | No | [Gemma](https://ai.google.dev/gemma/terms) |
-| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
+| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
+| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) |
+| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) |  BAAI | 4k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
+| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 
 ## Models feature summary
 | Model name | Structured output supported | Function calling | Supported languages |

From f90636d5e8a2d515730b569fbedde7784f065d0e Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Wed, 23 Apr 2025 12:07:48 +0200
Subject: [PATCH 13/16] fix(inference): update task descriptions

---
 pages/managed-inference/reference-content/model-catalog.mdx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index fc8132bbc0..c2ffb335ed 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -30,8 +30,8 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
-| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
+| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Audio to Audio | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
+| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Audio to Audio| L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
 | [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) |
 | [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |

From bd78dfa95a61cdc9228ceebd7cab356177c57877 Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Wed, 23 Apr 2025 13:38:50 +0200
Subject: [PATCH 14/16] feat(inference): add gemma and mistral small
 characteristics

---
 .../reference-content/model-catalog.mdx       | 28 ++++++++++++++-----
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index c2ffb335ed..f987912735 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -27,7 +27,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
 | [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
 | [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mistral-small-3.1-24b-instruct-2503	3`](#mistral-small-3.1-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Audio to Audio | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
@@ -69,18 +69,32 @@ A quick overview of available models in Scaleway's catalog and their core attrib
   Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
 </Message>
 
+## Multimodal models (Text and Vision)
+
+### Gemma-3-27b-it
+Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages.
+The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited.
+
+#### Model names
+```
+google/gemma-3-27b-it:bf16
+```
+
+### Mistral-small-3.1-24b-instruct-2503
+Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages.
+This model was optimized to have a dense knowledge and faster tokens throughput compared to its size.
+
+#### Model names
+```
+mistral/mistral-small-3.1-24b-instruct-2503:bf16
+```
+
 ## Text models
 
 ### Mixtral-8x7b-instruct-v0.1
 Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
 Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | No |
-| Supported languages | English, French, German, Spanish |
-
 #### Model names
 ```
 mistral/mixtral-8x7b-instruct-v0.1:fp8

From 9d9b01942a18cbac1add1ad6716ee74465f6ee90 Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Wed, 23 Apr 2025 14:28:55 +0200
Subject: [PATCH 15/16] feat(inference): restructure model catalog

---
 .../reference-content/model-catalog.mdx       | 208 ++++++------------
 1 file changed, 63 insertions(+), 145 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index f987912735..33d8aba0bf 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -23,7 +23,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
 | [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | 128k | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
 | [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k | Text | H100, H100-2 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) |
-| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
+| [`llama-3.1-nemotron-70b-instruct`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
 | [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
 | [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
 | [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
@@ -34,7 +34,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Audio to Audio| L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
 | [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) |
 | [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) and [Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)|
 | [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) |  BAAI | 4k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
 | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
@@ -71,6 +71,10 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 
 ## Multimodal models (Text and Vision)
 
+<Message type="note">
+  Vision models can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+</Message>
+
 ### Gemma-3-27b-it
 Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages.
 The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited.
@@ -89,28 +93,42 @@ This model was optimized to have a dense knowledge and faster tokens throughput
 mistral/mistral-small-3.1-24b-instruct-2503:bf16
 ```
 
+### Pixtral-12b-2409
+Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
+It can analyze images and offer insights from visual content alongside text.
+This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
+Pixtral is open-weight and distributed under the Apache 2.0 license.
+
+#### Model name
+```
+mistral/pixtral-12b-2409:bf16
+```
+
+### Molmo-72b-0924
+Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
+Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
+
+#### Model name
+```
+allenai/molmo-72b-0924:fp8
+```
+
 ## Text models
 
-### Mixtral-8x7b-instruct-v0.1
-Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
-Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
+### Llama-3.3-70b-instruct
+Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
+This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
 
-#### Model names
+#### Model name
 ```
-mistral/mixtral-8x7b-instruct-v0.1:fp8
-mistral/mixtral-8x7b-instruct-v0.1:bf16
+meta/llama-3.3-70b-instruct:fp8
+meta/llama-3.3-70b-instruct:bf16
 ```
 
 ### Llama-3.1-70b-instruct
 Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
 Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-
 #### Model names
 ```
 meta/llama-3.1-70b-instruct:fp8
@@ -121,12 +139,6 @@ meta/llama-3.1-70b-instruct:bf16
 Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
 Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-
 #### Model names
 ```
 meta/llama-3.1-8b-instruct:fp8
@@ -138,59 +150,27 @@ Meta’s Llama 3 is an iteration of the open-access Llama family.
 Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
 With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-
 #### Model name
 ```
 meta/llama-3-70b-instruct:fp8
 ```
 
-### Llama-3.3-70b-instruct
-Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
-This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
-
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-
-#### Model name
-```
-meta/llama-3.3-70b-instruct:bf16
-```
-
 ### Llama-3.1-Nemotron-70b-instruct
 Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions.
 NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
-
 #### Model name
 ```
-meta/llama-3.1-nemotron-70b-instruct:fp8
+nvidia/llama-3.1-nemotron-70b-instruct:fp8
 ```
 
 ### DeepSeek-R1-Distill-Llama-70B
 Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
 DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, Simplified Chinese |
-
 #### Model name
 ```
+deepseek/deepseek-r1-distill-llama-70b:fp8
 deepseek/deepseek-r1-distill-llama-70b:bf16
 ```
 
@@ -198,27 +178,26 @@ deepseek/deepseek-r1-distill-llama-70b:bf16
 Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
 DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, Simplified Chinese |
-
 #### Model names
 ```
+deepseek/deepseek-r1-distill-llama-8b:fp8
 deepseek/deepseek-r1-distill-llama-8b:bf16
 ```
 
+### Mixtral-8x7b-instruct-v0.1
+Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
+Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
+
+#### Model names
+```
+mistral/mixtral-8x7b-instruct-v0.1:fp8
+mistral/mixtral-8x7b-instruct-v0.1:bf16
+```
+
 ### Mistral-7b-instruct-v0.3
 The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
 This model is open-weight and distributed under the Apache 2.0 license.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English |
-
 #### Model name
 ```
 mistral/mistral-7b-instruct-v0.3:bf16
@@ -228,15 +207,10 @@ mistral/mistral-7b-instruct-v0.3:bf16
 Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
 This model is open-weight and distributed under the Apache 2.0 license.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
-
 #### Model name
 ```
 mistral/mistral-small-24b-instruct-2501:fp8
+mistral/mistral-small-24b-instruct-2501:bf16
 ```
 
 ### Mistral-nemo-instruct-2407
@@ -244,12 +218,6 @@ Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by
 This model is open-weight and distributed under the Apache 2.0 license.
 It was trained on a large proportion of multilingual and code data.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
-
 #### Model name
 ```
 mistral/mistral-nemo-instruct-2407:fp8
@@ -261,12 +229,6 @@ Moshi is an experimental next-generation conversational model, designed to under
 While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
 Moshiko is the variant of Moshi with a male voice in English.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | No |
-| Function calling | No |
-| Supported languages | English |
-
 #### Model names
 ```
 kyutai/moshiko-0.1-8b:bf16
@@ -279,12 +241,6 @@ Moshi is an experimental next-generation conversational model, designed to under
 While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
 Moshika is the variant of Moshi with a female voice in English.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | No |
-| Function calling | No |
-| Supported languages | English |
-
 #### Model names
 ```
 kyutai/moshika-0.1-8b:bf16
@@ -295,91 +251,53 @@ kyutai/moshika-0.1-8b:fp8
 WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
 With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
 
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | No |
-| Supported languages | English (to be verified) |
-
 #### Model names
 ```
 wizardlm/wizardlm-70b-v1.0:fp8
 wizardlm/wizardlm-70b-v1.0:fp16
 ```
 
-## Multimodal models
-
-### Pixtral-12b-2409
-Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
-It can analyze images and offer insights from visual content alongside text.
-This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
-Pixtral is open-weight and distributed under the Apache 2.0 license.
-
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | No |
-| Supported languages | English, French, German, Spanish (to be verified) |
+## Code models
 
-<Message type="note">
-  Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
-</Message>
+### Qwen2.5-coder-32b-instruct
+Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
+With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
 
 #### Model name
 ```
-mistral/pixtral-12b-2409:bf16
+qwen/qwen2.5-coder-32b-instruct:int8
 ```
 
-### Molmo-72b-0924
-Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
-Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
-Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
-Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
+## Embeddings models
+
+### Bge-multilingual-gemma2
+BGE-Multilingual-Gemma2 tops the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard), scoring the number one spot in French and Polish, and number seven in English (as of Q4 2024).
+As its name suggests, the model’s training data spans a broad range of languages, including English, Chinese, Polish, French, and more.
 
 | Attribute | Value |
 |-----------|-------|
-| Structured output supported | Yes |
-| Function calling | No |
-| Supported languages | English, French, German, Spanish (to be verified) |
+| Embedding dimensions | 3584 |
+| Matryoshka embedding | No |
 
 <Message type="note">
-  Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+  [Matryoshka embeddings](https://huggingface.co/blog/matryoshka) refers to embeddings trained on multiple dimensions number. As a result, resulting vectors dimensions will be sorted by most meaningful first. For example, a 3584 dimensions vector can be truncated to its 768 first dimensions and used directly.
 </Message>
 
 #### Model name
 ```
-allenai/molmo-72b-0924:fp8
+baai/bge-multilingual-gemma2:fp32
 ```
 
-## Code models
-
-### Qwen2.5-coder-32b-instruct
-Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
-With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
-
-| Attribute | Value |
-|-----------|-------|
-| Structured output supported | Yes |
-| Function calling | Yes |
-| Supported languages | over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
-
-#### Model name
-```
-qwen/qwen2.5-coder-32b-instruct:int8
-```
-
-## Embeddings models
-
 ### Sentence-t5-xxl
 The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
 Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
 This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
 
+
 | Attribute | Value |
 |-----------|-------|
-| Structured output supported | No |
-| Function calling | No |
-| Supported languages | English (to be verified) |
+| Embedding dimensions | 768 |
+| Matryoshka embedding | No |
 
 #### Model name
 ```

From 228ab0e4011a73b2ea019bf60be41d8d0a574aaf Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Wed, 23 Apr 2025 14:41:02 +0200
Subject: [PATCH 16/16] fix(inference): fix anchors

---
 .../managed-inference/reference-content/model-catalog.mdx | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index 33d8aba0bf..3ea11e5d0e 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -27,7 +27,8 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
 | [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
 | [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`mistral-small-3.1-24b-instruct-2503	3`](#mistral-small-3.1-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | Mistral | 32k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Audio to Audio | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
@@ -46,12 +47,13 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
 | `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
 | `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English |
 | `llama-3-70b-instruct` | Yes | Yes | English |
+| `llama-3.1-nemotron-70b-instruct` | Yes | Yes | English |
 | `deepseek-r1-distill-llama-70B` | Yes | Yes | English, Chinese |
 | `deepseek-r1-distill-llama-8B` | Yes | Yes | English, Chinese |
 | `mistral-7b-instruct-v0.3` | Yes | Yes | English |
 | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
+| `mistral-small-24b-instruct-2501` | Yes | Yes | Text | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
 | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
 | `mixtral-8x7b-instruct-v0.1` | Yes | Yes | English, French, German, Italian, Spanish |
 | `moshiko-0.1-8b` | No | No | English |
@@ -203,7 +205,7 @@ This model is open-weight and distributed under the Apache 2.0 license.
 mistral/mistral-7b-instruct-v0.3:bf16
 ```
 
-### Mistral-small-24b-base-2501
+### Mistral-small-24b-instruct-2501
 Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
 This model is open-weight and distributed under the Apache 2.0 license.