From 8371ccdbaaaf126f22315c5c3d7afdfeb5e19c99 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Tue, 15 Apr 2025 11:26:48 +0200 Subject: [PATCH 01/16] feat(infr): add catalog page --- .../reference-content/models.mdx | 107 ++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 pages/managed-inference/reference-content/models.mdx diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx new file mode 100644 index 0000000000..998e7c7724 --- /dev/null +++ b/pages/managed-inference/reference-content/models.mdx @@ -0,0 +1,107 @@ +--- +meta: + title: Managed Inference model catalog + description: Deploy your own secure Mixtral-8x7b-Instruct model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Managed Inference model catalog + paragraph: This page provides information on the Mixtral-8x7b-instruct-v0.1 model +tags: +dates: + validation: 2025-03-19 + posted: 2024-05-28 +categories: + - ai-data +--- +A quick overview of available models and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities. + + +## Summary table + +| Model Name | Provider | Context Size | Modalities | Instances | Endpoint | +|------------|----------|--------------|------------|-----------|----------| +| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | [Mistral](https://mistral.ai/technology/#models) | 32k | Text | H100 (FP8), H100-2 (BF16) | `/v1/chat/completions` | +| [`molmo-72b-0924`](#molmo-72b-0924) | [Allen Institute](https://molmo.allenai.org/blog) | 50k | Multimodal | H100-2 (FP8) | `/v1/chat/completions` | + + +## Model details + + +## Mixtral-8x7b-instruct-v0.1 +### Overview + +| Attribute | Details | +|----------------------|---------------------------------------------------| +| Provider | [Mistral](https://mistral.ai/technology/#models) | +| Context Size | 32k tokens | +| Compatible Instances | H100 (FP8), H100-2 (BF16) | + +### Model names + +```bash +mistral/mixtral-8x7b-instruct-v0.1:fp8 +mistral/mixtral-8x7b-instruct-v0.1:bf16 +``` + +### How to use (Text Inference) + +```bash +curl -s \ +-H "Authorization: Bearer " \ +-H "Content-Type: application/json" \ +--request POST \ +--url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ +--data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8","messages":[{"role":"user","content":"Sing me a song about Scaleway"}],"max_tokens":200,"top_p":1,"temperature":1}' +``` + + + Ideal for instructional content, multilingual understanding, and code generation. + + + + + + + +## Molmo-72b-0924 + +### Overview + +| Attribute | Details | +|----------------------|------------------------------------------------------------------| +| Provider | [Allen Institute for AI](https://molmo.allenai.org/blog) | +| License | Apache 2.0 | +| Context Size | 50k tokens | +| Compatible Instances | H100-2 (FP8) | + +### Model name + +```bash +allenai/molmo-72b-0924:fp8 +``` + +### How to use (Image + Text) + +```bash +curl -s \ +-H "Authorization: Bearer " \ +-H "Content-Type: application/json" \ +--request POST \ +--url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ +--data '{ + "model": "allenai/molmo-72b-0924:fp8", + "messages": [{ + "role": "user", + "content": [ + {"type": "text", "text": "Describe this image"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}} + ] + }], + "temperature": 0.7 +}' +``` + + + Known limitations: No system role, no structured output (`response_format`), and supports 1 image max per request. + + + From f6dfa9b7d7e52b0b93b9602bac1561b350f1e3f7 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 16 Apr 2025 15:09:18 +0200 Subject: [PATCH 02/16] docs(infr): add model catalog page --- .../reference-content/models.mdx | 1193 ++++++++++++++++- 1 file changed, 1127 insertions(+), 66 deletions(-) diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx index 998e7c7724..f1d04426b0 100644 --- a/pages/managed-inference/reference-content/models.mdx +++ b/pages/managed-inference/reference-content/models.mdx @@ -1,10 +1,10 @@ --- meta: title: Managed Inference model catalog - description: Deploy your own secure Mixtral-8x7b-Instruct model with Scaleway Managed Inference. Privacy-focused, fully managed. + description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed. content: h1: Managed Inference model catalog - paragraph: This page provides information on the Mixtral-8x7b-instruct-v0.1 model + paragraph: This page provides information on the Scaleway Managed Inference product catalog tags: dates: validation: 2025-03-19 @@ -12,96 +12,1157 @@ dates: categories: - ai-data --- -A quick overview of available models and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities. +A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities. ## Summary table -| Model Name | Provider | Context Size | Modalities | Instances | Endpoint | -|------------|----------|--------------|------------|-----------|----------| -| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | [Mistral](https://mistral.ai/technology/#models) | 32k | Text | H100 (FP8), H100-2 (BF16) | `/v1/chat/completions` | -| [`molmo-72b-0924`](#molmo-72b-0924) | [Allen Institute](https://molmo.allenai.org/blog) | 50k | Multimodal | H100-2 (FP8) | `/v1/chat/completions` | - +| Model Name | Provider | Context Size | Modalities | Instances | License | +|------------|----------|--------------|------------|-----------|---------| +| `mixtral-8x7b-instruct-v0.1` | Mistral | 32k tokens | Text | H100 | Apache 2.0 | +| `llama-3.1-70b-instruct` | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | +| `llama-3.1-8b-instruct` | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community | +| `llama-3-70b-instruct` | Meta | 8k tokens | Text | H100 | Llama 3 community | +| `llama-3.3-70b-instruct` | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community | +| `llama-3-nemotron-70b` | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community | +| `deepseek-r1-distill-70b` | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT | +| `deepseek-r1-distill-8b` | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 | +| `mistral-7b-instruct-v0.3` | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 | +| `mistral-small-24b-instruct-2501` | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 | +| `mistral-nemo-instruct-2407` | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 | +| `moshiko-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | +| `moshika-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | +| `wizardlm-70b-v1.0` | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community | +| `pixtral-12b-2409` | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 | +| `molmo-72b-0924` | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 | +| `qwen2.5-coder-32b-instruct` | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | +| `sentence-t5-xxl` | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | ## Model details + + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. + + +## Text models + -## Mixtral-8x7b-instruct-v0.1 -### Overview -| Attribute | Details | -|----------------------|---------------------------------------------------| -| Provider | [Mistral](https://mistral.ai/technology/#models) | -| Context Size | 32k tokens | -| Compatible Instances | H100 (FP8), H100-2 (BF16) | +### Mixtral-8x7b-instruct-v0.1 -### Model names + Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. + Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. -```bash -mistral/mixtral-8x7b-instruct-v0.1:fp8 -mistral/mixtral-8x7b-instruct-v0.1:bf16 -``` + | Attribute | Details | + |----------------------|---------| + | Provider | Mistral | + | Context Size | 32k tokens | + | License | Apache 2.0 | + | Compatible Instances | H100 | -### How to use (Text Inference) + #### Model names -```bash -curl -s \ --H "Authorization: Bearer " \ --H "Content-Type: application/json" \ ---request POST \ ---url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ ---data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8","messages":[{"role":"user","content":"Sing me a song about Scaleway"}],"max_tokens":200,"top_p":1,"temperature":1}' -``` + ```bash + mistral/mixtral-8x7b-instruct-v0.1:fp8 + mistral/mixtral-8x7b-instruct-v0.1:bf16 + ``` + #### Sending Inference requests - - Ideal for instructional content, multilingual understanding, and code generation. - + To perform inference tasks with your Mixtral model deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8", "messages":[{"role": "user","content": "Sing me a song about Scaleway"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + ### LLaMA 3.1 70B Instruct + + Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. + Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. + | Attribute | Details | + |----------------------|---------| + | Provider | Meta | + | Context Size | 32k tokens | + | License | Llama 3 community | + | Compatible Instances | H100 17k (FP8), H100-2 128k (FP8), 70k (BF16) | + + + #### Sending Managed Inference requests + + To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"meta/llama-3.1-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + -## Molmo-72b-0924 -### Overview +### Llama-3.1-8b-instruct model -| Attribute | Details | -|----------------------|------------------------------------------------------------------| -| Provider | [Allen Institute for AI](https://molmo.allenai.org/blog) | -| License | Apache 2.0 | -| Context Size | 50k tokens | -| Compatible Instances | H100-2 (FP8) | + Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. + Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. -### Model name + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Meta](https://llama.meta.com/llama3/) | + | License | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/) | + | Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) | + | Context Length | up to 128k tokens | -```bash -allenai/molmo-72b-0924:fp8 -``` + #### Model names -### How to use (Image + Text) + ```bash + meta/llama-3.1-8b-instruct:fp8 + meta/llama-3.1-8b-instruct:bf16 + ``` -```bash -curl -s \ --H "Authorization: Bearer " \ --H "Content-Type: application/json" \ ---request POST \ ---url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ ---data '{ - "model": "allenai/molmo-72b-0924:fp8", - "messages": [{ - "role": "user", - "content": [ - {"type": "text", "text": "Describe this image"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}} - ] - }], - "temperature": 0.7 -}' -``` + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L4 | 96k (FP8), 27k (BF16) | + | L40S | 128k (FP8, BF16) | + | H100 | 128k (FP8, BF16) | + | H100-2 | 128k (FP8, BF16) | + + #### Sending Managed Inference requests + + To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"meta/llama-3.1-8b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + + + +### Llama-3-70b-instruct + + Meta’s Llama 3 is an iteration of the open-access Llama family. + Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. + With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Meta](https://llama.meta.com/llama3/) | + | Compatible Instances | H100, H100-2 (FP8) | + | Context size | 8192 tokens | + + #### Model names + + ```bash + meta/llama-3-70b-instruct:fp8 + ``` + + #### Compatible Instances + + - [H100 (FP8)](https://www.scaleway.com/en/h100-pcie-try-it-now/) + - H100-2 (FP8) + + #### Sending Managed Inference requests + + To perform inference tasks with your Llama-3 deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"meta/llama-3-70b-instruct:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "max_tokens": 500, "top_p": 1, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + + + +### Llama-3.3-70b-instruct + + Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model. + This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Meta](https://www.llama.com/) | + | License | [Llama 3.3 community](https://www.llama.com/llama3_3/license/) | + | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) | + | Context length | Up to 131k tokens | + + #### Model names + + ```bash + meta/llama-3.3-70b-instruct:bf16 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | H100 | 15k (FP8) | + | H100-2 | 131k (FP8), 62k (BF16) | + + #### Sending Managed Inference requests + + To perform inference tasks with your Llama-3.3 deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"meta/llama-3.3-70b-instruct:bf16", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + + + +### Llama-3.1-Nemotron-70b-instruct + + Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. + NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Nvidia](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct) | + | License | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/) | | + | Compatible Instances | H100 (FP8), H100-2 (FP8) | + | Context Length | up to 128k tokens | + + #### Model names + + ```bash + meta/llama-3.1-nemotron-70b-instruct:fp8 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | H100 | 16k (FP8) | + | H100-2 | 128k (FP8) | + + + #### Sending Managed Inference requests + + To perform inference tasks with your Llama-3.1-Nemotron-70b-instruct deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"meta/llama-3.1-nemotron-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + + + +### DeepSeek-R1-Distill-Llama-70B + + Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. + DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | + | License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | + | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) | + | Context Length | up to 131k tokens | + + #### Model names + + ```bash + deepseek/deepseek-r1-distill-llama-70b:bf16 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | H100 | 15k (FP8) | + | H100-2 | 131k (FP8), 56k (BF16) | + + #### Sending Managed Inference requests + + To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"deepseek/deepseek-r1-distill-llama-70b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Ensure that the `messages` array is properly formatted with roles (user, assistant) and content. + + + + This model is better used without `system prompt`, as suggested by the model provider. + + + + + + +### DeepSeek-R1-Distill-Llama-8B + + Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. + DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. + + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | + | License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | + | Compatible Instances | L4, L40S, H100 (FP8, BF16) | + | Context Length | up to 131k tokens | + + #### Model names + + ```bash + deepseek/deepseek-r1-distill-llama-8b:bf16 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L4 | 90k (FP8), 39k (BF16) | + | L40S | 131k (FP8, BF16) | + | H100 | 131k (FP8, BF16) | + + #### Sending Managed Inference requests + + To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"deepseek/deepseek-r1-distill-llama-8b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Ensure that the `messages` array is properly formatted with roles (user, assistant) and content. + + + + This model is better used without `system prompt`, as suggested by the model provider. + + + + + + +### Mistral-7b-instruct-v0.3 + + The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. + This model is open-weight and distributed under the Apache 2.0 license. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Mistral](https://mistral.ai/technology/#models) | + | Compatible Instances | L4, L40S, H100, H100-2 (BF16) | + | Context size | 32K tokens | + + #### Model name + + ```bash + mistral/mistral-7b-instruct-v0.3:bf16 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L4 | 32k (BF16) | + | L40S | 32k (BF16) | + | H100 | 32k (BF16) | + | H100-2 | 32k (BF16) | + + #### Sending Inference requests + + To perform inference tasks with your Mistral model deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"mistral/mistral-7b-instruct-v0.3:bf16", "messages":[{"role": "user","content": "Explain Public Cloud in a nutshell."}], "top_p": 1, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + + + +### Mistral-small-24b-base-2501 + + Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. + This model is open-weight and distributed under the Apache 2.0 license. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Mistral](https://mistral.ai/technology/#models) | + | Compatible Instances | L40S, H100, H100-2 (FP8) | + | Context size | 32K tokens | + + #### Model name + + ```bash + mistral/mistral-small-24b-instruct-2501:fp8 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L40 | 20k (FP8) | + | H100 | 32k (FP8) | + | H100-2 | 32k (FP8) | + + #### Sending Inference requests + + To perform inference tasks with your Mistral model deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"mistral/mistral-small-24b-instruct-2501:fp8", "messages":[{"role": "user","content": "Tell me about Scaleway."}], "top_p": 1, "temperature": 0.7, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + + + +### Mistral-nemo-instruct-2407 + + Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. + This model is open-weight and distributed under the Apache 2.0 license. + It was trained on a large proportion of multilingual and code data. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Mistral](https://mistral.ai/technology/#models) | + | Compatible Instances | L40S, H100, H100-2 (FP8) | + | Context size | 128K tokens | + + #### Model name + + ```bash + mistral/mistral-nemo-instruct-2407:fp8 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L40 | 128k (FP8) | + | H100 | 128k (FP8) | + | H100-2 | 128k (FP8) | + + #### Sending Inference requests + + + Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. It is recommend to use a temperature of 0.35. + + + To perform inference tasks with your Mistral model deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"mistral/mistral-nemo-instruct-2407:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "top_p": 1, "temperature": 0.35, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + - - Known limitations: No system role, no structured output (`response_format`), and supports 1 image max per request. - + + +### Moshiko-0.1-8b + + Kyutai's Moshi is a speech-text foundation model for real-time dialogue. + Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. + While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. + Moshiko is the variant of Moshi with a male voice in English. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Kyutai](https://github.com/kyutai-labs/moshi) | + | Compatible Instances | L4, H100 (FP8, BF16) | + | Context size | 4096 tokens | + + #### Model names + + ```bash + kyutai/moshiko-0.1-8b:bf16 + kyutai/moshiko-0.1-8b:fp8 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L4 | 4096 (FP8, BF16) | + | H100 | 4096 (FP8, BF16) | + + #### How to use it + + To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint: + + ```bash + wss://.ifr.fr-par.scaleway.com/api/chat + ``` + + #### Testing the WebSocket endpoint + + To test the endpoint, use the following command: + + ```bash + curl -i --http1.1 \ + -H "Authorization: Bearer " \ + -H "Connection: Upgrade" \ + -H "Upgrade: websocket" \ + -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \ + -H "Sec-WebSocket-Version: 13" \ + --url "https://.ifr.fr-par.scaleway.com/api/chat" + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser). + + + The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection. + + #### Interacting with the model + + We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface. + Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples). + This repository contains instructions on how to run the code samples and interact with the model. + + + + + +### Moshika-0.1-8b + + + Kyutai's Moshi is a speech-text foundation model for real-time dialogue. + Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. + While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. + Moshiko is the variant of Moshi with a male voice in English. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Kyutai](https://github.com/kyutai-labs/moshi) | + | Compatible Instances | L4, H100 (FP8, BF16) | + | Context size | 4096 tokens | + + #### Model names + + ```bash + kyutai/moshiko-0.1-8b:bf16 + kyutai/moshiko-0.1-8b:fp8 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L4 | 4096 (FP8, BF16) | + | H100 | 4096 (FP8, BF16) | + + #### How to use it + + To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint: + + ```bash + wss://.ifr.fr-par.scaleway.com/api/chat + ``` + + #### Testing the WebSocket endpoint + + To test the endpoint, use the following command: + + ```bash + curl -i --http1.1 \ + -H "Authorization: Bearer " \ + -H "Connection: Upgrade" \ + -H "Upgrade: websocket" \ + -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \ + -H "Sec-WebSocket-Version: 13" \ + --url "https://.ifr.fr-par.scaleway.com/api/chat" + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser). + + + The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection. + + #### Interacting with the model + + We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface. + Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples). + This repository contains instructions on how to run the code samples and interact with the model. + + + + + +### WizardLM-70B-V1.0 + + WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants. + With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [WizardLM](https://wizardlm.github.io/WizardLM2/) | + | Compatible Instances | H100 (FP8) - H100-2 (FP16) | + | Context size | 4,096 tokens | + + #### Model names + + ```bash + wizardlm/wizardlm-70b-v1.0:fp8 + wizardlm/wizardlm-70b-v1.0:fp16 + ``` + + #### Compatible Instances + + - [H100-1 (INT8)](https://www.scaleway.com/en/h100-pcie-try-it-now/) + - [H100-2 (FP16)](https://www.scaleway.com/en/h100-pcie-try-it-now/) + + #### Sending Inference requests + + To perform inference tasks with your WizardLM model deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"wizardlm/wizardlm-70b-v1.0:fp8", "messages":[{"role": "user","content": "Say hello to Scaleway's Inference"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + +## Multimodal models + + + +### Pixtral-12b-2409 + + Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. + It can analyze images and offer insights from visual content alongside text. + This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. + + Pixtral is open-weight and distributed under the Apache 2.0 license. + + + Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + + + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Mistral](https://mistral.ai/technology/#models) | + | Compatible Instances | L40S, H100, H100-2 (bf16) | + | Context size | 128k tokens | + + #### Model name + + ```bash + mistral/pixtral-12b-2409:bf16 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L40S | 50k (BF16) + | H100 | 128k (BF16) + | H100-2 | 128k (BF16) + + #### Sending Inference requests + + + Unlike previous Mistral models, Pixtral can take an `image_url` in the content array. + + + To perform inference tasks with your Pixtral model deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scw.cloud/v1/chat/completions" \ + --data '{ + "model": "mistral/pixtral-12b-2409:bf16", + "messages": [ + { + "role": "user", + "content": [ + {"type" : "text", "text": "Describe this image in detail please."}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + {"type" : "text", "text": "and this one as well."}, + {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}} + ] + } + ], + "top_p": 1, + "temperature": 0.7, + "stream": false + }' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + #### Passing images to Pixtral + + 1. Image URLs + If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding. + + 2. Base64 encoded image + Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet. + + The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. + + + ```python + import base64 + from io import BytesIO + from PIL import Image + + def encode_image(img): + buffered = BytesIO() + img.save(buffered, format="JPEG") + encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8") + return encoded_string + + img = Image.open("path_to_your_image.jpg") + base64_img = encode_image(img) + + payload = { + "messages": [ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + { + "type": "image_url", + "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}, + }, + ], + } + ], + ... # other parameters + } + + ``` + + #### Receiving Managed Inference responses + + Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. + Process the output data according to your application's needs. The response will contain the output generated by the visual language model based on the input provided in the request. + + + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. + + + #### Frequently Asked Questions + + ##### What types of images are supported by Pixtral? + - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular. + - Vector image formats (SVG, PSD) are not supported. + + ##### Are other files supported? + Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported. + + ##### Is there a limit to the size of each image? + Images size are limited: + - Directly by the maximum context window. As an example, since tokens are squares of 16x16 pixels, the maximum context window taken by a single image is `4096` tokens (ie. `(1024*1024)/(16*16)`) + - Indirectly by the model accuracy: resolution above 1024x1024 will not increase model output accuracy. Indeed, images above 1024 pixels width or height will be automatically downscaled to fit within 1024x1024 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionaly compressed). + + ##### What is the maximum amount of images per conversation? + One conversation can handle up to 12 images (per request). The 13rd will return a 413 error. + + + + + +### Molmo-72b-0924 + + Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. + Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. + + Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. + Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)). + + + Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Allen Institute for AI](https://molmo.allenai.org/blog) | + | License | Apache 2.0 | | + | Compatible Instances | H100-2 (FP8) | + | Context size | 50k tokens | + + #### Model name + + ```bash + allenai/molmo-72b-0924:fp8 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | H100-2 | 50k (FP8) + + #### Sending inference requests + + + Unlike regular chat models, Molmo-72b can take an `image_url` in the content array. + + + To perform inference tasks with your Molmo-72b model deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scw.cloud/v1/chat/completions" \ + --data '{ + "model": "allenai/molmo-72b-0924:fp8", + "messages": [ + { + "role": "user", + "content": [ + {"type" : "text", "text": "Describe this image in detail please."}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}} + ] + } + ], + "top_p": 1, + "temperature": 0.7, + "stream": false + }' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + #### Passing images to Molmo-72b + + ##### Image URLs + If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding. + + ##### Base64 encoded image + Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet. + + The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. + + ```python + import base64 + from io import BytesIO + from PIL import Image + + def encode_image(img): + buffered = BytesIO() + img.save(buffered, format="JPEG") + encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8") + return encoded_string + + img = Image.open("path_to_your_image.jpg") + base64_img = encode_image(img) + + payload = { + "messages": [ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + { + "type": "image_url", + "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}, + }, + ], + } + ], + ... # other parameters + } + + ``` + + #### Frequently Asked Questions + + ##### What types of images are supported by Molmo-72b? + - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular. + - Vector image formats (SVG, PSD) are not supported. + + ##### Are other file types supported? + Only bitmaps can be analyzed by Molmo. PDFs and videos are not supported. + + ##### Is there a limit to the size of each image? + The only limitation is the context window (1 token for each 16x16 pixel). + + ##### What is the maximum amount of images per conversation? + One conversation can handle a maximum of 1 image (per request). Sending more than one image will return a 400 error. + + + +## Code models + + + +### Qwen2.5-coder-32b-instruct + + Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. + With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [Qwen](https://qwenlm.github.io/) | + | License | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) | + | Compatible Instances | H100, H100-2 (INT8) | + | Context Length | up to 32k tokens | + + #### Model names + + ```bash + qwen/qwen2.5-coder-32b-instruct:int8 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | H100 | 32k (INT8) + | H100-2 | 32k (INT8) + + #### Sending Managed Inference requests + + To perform inference tasks with your Qwen2.5-coder deployed at Scaleway, use the following command: + + ```bash + curl -s \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + --request POST \ + --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ + --data '{"model":"qwen/qwen2.5-coder-32b-instruct:int8", "messages":[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful code assistant."},{"role": "user","content": "Write a quick sort algorithm."}], "max_tokens": 1000, "temperature": 0.8, "stream": false}' + ``` + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + + + +## Embeddings models + + + +### Sentence-t5-xxl + + The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. + Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. + This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal. + + | Attribute | Details | + |-----------------|------------------------------------| + | Provider | [sentence-transformers](https://www.sbert.net/) | + | Compatible Instances | L4 (FP32) | + | Context size | 512 tokens | + + #### Model name + + ```bash + sentence-transformers/sentence-t5-xxl:fp32 + ``` + + #### Compatible Instances + + | Instance type | Max context length | + | ------------- |-------------| + | L4 | 512 (FP32) | + + #### Sending Managed Inference requests + + To perform inference tasks with your Embedding model deployed at Scaleway, use the following command: + + ```bash + curl https://.ifr.fr-par.scaleway.com/v1/embeddings \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + -d '{ + "input": "Embeddings can represent text in a numerical format.", + "model": "sentence-transformers/sentence-t5-xxl:fp32" + }' + ``` + + Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + \ No newline at end of file From 68313df0e0127b7395125341cf686dd336e9f4d7 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 16 Apr 2025 18:01:17 +0200 Subject: [PATCH 03/16] docs(infr): update --- .../reference-content/models.mdx | 1211 +++-------------- 1 file changed, 198 insertions(+), 1013 deletions(-) diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx index f1d04426b0..4324a44bde 100644 --- a/pages/managed-inference/reference-content/models.mdx +++ b/pages/managed-inference/reference-content/models.mdx @@ -19,24 +19,24 @@ A quick overview of available models in Scaleway's catalog and their core attrib | Model Name | Provider | Context Size | Modalities | Instances | License | |------------|----------|--------------|------------|-----------|---------| -| `mixtral-8x7b-instruct-v0.1` | Mistral | 32k tokens | Text | H100 | Apache 2.0 | -| `llama-3.1-70b-instruct` | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | -| `llama-3.1-8b-instruct` | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community | -| `llama-3-70b-instruct` | Meta | 8k tokens | Text | H100 | Llama 3 community | -| `llama-3.3-70b-instruct` | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community | -| `llama-3-nemotron-70b` | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community | -| `deepseek-r1-distill-70b` | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT | -| `deepseek-r1-distill-8b` | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 | -| `mistral-7b-instruct-v0.3` | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 | -| `mistral-small-24b-instruct-2501` | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 | -| `mistral-nemo-instruct-2407` | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 | -| `moshiko-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | -| `moshika-0.1-8b` | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | -| `wizardlm-70b-v1.0` | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community | -| `pixtral-12b-2409` | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 | -| `molmo-72b-0924` | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 | -| `qwen2.5-coder-32b-instruct` | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | -| `sentence-t5-xxl` | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | +| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 | +| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | +| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community | +| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community | +| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community | +| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community | +| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT | +| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 | +| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 | +| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 | +| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 | +| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | +| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | +| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community | +| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 | +| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 | +| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | +| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | ## Model details @@ -46,1123 +46,308 @@ A quick overview of available models in Scaleway's catalog and their core attrib ## Text models - - ### Mixtral-8x7b-instruct-v0.1 - Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. - Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. - - | Attribute | Details | - |----------------------|---------| - | Provider | Mistral | - | Context Size | 32k tokens | - | License | Apache 2.0 | - | Compatible Instances | H100 | - - #### Model names - - ```bash - mistral/mixtral-8x7b-instruct-v0.1:fp8 - mistral/mixtral-8x7b-instruct-v0.1:bf16 - ``` - #### Sending Inference requests - - To perform inference tasks with your Mixtral model deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"mistral/mixtral-8x7b-instruct-v0.1:fp8", "messages":[{"role": "user","content": "Sing me a song about Scaleway"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - The model name allows Scaleway to put your prompts in the expected format. - - - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - - - - ### LLaMA 3.1 70B Instruct - - Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. - Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. - - | Attribute | Details | - |----------------------|---------| - | Provider | Meta | - | Context Size | 32k tokens | - | License | Llama 3 community | - | Compatible Instances | H100 17k (FP8), H100-2 128k (FP8), 70k (BF16) | - +Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. +Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. - #### Sending Managed Inference requests +- Structured output supported: Yes +- Function calling: No +- Supported languages: English, French, German, Spanish - To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command: +#### Model names - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"meta/llama-3.1-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - The model name allows Scaleway to put your prompts in the expected format. - - - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - - - - -### Llama-3.1-8b-instruct model - - Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. - Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. +``` +mistral/mixtral-8x7b-instruct-v0.1:fp8 +mistral/mixtral-8x7b-instruct-v0.1:bf16 +``` - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Meta](https://llama.meta.com/llama3/) | - | License | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/) | - | Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) | - | Context Length | up to 128k tokens | +### Llama-3.1-70b-instruct - #### Model names +Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. +Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. - ```bash - meta/llama-3.1-8b-instruct:fp8 - meta/llama-3.1-8b-instruct:bf16 - ``` - - #### Compatible Instances +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - | Instance type | Max context length | - | ------------- |-------------| - | L4 | 96k (FP8), 27k (BF16) | - | L40S | 128k (FP8, BF16) | - | H100 | 128k (FP8, BF16) | - | H100-2 | 128k (FP8, BF16) | +#### Model names - #### Sending Managed Inference requests +``` +meta/llama-3.1-70b-instruct:fp8 +meta/llama-3.1-70b-instruct:bf16 +``` - To perform inference tasks with your Llama-3.1 deployed at Scaleway, use the following command: +### Llama-3.1-8b-instruct - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"meta/llama-3.1-8b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' - ``` +Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. +Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - The model name allows Scaleway to put your prompts in the expected format. - +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - +#### Model names - +``` +meta/llama-3.1-8b-instruct:fp8 +meta/llama-3.1-8b-instruct:bf16 +``` - ### Llama-3-70b-instruct - Meta’s Llama 3 is an iteration of the open-access Llama family. - Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. - With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities. +Meta’s Llama 3 is an iteration of the open-access Llama family. +Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. +With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities. - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Meta](https://llama.meta.com/llama3/) | - | Compatible Instances | H100, H100-2 (FP8) | - | Context size | 8192 tokens | +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - #### Model names + #### Model name - ```bash - meta/llama-3-70b-instruct:fp8 ``` - - #### Compatible Instances - - - [H100 (FP8)](https://www.scaleway.com/en/h100-pcie-try-it-now/) - - H100-2 (FP8) - - #### Sending Managed Inference requests - - To perform inference tasks with your Llama-3 deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"meta/llama-3-70b-instruct:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "max_tokens": 500, "top_p": 1, "temperature": 0.7, "stream": false}' + meta/llama-3-70b-instruct:fp8 ``` - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - The model name allows Scaleway to put your prompts in the expected format. - - - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - - - ### Llama-3.3-70b-instruct - Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model. - This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Meta](https://www.llama.com/) | - | License | [Llama 3.3 community](https://www.llama.com/llama3_3/license/) | - | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) | - | Context length | Up to 131k tokens | - - #### Model names - - ```bash - meta/llama-3.3-70b-instruct:bf16 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | H100 | 15k (FP8) | - | H100-2 | 131k (FP8), 62k (BF16) | - - #### Sending Managed Inference requests - - To perform inference tasks with your Llama-3.3 deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"meta/llama-3.3-70b-instruct:bf16", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. +Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model. +This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications. - - The model name allows Scaleway to put your prompts in the expected format. - +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - +#### Model name - - - +``` +meta/llama-3.3-70b-instruct:bf16 +``` ### Llama-3.1-Nemotron-70b-instruct - Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. - NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Nvidia](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct) | - | License | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/) | | - | Compatible Instances | H100 (FP8), H100-2 (FP8) | - | Context Length | up to 128k tokens | - - #### Model names - - ```bash - meta/llama-3.1-nemotron-70b-instruct:fp8 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | H100 | 16k (FP8) | - | H100-2 | 128k (FP8) | - - - #### Sending Managed Inference requests - - To perform inference tasks with your Llama-3.1-Nemotron-70b-instruct deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"meta/llama-3.1-nemotron-70b-instruct:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - The model name allows Scaleway to put your prompts in the expected format. - +Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. +NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses. - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify) - +#### Model name - +``` +meta/llama-3.1-nemotron-70b-instruct:fp8 +``` ### DeepSeek-R1-Distill-Llama-70B - Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. - DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks. +Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. +DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks. - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | - | License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | - | Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) | - | Context Length | up to 131k tokens | +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, Simplified Chinese +#### Model name - #### Model names - - ```bash - deepseek/deepseek-r1-distill-llama-70b:bf16 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | H100 | 15k (FP8) | - | H100-2 | 131k (FP8), 56k (BF16) | - - #### Sending Managed Inference requests - - To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"deepseek/deepseek-r1-distill-llama-70b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - Ensure that the `messages` array is properly formatted with roles (user, assistant) and content. - - - - This model is better used without `system prompt`, as suggested by the model provider. - - - - - +``` +deepseek/deepseek-r1-distill-llama-70b:bf16 +``` ### DeepSeek-R1-Distill-Llama-8B - Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. - DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. - - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | - | License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | - | Compatible Instances | L4, L40S, H100 (FP8, BF16) | - | Context Length | up to 131k tokens | - - #### Model names +Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. +DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. - ```bash - deepseek/deepseek-r1-distill-llama-8b:bf16 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | L4 | 90k (FP8), 39k (BF16) | - | L40S | 131k (FP8, BF16) | - | H100 | 131k (FP8, BF16) | - - #### Sending Managed Inference requests - - To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"deepseek/deepseek-r1-distill-llama-8b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - Ensure that the `messages` array is properly formatted with roles (user, assistant) and content. - +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, Simplified Chinese - - This model is better used without `system prompt`, as suggested by the model provider. - +#### Model names - - - +``` +deepseek/deepseek-r1-distill-llama-8b:bf16 +``` ### Mistral-7b-instruct-v0.3 - The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. - This model is open-weight and distributed under the Apache 2.0 license. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Mistral](https://mistral.ai/technology/#models) | - | Compatible Instances | L4, L40S, H100, H100-2 (BF16) | - | Context size | 32K tokens | - - #### Model name - - ```bash - mistral/mistral-7b-instruct-v0.3:bf16 - ``` - - #### Compatible Instances +The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. +This model is open-weight and distributed under the Apache 2.0 license. - | Instance type | Max context length | - | ------------- |-------------| - | L4 | 32k (BF16) | - | L40S | 32k (BF16) | - | H100 | 32k (BF16) | - | H100-2 | 32k (BF16) | +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English - #### Sending Inference requests +#### Model name - To perform inference tasks with your Mistral model deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"mistral/mistral-7b-instruct-v0.3:bf16", "messages":[{"role": "user","content": "Explain Public Cloud in a nutshell."}], "top_p": 1, "temperature": 0.7, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - The model name allows Scaleway to put your prompts in the expected format. - - - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - - - +``` +mistral/mistral-7b-instruct-v0.3:bf16 +``` ### Mistral-small-24b-base-2501 - Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. - This model is open-weight and distributed under the Apache 2.0 license. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Mistral](https://mistral.ai/technology/#models) | - | Compatible Instances | L40S, H100, H100-2 (FP8) | - | Context size | 32K tokens | - - #### Model name - - ```bash - mistral/mistral-small-24b-instruct-2501:fp8 - ``` - - #### Compatible Instances +Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. +This model is open-weight and distributed under the Apache 2.0 license. - | Instance type | Max context length | - | ------------- |-------------| - | L40 | 20k (FP8) | - | H100 | 32k (FP8) | - | H100-2 | 32k (FP8) | - - #### Sending Inference requests - - To perform inference tasks with your Mistral model deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"mistral/mistral-small-24b-instruct-2501:fp8", "messages":[{"role": "user","content": "Tell me about Scaleway."}], "top_p": 1, "temperature": 0.7, "stream": false}' - ``` +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. +#### Model name - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - - - +``` +mistral/mistral-small-24b-instruct-2501:fp8 +``` ### Mistral-nemo-instruct-2407 - Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. - This model is open-weight and distributed under the Apache 2.0 license. - It was trained on a large proportion of multilingual and code data. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Mistral](https://mistral.ai/technology/#models) | - | Compatible Instances | L40S, H100, H100-2 (FP8) | - | Context size | 128K tokens | - - #### Model name - - ```bash - mistral/mistral-nemo-instruct-2407:fp8 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | L40 | 128k (FP8) | - | H100 | 128k (FP8) | - | H100-2 | 128k (FP8) | - - #### Sending Inference requests - - - Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. It is recommend to use a temperature of 0.35. - - - To perform inference tasks with your Mistral model deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"mistral/mistral-nemo-instruct-2407:fp8", "messages":[{"role": "user","content": "Sing me a song about Xavier Niel"}], "top_p": 1, "temperature": 0.35, "stream": false}' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. +Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. +This model is open-weight and distributed under the Apache 2.0 license. +It was trained on a large proportion of multilingual and code data. - - The model name allows Scaleway to put your prompts in the expected format. - +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - +#### Model name - - - +``` +mistral/mistral-nemo-instruct-2407:fp8 +``` ### Moshiko-0.1-8b - Kyutai's Moshi is a speech-text foundation model for real-time dialogue. - Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. - While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. - Moshiko is the variant of Moshi with a male voice in English. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Kyutai](https://github.com/kyutai-labs/moshi) | - | Compatible Instances | L4, H100 (FP8, BF16) | - | Context size | 4096 tokens | - - #### Model names - - ```bash - kyutai/moshiko-0.1-8b:bf16 - kyutai/moshiko-0.1-8b:fp8 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | L4 | 4096 (FP8, BF16) | - | H100 | 4096 (FP8, BF16) | - - #### How to use it - - To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint: - - ```bash - wss://.ifr.fr-par.scaleway.com/api/chat - ``` - - #### Testing the WebSocket endpoint - - To test the endpoint, use the following command: - - ```bash - curl -i --http1.1 \ - -H "Authorization: Bearer " \ - -H "Connection: Upgrade" \ - -H "Upgrade: websocket" \ - -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \ - -H "Sec-WebSocket-Version: 13" \ - --url "https://.ifr.fr-par.scaleway.com/api/chat" - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. +Kyutai's Moshi is a speech-text foundation model for real-time dialogue. +Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. +While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. +Moshiko is the variant of Moshi with a male voice in English. - - Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser). - +- Structured output supported: No +- Function calling: No +- Supported languages: English - The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection. +#### Model names - #### Interacting with the model - - We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface. - Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples). - This repository contains instructions on how to run the code samples and interact with the model. - - - - +``` +kyutai/moshiko-0.1-8b:bf16 +kyutai/moshiko-0.1-8b:fp8 +``` ### Moshika-0.1-8b +Kyutai's Moshi is a speech-text foundation model for real-time dialogue. +Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. +While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. +Moshika is the variant of Moshi with a female voice in English. - Kyutai's Moshi is a speech-text foundation model for real-time dialogue. - Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. - While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. - Moshiko is the variant of Moshi with a male voice in English. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Kyutai](https://github.com/kyutai-labs/moshi) | - | Compatible Instances | L4, H100 (FP8, BF16) | - | Context size | 4096 tokens | - - #### Model names - - ```bash - kyutai/moshiko-0.1-8b:bf16 - kyutai/moshiko-0.1-8b:fp8 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | L4 | 4096 (FP8, BF16) | - | H100 | 4096 (FP8, BF16) | - - #### How to use it - - To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint: - - ```bash - wss://.ifr.fr-par.scaleway.com/api/chat - ``` - - #### Testing the WebSocket endpoint +- Structured output supported: No +- Function calling: No +- Supported languages: English - To test the endpoint, use the following command: - - ```bash - curl -i --http1.1 \ - -H "Authorization: Bearer " \ - -H "Connection: Upgrade" \ - -H "Upgrade: websocket" \ - -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \ - -H "Sec-WebSocket-Version: 13" \ - --url "https://.ifr.fr-par.scaleway.com/api/chat" - ``` +#### Model names - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser). - - - The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection. - - #### Interacting with the model - - We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface. - Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples). - This repository contains instructions on how to run the code samples and interact with the model. - - - - +``` +kyutai/moshika-0.1-8b:bf16 +kyutai/moshika-0.1-8b:fp8 +``` ### WizardLM-70B-V1.0 - WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants. - With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [WizardLM](https://wizardlm.github.io/WizardLM2/) | - | Compatible Instances | H100 (FP8) - H100-2 (FP16) | - | Context size | 4,096 tokens | - - #### Model names - - ```bash - wizardlm/wizardlm-70b-v1.0:fp8 - wizardlm/wizardlm-70b-v1.0:fp16 - ``` - - #### Compatible Instances - - - [H100-1 (INT8)](https://www.scaleway.com/en/h100-pcie-try-it-now/) - - [H100-2 (FP16)](https://www.scaleway.com/en/h100-pcie-try-it-now/) - - #### Sending Inference requests - - To perform inference tasks with your WizardLM model deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"wizardlm/wizardlm-70b-v1.0:fp8", "messages":[{"role": "user","content": "Say hello to Scaleway's Inference"}], "max_tokens": 200, "top_p": 1, "temperature": 1, "stream": false}' - ``` +WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants. +With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors. - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. +- Structured output supported: Yes +- Function calling: No +- Supported languages: English (to be verified) - - The model name allows Scaleway to put your prompts in the expected format. - +#### Model names - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - +``` +wizardlm/wizardlm-70b-v1.0:fp8 +wizardlm/wizardlm-70b-v1.0:fp16 +``` ## Multimodal models - - ### Pixtral-12b-2409 - Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. - It can analyze images and offer insights from visual content alongside text. - This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. +Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. +It can analyze images and offer insights from visual content alongside text. +This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. - Pixtral is open-weight and distributed under the Apache 2.0 license. - - - Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. - +Pixtral is open-weight and distributed under the Apache 2.0 license. + + Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Mistral](https://mistral.ai/technology/#models) | - | Compatible Instances | L40S, H100, H100-2 (bf16) | - | Context size | 128k tokens | +- Structured output supported: Yes +- Function calling: No +- Supported languages: English, French, German, Spanish (to be verified) #### Model name - ```bash - mistral/pixtral-12b-2409:bf16 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | L40S | 50k (BF16) - | H100 | 128k (BF16) - | H100-2 | 128k (BF16) - - #### Sending Inference requests - - - Unlike previous Mistral models, Pixtral can take an `image_url` in the content array. - - - To perform inference tasks with your Pixtral model deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scw.cloud/v1/chat/completions" \ - --data '{ - "model": "mistral/pixtral-12b-2409:bf16", - "messages": [ - { - "role": "user", - "content": [ - {"type" : "text", "text": "Describe this image in detail please."}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - {"type" : "text", "text": "and this one as well."}, - {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}} - ] - } - ], - "top_p": 1, - "temperature": 0.7, - "stream": false - }' ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - The model name allows Scaleway to put your prompts in the expected format. - - - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - #### Passing images to Pixtral - - 1. Image URLs - If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding. - - 2. Base64 encoded image - Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet. - - The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. - - - ```python - import base64 - from io import BytesIO - from PIL import Image - - def encode_image(img): - buffered = BytesIO() - img.save(buffered, format="JPEG") - encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8") - return encoded_string - - img = Image.open("path_to_your_image.jpg") - base64_img = encode_image(img) - - payload = { - "messages": [ - { - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - { - "type": "image_url", - "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}, - }, - ], - } - ], - ... # other parameters - } - + mistral/pixtral-12b-2409:bf16 ``` - #### Receiving Managed Inference responses - - Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. - Process the output data according to your application's needs. The response will contain the output generated by the visual language model based on the input provided in the request. - - - Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. - - - #### Frequently Asked Questions - - ##### What types of images are supported by Pixtral? - - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular. - - Vector image formats (SVG, PSD) are not supported. - - ##### Are other files supported? - Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported. - - ##### Is there a limit to the size of each image? - Images size are limited: - - Directly by the maximum context window. As an example, since tokens are squares of 16x16 pixels, the maximum context window taken by a single image is `4096` tokens (ie. `(1024*1024)/(16*16)`) - - Indirectly by the model accuracy: resolution above 1024x1024 will not increase model output accuracy. Indeed, images above 1024 pixels width or height will be automatically downscaled to fit within 1024x1024 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionaly compressed). - - ##### What is the maximum amount of images per conversation? - One conversation can handle up to 12 images (per request). The 13rd will return a 413 error. - - - - - ### Molmo-72b-0924 - Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. - Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. - - Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. - Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)). - - - Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. - - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Allen Institute for AI](https://molmo.allenai.org/blog) | - | License | Apache 2.0 | | - | Compatible Instances | H100-2 (FP8) | - | Context size | 50k tokens | - - #### Model name - - ```bash - allenai/molmo-72b-0924:fp8 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | H100-2 | 50k (FP8) - - #### Sending inference requests - - - Unlike regular chat models, Molmo-72b can take an `image_url` in the content array. - - - To perform inference tasks with your Molmo-72b model deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scw.cloud/v1/chat/completions" \ - --data '{ - "model": "allenai/molmo-72b-0924:fp8", - "messages": [ - { - "role": "user", - "content": [ - {"type" : "text", "text": "Describe this image in detail please."}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}} - ] - } - ], - "top_p": 1, - "temperature": 0.7, - "stream": false - }' - ``` - - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. - - - The model name allows Scaleway to put your prompts in the expected format. - - - #### Passing images to Molmo-72b - - ##### Image URLs - If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding. - - ##### Base64 encoded image - Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet. - - The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. +Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. +Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. - ```python - import base64 - from io import BytesIO - from PIL import Image +Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. +Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)). - def encode_image(img): - buffered = BytesIO() - img.save(buffered, format="JPEG") - encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8") - return encoded_string - - img = Image.open("path_to_your_image.jpg") - base64_img = encode_image(img) - - payload = { - "messages": [ - { - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - { - "type": "image_url", - "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}, - }, - ], - } - ], - ... # other parameters - } - - ``` - - #### Frequently Asked Questions - - ##### What types of images are supported by Molmo-72b? - - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular. - - Vector image formats (SVG, PSD) are not supported. - - ##### Are other file types supported? - Only bitmaps can be analyzed by Molmo. PDFs and videos are not supported. + + Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + - ##### Is there a limit to the size of each image? - The only limitation is the context window (1 token for each 16x16 pixel). +- Structured output supported: Yes +- Function calling: No +- Supported languages: English, French, German, Spanish (to be verified) - ##### What is the maximum amount of images per conversation? - One conversation can handle a maximum of 1 image (per request). Sending more than one image will return a 400 error. +#### Model name - +``` +allenai/molmo-72b-0924:fp8 +``` ## Code models - - ### Qwen2.5-coder-32b-instruct - Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. - With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. +Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. +With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [Qwen](https://qwenlm.github.io/) | - | License | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) | - | Compatible Instances | H100, H100-2 (INT8) | - | Context Length | up to 32k tokens | +- Structured output supported: Yes +- Function calling: Yes +- Supported languages: over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic. - #### Model names +#### Model name - ```bash - qwen/qwen2.5-coder-32b-instruct:int8 ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | H100 | 32k (INT8) - | H100-2 | 32k (INT8) - - #### Sending Managed Inference requests - - To perform inference tasks with your Qwen2.5-coder deployed at Scaleway, use the following command: - - ```bash - curl -s \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - --request POST \ - --url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ - --data '{"model":"qwen/qwen2.5-coder-32b-instruct:int8", "messages":[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful code assistant."},{"role": "user","content": "Write a quick sort algorithm."}], "max_tokens": 1000, "temperature": 0.8, "stream": false}' + qwen/qwen2.5-coder-32b-instruct:int8 ``` - - The model name allows Scaleway to put your prompts in the expected format. - - - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - - - - ## Embeddings models - - ### Sentence-t5-xxl - The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. - Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. - This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal. - - | Attribute | Details | - |-----------------|------------------------------------| - | Provider | [sentence-transformers](https://www.sbert.net/) | - | Compatible Instances | L4 (FP32) | - | Context size | 512 tokens | - - #### Model name +The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. +Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. +This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal. - ```bash - sentence-transformers/sentence-t5-xxl:fp32 - ``` - - #### Compatible Instances - - | Instance type | Max context length | - | ------------- |-------------| - | L4 | 512 (FP32) | - - #### Sending Managed Inference requests - - To perform inference tasks with your Embedding model deployed at Scaleway, use the following command: - - ```bash - curl https://.ifr.fr-par.scaleway.com/v1/embeddings \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - -d '{ - "input": "Embeddings can represent text in a numerical format.", - "model": "sentence-transformers/sentence-t5-xxl:fp32" - }' - ``` +- Structured output supported: No +- Function calling: No +- Supported languages: English (to be verified) - Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. +#### Model name - \ No newline at end of file +``` +sentence-transformers/sentence-t5-xxl:fp32 +``` \ No newline at end of file From 9a6419f870ef253e4337f14d6005215917d77d4c Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Thu, 17 Apr 2025 11:10:16 +0200 Subject: [PATCH 04/16] docs(infr): add table --- .../reference-content/models.mdx | 24 ++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx index 4324a44bde..93dd58b49d 100644 --- a/pages/managed-inference/reference-content/models.mdx +++ b/pages/managed-inference/reference-content/models.mdx @@ -17,7 +17,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib ## Summary table -| Model Name | Provider | Context Size | Modalities | Instances | License | +| Model Name | Provider | Context Length | Modalities | Instances | License | |------------|----------|--------------|------------|-----------|---------| | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 | | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | @@ -38,6 +38,28 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | + +| Model Name | Structured output supported | Function calling | Supported languages | +| --- | --- | --- | --- | +| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish | +| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) | +| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese | +| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese | +| `Mistral-7b-instruct-v0.3` | Yes | Yes | English | +| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish | +| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi | +| `Moshiko-0.1-8b` | No | No | English | +| `Moshika-0.1-8b` | No | No | English | +| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) | +| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) | +| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) | +| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | +| `Sentence-t5-xxl` | No | No | English (to be verified) | + ## Model details From 236bc4189e77e0828b08ab81014fead9fb11391d Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Thu, 17 Apr 2025 13:45:26 +0200 Subject: [PATCH 05/16] docs(infr): test --- .../reference-content/models2.mdx | 366 ++++++++++++++++++ 1 file changed, 366 insertions(+) create mode 100644 pages/managed-inference/reference-content/models2.mdx diff --git a/pages/managed-inference/reference-content/models2.mdx b/pages/managed-inference/reference-content/models2.mdx new file mode 100644 index 0000000000..1eb2e0bc7e --- /dev/null +++ b/pages/managed-inference/reference-content/models2.mdx @@ -0,0 +1,366 @@ +--- +meta: + title: Managed Inference model catalog + description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Managed Inference model catalog + paragraph: This page provides information on the Scaleway Managed Inference product catalog +tags: +dates: + validation: 2025-03-19 + posted: 2024-05-28 +categories: + - ai-data +--- +A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities. + +## Summary table +| Model Name | Provider | Context Length | Modalities | Instances | License | +|------------|----------|--------------|------------|-----------|---------| +| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 | +| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | +| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community | +| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community | +| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community | +| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | Lllama 3.3 community | +| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT | +| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 | +| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 | +| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 | +| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 | +| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | +| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | +| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community | +| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 | +| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 | +| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | +| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | + +| Model Name | Structured output supported | Function calling | Supported languages | +| --- | --- | --- | --- | +| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish | +| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) | +| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese | +| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese | +| `Mistral-7b-instruct-v0.3` | Yes | Yes | English | +| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish | +| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi | +| `Moshiko-0.1-8b` | No | No | English | +| `Moshika-0.1-8b` | No | No | English | +| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) | +| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) | +| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) | +| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | +| `Sentence-t5-xxl` | No | No | English (to be verified) | + +## Model details + + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. + + +## Text models + +### Mixtral-8x7b-instruct-v0.1 +Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. +Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | No | +| Supported languages | English, French, German, Spanish | + +#### Model names +``` +mistral/mixtral-8x7b-instruct-v0.1:fp8 +mistral/mixtral-8x7b-instruct-v0.1:bf16 +``` + +### Llama-3.1-70b-instruct +Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. +Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | + +#### Model names +``` +meta/llama-3.1-70b-instruct:fp8 +meta/llama-3.1-70b-instruct:bf16 +``` + +### Llama-3.1-8b-instruct +Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. +Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | + +#### Model names +``` +meta/llama-3.1-8b-instruct:fp8 +meta/llama-3.1-8b-instruct:bf16 +``` + +### Llama-3-70b-instruct +Meta’s Llama 3 is an iteration of the open-access Llama family. +Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. +With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | + +#### Model name +``` +meta/llama-3-70b-instruct:fp8 +``` + +### Llama-3.3-70b-instruct +Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model. +This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | + +#### Model name +``` +meta/llama-3.3-70b-instruct:bf16 +``` + +### Llama-3.1-Nemotron-70b-instruct +Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. +NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify) | + +#### Model name +``` +meta/llama-3.1-nemotron-70b-instruct:fp8 +``` + +### DeepSeek-R1-Distill-Llama-70B +Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. +DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, Simplified Chinese | + +#### Model name +``` +deepseek/deepseek-r1-distill-llama-70b:bf16 +``` + +### DeepSeek-R1-Distill-Llama-8B +Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. +DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, Simplified Chinese | + +#### Model names +``` +deepseek/deepseek-r1-distill-llama-8b:bf16 +``` + +### Mistral-7b-instruct-v0.3 +The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. +This model is open-weight and distributed under the Apache 2.0 license. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English | + +#### Model name +``` +mistral/mistral-7b-instruct-v0.3:bf16 +``` + +### Mistral-small-24b-base-2501 +Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. +This model is open-weight and distributed under the Apache 2.0 license. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish | + +#### Model name +``` +mistral/mistral-small-24b-instruct-2501:fp8 +``` + +### Mistral-nemo-instruct-2407 +Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. +This model is open-weight and distributed under the Apache 2.0 license. +It was trained on a large proportion of multilingual and code data. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi | + +#### Model name +``` +mistral/mistral-nemo-instruct-2407:fp8 +``` + +### Moshiko-0.1-8b +Kyutai's Moshi is a speech-text foundation model for real-time dialogue. +Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. +While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. +Moshiko is the variant of Moshi with a male voice in English. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | No | +| Function calling | No | +| Supported languages | English | + +#### Model names +``` +kyutai/moshiko-0.1-8b:bf16 +kyutai/moshiko-0.1-8b:fp8 +``` + +### Moshika-0.1-8b +Kyutai's Moshi is a speech-text foundation model for real-time dialogue. +Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. +While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. +Moshika is the variant of Moshi with a female voice in English. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | No | +| Function calling | No | +| Supported languages | English | + +#### Model names +``` +kyutai/moshika-0.1-8b:bf16 +kyutai/moshika-0.1-8b:fp8 +``` + +### WizardLM-70B-V1.0 +WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants. +With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | No | +| Supported languages | English (to be verified) | + +#### Model names +``` +wizardlm/wizardlm-70b-v1.0:fp8 +wizardlm/wizardlm-70b-v1.0:fp16 +``` + +## Multimodal models + +### Pixtral-12b-2409 +Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. +It can analyze images and offer insights from visual content alongside text. +This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. +Pixtral is open-weight and distributed under the Apache 2.0 license. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | No | +| Supported languages | English, French, German, Spanish (to be verified) | + + + Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + + +#### Model name +``` +mistral/pixtral-12b-2409:bf16 +``` + +### Molmo-72b-0924 +Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. +Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. +Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. +Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)). + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | No | +| Supported languages | English, French, German, Spanish (to be verified) | + + + Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + + +#### Model name +``` +allenai/molmo-72b-0924:fp8 +``` + +## Code models + +### Qwen2.5-coder-32b-instruct +Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. +With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | Yes | +| Function calling | Yes | +| Supported languages | over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | + +#### Model name +``` +qwen/qwen2.5-coder-32b-instruct:int8 +``` + +## Embeddings models + +### Sentence-t5-xxl +The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. +Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. +This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal. + +| Attribute | Value | +|-----------|-------| +| Structured output supported | No | +| Function calling | No | +| Supported languages | English (to be verified) | + +#### Model name +``` +sentence-transformers/sentence-t5-xxl:fp32 +``` From cbce849bbee4aa4e7bbde92fc04fc4db4f6fbb00 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Fri, 18 Apr 2025 11:30:31 +0200 Subject: [PATCH 06/16] feat(infr): add model catalog --- menu/navigation.json | 104 ++--- .../{models2.mdx => model-catalog.mdx} | 14 +- .../reference-content/models.mdx | 375 ------------------ 3 files changed, 62 insertions(+), 431 deletions(-) rename pages/managed-inference/reference-content/{models2.mdx => model-catalog.mdx} (98%) delete mode 100644 pages/managed-inference/reference-content/models.mdx diff --git a/menu/navigation.json b/menu/navigation.json index 8c65f19497..ef374612b9 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -872,6 +872,10 @@ "label": "Support for function calling in Scaleway Managed Inference", "slug": "function-calling-support" }, + { + "label": "Managed Inference model catalog", + "slug": "model-catalog" + }, { "label": "BGE-Multilingual-Gemma2 model", "slug": "bge-multilingual-gemma2" @@ -3300,7 +3304,7 @@ "slug": "faq" }, { - "items": [ + "items": [ { "label": "Order an InterLink", "slug": "order-interlink" @@ -3310,61 +3314,61 @@ "slug": "complete-provisioning" }, { - "label": "Configure an InterLink", - "slug": "configure-interlink" - }, - { - "label": "Create a routing policy", - "slug": "create-routing-policy" - }, - { - "label": "Delete an InterLink", - "slug": "delete-interlink" - } - ], - "label": "How to", - "slug": "how-to" - }, - { - "items": [ - { - "label": "InterLink API Reference", - "slug": "https://www.scaleway.com/en/developers/api/interlink/" - } - ], - "label": "API/CLI", - "slug": "api-cli" - }, - { - "items": [ - { - "label": "InterLink overview", - "slug": "overview" - }, - { - "label": "InterLink provisioning", - "slug": "provisioning" - }, - { - "label": "Configuring an InterLink", - "slug": "configuring" + "label": "Configure an InterLink", + "slug": "configure-interlink" + }, + { + "label": "Create a routing policy", + "slug": "create-routing-policy" + }, + { + "label": "Delete an InterLink", + "slug": "delete-interlink" + } + ], + "label": "How to", + "slug": "how-to" }, { - "label": "InterLink statuses", - "slug": "statuses" + "items": [ + { + "label": "InterLink API Reference", + "slug": "https://www.scaleway.com/en/developers/api/interlink/" + } + ], + "label": "API/CLI", + "slug": "api-cli" }, { - "label": "Using BGP communities", - "slug": "bgp-communities" + "items": [ + { + "label": "InterLink overview", + "slug": "overview" + }, + { + "label": "InterLink provisioning", + "slug": "provisioning" + }, + { + "label": "Configuring an InterLink", + "slug": "configuring" + }, + { + "label": "InterLink statuses", + "slug": "statuses" + }, + { + "label": "Using BGP communities", + "slug": "bgp-communities" + } + ], + "label": "Additional Content", + "slug": "reference-content" } ], - "label": "Additional Content", - "slug": "reference-content" - } - ], - "label": "InterLink", - "slug": "interlink" - }, + "label": "InterLink", + "slug": "interlink" + }, { "items": [ { diff --git a/pages/managed-inference/reference-content/models2.mdx b/pages/managed-inference/reference-content/model-catalog.mdx similarity index 98% rename from pages/managed-inference/reference-content/models2.mdx rename to pages/managed-inference/reference-content/model-catalog.mdx index 1eb2e0bc7e..d441bd68b1 100644 --- a/pages/managed-inference/reference-content/models2.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -7,15 +7,16 @@ content: paragraph: This page provides information on the Scaleway Managed Inference product catalog tags: dates: - validation: 2025-03-19 - posted: 2024-05-28 + validation: 2025-04-18 + posted: 2024-04-18 categories: - ai-data --- A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities. -## Summary table -| Model Name | Provider | Context Length | Modalities | Instances | License | +## Models technical summary + +| Model name | Provider | Context length | Modalities | Instances | License | |------------|----------|--------------|------------|-----------|---------| | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 | | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | @@ -36,7 +37,8 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | -| Model Name | Structured output supported | Function calling | Supported languages | +## Models feature summary +| Model name | Structured output supported | Function calling | Supported languages | | --- | --- | --- | --- | | `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish | | `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | @@ -151,7 +153,7 @@ NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune t |-----------|-------| | Structured output supported | Yes | | Function calling | Yes | -| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify) | +| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) | #### Model name ``` diff --git a/pages/managed-inference/reference-content/models.mdx b/pages/managed-inference/reference-content/models.mdx deleted file mode 100644 index 93dd58b49d..0000000000 --- a/pages/managed-inference/reference-content/models.mdx +++ /dev/null @@ -1,375 +0,0 @@ ---- -meta: - title: Managed Inference model catalog - description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed. -content: - h1: Managed Inference model catalog - paragraph: This page provides information on the Scaleway Managed Inference product catalog -tags: -dates: - validation: 2025-03-19 - posted: 2024-05-28 -categories: - - ai-data ---- - -A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities. - -## Summary table - -| Model Name | Provider | Context Length | Modalities | Instances | License | -|------------|----------|--------------|------------|-----------|---------| -| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 | -| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | -| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community | -| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community | -| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community | -| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 |Lllama 3.3 community | -| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT | -| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 | -| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 | -| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 | -| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 | -| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | -| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | -| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community | -| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 | -| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 | -| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | -| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | - - -| Model Name | Structured output supported | Function calling | Supported languages | -| --- | --- | --- | --- | -| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish | -| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) | -| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese | -| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese | -| `Mistral-7b-instruct-v0.3` | Yes | Yes | English | -| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish | -| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi | -| `Moshiko-0.1-8b` | No | No | English | -| `Moshika-0.1-8b` | No | No | English | -| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) | -| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) | -| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) | -| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | -| `Sentence-t5-xxl` | No | No | English (to be verified) | - -## Model details - - - Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. - - -## Text models - -### Mixtral-8x7b-instruct-v0.1 - -Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. -Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. - -- Structured output supported: Yes -- Function calling: No -- Supported languages: English, French, German, Spanish - -#### Model names - -``` -mistral/mixtral-8x7b-instruct-v0.1:fp8 -mistral/mixtral-8x7b-instruct-v0.1:bf16 -``` - -### Llama-3.1-70b-instruct - -Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. -Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - -#### Model names - -``` -meta/llama-3.1-70b-instruct:fp8 -meta/llama-3.1-70b-instruct:bf16 -``` - -### Llama-3.1-8b-instruct - -Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. -Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. - - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - -#### Model names - -``` -meta/llama-3.1-8b-instruct:fp8 -meta/llama-3.1-8b-instruct:bf16 -``` - - -### Llama-3-70b-instruct - -Meta’s Llama 3 is an iteration of the open-access Llama family. -Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. -With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - - #### Model name - - ``` - meta/llama-3-70b-instruct:fp8 - ``` - - -### Llama-3.3-70b-instruct - -Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model. -This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. - -#### Model name - -``` -meta/llama-3.3-70b-instruct:bf16 -``` - -### Llama-3.1-Nemotron-70b-instruct - -Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. -NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify) - -#### Model name - -``` -meta/llama-3.1-nemotron-70b-instruct:fp8 -``` - -### DeepSeek-R1-Distill-Llama-70B - -Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. -DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, Simplified Chinese -#### Model name - -``` -deepseek/deepseek-r1-distill-llama-70b:bf16 -``` - -### DeepSeek-R1-Distill-Llama-8B - -Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. -DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. - - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, Simplified Chinese - -#### Model names - -``` -deepseek/deepseek-r1-distill-llama-8b:bf16 -``` - -### Mistral-7b-instruct-v0.3 - -The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. -This model is open-weight and distributed under the Apache 2.0 license. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English - -#### Model name - -``` -mistral/mistral-7b-instruct-v0.3:bf16 -``` - -### Mistral-small-24b-base-2501 - -Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. -This model is open-weight and distributed under the Apache 2.0 license. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. - -#### Model name - -``` -mistral/mistral-small-24b-instruct-2501:fp8 -``` - -### Mistral-nemo-instruct-2407 - -Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. -This model is open-weight and distributed under the Apache 2.0 license. -It was trained on a large proportion of multilingual and code data. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. - -#### Model name - -``` -mistral/mistral-nemo-instruct-2407:fp8 -``` - -### Moshiko-0.1-8b - -Kyutai's Moshi is a speech-text foundation model for real-time dialogue. -Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. -While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. -Moshiko is the variant of Moshi with a male voice in English. - -- Structured output supported: No -- Function calling: No -- Supported languages: English - -#### Model names - -``` -kyutai/moshiko-0.1-8b:bf16 -kyutai/moshiko-0.1-8b:fp8 -``` - -### Moshika-0.1-8b - -Kyutai's Moshi is a speech-text foundation model for real-time dialogue. -Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. -While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. -Moshika is the variant of Moshi with a female voice in English. - -- Structured output supported: No -- Function calling: No -- Supported languages: English - -#### Model names - -``` -kyutai/moshika-0.1-8b:bf16 -kyutai/moshika-0.1-8b:fp8 -``` - -### WizardLM-70B-V1.0 - -WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants. -With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors. - -- Structured output supported: Yes -- Function calling: No -- Supported languages: English (to be verified) - -#### Model names - -``` -wizardlm/wizardlm-70b-v1.0:fp8 -wizardlm/wizardlm-70b-v1.0:fp16 -``` - -## Multimodal models - -### Pixtral-12b-2409 - -Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. -It can analyze images and offer insights from visual content alongside text. -This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. - -Pixtral is open-weight and distributed under the Apache 2.0 license. - - - Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. - - -- Structured output supported: Yes -- Function calling: No -- Supported languages: English, French, German, Spanish (to be verified) - - #### Model name - - ``` - mistral/pixtral-12b-2409:bf16 - ``` - -### Molmo-72b-0924 - -Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. -Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. - -Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. -Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)). - - - Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. - - -- Structured output supported: Yes -- Function calling: No -- Supported languages: English, French, German, Spanish (to be verified) - -#### Model name - -``` -allenai/molmo-72b-0924:fp8 -``` - -## Code models - -### Qwen2.5-coder-32b-instruct - -Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. -With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. - -- Structured output supported: Yes -- Function calling: Yes -- Supported languages: over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic. - -#### Model name - - ``` - qwen/qwen2.5-coder-32b-instruct:int8 - ``` - -## Embeddings models - -### Sentence-t5-xxl - -The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. -Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. -This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal. - -- Structured output supported: No -- Function calling: No -- Supported languages: English (to be verified) - -#### Model name - -``` -sentence-transformers/sentence-t5-xxl:fp32 -``` \ No newline at end of file From 83cc5923ec84adb6c31546ea64103485b12c2e79 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Tue, 22 Apr 2025 09:44:08 +0200 Subject: [PATCH 07/16] Apply suggestions from code review Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- pages/managed-inference/reference-content/model-catalog.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index d441bd68b1..2fe69e5a6c 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -84,7 +84,7 @@ mistral/mixtral-8x7b-instruct-v0.1:bf16 ### Llama-3.1-70b-instruct Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. -Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. +Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks. | Attribute | Value | |-----------|-------| @@ -100,7 +100,7 @@ meta/llama-3.1-70b-instruct:bf16 ### Llama-3.1-8b-instruct Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. -Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks. +Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks. | Attribute | Value | |-----------|-------| @@ -162,7 +162,7 @@ meta/llama-3.1-nemotron-70b-instruct:fp8 ### DeepSeek-R1-Distill-Llama-70B Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. -DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks. +DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. | Attribute | Value | |-----------|-------| From 904cdc5308b1983889b4876d632c69beadc942c7 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Tue, 22 Apr 2025 09:59:31 +0200 Subject: [PATCH 08/16] fix(infr): fix typos --- pages/managed-inference/reference-content/model-catalog.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 2fe69e5a6c..0592b1595f 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -19,7 +19,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | Model name | Provider | Context length | Modalities | Instances | License | |------------|----------|--------------|------------|-----------|---------| | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 | -| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community | +| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | Llama 3 community | | [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community | | [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community | | [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community | @@ -34,7 +34,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community | | [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 | | [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 | -| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License | +| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Apache 2.0 | | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | ## Models feature summary From 2dfb5fe96b8eef2b1df79cb7ac3bbe81d0c4340d Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 22 Apr 2025 11:00:20 +0200 Subject: [PATCH 09/16] fix(inference): supported languages --- .../reference-content/model-catalog.mdx | 36 +++++++++---------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 0592b1595f..ff2a49d76c 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -40,24 +40,24 @@ A quick overview of available models in Scaleway's catalog and their core attrib ## Models feature summary | Model name | Structured output supported | Function calling | Supported languages | | --- | --- | --- | --- | -| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish | -| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) | -| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese | -| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese | -| `Mistral-7b-instruct-v0.3` | Yes | Yes | English | -| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish | -| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi | -| `Moshiko-0.1-8b` | No | No | English | -| `Moshika-0.1-8b` | No | No | English | -| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) | -| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) | -| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) | -| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | -| `Sentence-t5-xxl` | No | No | English (to be verified) | +| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish | +| `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | +| `llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English | +| `llama-3-70b-instruct` | Yes | Yes | English | +| `deepseek-r1-distill-llama-70B` | Yes | Yes | English, Chinese | +| `deepseek-r1-distill-llama-8B` | Yes | Yes | English, Chinese | +| `mistral-7b-instruct-v0.3` | Yes | Yes | English | +| `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi | +| `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese | +| `moshiko-0.1-8b` | No | No | English | +| `moshika-0.1-8b` | No | No | English | +| `wizardLM-70b-v1.0` | Yes | No | English | +| `pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) | +| `molmo-72b-0924` | Yes | No | English | +| `qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | +| `sentence-t5-xxl` | No | No | English | ## Model details From ccda4e577f957f107043f0f8c45aaf4de748f15d Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 22 Apr 2025 17:28:05 +0200 Subject: [PATCH 10/16] fix(inference): update licenses --- .../reference-content/model-catalog.mdx | 47 ++++++++++--------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index ff2a49d76c..eb0ff42f6e 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -16,31 +16,33 @@ A quick overview of available models in Scaleway's catalog and their core attrib ## Models technical summary -| Model name | Provider | Context length | Modalities | Instances | License | +| Model name | Provider | Context length (tokens) | Modalities | Instances | License | |------------|----------|--------------|------------|-----------|---------| -| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 | -| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | Llama 3 community | -| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community | -| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community | -| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community | -| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | Lllama 3.3 community | -| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT | -| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 | -| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 | -| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 | -| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 | -| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | -| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 | -| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community | -| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 | -| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 | -| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Apache 2.0 | -| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 | +| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 32k | Text | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) | +| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | +| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | +| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) | +| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | [Llama 3.3 community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | +| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | +| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) | +| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | +| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | +| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | +| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) | +| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | No | No | [Gemma](https://ai.google.dev/gemma/terms) | +| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | ## Models feature summary | Model name | Structured output supported | Function calling | Supported languages | | --- | --- | --- | --- | -| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish | +| `gemma-3-27b-it` | Yes | Partial | English, Chinese, Japanese, Korean and 31 additional languages | | `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | | `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | | `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | @@ -51,14 +53,17 @@ A quick overview of available models in Scaleway's catalog and their core attrib | `mistral-7b-instruct-v0.3` | Yes | Yes | English | | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi | | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese | +| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish | | `moshiko-0.1-8b` | No | No | English | | `moshika-0.1-8b` | No | No | English | | `wizardLM-70b-v1.0` | Yes | No | English | | `pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) | | `molmo-72b-0924` | Yes | No | English | -| `qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | +| `qwen2.5-coder-32b-instruct` | Yes | Yes | English, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic and 16 additional languages. | +| `bge-multilingual-gemma2` | No | No | English, French, Chinese, Japanese, Korean | | `sentence-t5-xxl` | No | No | English | + ## Model details Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. From a165bd1d832559da683e0e4d2a02e73772c62831 Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 22 Apr 2025 17:37:12 +0200 Subject: [PATCH 11/16] fix(inference): models supported features --- pages/managed-inference/reference-content/model-catalog.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index eb0ff42f6e..bfa5716d17 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -53,11 +53,11 @@ A quick overview of available models in Scaleway's catalog and their core attrib | `mistral-7b-instruct-v0.3` | Yes | Yes | English | | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi | | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese | -| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish | +| `mixtral-8x7b-instruct-v0.1` | Yes | Yes | English, French, German, Italian, Spanish | | `moshiko-0.1-8b` | No | No | English | | `moshika-0.1-8b` | No | No | English | | `wizardLM-70b-v1.0` | Yes | No | English | -| `pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) | +| `pixtral-12b-2409` | Yes | Yes | English | | `molmo-72b-0924` | Yes | No | English | | `qwen2.5-coder-32b-instruct` | Yes | Yes | English, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic and 16 additional languages. | | `bge-multilingual-gemma2` | No | No | English, French, Chinese, Japanese, Korean | From 041af5f89230dfce66dc231b2072e0ff225398ce Mon Sep 17 00:00:00 2001 From: fpagny Date: Wed, 23 Apr 2025 12:06:28 +0200 Subject: [PATCH 12/16] fix(inference): update context length and tasks --- .../reference-content/model-catalog.mdx | 40 +++++++++---------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index bfa5716d17..fc8132bbc0 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -16,28 +16,28 @@ A quick overview of available models in Scaleway's catalog and their core attrib ## Models technical summary -| Model name | Provider | Context length (tokens) | Modalities | Instances | License | +| Model name | Provider | Maximum Context length (tokens) | Modalities | Instances | License | |------------|----------|--------------|------------|-----------|---------| -| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 32k | Text | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) | -| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | -| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | -| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) | -| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | [Llama 3.3 community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | -| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | -| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) | -| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | -| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 40k | Text, Vision | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) | +| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.3 community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | +| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | +| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | 128k | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | +| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k | Text | H100, H100-2 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) | +| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | +| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) | +| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | +| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | -| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | -| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) | -| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | No | No | [Gemma](https://ai.google.dev/gemma/terms) | -| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | +| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | +| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) | +| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | BAAI | 4k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) | +| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | ## Models feature summary | Model name | Structured output supported | Function calling | Supported languages | From f90636d5e8a2d515730b569fbedde7784f065d0e Mon Sep 17 00:00:00 2001 From: fpagny Date: Wed, 23 Apr 2025 12:07:48 +0200 Subject: [PATCH 13/16] fix(inference): update task descriptions --- pages/managed-inference/reference-content/model-catalog.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index fc8132bbc0..c2ffb335ed 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -30,8 +30,8 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | -| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Text | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | +| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Audio to Audio | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | +| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Audio to Audio| L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | | [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) | | [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | From bd78dfa95a61cdc9228ceebd7cab356177c57877 Mon Sep 17 00:00:00 2001 From: fpagny Date: Wed, 23 Apr 2025 13:38:50 +0200 Subject: [PATCH 14/16] feat(inference): add gemma and mistral small characteristics --- .../reference-content/model-catalog.mdx | 28 ++++++++++++++----- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index c2ffb335ed..f987912735 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -27,7 +27,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) | | [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | | [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mistral-small-3.1-24b-instruct-2503 3`](#mistral-small-3.1-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Audio to Audio | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | @@ -69,18 +69,32 @@ A quick overview of available models in Scaleway's catalog and their core attrib Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. +## Multimodal models (Text and Vision) + +### Gemma-3-27b-it +Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages. +The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited. + +#### Model names +``` +google/gemma-3-27b-it:bf16 +``` + +### Mistral-small-3.1-24b-instruct-2503 +Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages. +This model was optimized to have a dense knowledge and faster tokens throughput compared to its size. + +#### Model names +``` +mistral/mistral-small-3.1-24b-instruct-2503:bf16 +``` + ## Text models ### Mixtral-8x7b-instruct-v0.1 Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | No | -| Supported languages | English, French, German, Spanish | - #### Model names ``` mistral/mixtral-8x7b-instruct-v0.1:fp8 From 9d9b01942a18cbac1add1ad6716ee74465f6ee90 Mon Sep 17 00:00:00 2001 From: fpagny Date: Wed, 23 Apr 2025 14:28:55 +0200 Subject: [PATCH 15/16] feat(inference): restructure model catalog --- .../reference-content/model-catalog.mdx | 208 ++++++------------ 1 file changed, 63 insertions(+), 145 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index f987912735..33d8aba0bf 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -23,7 +23,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | | [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | 128k | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | | [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k | Text | H100, H100-2 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) | -| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | +| [`llama-3.1-nemotron-70b-instruct`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | | [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) | | [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | | [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | @@ -34,7 +34,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Audio to Audio| L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | | [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) | | [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) and [Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)| | [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | BAAI | 4k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) | | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | @@ -71,6 +71,10 @@ A quick overview of available models in Scaleway's catalog and their core attrib ## Multimodal models (Text and Vision) + + Vision models can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + + ### Gemma-3-27b-it Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages. The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited. @@ -89,28 +93,42 @@ This model was optimized to have a dense knowledge and faster tokens throughput mistral/mistral-small-3.1-24b-instruct-2503:bf16 ``` +### Pixtral-12b-2409 +Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. +It can analyze images and offer insights from visual content alongside text. +This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. +Pixtral is open-weight and distributed under the Apache 2.0 license. + +#### Model name +``` +mistral/pixtral-12b-2409:bf16 +``` + +### Molmo-72b-0924 +Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. +Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. + +#### Model name +``` +allenai/molmo-72b-0924:fp8 +``` + ## Text models -### Mixtral-8x7b-instruct-v0.1 -Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. -Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. +### Llama-3.3-70b-instruct +Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model. +This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications. -#### Model names +#### Model name ``` -mistral/mixtral-8x7b-instruct-v0.1:fp8 -mistral/mixtral-8x7b-instruct-v0.1:bf16 +meta/llama-3.3-70b-instruct:fp8 +meta/llama-3.3-70b-instruct:bf16 ``` ### Llama-3.1-70b-instruct Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | - #### Model names ``` meta/llama-3.1-70b-instruct:fp8 @@ -121,12 +139,6 @@ meta/llama-3.1-70b-instruct:bf16 Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | - #### Model names ``` meta/llama-3.1-8b-instruct:fp8 @@ -138,59 +150,27 @@ Meta’s Llama 3 is an iteration of the open-access Llama family. Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | - #### Model name ``` meta/llama-3-70b-instruct:fp8 ``` -### Llama-3.3-70b-instruct -Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model. -This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications. - -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | - -#### Model name -``` -meta/llama-3.3-70b-instruct:bf16 -``` - ### Llama-3.1-Nemotron-70b-instruct Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) | - #### Model name ``` -meta/llama-3.1-nemotron-70b-instruct:fp8 +nvidia/llama-3.1-nemotron-70b-instruct:fp8 ``` ### DeepSeek-R1-Distill-Llama-70B Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, Simplified Chinese | - #### Model name ``` +deepseek/deepseek-r1-distill-llama-70b:fp8 deepseek/deepseek-r1-distill-llama-70b:bf16 ``` @@ -198,27 +178,26 @@ deepseek/deepseek-r1-distill-llama-70b:bf16 Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, Simplified Chinese | - #### Model names ``` +deepseek/deepseek-r1-distill-llama-8b:fp8 deepseek/deepseek-r1-distill-llama-8b:bf16 ``` +### Mixtral-8x7b-instruct-v0.1 +Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. +Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences. + +#### Model names +``` +mistral/mixtral-8x7b-instruct-v0.1:fp8 +mistral/mixtral-8x7b-instruct-v0.1:bf16 +``` + ### Mistral-7b-instruct-v0.3 The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. This model is open-weight and distributed under the Apache 2.0 license. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English | - #### Model name ``` mistral/mistral-7b-instruct-v0.3:bf16 @@ -228,15 +207,10 @@ mistral/mistral-7b-instruct-v0.3:bf16 Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. This model is open-weight and distributed under the Apache 2.0 license. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish | - #### Model name ``` mistral/mistral-small-24b-instruct-2501:fp8 +mistral/mistral-small-24b-instruct-2501:bf16 ``` ### Mistral-nemo-instruct-2407 @@ -244,12 +218,6 @@ Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by This model is open-weight and distributed under the Apache 2.0 license. It was trained on a large proportion of multilingual and code data. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi | - #### Model name ``` mistral/mistral-nemo-instruct-2407:fp8 @@ -261,12 +229,6 @@ Moshi is an experimental next-generation conversational model, designed to under While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. Moshiko is the variant of Moshi with a male voice in English. -| Attribute | Value | -|-----------|-------| -| Structured output supported | No | -| Function calling | No | -| Supported languages | English | - #### Model names ``` kyutai/moshiko-0.1-8b:bf16 @@ -279,12 +241,6 @@ Moshi is an experimental next-generation conversational model, designed to under While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. Moshika is the variant of Moshi with a female voice in English. -| Attribute | Value | -|-----------|-------| -| Structured output supported | No | -| Function calling | No | -| Supported languages | English | - #### Model names ``` kyutai/moshika-0.1-8b:bf16 @@ -295,91 +251,53 @@ kyutai/moshika-0.1-8b:fp8 WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants. With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors. -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | No | -| Supported languages | English (to be verified) | - #### Model names ``` wizardlm/wizardlm-70b-v1.0:fp8 wizardlm/wizardlm-70b-v1.0:fp16 ``` -## Multimodal models - -### Pixtral-12b-2409 -Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. -It can analyze images and offer insights from visual content alongside text. -This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. -Pixtral is open-weight and distributed under the Apache 2.0 license. - -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | No | -| Supported languages | English, French, German, Spanish (to be verified) | +## Code models - - Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. - +### Qwen2.5-coder-32b-instruct +Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. +With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. #### Model name ``` -mistral/pixtral-12b-2409:bf16 +qwen/qwen2.5-coder-32b-instruct:int8 ``` -### Molmo-72b-0924 -Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. -Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. -Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. -Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)). +## Embeddings models + +### Bge-multilingual-gemma2 +BGE-Multilingual-Gemma2 tops the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard), scoring the number one spot in French and Polish, and number seven in English (as of Q4 2024). +As its name suggests, the model’s training data spans a broad range of languages, including English, Chinese, Polish, French, and more. | Attribute | Value | |-----------|-------| -| Structured output supported | Yes | -| Function calling | No | -| Supported languages | English, French, German, Spanish (to be verified) | +| Embedding dimensions | 3584 | +| Matryoshka embedding | No | - Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + [Matryoshka embeddings](https://huggingface.co/blog/matryoshka) refers to embeddings trained on multiple dimensions number. As a result, resulting vectors dimensions will be sorted by most meaningful first. For example, a 3584 dimensions vector can be truncated to its 768 first dimensions and used directly. #### Model name ``` -allenai/molmo-72b-0924:fp8 +baai/bge-multilingual-gemma2:fp32 ``` -## Code models - -### Qwen2.5-coder-32b-instruct -Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. -With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. - -| Attribute | Value | -|-----------|-------| -| Structured output supported | Yes | -| Function calling | Yes | -| Supported languages | over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic | - -#### Model name -``` -qwen/qwen2.5-coder-32b-instruct:int8 -``` - -## Embeddings models - ### Sentence-t5-xxl The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal. + | Attribute | Value | |-----------|-------| -| Structured output supported | No | -| Function calling | No | -| Supported languages | English (to be verified) | +| Embedding dimensions | 768 | +| Matryoshka embedding | No | #### Model name ``` From 228ab0e4011a73b2ea019bf60be41d8d0a574aaf Mon Sep 17 00:00:00 2001 From: fpagny Date: Wed, 23 Apr 2025 14:41:02 +0200 Subject: [PATCH 16/16] fix(inference): fix anchors --- .../managed-inference/reference-content/model-catalog.mdx | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 33d8aba0bf..3ea11e5d0e 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -27,7 +27,8 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) | | [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | | [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`mistral-small-3.1-24b-instruct-2503 3`](#mistral-small-3.1-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | Mistral | 32k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Audio to Audio | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) | @@ -46,12 +47,13 @@ A quick overview of available models in Scaleway's catalog and their core attrib | `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | | `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | | `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | -| `llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English | | `llama-3-70b-instruct` | Yes | Yes | English | +| `llama-3.1-nemotron-70b-instruct` | Yes | Yes | English | | `deepseek-r1-distill-llama-70B` | Yes | Yes | English, Chinese | | `deepseek-r1-distill-llama-8B` | Yes | Yes | English, Chinese | | `mistral-7b-instruct-v0.3` | Yes | Yes | English | | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi | +| `mistral-small-24b-instruct-2501` | Yes | Yes | Text | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean | | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese | | `mixtral-8x7b-instruct-v0.1` | Yes | Yes | English, French, German, Italian, Spanish | | `moshiko-0.1-8b` | No | No | English | @@ -203,7 +205,7 @@ This model is open-weight and distributed under the Apache 2.0 license. mistral/mistral-7b-instruct-v0.3:bf16 ``` -### Mistral-small-24b-base-2501 +### Mistral-small-24b-instruct-2501 Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. This model is open-weight and distributed under the Apache 2.0 license.