From aab809b8dd8942e8b8aabc360bf105badd774555 Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 8 Apr 2025 17:12:24 +0200 Subject: [PATCH 1/9] feat(inference): add custom models requirements --- .../reference-content/supported-models.mdx | 52 +++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 pages/managed-inference/reference-content/supported-models.mdx diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx new file mode 100644 index 0000000000..ac811d2eae --- /dev/null +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -0,0 +1,52 @@ +--- +meta: + title: Supported Models in Managed Inference + description: Supported Models in Managed Inference +content: + h1: Supported Models in Managed Inference + paragraph: Supported Models in Managed Inference +tags: +dates: + validation: 2025-04-08 + posted: 2025-04-08 +categories: + - ai-data +--- + +## Models supported on Managed Inference + +Managed Inference supports multiple AI models either from: +- [Scaleway catalog]((#scaleway-catalog)): A curated model list available in [Scaleway Console](https://console.scaleway.com/inference/deployments/) or through [Managed Inference Models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) +- [Custom models](#custom-models): Models imported by you as a user from sources such as HuggingFace. + +## Scaleway Catalog + +### Multimodal models (Chat and Vision) + +### Chat models + +| Provider | Model string | Documentation | License | +|-----------------|-----------------|-----------------|-----------------| +| Meta | `llama-3.3-70b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) | +| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [HF](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | + +### Vision models + +### Embedding models + +## Custom models + + + Custom models are still in Beta status. If you identify unsupported models, you can report the issue to us through our [Slack Community Channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or our [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). + + +### Prerequesites + +To deploy a model by providing its URL on Hugging Face, you need to: +- Have access to this model with your Hugging Face credentials (if the model is "Gated", you specifically need to ask access from your Hugging Face account). Note that your Hugging Face credentials will not be stored, but we still recommend you to create [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens) for this purpose. + +### Additional consideration + +- When deploying custom models, you remain responsible for complying with any License requirements from the model provider. +- We currently + From 7dcfe6ed75c410b5f389245cd9d21751d47d1944 Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 8 Apr 2025 17:39:37 +0200 Subject: [PATCH 2/9] feat(inference): add custom model support --- .../reference-content/supported-models.mdx | 32 ++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx index ac811d2eae..58de9884d9 100644 --- a/pages/managed-inference/reference-content/supported-models.mdx +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -28,7 +28,7 @@ Managed Inference supports multiple AI models either from: | Provider | Model string | Documentation | License | |-----------------|-----------------|-----------------|-----------------| | Meta | `llama-3.3-70b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) | -| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [HF](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | +| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 Community](https://llama.meta.com/llama3_1/license/) | ### Vision models @@ -42,11 +42,35 @@ Managed Inference supports multiple AI models either from: ### Prerequesites + + To begin with custom models deployment, we recommend you start with existing variation of models supported in the Scaleway Catalog. As an example, you can deploy a [quantized version (4 bits) of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). If you want to then deploy a fine-tuned version of Llama 3.3, you can ensure the file structure you provide matches this example before creating your deployment. + + To deploy a model by providing its URL on Hugging Face, you need to: - Have access to this model with your Hugging Face credentials (if the model is "Gated", you specifically need to ask access from your Hugging Face account). Note that your Hugging Face credentials will not be stored, but we still recommend you to create [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens) for this purpose. -### Additional consideration +The model files need to include: +- a `config.json` file containing: + - `architectures` array. See [supported models architectures](#supported-models-architecture) for exact list of supported values. + - `max_position_embeddings` +- model weigths in [`.safetensors`](https://huggingface.co/docs/safetensors/index) format +- a chat template either in: + - `tokenizer_config.json` file as `chat_template` field + - `chat_template.json` file as `chat_template` field + +For security reasons, models containing arbitrary code execution such as [`pickle`](https://docs.python.org/3/library/pickle.html) format are not supported. + +### Custom model lifecycle + +Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments. +In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand. + +### License + +- When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. + +### Supported models architecture + +Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel` -- When deploying custom models, you remain responsible for complying with any License requirements from the model provider. -- We currently From 1386fd2544f3970100ad47e892dc6d8ca5d2b8fa Mon Sep 17 00:00:00 2001 From: fpagny Date: Tue, 8 Apr 2025 17:48:17 +0200 Subject: [PATCH 3/9] feat(inference): update custom models --- .../reference-content/supported-models.mdx | 29 +++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx index 58de9884d9..4336486980 100644 --- a/pages/managed-inference/reference-content/supported-models.mdx +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -58,8 +58,37 @@ The model files need to include: - `tokenizer_config.json` file as `chat_template` field - `chat_template.json` file as `chat_template` field +The model type need to either be: +- `chat` +- `vision` +- `multimodal` (`chat` and `vision` currently) +- `embedding` + For security reasons, models containing arbitrary code execution such as [`pickle`](https://docs.python.org/3/library/pickle.html) format are not supported. +### Supported API + +Depending on the model type, specific endpoints and features will be supported. + +#### Chat models + +Chat API will be expposed for this model under `/v1/chat/completions` endpoint. +**Structured outputs** or **Function calling** are not yet supported for custom models. + +#### Vision models + +Chat API will be expposed for this model under `/v1/chat/completions` endpoint. +**Structured outputs** or **Function calling** are not yet supported for custom models. + +#### Multimodal models (vision and chat) + +These models will be treated similarly to both Chat and Vision models. + +#### Embedding models + +Embeddings API will be exposed for this model under `/v1/embeddings` endpoint. + + ### Custom model lifecycle Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments. From 57a0ac2a41904031bf24c19c5edfac69a7244d55 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 9 Apr 2025 11:16:52 +0200 Subject: [PATCH 4/9] docs(infr): update docs --- menu/navigation.json | 4 + .../how-to/create-deployment.mdx | 7 +- .../reference-content/supported-models.mdx | 111 +++++++++++------- 3 files changed, 76 insertions(+), 46 deletions(-) diff --git a/menu/navigation.json b/menu/navigation.json index e4e2247785..e5d1696197 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -860,6 +860,10 @@ "label": "OpenAI API compatibility", "slug": "openai-compatibility" }, + { + "label": "Supported models in Managed Inference", + "slug": "supported-models" + }, { "label": "Support for function calling in Scaleway Managed Inference", "slug": "function-calling-support" diff --git a/pages/managed-inference/how-to/create-deployment.mdx b/pages/managed-inference/how-to/create-deployment.mdx index 12a15a8b57..1b43cd5ee8 100644 --- a/pages/managed-inference/how-to/create-deployment.mdx +++ b/pages/managed-inference/how-to/create-deployment.mdx @@ -7,7 +7,7 @@ content: paragraph: This page explains how to deploy a model on Scaleway Managed Inference tags: managed-inference ai-data creating dedicated dates: - validation: 2025-04-01 + validation: 2025-04-09 posted: 2024-03-06 --- @@ -19,7 +19,10 @@ dates: 1. Click the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard. 2. Click **Deploy a model** to launch the model deployment wizard. 3. Provide the necessary information: - - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/) + - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/). + + Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation. + Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly. diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx index 4336486980..0c098d9f98 100644 --- a/pages/managed-inference/reference-content/supported-models.mdx +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -1,11 +1,11 @@ --- meta: - title: Supported Models in Managed Inference - description: Supported Models in Managed Inference + title: Supported models in Managed Inference + description: Explore all AI models supported by Managed Inference content: - h1: Supported Models in Managed Inference - paragraph: Supported Models in Managed Inference -tags: + h1: Supported models in Managed Inference + paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models. +tags: support models custom catalog dates: validation: 2025-04-08 posted: 2025-04-08 @@ -13,93 +13,116 @@ categories: - ai-data --- -## Models supported on Managed Inference +Scaleway Managed Inference allows you to deploy various AI models, either from: -Managed Inference supports multiple AI models either from: -- [Scaleway catalog]((#scaleway-catalog)): A curated model list available in [Scaleway Console](https://console.scaleway.com/inference/deployments/) or through [Managed Inference Models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) -- [Custom models](#custom-models): Models imported by you as a user from sources such as HuggingFace. +- [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) +- [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face. -## Scaleway Catalog +## Scaleway catalog -### Multimodal models (Chat and Vision) +### Multimodal models (chat + vision) ### Chat models -| Provider | Model string | Documentation | License | -|-----------------|-----------------|-----------------|-----------------| -| Meta | `llama-3.3-70b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) | -| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 Community](https://llama.meta.com/llama3_1/license/) | +| Provider | Model identifier | Documentation | License | +|----------|------------------|----------------|---------| +| Meta | `llama-3.3-70b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 License](https://www.llama.com/llama3_3/license/) | +| Meta | `llama-3.1-8b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 License](https://llama.meta.com/llama3_1/license/) | ### Vision models +_More details to be added._ + ### Embedding models -## Custom models +_More details to be added._ + + +## Custom Models - Custom models are still in Beta status. If you identify unsupported models, you can report the issue to us through our [Slack Community Channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or our [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). + Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). -### Prerequesites +### Prerequisites - To begin with custom models deployment, we recommend you start with existing variation of models supported in the Scaleway Catalog. As an example, you can deploy a [quantized version (4 bits) of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). If you want to then deploy a fine-tuned version of Llama 3.3, you can ensure the file structure you provide matches this example before creating your deployment. + We recommend starting with a variation of a supported model from the Scaleway catalog. + For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). + If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above. -To deploy a model by providing its URL on Hugging Face, you need to: -- Have access to this model with your Hugging Face credentials (if the model is "Gated", you specifically need to ask access from your Hugging Face account). Note that your Hugging Face credentials will not be stored, but we still recommend you to create [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens) for this purpose. +To deploy a custom model via Hugging Face, ensure the following: + +#### Access requirements + +- You must have access to the model using your Hugging Face credentials. +- For gated models, request access through your Hugging Face account. +- Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens). + +#### Required files + +Your model repository must include: -The model files need to include: -- a `config.json` file containing: - - `architectures` array. See [supported models architectures](#supported-models-architecture) for exact list of supported values. +- `config.json` with: + - An `architectures` array (see [supported architectures](#supported-models-architecture)) - `max_position_embeddings` -- model weigths in [`.safetensors`](https://huggingface.co/docs/safetensors/index) format -- a chat template either in: - - `tokenizer_config.json` file as `chat_template` field - - `chat_template.json` file as `chat_template` field +- Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format +- A chat template included in either: + - `tokenizer_config.json` as a `chat_template` field, or + - `chat_template.json` as a `chat_template` field + +#### Supported model types + +Your model must be one of the following types: -The model type need to either be: - `chat` - `vision` -- `multimodal` (`chat` and `vision` currently) +- `multimodal` (chat + vision) - `embedding` -For security reasons, models containing arbitrary code execution such as [`pickle`](https://docs.python.org/3/library/pickle.html) format are not supported. + + **Security Notice**
+ Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**. +
-### Supported API +## API support -Depending on the model type, specific endpoints and features will be supported. +Depending on your model type, the following endpoints will be available: -#### Chat models +### Chat models Chat API will be expposed for this model under `/v1/chat/completions` endpoint. **Structured outputs** or **Function calling** are not yet supported for custom models. -#### Vision models +### Vision models Chat API will be expposed for this model under `/v1/chat/completions` endpoint. **Structured outputs** or **Function calling** are not yet supported for custom models. -#### Multimodal models (vision and chat) +### Multimodal models These models will be treated similarly to both Chat and Vision models. -#### Embedding models +### Embedding models Embeddings API will be exposed for this model under `/v1/embeddings` endpoint. -### Custom model lifecycle +## Custom model lifecycle Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments. -In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand. - -### License +In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**. -- When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. +## Licensing -### Supported models architecture +When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. -Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel` +## Supported model architectures +Custom models must conform to one of the architectures listed below. Click to expand full list. + + ## Supported custom model architectures + Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel` + \ No newline at end of file From c2304f3e5c0b16f15789b49d8f95ec3c92243179 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 9 Apr 2025 12:01:14 +0200 Subject: [PATCH 5/9] feat(minfr): add chat models --- .../reference-content/supported-models.mdx | 195 +++++++++++++++--- 1 file changed, 167 insertions(+), 28 deletions(-) diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx index 0c098d9f98..483eb155b3 100644 --- a/pages/managed-inference/reference-content/supported-models.mdx +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -10,13 +10,13 @@ dates: validation: 2025-04-08 posted: 2025-04-08 categories: - - ai-data + * ai-data --- Scaleway Managed Inference allows you to deploy various AI models, either from: -- [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) -- [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face. + * [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) + * [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face. ## Scaleway catalog @@ -24,10 +24,23 @@ Scaleway Managed Inference allows you to deploy various AI models, either from: ### Chat models -| Provider | Model identifier | Documentation | License | -|----------|------------------|----------------|---------| -| Meta | `llama-3.3-70b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 License](https://www.llama.com/llama3_3/license/) | -| Meta | `llama-3.1-8b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 License](https://llama.meta.com/llama3_1/license/) | +| Provider | Model identifier | Documentation | License | +|------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------| +| Allen AI | `molmo-72b-0924` | [View Details](/managed-inference/reference-content/molmo-72b-0924/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Deepseek | `deepseek-r1-distill-llama-70b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | +| Deepseek | `deepseek-r1-distill-llama-8b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | +| Meta | `llama-3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | +| Meta | `llama-3-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | +| Meta | `llama-3.1-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/) | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | +| Meta | `llama-3.1-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 license](https://www.llama.com/llama3_1/license/) | +| Meta | `llama-3.3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 license](https://www.llama.com/llama3_3/license/) | +| Nvidia | `llama-3.1-nemotron-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | +| Mistral | `mixtral-8x7b-instruct-v0.1` | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `mistral-7b-instruct-v0.3` | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `mistral-nemo-instruct-2407` | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `pixtral-12b-2409` | [View Details](/managed-inference/reference-content/pixtral-12b-2409/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Qwen | `qwen2.5-coder-32b-instruct` | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/) | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) | ### Vision models @@ -35,10 +48,13 @@ _More details to be added._ ### Embedding models -_More details to be added._ +| Provider | Model identifier | Documentation | License | +|----------|------------------|----------------|---------| +| BAAI | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) | +| Sentence Transformers | `sentence-t5-xxl` | [View Details](/managed-inference/reference-content/sentence-t5-xxl/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | -## Custom Models +## Custom models Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). @@ -56,30 +72,30 @@ To deploy a custom model via Hugging Face, ensure the following: #### Access requirements -- You must have access to the model using your Hugging Face credentials. -- For gated models, request access through your Hugging Face account. -- Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens). + * You must have access to the model using your Hugging Face credentials. + * For gated models, request access through your Hugging Face account. + * Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens). #### Required files Your model repository must include: -- `config.json` with: - - An `architectures` array (see [supported architectures](#supported-models-architecture)) - - `max_position_embeddings` -- Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format -- A chat template included in either: - - `tokenizer_config.json` as a `chat_template` field, or - - `chat_template.json` as a `chat_template` field + * A `config.json` file containig: + * An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values). + * `max_position_embeddings` + * Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format + * A chat template included in either: + * `tokenizer_config.json` as a `chat_template` field, or + * `chat_template.json` as a `chat_template` field #### Supported model types Your model must be one of the following types: -- `chat` -- `vision` -- `multimodal` (chat + vision) -- `embedding` + * `chat` + * `vision` + * `multimodal` (chat + vision) + * `embedding` **Security Notice**
@@ -88,16 +104,16 @@ Your model must be one of the following types: ## API support -Depending on your model type, the following endpoints will be available: +Depending on the model type, specific endpoints and features will be supported. ### Chat models -Chat API will be expposed for this model under `/v1/chat/completions` endpoint. +Chat API will be exposed for this model under `/v1/chat/completions` endpoint. **Structured outputs** or **Function calling** are not yet supported for custom models. ### Vision models -Chat API will be expposed for this model under `/v1/chat/completions` endpoint. +Chat API will be exposed for this model under `/v1/chat/completions` endpoint. **Structured outputs** or **Function calling** are not yet supported for custom models. ### Multimodal models @@ -123,6 +139,129 @@ When deploying custom models, **you remain responsible** for complying with any Custom models must conform to one of the architectures listed below. Click to expand full list. - ## Supported custom model architectures - Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel` + ## Supported custom model architectures + Custom models deployment currently supports the following model architectures: + * `AquilaModel` + * `AquilaForCausalLM` + * `ArcticForCausalLM` + * `BaiChuanForCausalLM` + * `BaichuanForCausalLM` + * `BloomForCausalLM` + * `CohereForCausalLM` + * `Cohere2ForCausalLM` + * `DbrxForCausalLM` + * `DeciLMForCausalLM` + * `DeepseekForCausalLM` + * `DeepseekV2ForCausalLM` + * `DeepseekV3ForCausalLM` + * `ExaoneForCausalLM` + * `FalconForCausalLM` + * `Fairseq2LlamaForCausalLM` + * `GemmaForCausalLM` + * `Gemma2ForCausalLM` + * `GlmForCausalLM` + * `GPT2LMHeadModel` + * `GPTBigCodeForCausalLM` + * `GPTJForCausalLM` + * `GPTNeoXForCausalLM` + * `GraniteForCausalLM` + * `GraniteMoeForCausalLM` + * `GritLM` + * `InternLMForCausalLM` + * `InternLM2ForCausalLM` + * `InternLM2VEForCausalLM` + * `InternLM3ForCausalLM` + * `JAISLMHeadModel` + * `JambaForCausalLM` + * `LlamaForCausalLM` + * `LLaMAForCausalLM` + * `MambaForCausalLM` + * `FalconMambaForCausalLM` + * `MiniCPMForCausalLM` + * `MiniCPM3ForCausalLM` + * `MistralForCausalLM` + * `MixtralForCausalLM` + * `QuantMixtralForCausalLM` + * `MptForCausalLM` + * `MPTForCausalLM` + * `NemotronForCausalLM` + * `OlmoForCausalLM` + * `Olmo2ForCausalLM` + * `OlmoeForCausalLM` + * `OPTForCausalLM` + * `OrionForCausalLM` + * `PersimmonForCausalLM` + * `PhiForCausalLM` + * `Phi3ForCausalLM` + * `Phi3SmallForCausalLM` + * `PhiMoEForCausalLM` + * `Qwen2ForCausalLM` + * `Qwen2MoeForCausalLM` + * `RWForCausalLM` + * `StableLMEpochForCausalLM` + * `StableLmForCausalLM` + * `Starcoder2ForCausalLM` + * `SolarForCausalLM` + * `TeleChat2ForCausalLM` + * `XverseForCausalLM` + * `BartModel` + * `BartForConditionalGeneration` + * `Florence2ForConditionalGeneration` + * `BertModel` + * `RobertaModel` + * `RobertaForMaskedLM` + * `XLMRobertaModel` + * `DeciLMForCausalLM` + * `Gemma2Model` + * `GlmForCausalLM` + * `GritLM` + * `InternLM2ForRewardModel` + * `JambaForSequenceClassification` + * `LlamaModel` + * `MistralModel` + * `Phi3ForCausalLM` + * `Qwen2Model` + * `Qwen2ForCausalLM` + * `Qwen2ForRewardModel` + * `Qwen2ForProcessRewardModel` + * `TeleChat2ForCausalLM` + * `LlavaNextForConditionalGeneration` + * `Phi3VForCausalLM` + * `Qwen2VLForConditionalGeneration` + * `Qwen2ForSequenceClassification` + * `BertForSequenceClassification` + * `RobertaForSequenceClassification` + * `XLMRobertaForSequenceClassification` + * `AriaForConditionalGeneration` + * `Blip2ForConditionalGeneration` + * `ChameleonForConditionalGeneration` + * `ChatGLMModel` + * `ChatGLMForConditionalGeneration` + * `DeepseekVLV2ForCausalLM` + * `FuyuForCausalLM` + * `H2OVLChatModel` + * `InternVLChatModel` + * `Idefics3ForConditionalGeneration` + * `LlavaForConditionalGeneration` + * `LlavaNextForConditionalGeneration` + * `LlavaNextVideoForConditionalGeneration` + * `LlavaOnevisionForConditionalGeneration` + * `MantisForConditionalGeneration` + * `MiniCPMO` + * `MiniCPMV` + * `MolmoForCausalLM` + * `NVLM_D` + * `PaliGemmaForConditionalGeneration` + * `Phi3VForCausalLM` + * `PixtralForConditionalGeneration` + * `QWenLMHeadModel` + * `Qwen2VLForConditionalGeneration` + * `Qwen2_5_VLForConditionalGeneration` + * `Qwen2AudioForConditionalGeneration` + * `UltravoxModel` + * `MllamaForConditionalGeneration` + * `WhisperForConditionalGeneration` + * `EAGLEModel` + * `MedusaModel` + * `MLPSpeculatorPreTrainedModel` \ No newline at end of file From 23659e8cc5a51ecdce7ee525f37e2605271babf5 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 9 Apr 2025 12:02:14 +0200 Subject: [PATCH 6/9] fix(gen): small typo --- pages/managed-inference/reference-content/supported-models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx index 483eb155b3..bbf7418daf 100644 --- a/pages/managed-inference/reference-content/supported-models.mdx +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -10,7 +10,7 @@ dates: validation: 2025-04-08 posted: 2025-04-08 categories: - * ai-data + - ai-data --- Scaleway Managed Inference allows you to deploy various AI models, either from: From 91c7e33fa63f8ac97aaaf7c5d3ff4c49fadd1a7b Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 9 Apr 2025 12:04:41 +0200 Subject: [PATCH 7/9] feat(inference): update quickstart --- pages/managed-inference/quickstart.mdx | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/pages/managed-inference/quickstart.mdx b/pages/managed-inference/quickstart.mdx index 697316fb36..48eb89b0b3 100644 --- a/pages/managed-inference/quickstart.mdx +++ b/pages/managed-inference/quickstart.mdx @@ -38,7 +38,10 @@ Here are some of the key features of Scaleway Managed Inference: 1. Navigate to the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard. 2. Click **Create deployment** to launch the deployment creation wizard. 3. Provide the necessary information: - - Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/) + - Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/). + + Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation. + Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly. From c6d5af76f05bdf344ff41ef570f9262af703c855 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 9 Apr 2025 12:07:07 +0200 Subject: [PATCH 8/9] feat(infr): update --- pages/managed-inference/reference-content/supported-models.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx index bbf7418daf..c2239c558b 100644 --- a/pages/managed-inference/reference-content/supported-models.mdx +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -22,6 +22,8 @@ Scaleway Managed Inference allows you to deploy various AI models, either from: ### Multimodal models (chat + vision) +_More details to be added._ + ### Chat models | Provider | Model identifier | Documentation | License | From f7b9f5a57bbd469fb124c838470b6d53bd78d414 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Fri, 11 Apr 2025 14:42:09 +0200 Subject: [PATCH 9/9] Apply suggestions from code review Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- .../reference-content/supported-models.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx index c2239c558b..845be58327 100644 --- a/pages/managed-inference/reference-content/supported-models.mdx +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -110,7 +110,7 @@ Depending on the model type, specific endpoints and features will be supported. ### Chat models -Chat API will be exposed for this model under `/v1/chat/completions` endpoint. +The Chat API will be exposed for this model under `/v1/chat/completions` endpoint. **Structured outputs** or **Function calling** are not yet supported for custom models. ### Vision models @@ -129,7 +129,7 @@ Embeddings API will be exposed for this model under `/v1/embeddings` endpoint. ## Custom model lifecycle -Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments. +Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**. ## Licensing @@ -142,7 +142,7 @@ Custom models must conform to one of the architectures listed below. Click to ex ## Supported custom model architectures - Custom models deployment currently supports the following model architectures: + Custom model deployment currently supports the following model architectures: * `AquilaModel` * `AquilaForCausalLM` * `ArcticForCausalLM`