diff --git a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx index f6cf661089..d59a491267 100644 --- a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx +++ b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx @@ -19,8 +19,8 @@ categories: |-----------------|------------------------------------| | Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | | License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | -| Compatible Instances | H100-2 (BF16) | -| Context Length | up to 56k tokens | +| Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) | +| Context Length | up to 131k tokens | ## Model names @@ -32,7 +32,8 @@ deepseek/deepseek-r1-distill-llama-70b:bf16 | Instance type | Max context length | | ------------- |-------------| -| H100-2 | 56k (BF16) | +| H100 | 15k (FP8) | +| H100-2 | 131k (FP8), 56k (BF16) | ## Model introduction diff --git a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx index dd9919bc93..5c6821e94a 100644 --- a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx +++ b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx @@ -19,7 +19,7 @@ categories: |-----------------|------------------------------------| | Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | | License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | -| Compatible Instances | L4, L40S, H100 (BF16) | +| Compatible Instances | L4, L40S, H100 (FP8, BF16) | | Context Length | up to 131k tokens | ## Model names @@ -32,9 +32,9 @@ deepseek/deepseek-r1-distill-llama-8b:bf16 | Instance type | Max context length | | ------------- |-------------| -| L4 | 39k (BF16) | -| L40S | 131k (BF16) | -| H100 | 131k (BF16) | +| L4 | 90k (FP8), 39k (BF16) | +| L40S | 131k (FP8, BF16) | +| H100 | 131k (FP8, BF16) | ## Model introduction diff --git a/pages/managed-inference/reference-content/llama-3.3-70b-instruct.mdx b/pages/managed-inference/reference-content/llama-3.3-70b-instruct.mdx index df9402482e..d129843c39 100644 --- a/pages/managed-inference/reference-content/llama-3.3-70b-instruct.mdx +++ b/pages/managed-inference/reference-content/llama-3.3-70b-instruct.mdx @@ -19,8 +19,8 @@ categories: |-----------------|------------------------------------| | Provider | [Meta](https://www.llama.com/) | | License | [Llama 3.3 community](https://www.llama.com/llama3_3/license/) | -| Compatible Instances | H100-2 (BF16) | -| Context length | Up to 70k tokens | +| Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) | +| Context length | Up to 131k tokens | ## Model names @@ -32,7 +32,8 @@ meta/llama-3.3-70b-instruct:bf16 | Instance type | Max context length | | ------------- |-------------| -| H100-2 | 62k (BF16) | +| H100 | 15k (FP8) | +| H100-2 | 131k (FP8), 62k (BF16) | ## Model introduction @@ -76,4 +77,4 @@ Process the output data according to your application's needs. The response will Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. - \ No newline at end of file + diff --git a/pages/managed-inference/reference-content/mistral-small-24b-instruct-2501.mdx b/pages/managed-inference/reference-content/mistral-small-24b-instruct-2501.mdx new file mode 100644 index 0000000000..9b19c023d5 --- /dev/null +++ b/pages/managed-inference/reference-content/mistral-small-24b-instruct-2501.mdx @@ -0,0 +1,77 @@ +--- +meta: + title: Understanding the Mistral-small-24b-base-2501 model + description: Deploy your own secure Mistral-small-24b-base-2501 model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Understanding the Mistral-small-24b-base-2501 model + paragraph: This page provides information on the Mistral-small-24b-base-2501 model +tags: +dates: + validation: 2025-03-04 + posted: 2025-03-04 +categories: + - ai-data +--- + +## Model overview + +| Attribute | Details | +|-----------------|------------------------------------| +| Provider | [Mistral](https://mistral.ai/technology/#models) | +| Compatible Instances | L40S, H100, H100-2 (FP8) | +| Context size | 32K tokens | + +## Model name + +```bash +mistral/mistral-small-24b-instruct-2501:fp8 +``` + +## Compatible Instances + +| Instance type | Max context length | +| ------------- |-------------| +| L40 | 20k (FP8) | +| H100 | 32k (FP8) | +| H100-2 | 32k (FP8) | + +## Model introduction + +Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. +This model is open-weight and distributed under the Apache 2.0 license. + +## Why is it useful? + +- Mistral Small 24B offers a large context window of up to 32k tokens and provide both conversational and reasoning capabilities. +- This model supports multiple languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. +- It supersedes Mistral Nemo Instruct, although its tokens throughput is slightly lower. + +## How to use it + +### Sending Inference requests + +To perform inference tasks with your Mistral model deployed at Scaleway, use the following command: + +```bash +curl -s \ +-H "Authorization: Bearer " \ +-H "Content-Type: application/json" \ +--request POST \ +--url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ +--data '{"model":"mistral/mistral-small-24b-instruct-2501:fp8", "messages":[{"role": "user","content": "Tell me about Scaleway."}], "top_p": 1, "temperature": 0.7, "stream": false}' +``` + +Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + +### Receiving Managed Inference responses + +Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. +Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request. + + + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. +