diff --git a/changelog/june2025/2025-06-25-generative-apis-changed-llama-33-70b-maximum-context-up.mdx b/changelog/june2025/2025-06-25-generative-apis-changed-llama-33-70b-maximum-context-up.mdx new file mode 100644 index 0000000000..0bbe3be2c9 --- /dev/null +++ b/changelog/june2025/2025-06-25-generative-apis-changed-llama-33-70b-maximum-context-up.mdx @@ -0,0 +1,12 @@ +--- +title: Llama 3.3 70B maximum context update +status: changed +date: 2025-06-25 +category: ai-data +product: generative-apis +--- + +Llama 3.3 70B maximum context is [now reduced to 100k tokens](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/) (from 130k tokens previously). +This update will improve average throughput and time to first token. +[Managed Inference](https://www.scaleway.com/en/docs/managed-inference/reference-content/model-catalog/) can still be used to support context lengths of 130k tokens. +