diff --git a/ai-data/managed-inference/reference-content/moshika-0.1-8b.mdx b/ai-data/managed-inference/reference-content/moshika-0.1-8b.mdx new file mode 100644 index 0000000000..61e29e6efb --- /dev/null +++ b/ai-data/managed-inference/reference-content/moshika-0.1-8b.mdx @@ -0,0 +1,87 @@ +--- +meta: + title: Understanding the Moshika-0.1-8b model + description: Deploy your own secure Moshika-0.1-8b model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Understanding the Moshika-0.1-8b model + paragraph: This page provides information on the Moshika-0.1-8b model +tags: +dates: + validation: 2024-10-30 + posted: 2024-10-30 +categories: + - ai-data +--- + +## Model overview + +| Attribute | Details | +|-----------------|------------------------------------| +| Provider | [Kyutai](https://github.com/kyutai-labs/moshi) | +| Compatible Instances | L4, H100 (FP8, BF16) | +| Context size | 4096 tokens | + +## Model names + +```bash +kyutai/moshika-0.1-8b:bf16 +kyutai/moshika-0.1-8b:fp8 +``` + +## Compatible Instances + +| Instance type | Max context length | +| ------------- |-------------| +| L4 | 4096 (FP8, BF16) | +| H100 | 4096 (FP8, BF16) | + +## Model introduction + +Kyutai's Moshi is a speech-text foundation model for real-time dialogue. +Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. +While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. +Moshika is the variant of Moshi with a female voice in English. + +## Why is it useful? + +Moshi offers seamless real-time dialogue capabilities, enabling users to engage in natural conversations with the model. +It allows the modeling of arbitrary conversational dynamics, including overlapping speech, interruptions, interjections, and more. +In particular, this model: +- Processes 24 kHz audio down to a 12.5 Hz representation with a bandwith of 1.1 kbps, performing better than existing non-streaming models. +- Achieves a theoretical latency of 160 ms, with a practical latency of 200 ms, making it suitable for real-time applications. + +## How to use it + +To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint: + +```bash +wss://.ifr.fr-par.scaleway.com/api/chat +``` + +### Testing the WebSocket endpoint + +To test the endpoint, use the following command: + +```bash +curl -i --http1.1 \ +-H "Authorization: Bearer " \ +-H "Connection: Upgrade" \ +-H "Upgrade: websocket" \ +-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \ +-H "Sec-WebSocket-Version: 13" \ +--url "https://.ifr.fr-par.scaleway.com/api/chat" +``` + +Make sure to replace `` and `` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser). + + +The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection. + +### Interacting with the model + +We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface. +Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples). +This repository contains instructions on how to run the code samples and interact with the model. \ No newline at end of file diff --git a/ai-data/managed-inference/reference-content/moshiko-0.1-8b.mdx b/ai-data/managed-inference/reference-content/moshiko-0.1-8b.mdx new file mode 100644 index 0000000000..967d23d885 --- /dev/null +++ b/ai-data/managed-inference/reference-content/moshiko-0.1-8b.mdx @@ -0,0 +1,86 @@ +--- +meta: + title: Understanding the Moshiko-0.1-8b model + description: Deploy your own secure Moshiko-0.1-8b model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Understanding the Moshiko-0.1-8b model + paragraph: This page provides information on the Moshiko-0.1-8b model +tags: +dates: + validation: 2024-10-30 +categories: + - ai-data +--- + +## Model overview + +| Attribute | Details | +|-----------------|------------------------------------| +| Provider | [Kyutai](https://github.com/kyutai-labs/moshi) | +| Compatible Instances | L4, H100 (FP8, BF16) | +| Context size | 4096 tokens | + +## Model names + +```bash +kyutai/moshiko-0.1-8b:bf16 +kyutai/moshiko-0.1-8b:fp8 +``` + +## Compatible Instances + +| Instance type | Max context length | +| ------------- |-------------| +| L4 | 4096 (FP8, BF16) | +| H100 | 4096 (FP8, BF16) | + +## Model introduction + +Kyutai's Moshi is a speech-text foundation model for real-time dialogue. +Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. +While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. +Moshiko is the variant of Moshi with a male voice in English. + +## Why is it useful? + +Moshi offers seamless real-time dialogue capabilities, enabling users to engage in natural conversations with the model. +It allows the modeling of arbitrary conversational dynamics, including overlapping speech, interruptions, interjections, and more. +In particular, this model: +- Processes 24 kHz audio down to a 12.5 Hz representation with a bandwith of 1.1 kbps, performing better than existing non-streaming models. +- Achieves a theoretical latency of 160 ms, with a practical latency of 200 ms, making it suitable for real-time applications. + +## How to use it + +To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint: + +```bash +wss://.ifr.fr-par.scaleway.com/api/chat +``` + +### Testing the WebSocket endpoint + +To test the endpoint, use the following command: + +```bash +curl -i --http1.1 \ +-H "Authorization: Bearer " \ +-H "Connection: Upgrade" \ +-H "Upgrade: websocket" \ +-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \ +-H "Sec-WebSocket-Version: 13" \ +--url "https://.ifr.fr-par.scaleway.com/api/chat" +``` + +Make sure to replace `` and `` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser). + + +The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection. + +### Interacting with the model + +We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface. +Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples). +This repository contains instructions on how to run the code samples and interact with the model. \ No newline at end of file