Voxtral Realtime: enable streaming mode in CUDA CI#17844
Conversation
Remove the vr-offline override so the CUDA CI runs Voxtral Realtime in streaming mode (the default). The streaming encoder path exercises the full pipeline including ring buffer KV cache and incremental mel processing.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17844
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 9614d5c with merge base 6db7f4c ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR updates the CUDA CI workflow to run Voxtral Realtime in its default streaming mode by removing the explicit vr-offline override. This helps exercise the streaming encoder pipeline in CUDA CI (e.g., incremental mel + ring-buffer KV cache).
Changes:
- Remove the workflow logic that forces Voxtral Realtime into
vr-offlinemode during CUDA artifact export. - Remove the workflow logic that forces Voxtral Realtime into
vr-offlinemode during CUDA e2e testing. - Update related workflow comments to no longer mention “offline mode”.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Remove the vr-offline override so the CUDA CI runs Voxtral Realtime in streaming mode (the default). The streaming encoder path exercises the full pipeline including ring buffer KV cache and incremental mel processing.
Remove the vr-offline override so the CUDA CI runs Voxtral Realtime
in streaming mode (the default). The streaming encoder path exercises
the full pipeline including ring buffer KV cache and incremental mel
processing.