Skip to content

Add Voxtral Realtime Windows WPF application#230

Merged
seyeong-han merged 3 commits into
mainfrom
add-voxtral-windows-app
Apr 15, 2026
Merged

Add Voxtral Realtime Windows WPF application#230
seyeong-han merged 3 commits into
mainfrom
add-voxtral-windows-app

Conversation

@seyeong-han
Copy link
Copy Markdown
Contributor

Summary

image

Adds a Windows WPF (.NET 8) desktop application for Voxtral Realtime — a real-time speech-to-text transcription tool powered by ExecuTorch. This is the Windows counterpart to the existing macOS SwiftUI application.

Key Features

  • Real-time transcription — Live speech-to-text using the Voxtral model running locally via ExecuTorch
  • Dictation mode — Speak and auto-paste transcribed text into any target application via clipboard
  • Silence detection — Peak-based audio level monitoring with configurable silence threshold and timeout for auto-stop
  • Text processing pipeline — Post-processing with configurable text replacements and snippet expansion
  • Session history — Persistent transcript storage with search and browsing
  • Global hotkey — System-wide keyboard shortcut to start/stop transcription
  • MVVM architecture — Clean separation using CommunityToolkit.Mvvm with ObservableObject view models

Architecture

VoxtralRealtime/
├── Models/           # Data models (Session, Snippet, ReplacementEntry, Enums)
├── Converters/       # WPF value converters
├── Services/         # Core services
│   ├── RunnerBridge.cs         # ExecuTorch model bridge
│   ├── AudioCaptureService.cs  # NAudio-based mic capture
│   ├── ClipboardPasteService.cs # Win32 clipboard + paste
│   ├── GlobalHotkeyService.cs  # System-wide hotkey registration
│   ├── TextPipeline.cs         # Post-processing pipeline
│   ├── PersistenceService.cs   # JSON file storage
│   └── AppLogger.cs            # File-based logging
├── ViewModels/       # MVVM view models
│   ├── TranscriptStoreViewModel.cs  # Central state management
│   ├── DictationViewModel.cs        # Dictation flow + silence monitor
│   ├── SettingsViewModel.cs         # App configuration
│   ├── ReplacementStoreViewModel.cs # Text replacements
│   └── SnippetStoreViewModel.cs     # Text snippets
├── Views/            # WPF XAML views
│   ├── MainWindow              # Shell with sidebar + detail layout
│   ├── WelcomeView             # Home/landing page
│   ├── TranscriptView          # Live transcript display
│   ├── SidebarView             # Navigation sidebar
│   ├── RecordingControlsBar    # Toolbar (transcribe/pause/done)
│   ├── DictationWindow         # Floating dictation overlay
│   ├── AudioLevelControl       # Real-time audio level meter
│   ├── SettingsView            # Configuration UI
│   └── *ManagementViews        # Replacement & snippet editors
└── Resources/        # Styles and assets

Also Included

  • .gitignore updated with .NET/C# build artifact rules (bin/, obj/, publish/, etc.)
  • build.bat for release builds
  • upload_models.py helper for model distribution
  • README.md with setup and build instructions

Test Plan

  • Built and tested manually on Windows 10/11 with .NET 8
  • Verified real-time transcription, dictation auto-paste, silence detection, and session persistence

@seyeong-han seyeong-han requested a review from mergennachin April 8, 2026 22:49
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 8, 2026
@seyeong-han seyeong-han force-pushed the add-voxtral-windows-app branch from a215713 to 401d3d8 Compare April 8, 2026 23:39
@seyeong-han seyeong-han force-pushed the add-voxtral-windows-app branch from 401d3d8 to 20d57c5 Compare April 9, 2026 00:43
@mergennachin
Copy link
Copy Markdown
Contributor

@claude Review this PR

@seyeong-han seyeong-han merged commit 56f5fbf into main Apr 15, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants