Add support for Apple Silicon and PyTorch device configuration#4
Conversation
|
@blink-claw-bot review |
blink-claw-bot
left a comment
There was a problem hiding this comment.
PR Review: Add support for Apple Silicon and PyTorch device configuration
Nice addition — MLX Whisper support for Apple Silicon is a solid feature. The abstraction into _transcribe_asr is the right approach. Tests are well-structured. A few issues to address:
HIGH
1. Double model load in dual-channel mode (performance regression)
_transcribe_asr() loads and deletes the whisperx model on every call. In _transcribe_dual_channel, it's called twice (mic + system channel). Previously, the model was loaded once and reused for both channels.
This doubles model load time for dual-channel whisperx transcription — that's a multi-second regression on GPU and much worse on CPU.
Suggestion: Either:
- Accept a pre-loaded model as an optional param to
_transcribe_asr, or - Split into
_load_asr_model()/_run_asr()so dual-channel can reuse the model, or - Cache the model within
_transcribe_asrfor the whisperx path (less clean)
2. whisperx remains a hard dependency even with MLX backend
The mono transcribe() function still calls whisperx.load_audio() unconditionally (line ~1008 in the new code). This means whisperx must be installed even when using MLX as the ASR backend. If someone installs only the [mlx] extra on Apple Silicon without whisperx, it will crash.
Is this intentional? If so, the README should clarify that whisperx is still required (MLX only replaces the transcription step, not audio loading/alignment/diarization). If not, load_audio needs an alternative path for MLX-only installs.
MEDIUM
3. VAD options silently ignored with MLX backend
_transcribe_asr() uses config.vad_onset / config.vad_offset for whisperx but silently ignores them for MLX. If a user passes --vad-onset / --vad-offset with --asr-backend mlx, they get no feedback that their settings have no effect.
Suggestion: Log a warning when VAD options are set but the MLX backend is active.
4. _empty_torch_cache swallows all exceptions
except Exception:
passPer project conventions (and general best practice), catch-and-ignore is a code smell. For cache cleanup it's less critical, but at minimum log the exception at DEBUG level so failures are diagnosable.
5. Redundant double cache clear
After ASR in both mono and dual paths:
_empty_torch_cache(torch, config.device)
_empty_torch_cache(torch, config.torch_device)When torch_device defaults to device (the common case), this clears the same cache twice. Add a guard:
_empty_torch_cache(torch, config.device)
if config.torch_device != config.device:
_empty_torch_cache(torch, config.torch_device)LOW / nits
6. nit: _transcribe_asr uses print() for status messages. Consistent with existing code, but worth noting for a future logging pass.
7. nit: _mlx_available() imports importlib.util inside the function. Since importlib.util is stdlib and always available, this could be a top-level import. The defensive try/except around it is unnecessary.
8. nit: Test test_transcribe_asr_normalizes_mlx_result passes "audio.wav" (a string) as the audio argument. In production, the MLX path in dual-channel receives a numpy array. Consider adding a test case with a numpy-like input to validate both code paths.
What's good
- Clean separation of ASR backends via
_transcribe_asr() _empty_torch_cacheproperly handles MPS device- Auto-detection logic (
_apple_silicon() + _mlx_available()) is sensible - MLX model alias mapping is complete and well-structured
- Test coverage is solid: config validation, auto-detection, monkeypatched backend tests
- README docs are clear and well-placed
Verdict: Request changes — the dual-channel model reload regression (#1) needs to be fixed before merge. The whisperx dependency story (#2) needs at least a documentation clarification.
🤖 Review by Claude Opus 4 via Blink-Claw-Bot
Summary: - reuse a single WhisperX ASR model for dual-channel transcription - document that MLX only replaces ASR while WhisperX remains required for audio loading, alignment, and diarization - warn when custom VAD settings are ignored by MLX - log torch cache cleanup failures and avoid duplicate cache clears - clean lint smells in the PR-touched CLI/transcription code Verification: - ruff check meet/transcribe.py meet/cli.py tests/test_transcribe.py - pytest tests/test_transcribe.py - pytest - git diff --check
|
@blink-claw-bot review with the new changes addressing the previous comments |
pretyflaco
left a comment
There was a problem hiding this comment.
Verdict
Approve. Solid, well-scoped Apple Silicon / MLX integration with good test coverage. Happy to merge after one small follow-up below.
What's good
- Clean ASR-backend abstraction in
_transcribe_asr(meet/transcribe.py:771). - Dual-channel WhisperX model reuse fix (
meet/transcribe.py:860) — addresses prior review feedback about double model load. - Torch device split for alignment/diarization throughout the pipeline (
meet/transcribe.py:1076). - MLX model alias map and auto-detect logic (
_apple_silicon,_mlx_available,_MLX_MODEL_ALIASES). - VAD-options-ignored warning under MLX backend (
meet/transcribe.py:781). - README clarifies that MLX only replaces transcription; WhisperX remains required for audio loading, alignment, and diarization (
README.md:176). - Tests cover MLX result normalization, MLX VAD warning, dual-channel WhisperX model reuse, and config validation (
tests/test_transcribe.py).
One small ask before merge
On macOS / Apple Silicon, the CLI --device still defaults to cuda (meet/cli.py:208 and meet/cli.py:471). Combined with auto-MLX selection at meet/transcribe.py:432, a Mac user running meet transcribe file.wav with no flags will hit whisperx.load_align_model(..., device="cuda") (meet/transcribe.py:1076) and fail.
Could you add a short pre-flight notice in both the transcribe and run commands when _apple_silicon() is True and device == "cuda", suggesting --device cpu --torch-device mps? Roughly:
from meet.transcribe import _apple_silicon
if _apple_silicon() and device == "cuda":
click.echo(
"Note: detected Apple Silicon. CUDA is not available on macOS — "
"re-run with `--device cpu --torch-device mps` for best results.",
err=True,
)This keeps the "Apple Silicon support" claim honest without changing global defaults. It belongs in this PR's scope since it's part of the Apple Silicon UX you're adding.
Everything else (auto-defaulting device on Darwin, GUI exposure of --asr-backend / --torch-device / --mlx-model, CI workflow, runtime validation of torch_device, always-on MLX VAD note) we'll handle as separate follow-up issues on our side — no need to expand this PR.
Thanks for the contribution and the btop screenshot proof of M1 GPU usage.
PR #4 added --asr-backend, --torch-device, and --mlx-model to the transcribe and run CLI commands but did not propagate them to the GUI. Wire them through end-to-end and surface a collapsible 'Advanced' settings panel in the recorder widget so users can toggle them without restarting. CLI changes (meet gui command): - Add --asr-backend, --torch-device, --mlx-model options mirroring those on transcribe and run. - Forward them through gui() into launch(). launch() / transcribe_kwargs: - New asr_backend, torch_device, mlx_model parameters. - device default flips from 'cuda' to None to match the CLI (consistent with the issue #8 auto-default behavior; until that lands the CLI still passes through 'cuda' explicitly via the click default). - transcribe_kwargs dict carries the three new keys so MeetRecorderWindow reads them on every run. GUI: - New Gtk.Expander 'Advanced' panel with three controls: * ASR backend combobox (auto/whisperx/mlx) * Torch device combobox (auto/cuda/cpu/mps; 'auto' maps to None so the dataclass platform-detects) * MLX model entry (free-form text; empty string maps to None so the alias resolver picks based on --model) - Selections write back into self._transcribe_kwargs immediately so the next transcription picks them up. - Expander stays collapsed by default to preserve the compact widget UX. Tests: - New tests/test_gui.py verifies launch() threads the three new kwargs into the transcribe_kwargs dict, and that defaults resolve to None / 'auto'. - Skipped when PyGObject is unavailable. Closes #5 Co-authored-by: pretyflaco <kemal@blinkbtc.com>
PR #4 was reviewed and merged with no automated checks because the repo had no CI. Add a minimal MVP workflow so future PRs surface lint and focused unit-test results. Scope (intentionally narrow for first iteration): - Ubuntu latest, Python 3.12 only (matrix expansion is follow-up). - Minimal dependency install: ruff, pytest, numpy, click, reportlab, requests, and the package itself with --no-deps. Skips the [dev] extra to avoid pulling whisperx + torch (~700MB). - Ruff scoped to the actively-maintained files (transcribe, cli, test_transcribe, test_utils). Wider scope is tracked as a follow-up to this MVP. - Pytest runs the focused unit tests; the dual-channel dispatch test imports whisperx at runtime and is deselected. tests/test_cli.py is run only when present (lands with issue #8). Closes #9
PR #4 was reviewed and merged with no automated checks because the repo had no CI. Add a minimal MVP workflow so future PRs surface lint and focused unit-test results. Scope (intentionally narrow for first iteration): - Ubuntu latest, Python 3.12 only (matrix expansion is follow-up). - Minimal dependency install: ruff, pytest, numpy, click, reportlab, requests, and the package itself with --no-deps. Skips the [dev] extra to avoid pulling whisperx + torch (~700MB). - Ruff scoped to the actively-maintained files (transcribe, cli, test_transcribe, test_utils). Wider scope is tracked as a follow-up to this MVP. - Pytest runs the focused unit tests; the dual-channel dispatch test imports whisperx at runtime and is deselected. tests/test_cli.py is run only when present (lands with issue #8). Closes #9
PR #4 was reviewed and merged with no automated checks because the repo had no CI. Add a minimal MVP workflow so future PRs surface lint and focused unit-test results. Scope (intentionally narrow for first iteration): - Ubuntu latest, Python 3.12 only (matrix expansion is follow-up). - Minimal dependency install: ruff, pytest, numpy, click, reportlab, requests, and the package itself with --no-deps. Skips the [dev] extra to avoid pulling whisperx + torch (~700MB). - Ruff scoped to the actively-maintained files (transcribe, cli, test_transcribe, test_utils). Wider scope is tracked as a follow-up to this MVP. - Pytest runs the focused unit tests; the dual-channel dispatch test imports whisperx at runtime and is deselected. tests/test_cli.py is run only when present (lands with issue #8). Closes #9
PR #4 was reviewed and merged with no automated checks because the repo had no CI. Add a minimal MVP workflow so future PRs surface lint and focused unit-test results. Scope (intentionally narrow for first iteration): - Ubuntu latest, Python 3.12 only (matrix expansion is follow-up). - Minimal dependency install: ruff, pytest, numpy, click, reportlab, requests, and the package itself with --no-deps. Skips the [dev] extra to avoid pulling whisperx + torch (~700MB). - Ruff scoped to the actively-maintained files (transcribe, cli, test_transcribe, test_utils). Wider scope is tracked as a follow-up to this MVP. - Pytest runs the focused unit tests; the dual-channel dispatch test imports whisperx at runtime and is deselected. tests/test_cli.py is run only when present (lands with issue #8). Closes #9 Co-authored-by: pretyflaco <kemal@blinkbtc.com>
Bundles five issue-PR landings: - #5 Expose ASR backend, torch device, MLX model in GUI - #6 Always note that MLX backend ignores VAD options - #7 Validate torch device availability in TranscriptionConfig - #8 Auto-default device on Apple Silicon - #9 Minimal CI workflow (ruff + focused pytest) Built on PR #4 by openoms which added the underlying ASR backend abstraction and Apple Silicon MLX Whisper integration.
…link vezir (#15) The 'Linux only' framing in the README and REQUIREMENTS.md was accurate before v0.6.0 but is now misleading. PR #4 (openoms) and the v0.6.0 follow-ups added MLX Whisper ASR + MPS torch device support, so the post-capture pipeline (transcribe, label, sync, summarize) now works on macOS Apple Silicon. Audio capture (meet record / meet run) still requires Linux because meet_record.capture shells out to ffmpeg -f pulse against PulseAudio/PipeWire monitor sources. README: - Add CI status and PyPI version badges at the top. - Rewrite Requirements section to split Linux desktop (full pipeline) from macOS Apple Silicon (post-capture pipeline). - Update transcribe --device option help to note the auto-detected default introduced by issue #8. - Replace 'Linux only' Limitations bullet with a precise per-subcommand statement. Add explicit 'Windows is not supported.' - FAQ macOS answer rewritten; new FAQ entry cross-links vezir for Mac-as-server deployments alongside the Android client. REQUIREMENTS.md: - Operating System section bumped to acknowledge macOS Apple Silicon support for the post-capture pipeline as of v0.6.0. No code changes. Co-authored-by: pretyflaco <kemal@blinkbtc.com>
Whisper's auto-detect on sparse / quiet audio (long opening silences, short initial utterances) is unreliable and can mis-classify English as ja/zh/ko, producing pages of CJK hallucinations interspersed with the correctly-transcribed English fragments. A real Blink dev-sync recorded on 2026-05-05 was rendered useless this way: the meeting was English but the .txt/.srt/.json/.summary.md/.pdf came out in Japanese hallucinations. Letting the user set the meeting language up-front via the GUI avoids the failure mode without forcing them to drop to the CLI. Adds a fourth row to the Advanced expander with a Language combobox. Values cover the alignment-models registry (en/de/fr/es/tr/fa) plus common Whisper-supported languages (it/pt/nl/ja/zh/ko/ar/ru) and the 'auto' default. Selection is written back into self._transcribe_kwargs on change, so the next recording's transcription picks it up. The CLI --language plumbing (already in place since PR #4) is unchanged; this PR only surfaces it in the GUI. Co-authored-by: pretyflaco <kemal@blinkbtc.com>
The CHANGELOG had a gap between v0.4.2 (2026-04-24) and v0.7.0
(2026-05-08) — three significant releases were never recorded:
v0.5.0 package split: capture moves to meetscribe-record
v0.6.0 Apple Silicon support (--asr-backend, --torch-device,
--mlx-model, auto-defaults). Includes PR #4 from @openoms.
v0.6.1 GUI Language dropdown — mitigates Whisper auto-detect
mis-classifying English audio as CJK on sparse audio
Content synthesized verbatim-where-possible from each GitHub release
body. Format matches the existing meetscribe CHANGELOG convention
(### Improvements / ### Fixes / ### PRs).
Also: README.md gains a one-line pointer to CHANGELOG.md under the
project intro for discoverability.
No code changes; pure docs.
Enhance transcription configuration by adding support for specifying the PyTorch device and integrating MLX Whisper for Apple Silicon, allowing users to choose between ASR backends.
btop evidence of Mac M1 GPU usage:
