Add support for Apple Silicon and PyTorch device configuration by openoms · Pull Request #4 · pretyflaco/millet

openoms · 2026-05-03T07:26:34Z

Enhance transcription configuration by adding support for specifying the PyTorch device and integrating MLX Whisper for Apple Silicon, allowing users to choose between ASR backends.

btop evidence of Mac M1 GPU usage:

…iguration

…backend options

openoms · 2026-05-03T07:29:56Z

@blink-claw-bot review

blink-claw-bot

PR Review: Add support for Apple Silicon and PyTorch device configuration

Nice addition — MLX Whisper support for Apple Silicon is a solid feature. The abstraction into _transcribe_asr is the right approach. Tests are well-structured. A few issues to address:

HIGH

1. Double model load in dual-channel mode (performance regression)

_transcribe_asr() loads and deletes the whisperx model on every call. In _transcribe_dual_channel, it's called twice (mic + system channel). Previously, the model was loaded once and reused for both channels.

This doubles model load time for dual-channel whisperx transcription — that's a multi-second regression on GPU and much worse on CPU.

Suggestion: Either:

Accept a pre-loaded model as an optional param to _transcribe_asr, or
Split into _load_asr_model() / _run_asr() so dual-channel can reuse the model, or
Cache the model within _transcribe_asr for the whisperx path (less clean)

2. whisperx remains a hard dependency even with MLX backend

The mono transcribe() function still calls whisperx.load_audio() unconditionally (line ~1008 in the new code). This means whisperx must be installed even when using MLX as the ASR backend. If someone installs only the [mlx] extra on Apple Silicon without whisperx, it will crash.

Is this intentional? If so, the README should clarify that whisperx is still required (MLX only replaces the transcription step, not audio loading/alignment/diarization). If not, load_audio needs an alternative path for MLX-only installs.

MEDIUM

3. VAD options silently ignored with MLX backend

_transcribe_asr() uses config.vad_onset / config.vad_offset for whisperx but silently ignores them for MLX. If a user passes --vad-onset / --vad-offset with --asr-backend mlx, they get no feedback that their settings have no effect.

Suggestion: Log a warning when VAD options are set but the MLX backend is active.

4. _empty_torch_cache swallows all exceptions

except Exception:
    pass

Per project conventions (and general best practice), catch-and-ignore is a code smell. For cache cleanup it's less critical, but at minimum log the exception at DEBUG level so failures are diagnosable.

5. Redundant double cache clear

After ASR in both mono and dual paths:

_empty_torch_cache(torch, config.device)
_empty_torch_cache(torch, config.torch_device)

When torch_device defaults to device (the common case), this clears the same cache twice. Add a guard:

_empty_torch_cache(torch, config.device)
if config.torch_device != config.device:
    _empty_torch_cache(torch, config.torch_device)

LOW / nits

6. nit: _transcribe_asr uses print() for status messages. Consistent with existing code, but worth noting for a future logging pass.

7. nit: _mlx_available() imports importlib.util inside the function. Since importlib.util is stdlib and always available, this could be a top-level import. The defensive try/except around it is unnecessary.

8. nit: Test test_transcribe_asr_normalizes_mlx_result passes "audio.wav" (a string) as the audio argument. In production, the MLX path in dual-channel receives a numpy array. Consider adding a test case with a numpy-like input to validate both code paths.

What's good

Clean separation of ASR backends via _transcribe_asr()
_empty_torch_cache properly handles MPS device
Auto-detection logic (_apple_silicon() + _mlx_available()) is sensible
MLX model alias mapping is complete and well-structured
Test coverage is solid: config validation, auto-detection, monkeypatched backend tests
README docs are clear and well-placed

Verdict: Request changes — the dual-channel model reload regression (#1) needs to be fixed before merge. The whisperx dependency story (#2) needs at least a documentation clarification.

🤖 Review by Claude Opus 4 via Blink-Claw-Bot

Summary: - reuse a single WhisperX ASR model for dual-channel transcription - document that MLX only replaces ASR while WhisperX remains required for audio loading, alignment, and diarization - warn when custom VAD settings are ignored by MLX - log torch cache cleanup failures and avoid duplicate cache clears - clean lint smells in the PR-touched CLI/transcription code Verification: - ruff check meet/transcribe.py meet/cli.py tests/test_transcribe.py - pytest tests/test_transcribe.py - pytest - git diff --check

openoms · 2026-05-03T09:26:49Z

@blink-claw-bot review with the new changes addressing the previous comments

pretyflaco

Verdict

Approve. Solid, well-scoped Apple Silicon / MLX integration with good test coverage. Happy to merge after one small follow-up below.

What's good

Clean ASR-backend abstraction in _transcribe_asr (meet/transcribe.py:771).
Dual-channel WhisperX model reuse fix (meet/transcribe.py:860) — addresses prior review feedback about double model load.
Torch device split for alignment/diarization throughout the pipeline (meet/transcribe.py:1076).
MLX model alias map and auto-detect logic (_apple_silicon, _mlx_available, _MLX_MODEL_ALIASES).
VAD-options-ignored warning under MLX backend (meet/transcribe.py:781).
README clarifies that MLX only replaces transcription; WhisperX remains required for audio loading, alignment, and diarization (README.md:176).
Tests cover MLX result normalization, MLX VAD warning, dual-channel WhisperX model reuse, and config validation (tests/test_transcribe.py).

One small ask before merge

On macOS / Apple Silicon, the CLI --device still defaults to cuda (meet/cli.py:208 and meet/cli.py:471). Combined with auto-MLX selection at meet/transcribe.py:432, a Mac user running meet transcribe file.wav with no flags will hit whisperx.load_align_model(..., device="cuda") (meet/transcribe.py:1076) and fail.

Could you add a short pre-flight notice in both the transcribe and run commands when _apple_silicon() is True and device == "cuda", suggesting --device cpu --torch-device mps? Roughly:

from meet.transcribe import _apple_silicon
if _apple_silicon() and device == "cuda":
    click.echo(
        "Note: detected Apple Silicon. CUDA is not available on macOS — "
        "re-run with `--device cpu --torch-device mps` for best results.",
        err=True,
    )

This keeps the "Apple Silicon support" claim honest without changing global defaults. It belongs in this PR's scope since it's part of the Apple Silicon UX you're adding.

Everything else (auto-defaulting device on Darwin, GUI exposure of --asr-backend / --torch-device / --mlx-model, CI workflow, runtime validation of torch_device, always-on MLX VAD note) we'll handle as separate follow-up issues on our side — no need to expand this PR.

Thanks for the contribution and the btop screenshot proof of M1 GPU usage.

PR #4 added --asr-backend, --torch-device, and --mlx-model to the transcribe and run CLI commands but did not propagate them to the GUI. Wire them through end-to-end and surface a collapsible 'Advanced' settings panel in the recorder widget so users can toggle them without restarting. CLI changes (meet gui command): - Add --asr-backend, --torch-device, --mlx-model options mirroring those on transcribe and run. - Forward them through gui() into launch(). launch() / transcribe_kwargs: - New asr_backend, torch_device, mlx_model parameters. - device default flips from 'cuda' to None to match the CLI (consistent with the issue #8 auto-default behavior; until that lands the CLI still passes through 'cuda' explicitly via the click default). - transcribe_kwargs dict carries the three new keys so MeetRecorderWindow reads them on every run. GUI: - New Gtk.Expander 'Advanced' panel with three controls: * ASR backend combobox (auto/whisperx/mlx) * Torch device combobox (auto/cuda/cpu/mps; 'auto' maps to None so the dataclass platform-detects) * MLX model entry (free-form text; empty string maps to None so the alias resolver picks based on --model) - Selections write back into self._transcribe_kwargs immediately so the next transcription picks them up. - Expander stays collapsed by default to preserve the compact widget UX. Tests: - New tests/test_gui.py verifies launch() threads the three new kwargs into the transcribe_kwargs dict, and that defaults resolve to None / 'auto'. - Skipped when PyGObject is unavailable. Closes #5 Co-authored-by: pretyflaco <kemal@blinkbtc.com>

PR #4 was reviewed and merged with no automated checks because the repo had no CI. Add a minimal MVP workflow so future PRs surface lint and focused unit-test results. Scope (intentionally narrow for first iteration): - Ubuntu latest, Python 3.12 only (matrix expansion is follow-up). - Minimal dependency install: ruff, pytest, numpy, click, reportlab, requests, and the package itself with --no-deps. Skips the [dev] extra to avoid pulling whisperx + torch (~700MB). - Ruff scoped to the actively-maintained files (transcribe, cli, test_transcribe, test_utils). Wider scope is tracked as a follow-up to this MVP. - Pytest runs the focused unit tests; the dual-channel dispatch test imports whisperx at runtime and is deselected. tests/test_cli.py is run only when present (lands with issue #8). Closes #9

PR #4 was reviewed and merged with no automated checks because the repo had no CI. Add a minimal MVP workflow so future PRs surface lint and focused unit-test results. Scope (intentionally narrow for first iteration): - Ubuntu latest, Python 3.12 only (matrix expansion is follow-up). - Minimal dependency install: ruff, pytest, numpy, click, reportlab, requests, and the package itself with --no-deps. Skips the [dev] extra to avoid pulling whisperx + torch (~700MB). - Ruff scoped to the actively-maintained files (transcribe, cli, test_transcribe, test_utils). Wider scope is tracked as a follow-up to this MVP. - Pytest runs the focused unit tests; the dual-channel dispatch test imports whisperx at runtime and is deselected. tests/test_cli.py is run only when present (lands with issue #8). Closes #9 Co-authored-by: pretyflaco <kemal@blinkbtc.com>

Bundles five issue-PR landings: - #5 Expose ASR backend, torch device, MLX model in GUI - #6 Always note that MLX backend ignores VAD options - #7 Validate torch device availability in TranscriptionConfig - #8 Auto-default device on Apple Silicon - #9 Minimal CI workflow (ruff + focused pytest) Built on PR #4 by openoms which added the underlying ASR backend abstraction and Apple Silicon MLX Whisper integration.

…link vezir (#15) The 'Linux only' framing in the README and REQUIREMENTS.md was accurate before v0.6.0 but is now misleading. PR #4 (openoms) and the v0.6.0 follow-ups added MLX Whisper ASR + MPS torch device support, so the post-capture pipeline (transcribe, label, sync, summarize) now works on macOS Apple Silicon. Audio capture (meet record / meet run) still requires Linux because meet_record.capture shells out to ffmpeg -f pulse against PulseAudio/PipeWire monitor sources. README: - Add CI status and PyPI version badges at the top. - Rewrite Requirements section to split Linux desktop (full pipeline) from macOS Apple Silicon (post-capture pipeline). - Update transcribe --device option help to note the auto-detected default introduced by issue #8. - Replace 'Linux only' Limitations bullet with a precise per-subcommand statement. Add explicit 'Windows is not supported.' - FAQ macOS answer rewritten; new FAQ entry cross-links vezir for Mac-as-server deployments alongside the Android client. REQUIREMENTS.md: - Operating System section bumped to acknowledge macOS Apple Silicon support for the post-capture pipeline as of v0.6.0. No code changes. Co-authored-by: pretyflaco <kemal@blinkbtc.com>

Whisper's auto-detect on sparse / quiet audio (long opening silences, short initial utterances) is unreliable and can mis-classify English as ja/zh/ko, producing pages of CJK hallucinations interspersed with the correctly-transcribed English fragments. A real Blink dev-sync recorded on 2026-05-05 was rendered useless this way: the meeting was English but the .txt/.srt/.json/.summary.md/.pdf came out in Japanese hallucinations. Letting the user set the meeting language up-front via the GUI avoids the failure mode without forcing them to drop to the CLI. Adds a fourth row to the Advanced expander with a Language combobox. Values cover the alignment-models registry (en/de/fr/es/tr/fa) plus common Whisper-supported languages (it/pt/nl/ja/zh/ko/ar/ru) and the 'auto' default. Selection is written back into self._transcribe_kwargs on change, so the next recording's transcription picks it up. The CLI --language plumbing (already in place since PR #4) is unchanged; this PR only surfaces it in the GUI. Co-authored-by: pretyflaco <kemal@blinkbtc.com>

@openoms

The CHANGELOG had a gap between v0.4.2 (2026-04-24) and v0.7.0 (2026-05-08) — three significant releases were never recorded: v0.5.0 package split: capture moves to meetscribe-record v0.6.0 Apple Silicon support (--asr-backend, --torch-device, --mlx-model, auto-defaults). Includes PR #4 from @openoms. v0.6.1 GUI Language dropdown — mitigates Whisper auto-detect mis-classifying English audio as CJK on sparse audio Content synthesized verbatim-where-possible from each GitHub release body. Format matches the existing meetscribe CHANGELOG convention (### Improvements / ### Fixes / ### PRs). Also: README.md gains a one-line pointer to CHANGELOG.md under the project intro for discoverability. No code changes; pure docs.

openoms added 2 commits May 3, 2026 07:56

feat: add support for specifying PyTorch device in transcription conf…

d7d67e1

…iguration

feat: add Apple Silicon support with MLX Whisper integration and ASR …

2066c1a

…backend options

openoms mentioned this pull request May 3, 2026

feat: add device and compute type configuration for meet transcribe to run the server on MacOS pretyflaco/vezir#1

Merged

blink-claw-bot suggested changes May 3, 2026

View reviewed changes

openoms requested a review from blink-claw-bot May 3, 2026 18:57

pretyflaco approved these changes May 4, 2026

View reviewed changes

pretyflaco merged commit 94458f9 into pretyflaco:main May 4, 2026

pretyflaco mentioned this pull request May 5, 2026

docs: reflect macOS Apple Silicon support; add CI/PyPI badges #15

Merged

pretyflaco mentioned this pull request May 5, 2026

feat(gui): add Language dropdown to Advanced settings panel #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Apple Silicon and PyTorch device configuration#4

Add support for Apple Silicon and PyTorch device configuration#4
pretyflaco merged 3 commits into
pretyflaco:mainfrom
openoms:feat--apple-silicon-support

openoms commented May 3, 2026 •

edited

Loading

Uh oh!

openoms commented May 3, 2026

Uh oh!

blink-claw-bot left a comment

Uh oh!

openoms commented May 3, 2026

Uh oh!

pretyflaco left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

openoms commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openoms commented May 3, 2026

Uh oh!

blink-claw-bot left a comment

Choose a reason for hiding this comment

PR Review: Add support for Apple Silicon and PyTorch device configuration

HIGH

MEDIUM

LOW / nits

What's good

Uh oh!

openoms commented May 3, 2026

Uh oh!

pretyflaco left a comment

Choose a reason for hiding this comment

Verdict

What's good

One small ask before merge

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openoms commented May 3, 2026 •

edited

Loading