Skip to content

Add support for Apple Silicon and PyTorch device configuration#4

Merged
pretyflaco merged 3 commits into
pretyflaco:mainfrom
openoms:feat--apple-silicon-support
May 4, 2026
Merged

Add support for Apple Silicon and PyTorch device configuration#4
pretyflaco merged 3 commits into
pretyflaco:mainfrom
openoms:feat--apple-silicon-support

Conversation

@openoms
Copy link
Copy Markdown
Contributor

@openoms openoms commented May 3, 2026

Enhance transcription configuration by adding support for specifying the PyTorch device and integrating MLX Whisper for Apple Silicon, allowing users to choose between ASR backends.

btop evidence of Mac M1 GPU usage:
image

@openoms
Copy link
Copy Markdown
Contributor Author

openoms commented May 3, 2026

@blink-claw-bot review

Copy link
Copy Markdown

@blink-claw-bot blink-claw-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add support for Apple Silicon and PyTorch device configuration

Nice addition — MLX Whisper support for Apple Silicon is a solid feature. The abstraction into _transcribe_asr is the right approach. Tests are well-structured. A few issues to address:


HIGH

1. Double model load in dual-channel mode (performance regression)

_transcribe_asr() loads and deletes the whisperx model on every call. In _transcribe_dual_channel, it's called twice (mic + system channel). Previously, the model was loaded once and reused for both channels.

This doubles model load time for dual-channel whisperx transcription — that's a multi-second regression on GPU and much worse on CPU.

Suggestion: Either:

  • Accept a pre-loaded model as an optional param to _transcribe_asr, or
  • Split into _load_asr_model() / _run_asr() so dual-channel can reuse the model, or
  • Cache the model within _transcribe_asr for the whisperx path (less clean)

2. whisperx remains a hard dependency even with MLX backend

The mono transcribe() function still calls whisperx.load_audio() unconditionally (line ~1008 in the new code). This means whisperx must be installed even when using MLX as the ASR backend. If someone installs only the [mlx] extra on Apple Silicon without whisperx, it will crash.

Is this intentional? If so, the README should clarify that whisperx is still required (MLX only replaces the transcription step, not audio loading/alignment/diarization). If not, load_audio needs an alternative path for MLX-only installs.


MEDIUM

3. VAD options silently ignored with MLX backend

_transcribe_asr() uses config.vad_onset / config.vad_offset for whisperx but silently ignores them for MLX. If a user passes --vad-onset / --vad-offset with --asr-backend mlx, they get no feedback that their settings have no effect.

Suggestion: Log a warning when VAD options are set but the MLX backend is active.

4. _empty_torch_cache swallows all exceptions

except Exception:
    pass

Per project conventions (and general best practice), catch-and-ignore is a code smell. For cache cleanup it's less critical, but at minimum log the exception at DEBUG level so failures are diagnosable.

5. Redundant double cache clear

After ASR in both mono and dual paths:

_empty_torch_cache(torch, config.device)
_empty_torch_cache(torch, config.torch_device)

When torch_device defaults to device (the common case), this clears the same cache twice. Add a guard:

_empty_torch_cache(torch, config.device)
if config.torch_device != config.device:
    _empty_torch_cache(torch, config.torch_device)

LOW / nits

6. nit: _transcribe_asr uses print() for status messages. Consistent with existing code, but worth noting for a future logging pass.

7. nit: _mlx_available() imports importlib.util inside the function. Since importlib.util is stdlib and always available, this could be a top-level import. The defensive try/except around it is unnecessary.

8. nit: Test test_transcribe_asr_normalizes_mlx_result passes "audio.wav" (a string) as the audio argument. In production, the MLX path in dual-channel receives a numpy array. Consider adding a test case with a numpy-like input to validate both code paths.


What's good

  • Clean separation of ASR backends via _transcribe_asr()
  • _empty_torch_cache properly handles MPS device
  • Auto-detection logic (_apple_silicon() + _mlx_available()) is sensible
  • MLX model alias mapping is complete and well-structured
  • Test coverage is solid: config validation, auto-detection, monkeypatched backend tests
  • README docs are clear and well-placed

Verdict: Request changes — the dual-channel model reload regression (#1) needs to be fixed before merge. The whisperx dependency story (#2) needs at least a documentation clarification.

🤖 Review by Claude Opus 4 via Blink-Claw-Bot

Summary:
- reuse a single WhisperX ASR model for dual-channel transcription
- document that MLX only replaces ASR while WhisperX remains required for audio loading, alignment, and diarization
- warn when custom VAD settings are ignored by MLX
- log torch cache cleanup failures and avoid duplicate cache clears
- clean lint smells in the PR-touched CLI/transcription code

Verification:
- ruff check meet/transcribe.py meet/cli.py tests/test_transcribe.py
- pytest tests/test_transcribe.py
- pytest
- git diff --check
@openoms
Copy link
Copy Markdown
Contributor Author

openoms commented May 3, 2026

@blink-claw-bot review with the new changes addressing the previous comments

@openoms openoms requested a review from blink-claw-bot May 3, 2026 18:57
Copy link
Copy Markdown
Owner

@pretyflaco pretyflaco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict

Approve. Solid, well-scoped Apple Silicon / MLX integration with good test coverage. Happy to merge after one small follow-up below.

What's good

  • Clean ASR-backend abstraction in _transcribe_asr (meet/transcribe.py:771).
  • Dual-channel WhisperX model reuse fix (meet/transcribe.py:860) — addresses prior review feedback about double model load.
  • Torch device split for alignment/diarization throughout the pipeline (meet/transcribe.py:1076).
  • MLX model alias map and auto-detect logic (_apple_silicon, _mlx_available, _MLX_MODEL_ALIASES).
  • VAD-options-ignored warning under MLX backend (meet/transcribe.py:781).
  • README clarifies that MLX only replaces transcription; WhisperX remains required for audio loading, alignment, and diarization (README.md:176).
  • Tests cover MLX result normalization, MLX VAD warning, dual-channel WhisperX model reuse, and config validation (tests/test_transcribe.py).

One small ask before merge

On macOS / Apple Silicon, the CLI --device still defaults to cuda (meet/cli.py:208 and meet/cli.py:471). Combined with auto-MLX selection at meet/transcribe.py:432, a Mac user running meet transcribe file.wav with no flags will hit whisperx.load_align_model(..., device="cuda") (meet/transcribe.py:1076) and fail.

Could you add a short pre-flight notice in both the transcribe and run commands when _apple_silicon() is True and device == "cuda", suggesting --device cpu --torch-device mps? Roughly:

from meet.transcribe import _apple_silicon
if _apple_silicon() and device == "cuda":
    click.echo(
        "Note: detected Apple Silicon. CUDA is not available on macOS — "
        "re-run with `--device cpu --torch-device mps` for best results.",
        err=True,
    )

This keeps the "Apple Silicon support" claim honest without changing global defaults. It belongs in this PR's scope since it's part of the Apple Silicon UX you're adding.

Everything else (auto-defaulting device on Darwin, GUI exposure of --asr-backend / --torch-device / --mlx-model, CI workflow, runtime validation of torch_device, always-on MLX VAD note) we'll handle as separate follow-up issues on our side — no need to expand this PR.

Thanks for the contribution and the btop screenshot proof of M1 GPU usage.

@pretyflaco pretyflaco merged commit 94458f9 into pretyflaco:main May 4, 2026
pretyflaco added a commit that referenced this pull request May 4, 2026
PR #4 added --asr-backend, --torch-device, and --mlx-model to the transcribe
and run CLI commands but did not propagate them to the GUI.  Wire them through
end-to-end and surface a collapsible 'Advanced' settings panel in the recorder
widget so users can toggle them without restarting.

CLI changes (meet gui command):
- Add --asr-backend, --torch-device, --mlx-model options mirroring those
  on transcribe and run.
- Forward them through gui() into launch().

launch() / transcribe_kwargs:
- New asr_backend, torch_device, mlx_model parameters.
- device default flips from 'cuda' to None to match the CLI (consistent with
  the issue #8 auto-default behavior; until that lands the CLI still passes
  through 'cuda' explicitly via the click default).
- transcribe_kwargs dict carries the three new keys so MeetRecorderWindow
  reads them on every run.

GUI:
- New Gtk.Expander 'Advanced' panel with three controls:
    * ASR backend combobox (auto/whisperx/mlx)
    * Torch device combobox (auto/cuda/cpu/mps; 'auto' maps to None so
      the dataclass platform-detects)
    * MLX model entry (free-form text; empty string maps to None so the
      alias resolver picks based on --model)
- Selections write back into self._transcribe_kwargs immediately so the
  next transcription picks them up.
- Expander stays collapsed by default to preserve the compact widget UX.

Tests:
- New tests/test_gui.py verifies launch() threads the three new kwargs into
  the transcribe_kwargs dict, and that defaults resolve to None / 'auto'.
- Skipped when PyGObject is unavailable.

Closes #5

Co-authored-by: pretyflaco <kemal@blinkbtc.com>
pretyflaco pushed a commit that referenced this pull request May 4, 2026
PR #4 was reviewed and merged with no automated checks because the repo
had no CI.  Add a minimal MVP workflow so future PRs surface lint and
focused unit-test results.

Scope (intentionally narrow for first iteration):
- Ubuntu latest, Python 3.12 only (matrix expansion is follow-up).
- Minimal dependency install: ruff, pytest, numpy, click, reportlab,
  requests, and the package itself with --no-deps.  Skips the [dev]
  extra to avoid pulling whisperx + torch (~700MB).
- Ruff scoped to the actively-maintained files (transcribe, cli,
  test_transcribe, test_utils).  Wider scope is tracked as a follow-up
  to this MVP.
- Pytest runs the focused unit tests; the dual-channel dispatch test
  imports whisperx at runtime and is deselected.  tests/test_cli.py is
  run only when present (lands with issue #8).

Closes #9
pretyflaco pushed a commit that referenced this pull request May 4, 2026
PR #4 was reviewed and merged with no automated checks because the repo
had no CI.  Add a minimal MVP workflow so future PRs surface lint and
focused unit-test results.

Scope (intentionally narrow for first iteration):
- Ubuntu latest, Python 3.12 only (matrix expansion is follow-up).
- Minimal dependency install: ruff, pytest, numpy, click, reportlab,
  requests, and the package itself with --no-deps.  Skips the [dev]
  extra to avoid pulling whisperx + torch (~700MB).
- Ruff scoped to the actively-maintained files (transcribe, cli,
  test_transcribe, test_utils).  Wider scope is tracked as a follow-up
  to this MVP.
- Pytest runs the focused unit tests; the dual-channel dispatch test
  imports whisperx at runtime and is deselected.  tests/test_cli.py is
  run only when present (lands with issue #8).

Closes #9
pretyflaco pushed a commit that referenced this pull request May 4, 2026
PR #4 was reviewed and merged with no automated checks because the repo
had no CI.  Add a minimal MVP workflow so future PRs surface lint and
focused unit-test results.

Scope (intentionally narrow for first iteration):
- Ubuntu latest, Python 3.12 only (matrix expansion is follow-up).
- Minimal dependency install: ruff, pytest, numpy, click, reportlab,
  requests, and the package itself with --no-deps.  Skips the [dev]
  extra to avoid pulling whisperx + torch (~700MB).
- Ruff scoped to the actively-maintained files (transcribe, cli,
  test_transcribe, test_utils).  Wider scope is tracked as a follow-up
  to this MVP.
- Pytest runs the focused unit tests; the dual-channel dispatch test
  imports whisperx at runtime and is deselected.  tests/test_cli.py is
  run only when present (lands with issue #8).

Closes #9
pretyflaco added a commit that referenced this pull request May 4, 2026
PR #4 was reviewed and merged with no automated checks because the repo
had no CI.  Add a minimal MVP workflow so future PRs surface lint and
focused unit-test results.

Scope (intentionally narrow for first iteration):
- Ubuntu latest, Python 3.12 only (matrix expansion is follow-up).
- Minimal dependency install: ruff, pytest, numpy, click, reportlab,
  requests, and the package itself with --no-deps.  Skips the [dev]
  extra to avoid pulling whisperx + torch (~700MB).
- Ruff scoped to the actively-maintained files (transcribe, cli,
  test_transcribe, test_utils).  Wider scope is tracked as a follow-up
  to this MVP.
- Pytest runs the focused unit tests; the dual-channel dispatch test
  imports whisperx at runtime and is deselected.  tests/test_cli.py is
  run only when present (lands with issue #8).

Closes #9

Co-authored-by: pretyflaco <kemal@blinkbtc.com>
pretyflaco pushed a commit that referenced this pull request May 4, 2026
Bundles five issue-PR landings:
- #5 Expose ASR backend, torch device, MLX model in GUI
- #6 Always note that MLX backend ignores VAD options
- #7 Validate torch device availability in TranscriptionConfig
- #8 Auto-default device on Apple Silicon
- #9 Minimal CI workflow (ruff + focused pytest)

Built on PR #4 by openoms which added the underlying ASR backend
abstraction and Apple Silicon MLX Whisper integration.
pretyflaco added a commit that referenced this pull request May 5, 2026
…link vezir (#15)

The 'Linux only' framing in the README and REQUIREMENTS.md was accurate
before v0.6.0 but is now misleading.  PR #4 (openoms) and the v0.6.0
follow-ups added MLX Whisper ASR + MPS torch device support, so the
post-capture pipeline (transcribe, label, sync, summarize) now works on
macOS Apple Silicon.  Audio capture (meet record / meet run) still
requires Linux because meet_record.capture shells out to ffmpeg -f pulse
against PulseAudio/PipeWire monitor sources.

README:
- Add CI status and PyPI version badges at the top.
- Rewrite Requirements section to split Linux desktop (full pipeline)
  from macOS Apple Silicon (post-capture pipeline).
- Update transcribe --device option help to note the auto-detected
  default introduced by issue #8.
- Replace 'Linux only' Limitations bullet with a precise per-subcommand
  statement.  Add explicit 'Windows is not supported.'
- FAQ macOS answer rewritten; new FAQ entry cross-links vezir for
  Mac-as-server deployments alongside the Android client.

REQUIREMENTS.md:
- Operating System section bumped to acknowledge macOS Apple Silicon
  support for the post-capture pipeline as of v0.6.0.

No code changes.

Co-authored-by: pretyflaco <kemal@blinkbtc.com>
pretyflaco added a commit that referenced this pull request May 5, 2026
Whisper's auto-detect on sparse / quiet audio (long opening silences,
short initial utterances) is unreliable and can mis-classify English as
ja/zh/ko, producing pages of CJK hallucinations interspersed with the
correctly-transcribed English fragments.  A real Blink dev-sync recorded
on 2026-05-05 was rendered useless this way: the meeting was English but
the .txt/.srt/.json/.summary.md/.pdf came out in Japanese hallucinations.

Letting the user set the meeting language up-front via the GUI avoids
the failure mode without forcing them to drop to the CLI.

Adds a fourth row to the Advanced expander with a Language combobox.
Values cover the alignment-models registry (en/de/fr/es/tr/fa) plus
common Whisper-supported languages (it/pt/nl/ja/zh/ko/ar/ru) and the
'auto' default.  Selection is written back into self._transcribe_kwargs
on change, so the next recording's transcription picks it up.

The CLI --language plumbing (already in place since PR #4) is unchanged;
this PR only surfaces it in the GUI.

Co-authored-by: pretyflaco <kemal@blinkbtc.com>
pretyflaco pushed a commit that referenced this pull request May 24, 2026
The CHANGELOG had a gap between v0.4.2 (2026-04-24) and v0.7.0
(2026-05-08) — three significant releases were never recorded:

  v0.5.0  package split: capture moves to meetscribe-record
  v0.6.0  Apple Silicon support (--asr-backend, --torch-device,
          --mlx-model, auto-defaults).  Includes PR #4 from @openoms.
  v0.6.1  GUI Language dropdown — mitigates Whisper auto-detect
          mis-classifying English audio as CJK on sparse audio

Content synthesized verbatim-where-possible from each GitHub release
body.  Format matches the existing meetscribe CHANGELOG convention
(### Improvements / ### Fixes / ### PRs).

Also: README.md gains a one-line pointer to CHANGELOG.md under the
project intro for discoverability.

No code changes; pure docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants