Skip to content

mdaiter/rust-node-transcription

Repository files navigation

test_speech_transcription

This repo wires a macOS Speech framework bridge (Swift) into a Rust library that exposes a Node.js addon for live microphone transcription. The native-live.js script streams mic audio into Rust, forwards it to the Swift analyzer, and prints the transcripts coming back from the native callback.

The flow only targets macOS Sonoma (15) and newer because Apple’s asynchronous SpeechAnalyzer APIs require those SDKs.

Prerequisites

  • macOS 15 (Sonoma) or newer with Xcode 16 toolchain (provides the 26.0 SDK referenced below)
  • Rust toolchain (rustup or Homebrew)
  • Node.js 18+
  • Swift compiler (swiftc) from Xcode command-line tools

First install the Node dependencies (only needed once):

npm install

1. Build the Swift speech shim

Compile the Swift bridge against the macOS 26.0 SDK. We drop the artifacts into the Rust target/release directory so the Node addon can dlopen it without extra steps.

mkdir -p target/release swift-module-cache
swiftc -module-cache-path swift-module-cache \
  -emit-library -emit-module -module-name SpeechShim \
  -o target/release/libSpeechShim.dylib swift/SpeechShim.swift \
  -framework Speech -framework AVFoundation -framework CoreMedia \
  -target arm64-apple-macos26.0

That command emits:

  • target/release/libSpeechShim.dylib
  • target/release/SpeechShim.swiftmodule

The Rust layer loads libSpeechShim.dylib at runtime, so keeping it beside the Rust artifacts simplifies things.

2. Build the Rust/Node addon

Build the addon in release mode (it produces target/release/whisper_local.node):

cargo build --release

If you already built, rerun whenever Rust sources change or after rebuilding Swift so the linker can see the updated dylib.

3. (Optional) Copy products elsewhere

If you want to package the outputs, copy both the Swift dylib and the Node addon together, e.g.:

cp target/release/libSpeechShim.dylib target/release/whisper_local.node /path/to/app/resources/

When running from the repo you can skip this because the next step points DYLD_LIBRARY_PATH at target/release.

4. Run the live transcription script

Point the dynamic loader at the release artifacts and start the Node script. By default it uses the macOS speech analyzer backend (WHISPER_MODE=os).

DYLD_LIBRARY_PATH=target/release \
  WHISPER_MODE=os \
  node native-live.js
  • WHISPER_MODE=os tells Rust to use the Swift Speech bridge.
  • native-live.js captures mono 16 kHz audio via mic, converts to float samples, and pushes them into Rust.
  • The Node console prints mic-buffer diagnostics plus [os] transcription: … lines from the Swift callback.

Troubleshooting

  • If you see sa2_push_pcm_f32 return code=-2, the Swift shim rejected the audio format. The current build expects 16 kHz mono; rebuild the Swift shim and verify the logs ([SpeechShim] expected audio format …) match your mic configuration.
  • If Node cannot load libSpeechShim.dylib, ensure DYLD_LIBRARY_PATH includes the directory containing the dylib or copy it next to whisper_local.node.

Directory map

  • src/lib.rs – Rust entry point; handles the Node bridge and forwards mic buffers to Swift.
  • src/speech_analyzer.rs – Rust FFI bindings to the Swift speech shim.
  • swift/SpeechShim.swift – Async Swift host wrapping Apple’s SpeechAnalyzer/SpeechTranscriber.
  • native-live.js – Demo Node script that captures mic audio and prints transcription callbacks.

Feel free to swap WHISPER_MODE to whisper to experiment with the ONNX model path once macOS transcription is flowing.

About

Transcribe using the OS backend using Rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors