Skip to content

samuelokirby/splitstream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⑂ Splitstream

Splitstream

Realtime MacOS speech-to-text library for performant microphone and system audio transcription. Supports local model transcription and Deepgram. Perfect for meeting apps, note takers, or ambient computing.

How it works

Splitstream uses MacOS' Audio Tap API with Rust bindings from cidre to capture system output. Elementary echo cancellation powered by SpeexDSP keeps the mic channel clean even when audio is playing through speakers at negliglible performance cost.

The result is multithreaded and non-blocking dual transcription that runs seamlessly on low-end devices.

🤖 AI Disclaimer

The crucial pieces of this library (core audio pipeline, audio capture, echo cancellation, transcription backend) were designed and written by a human. Docstrings and the refactoring that turned into a usable library was made with Claude.

Minimum Requirements

  • macOS 14.2+: the audio tap API used for system audio capture only works for macOS 14.2 and beyond
  • Rust 1.80+
  • Xcode Command Line Tools: xcode-select --install
  • cmake: brew install cmake (required to compile audio dependencies)

Getting Started

🦜 Parakeet (local, free, fast)

Parakeet is the default backend. There is no API key, no cloud, runs entirely on your machine.

Step 1. Add splitstream to your project:

cargo add splitstream

Step 2. Download the Parakeet model and run download-parakeet (one-time setup, ~500MB):

cargo install splitstream --bin download-parakeet
download-parakeet

This downloads the three model files (encoder.onnx, decoder_joint.onnx, tokenizer.json) into ./models/parakeet-eou/ in your current directory.

Step 3. Use it in your Rust project:

# Cargo.toml
[dependencies]
splitstream = "0.1" # version may vary
tokio = { version = "1.52.3", features = ["full"] } # version may vary
use splitstream::{AudioSource, SplitStreamBuilder};

#[tokio::main]
async fn main() {
    let (handle, mut rx) = SplitStreamBuilder::new()
        .with_parakeet("models/parakeet-eou")
        .echo_cancellation(true)
        .start()
        .await
        .expect("failed to start splitstream");

    tokio::spawn(async move {
        tokio::signal::ctrl_c().await.ok();
        handle.shutdown();
    });

    while let Some(t) = rx.recv().await {
        let label = match t.source {
            AudioSource::Mic => "[🎤   Microphone]",
            AudioSource::Sys => "[🖥️ System Audio]",
        };
        println!("{} {}", label, t.text);
    }
}

☁️ Deepgram (cloud, premium, blazing fast)

Deepgram streams audio to the cloud and returns transcripts with very low latency. Requires a Deepgram API key.

Note: Deepgram pulls in Opus encoding which requires cmake: brew install cmake

Step 1. Add splitstream with the deepgram feature:

cargo add splitstream --features deepgram

Step 2. Set your API key (add to .env or export in your shell):

DEEPGRAM_API_KEY=your_key_here

Step 3. Use it in your Rust project:

# Cargo.toml
[dependencies]
splitstream = "0.1" # version may vary
tokio = { version = "1.52.3", features = ["full"] } # version may vary
use splitstream::{AudioSource, SplitStreamBuilder};

#[tokio::main]
async fn main() {
    let api_key = std::env::var("DEEPGRAM_API_KEY").expect("DEEPGRAM_API_KEY not set");

    let (handle, mut rx) = SplitStreamBuilder::new()
        .with_deepgram(api_key)
        .echo_cancellation(true)
        .start()
        .await
        .expect("failed to start splitstream");

    tokio::spawn(async move {
        tokio::signal::ctrl_c().await.ok();
        handle.shutdown();
    });

    while let Some(t) = rx.recv().await {
        let label = match t.source {
            AudioSource::Mic => "[🎤   Microphone]",
            AudioSource::Sys => "[🖥️ System Audio]",
        };
        println!("{} {}", label, t.text);
    }
}

Transcription Support

Backend Method Feature flag Notes
Parakeet .with_parakeet(model_dir) parakeet (default) Local ONNX, no API key needed
Deepgram .with_deepgram(api_key) deepgram Cloud, requires API key + cmake

To use Deepgram only (skips compiling Parakeet/ONNX):

splitstream = { version = "0.1", default-features = false, features = ["deepgram"] }

Mid-stream controls

The handle returned from .start() lets you toggle settings without stopping capture:

handle.set_mic_muted(true);          // silence the mic channel
handle.set_sys_muted(true);          // silence system audio (compliance mode)
handle.set_echo_cancellation(false); // toggle AEC on the fly
handle.shutdown();                   // stop everything cleanly

All of these take effect on the next 20ms tick and don't block.


Configure it via settings.toml in the working directory:

transcription_backend = "parakeet"   # "parakeet" | "deepgram"
echo_cancellation = true
compliance_mode_on_start = false     # start with sys audio muted

parakeet_model_dir = "./models/parakeet-eou"

# Required for Deepgram — or set DEEPGRAM_API_KEY in your environment
# api_key = "..."

Feature flags

Flag Default What it gates
parakeet ✅ on Parakeet ONNX inference (pulls in ORT + ONNX Runtime)
deepgram ❌ off Deepgram cloud backend (pulls in Opus, requires cmake)
whisper ❌ off WIP

License

Splitstream is dual-licensed:

  • Open source / individualsAGPL-3.0. Free to use, modify, and distribute, provided your application is also released under AGPL-3.0.
  • Commercial use — If you want to use Splitstream in a proprietary product without open-sourcing your code, a commercial license is required. Contact retaildiamond@gmail.com.

See LICENSE-COMMERCIAL for details.

About

Realtime MacOS speech-to-text Rust library for performant microphone and system audio transcription.

Resources

License

AGPL-3.0, Unknown licenses found

Licenses found

AGPL-3.0
LICENSE
Unknown
LICENSE-COMMERCIAL

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors