Skip to content

v0.2.0

Choose a tag to compare

@fcwu fcwu released this 03 May 07:54
· 26 commits to main since this release

Release v0.2.0

Notable changes since v0.1.2:

CLI

  • aureka download: pre-fetch all model weights (Kokoro + faster-whisper)
  • aureka benchmark: measure ASR/TTS/LLM speed and write a Markdown report
  • aureka type --no-streaming: opt out of streaming
  • aureka ui (settings UI via pywebview), aureka type/speak --speed

ASR

  • VAD-segmented streaming via silero-vad: partial transcripts arrive while
    you are still talking
  • In refine/translate modes, partials are stderr-only and only the final
    refined text is injected into the cursor
  • Phase events (transcribing / finalizing / refining) so the client can
    show what the daemon is doing
  • faster-whisper model is now configurable via [asr] model in config.toml

TTS / Daemon

  • POST /speak endpoint: shared warm Kokoro pipeline across clients
  • POST /reload: re-read config without restarting the daemon
  • aureka speak is now daemon-aware and streams over HTTP when the daemon
    is up
  • System tray menu (pystray) with daemon controls
  • Cross-platform launch-at-login (macOS launchd, Windows Task Scheduler)

LLM refine

  • Stricter system prompt + few-shot examples to suppress reasoning prelude
  • Detect truncated-mid-thinking responses and fall back to raw transcript
  • max_tokens / thinking_budget knobs in [llm] config

BREAKING

  • TheWhisper backend removed (the PyPI package was a placeholder, the
    code path never actually loaded). [asr-thewhisper] extra dropped.
  • Default ASR model changed from large-v3 to medium. Override with
    [asr] model = "large-v3" if you want the old precision.
  • aureka.models.MODEL_REGISTRY (dict) replaced by model_registry() (fn).
  • aureka.device.resolve_asr_backend removed.