Shared provider abstractions and frontend scaffolding for desktop AI applications.
This package supplies the common primitives consumed by Bard (TTS) and Scribe (STT) so each app can focus on its domain-specific logic rather than re-implementing the shared shell. It is intentionally small, dependency-free at the core, and designed to be pulled in as a git dependency.
desktop_ai_core.providers—Backend/TTSBackend/STTBackendabstract bases,VoiceandLanguageModeldataclasses, a tiny register/probe/get registry shared between apps, and aformat_openai_errorhelper for turningopenaiSDK exceptions into user-facing(title, message)tuples.desktop_ai_core.frontends—AbstractFrontendApplifecycle base, a terminal menu mini-framework (Menu,Item,SetValueItem), aMultiStateTrayIcondriver for pystray-style icons (plusflag_for, PID-file helpers, and aregister_signal_toggleshim), and a thread-safeshow_error_dialogbuilt on Tk.desktop_ai_core.install—install_desktop_file, an XDG.desktopentry writer for Linux desktop integration.
pip install desktop-ai-coreOr pin from a consumer project's pyproject.toml:
[project]
dependencies = [
"desktop-ai-core>=0.1",
]If you need an unreleased commit, you can also pull it straight from git:
"desktop-ai-core @ git+https://github.com/perrette/desktop-ai-core.git@<sha>",For local development against a checkout, use an editable install:
pip install -e /path/to/desktop-ai-coreRequires Python 3.9+. The core has no runtime dependencies; optional
features pull their dependencies from the consumer (e.g. openai for
format_openai_error, pystray + PIL for MultiStateTrayIcon, tkinter
for show_error_dialog).
TTSBackend and STTBackend are abstract base classes that consumer code
subclasses to wrap a concrete service (OpenAI, ElevenLabs, faster-whisper,
Groq, …). They expose a small, stable surface:
from pathlib import Path
from desktop_ai_core.providers import TTSBackend, STTBackend, Voice
class MyTTS(TTSBackend):
name = "mytts"
default_voice = "alloy"
default_model = "tts-1"
output_format = "mp3"
sample_rate = 24_000
supports_streaming = False
def synthesize(self, text: str, out_path: Path) -> Path: ...
def list_voices(self) -> list[str]: ...
# optional: list_voices_meta() -> list[Voice], list_models(), synthesize_stream(text)
class MySTT(STTBackend):
name = "mystt"
default_model = "whisper-1"
def transcribe(self, audio_path: Path) -> str: ...Class-level is_local: bool and install_hint: str | None let consumers
group local vs. cloud backends in menus and surface install instructions when
a backend is missing.
Voice and LanguageModel are frozen dataclasses that backends can return
from their listing methods to give the UI richer metadata than bare ids.
A pair of process-global registries — one for TTS, one for STT — let apps discover backends without hard-wiring imports:
from desktop_ai_core.providers import (
register_tts, get_tts, available_tts, probe_tts,
register_stt, get_stt, available_stt, probe_stt,
)
def _probe() -> tuple[bool, str | None]:
try:
import openai # noqa
return True, None
except ImportError:
return False, "pip install openai"
register_tts("openai", MyTTS, probe=_probe)
backend = get_tts("openai", api_key=...) # instantiates with kwargs
names = available_tts() # ["openai", ...]
ok, hint = probe_tts("openai") # availability checkThe probe callable is optional; backends without one are assumed available.
Registration is typically done at import time inside each backend's module.
format_openai_error(exc) maps an openai.*Error to a (title, message)
tuple suited for show_error_dialog. It distinguishes
AuthenticationError, PermissionDeniedError, RateLimitError (with a
dedicated "Credits exhausted" branch for insufficient_quota),
APIConnectionError, and BadRequestError, falling back to a generic
API error (<ClassName>) for everything else.
A minimal lifecycle base that holds a params dict, a view reference, a
logger, and an error_callback. Helpers cover the common menu-driven
patterns: set_param / get_param / checked / callback_toggle_option,
plus notify_error(title, message) which logs the error and fans out to the
callback (wrapped in try/except so a broken UI never takes the app down).
set_audioplayer is a no-op hook subclasses with audio concerns override.
A small text-mode menu framework that mirrors the structure of a tray-icon
menu so the same Item graph can drive both:
Item(name, callback, checked=None, checkable=False, visible=True, help="")— leaf action;checkedandvisiblemay be callables for live state.SetValueItem(name, callback, value=None, choices=None, type=None, ...)— prompts for input, validates againsttype/choices, then invokes the callback.Menu(items, name=None, help="")— renders a numbered list and loops until the user entersq/quitor an item callback returnsFalse.
MultiStateTrayIcon(icon, images, get_state, poll_interval=0.1)— wraps a pystray-compatible icon and a{state_name: PIL.Image}map; callupdate()(orstart_monitoring(should_continue)from a background thread) and the icon swaps images wheneverget_state()returns a new value, then callsicon.update_menu()so visibility predicates re-evaluate. UseNoneas the dict key for the idle state.flag_for(language)— emoji flag for a BCP-47 language tag (en-US, fr-FR, …), empty string for unknown.write_pidfile(name)/remove_pidfile(name)— write/remove$XDG_RUNTIME_DIR/<name>.pid(falling back to/tmp), with0o600permissions.remove_pidfilesilently ignores a missing file.register_signal_toggle(signal_number, callback)— install a signal handler that invokescallback(), degrading to a DEBUG log on platforms where the signal is unavailable (instead of raising).
show_error_dialog(title, message) pops a modal Tk messagebox.showerror
from a fresh daemon thread, so it is safe to call from anywhere — including
while a pystray/GTK main loop owns the main thread. If Tk is unavailable
the call falls back to a stderr-style print rather than raising.
install_desktop_file(template, name, icon_folder, bin_folder, terminal, startup_wm_class, options="") renders a .desktop entry template and
writes it to $XDG_DATA_HOME/applications/ (default
~/.local/share/applications/). The template is a normal str.format
string with placeholders {icon_folder}, {bin_folder}, {name},
{terminal}, {StartupWMClass}, and {options}. The function raises
NotImplementedError on non-Linux platforms — macOS/Windows packaging is
out of scope for now.
- Scribe — desktop speech-to-text. Registers STT backends
(faster-whisper, Groq, OpenAI) against
register_sttand subclassesAbstractFrontendAppfor the tray app. - Bard — desktop text-to-speech. Registers TTS backends against
register_tts; same frontend scaffolding.
Both projects depend on desktop-ai-core and treat its public surface
(everything re-exported from desktop_ai_core.providers.__init__ and
desktop_ai_core.frontends.__init__) as the stable contract.
- Zero runtime deps in the core. Optional features (openai error formatting, pystray icons, Tk dialogs) import their dependencies lazily so a consumer that does not use them does not pay for them.
- Registry over plugins. Backends self-register at import time; the
registry is just a
dict, with optionalprobecallables so the UI can show install hints instead of crashing on missing extras. - Frontend-agnostic state.
AbstractFrontendAppholds parameters and a view reference but knows nothing about pystray, Tk, or the terminal — the same app object drives tray and terminal frontends. - Linux first. Desktop-file installation and PID/signal helpers target
Linux; structures that could in principle work elsewhere
(
MultiStateTrayIcon,show_error_dialog) do, but are not actively exercised on macOS/Windows.
MIT — see LICENSE.