A Swift package for turning code-heavy, path-heavy, and markdown-heavy developer text into speech-safe text before it reaches a speech model.
TextForSpeech owns the text-conditioning layer that SpeakSwiftly consumes as an external package. It ships one semantic built-in core plus a selectable built-in style, then layers persisted custom profiles on top so callers can tune pronunciation without reimplementing the core normalization pipeline.
The package currently has three main responsibilities:
- normalize mixed text such as markdown, logs, CLI output, and prose with embedded code or identifiers
- normalize whole-source input through an explicit source lane
- persist and edit named custom profiles while keeping the built-in base layer always on
Speech models do poorly with raw developer text such as file paths, identifiers, markdown links, inline code, repeated separators, and repeated-letter runs. TextForSpeech centralizes those cleanup rules so the same behavior can be reused across callers instead of being reimplemented in app code or worker code.
TextForSpeech is a Swift Package Manager library product targeting iOS 17, macOS 14, and Swift 6 language mode.
During local development, add it to another package with a local path dependency:
dependencies: [
.package(path: "../TextForSpeech"),
],
targets: [
.executableTarget(
name: "ExampleApp",
dependencies: [
.product(name: "TextForSpeech", package: "TextForSpeech"),
]
),
]Then import TextForSpeech in the targets that need normalization or runtime-managed profile state.
Normalize mixed text directly when you want the default built-in .balanced style and an optional path context:
import TextForSpeech
let normalized = TextForSpeech.Normalize.text(
"stderr: /Users/galew/Workspace/SpeakSwiftly/Sources/SpeakSwiftly/WorkerRuntime.swift",
context: TextForSpeech.Context(
cwd: "/Users/galew/Workspace/SpeakSwiftly",
repoRoot: "/Users/galew/Workspace/SpeakSwiftly"
)
)If you omit format, TextForSpeech detects a likely outer text format before running the text normalization pipeline.
If you want a different shipped listening mode, pass style::
import TextForSpeech
let normalized = TextForSpeech.Normalize.source(
sourceText,
as: .swift,
style: .compact
)The shipped styles now differ in concrete coding-agent ways:
.compactassumes more visual context and says less. It drops the broad line-based spoken-code expansion and keeps common shapes terse, such asfoo()->foo,#123->123, and--help->help..balancedis the default general-purpose mode. It keeps spoken-code expansion for code-like lines and speaks common references more explicitly, such asfoo()->foo function,#123->issue 123, andWorkerRuntime.swift:42:7->Worker Runtime dot swift line 42 column 7..explicitis the audio-first mode. It keeps the same line-based spoken-code expansion as.balanced, but uses more narrated phrasing for common coding-agent shapes, such asfoo()->foo function call,#123->issue number 123, and--help->long flag help.
For repeated file paths in the same utterance, the text lane now also compacts repeated anchors before the built-in path-speaking pass. The first path still speaks normally, but later repeated mentions can collapse to shorter phrases such as same directory, Worker Runtime dot swift or same path instead of repeating the full spoken prefix.
When the outer document is mixed text but the embedded code language is known, pass nestedFormat so fenced or inline code can route through the source lane:
import TextForSpeech
let normalized = TextForSpeech.Normalize.text(
"""
Read this first:
```swift
let sampleRate = profile?.sampleRate ?? 24000
```
""",
format: .markdown,
nestedFormat: .swift
)Use the source lane when the whole input is a source file or editor buffer and the caller already knows the language:
import TextForSpeech
let normalized = TextForSpeech.Normalize.source(
"""
struct WorkerRuntime {
let sampleRate: Int
}
""",
as: .swift
)The source lane is explicit today but still generic. It normalizes whole-source input more consistently than the mixed-text lane, but SwiftSyntax-backed Swift-specific structure is still future roadmap work rather than current behavior.
The package also exposes one small lexical helper when a caller needs natural-language word tokenization:
import TextForSpeech
let words = TextForSpeech.Forensics.words(
in: "Please read /tmp/Thing and NSApplication.didFinishLaunchingNotification."
)Use TextForSpeech.Runtime when you need an observable owner for stored custom profiles, one active custom profile id, one selected built-in style, and default-on JSON-backed persistence in Application Support:
import TextForSpeech
let runtime = try TextForSpeech.Runtime(builtInStyle: .balanced)
try runtime.profiles.setBuiltInStyle(.compact)
try runtime.profiles.create(id: "logs", name: "Logs")
try runtime.profiles.add(
TextForSpeech.Replacement("stderr", with: "standard error", id: "stderr-rule"),
toProfileID: "logs"
)
try runtime.profiles.activate(id: "logs")
let normalized = TextForSpeech.Normalize.text(
"stderr and stdout",
customProfile: runtime.profiles.active(),
style: runtime.profiles.builtInStyle
)The runtime model is intentionally explicit:
TextForSpeech.Profile.semanticCoreis the always-on semantic built-in layer.TextForSpeech.Profile.builtInStyle(_:)returns one shipped style preset.TextForSpeech.Profile.builtInBase(style:)composessemanticCore + style preset.TextForSpeech.Profile.baseis the default.balancedbuilt-in base for convenience.TextForSpeech.Profile.defaultis the empty default custom profile value.runtime.profiles.builtInStyleis the currently selected shipped style preset.runtime.profiles.activeIDis the stored custom profile id currently selected by the runtime.runtime.profiles.active()is the raw active custom profile.runtime.profiles.effective()is alwaysbuiltInBase(style: builtInStyle) + active custom.runtime.profiles.stored(id:)reads a named stored custom profile without activating it.
Persistence is default-on. TextForSpeech.Runtime() writes to Application Support automatically, namespaced by the host bundle identifier when one is available and falling back to TextForSpeech when it is not. The selected built-in style is persisted alongside the active custom profile id and stored custom profiles.
The package source lives under Sources/TextForSpeech and is organized by responsibility:
-
API/Public namespace-first entrypoints such asNormalizeandForensics. -
Models/Core value types such asProfile,Replacement,Context, and the built-in profiles. -
Normalization/The text lane, source lane, structural markdown parsing, replacement-rule engine, speech helpers, and format detection. -
Runtime/Runtime ownership, grouped profile and persistence handles, persisted state, and runtime-facing errors. The current source split keeps structural normalization logic separate from durable lexical policy: -
structural work such as markdown parsing, code-span extraction, and format detection stays in code
-
durable lexical policy such as built-in aliases, identifier speaking, path speaking, URL speaking, repeated-letter-run handling, and style-specific speaking policy lives in the built-in profile layers
Tests live under Tests/TextForSpeechTests and are grouped by role:
Models/Normalization/Runtime/- top-level lexical helper coverage
Use the standard Swift package workflow from the repository root:
swift build
swift testThe baseline verification path for this repository is:
swift build
swift testFor release work or architectural refactors, also review the current roadmap in ROADMAP.md and the maintainer notes under docs/maintainers.
This project is licensed under the Apache License 2.0. See LICENSE for the full text.