Thinking in Code is a local-first system that turns research PDFs into a cited, listener-ready podcast script with optional narration and audio artifacts. The API registers uploads and creates a job. A worker processes the job through extraction, chunking, script generation, and optional TTS/audio assembly. All outputs are written to data/processed/<job_id>/ and progress is tracked in data/jobs/<job_id>/status.json (streamed via SSE).
- Evidence-aware script generation with per-segment citations.
- Optional retrieval (local vectors, Chroma, or FAISS) for grounding.
- Pluggable LLM and TTS providers (Ollama, OpenRouter, pyttsx3, Piper, Coqui, Minimax).
- Audio assembly with loudness normalization and QA metadata.
- Web Studio for uploads, status tracking, artifact downloads, and feedback.
- Extract text from PDFs with page-level citation metadata.
- Normalize and chunk text while preserving citations.
- Generate a structured episode script with guardrails and evidence checks.
- Optionally embed/chunk for retrieval.
- Optionally synthesize audio and assemble the final mix.
Common artifacts written under data/processed/<job_id>/:
script.md(narration script)transcript.txtandtranscript.srtquality.json/quality.log(evidence + retention checks)job_metrics.json(stage timings)episode.mp3(when audio is enabled)audio_metadata.json/audio_quality.json(when audio is enabled)
- API service (
services/api): upload, job creation, SSE progress, artifact access. - Worker service (
services/worker): pipeline execution and artifact generation. - Core engine (
libs/podcast): clean-architecture pipeline + adapters. - Shared contracts (
libs/contracts): stable schemas across services. - Prompts (
prompts/): LLM prompt templates. - Data (
data/): runtime storage (jobs, processed outputs, research).
| Path | Purpose |
|---|---|
services/api |
FastAPI service (job submit/status, SSE) |
services/worker |
Worker pipeline (PDF -> script -> TTS -> audio) |
libs/contracts |
Shared Pydantic models |
libs/podcast |
Core engine (domain/application/infrastructure) |
prompts/ |
Prompt templates |
scripts/ |
Utilities (e.g., Redis enqueue helper) |
data/ |
Runtime: jobs, processed outputs, research |
Dockerfile.api, Dockerfile.worker |
Container builds |
docker-compose.yaml |
Local stack (API + worker + Redis + Ollama) |
The Studio UI is served at / from the API service. It supports upload, job status, artifact downloads, and feedback submission. If API_KEY is set, enter it in the Studio panel to authorize requests.
GET /healthPOST /v1/jobs(multipart:file,language,style,target_minutes,target_seconds)GET /v1/jobs/{job_id}/statusGET /v1/jobs/{job_id}/progress(SSE)GET /v1/jobs/{job_id}/resultGET /v1/jobs/{job_id}/artifacts(list artifacts + download URLs)GET /v1/jobs/{job_id}/artifacts/{artifact_name}(download)GET /v1/metrics/summary(queue depth, stage timings, audio success)POST /v1/feedback
Copy the example env files and keep queue settings aligned between API and worker:
cp services/api/.env.example services/api/.env
cp services/worker/.env.example services/worker/.env
Key settings:
- Queue:
QUEUE_MODE(dir,file,redis),QUEUE_REDIS_URL,QUEUE_REDIS_KEY - LLM:
OLLAMA_BASE_URL,OLLAMA_MODEL,OLLAMA_NUM_PREDICT,MAX_CONTEXT_CHARS - Script provider (optional):
SCRIPT_PROVIDER=openrouter,OPENROUTER_API_KEY,OPENROUTER_MODEL - TTS/audio:
ENABLE_TTS,ENABLE_AUDIO,TTS_PROVIDER,AUDIO_FORMAT,AUDIO_TARGET_DBFS - Narration personalization:
VOICE_PROFILE_MAP,SEGMENT_EMPHASIS_MAP,PACING_PRESET(orPACING_MODE) - Voice mapping:
TTS_VOICE_MAP,PIPER_SPEAKER_MAP,COQUI_SPEAKER_MAP - Metrics:
METRICS_ENABLED,METRICS_PORT
Optional retrieval extras:
- Install with
pip install -e "libs/podcast[retrieval]"to enable Chroma and sentence-transformers. - Optional lock files:
libs/podcast/requirements.optional.lock(Python 3.11) andlibs/podcast/requirements.optional.py312.lock(Python 3.12).
Local-only data:
- Keep runtime outputs under
data/out of version control (the repo.gitignorealready covers them).
- Prereqs: Python 3.11+, Ollama running, FFmpeg, optional tesseract.
- Install:
python3 -m venv .venv && source .venv/bin/activate pip install -U pip pip install -e libs/contracts -e libs/podcast -e services/api -e services/worker
- Run API:
uvicorn podcastify_api.main:app --reload --port 8000
- Run worker:
python -m podcastify_worker.main
- Submit a job:
curl -F "file=@/path/to/paper.pdf" -F "language=en" -F "style=everyday" -F "target_minutes=8" http://localhost:8000/v1/jobs
- Stream progress:
curl -N http://localhost:8000/v1/jobs/<job_id>/progress
docker-compose up --build -d
docker exec -it ollama ollama pull llama3:instruct
# Optional fallback/smaller: docker exec -it ollama ollama pull llama3:8b
# Optional embeddings: docker exec -it ollama ollama pull nomic-embed-text- API: http://localhost:8000 (set
API_KEYto require a key) - Worker: pulls jobs from Redis sorted set
- Outputs:
data/processed/<job_id>/(script, quality, metrics, audio if enabled)
PYTHONPATH=libs/podcast/src:libs/contracts/src pytest -q