RunCast Intelligence

Semantic search across thousands of hours of running podcast transcripts. Ask a question in plain English — get answers with exact episode sources and timestamps.

Live: runcast-intelligence.vercel.app · Backend: Railway · Auth: Clerk (Google login)

What it does

Most podcast knowledge is locked behind episode titles and show notes. RunCast transcribes every episode with Whisper, embeds the content into a vector database, and lets you search across everything at once using natural language.

Ask: "How do elites taper for a marathon?" and get an answer synthesised from 4,000+ episodes, with the exact timestamp so you can jump straight to the source.

┌─────────────────────────────────────────────────────────────────┐
│                        User asks a question                      │
└────────────────────────────┬────────────────────────────────────┘
                             │
                    Embed query (OpenAI)
                             │
                             ▼
                  ┌─────────────────────┐
                  │  Supabase pgvector  │  ← similarity search
                  │  4,151 episodes     │
                  │  chunked + embedded │
                  └──────────┬──────────┘
                             │  top-k chunks with timestamps
                             ▼
                  ┌─────────────────────┐
                  │   LLM (OpenRouter)  │  ← RAG answer synthesis
                  └──────────┬──────────┘
                             │
                             ▼
              Answer + episode sources + timestamps

Architecture

RSS Feeds (9 podcasts, 4,151 episodes)
    │
    ▼
Crawler (scripts/crawl.py)
    │  stores episode metadata
    ▼
Supabase
    │
    ├── Transcription pipeline
    │       ffmpeg (split >25MB audio) → OpenAI Whisper → raw transcript
    │       stored in Supabase with chunk offsets
    │
    └── Embedding pipeline
            text-embedding-3-small → pgvector
            chunk size: 500 tokens, 50-token overlap

FastAPI (src/api/)
    ├── POST /search  →  embed query → pgvector → LLM → response
    └── GET  /health

Next.js frontend
    ├── Public homepage
    └── Search (Clerk auth required)

Podcasts indexed

Podcast	Host
The Running Explained Podcast	Elisabeth Scott
Ali on the Run Show	Ali Feller
The Strength Running Podcast	Jason Fitzgerald
The CITIUS MAG Podcast	Chris Chavez
The Morning Shakeout Podcast	Mario Fraioli
Run to the Top	Runners Connect
Some Work, All Play	David & Megan Roche
Real Talk Running	—
The Planted Runner	—

Tech stack

Layer	Technology
Backend	Python · FastAPI
Database	Supabase (PostgreSQL + pgvector)
Transcription	OpenAI Whisper (with ffmpeg chunking for >25MB files)
Embeddings	OpenAI text-embedding-3-small
LLM	OpenRouter
Frontend	Next.js · TypeScript · Tailwind
Auth	Clerk (Google login)
Backend hosting	Railway
Frontend hosting	Vercel

Local setup

Prerequisites

Python 3.11+
Node 20+
ffmpeg (brew install ffmpeg)
A Supabase project (free tier)
An OpenAI API key
An OpenRouter API key

1. Clone and install

git clone https://github.com/lmenta/runcast-intelligence
cd runcast-intelligence
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

2. Environment variables

cp .env.example .env

Fill in .env:

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_KEY=your-service-role-key
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=sk-or-...

3. Set up the database

Open the Supabase SQL editor and run both migrations in order:

# Copy the contents of each file and run in Supabase SQL editor
supabase/migrations/001_initial_schema.sql
supabase/migrations/002_add_transcript.sql

This creates the episodes, chunks, and podcasts tables with pgvector enabled.

4. Seed podcasts and crawl RSS feeds

make setup        # seeds 9 podcasts and crawls all RSS feeds
make check-feeds  # verify all feeds are reachable

This populates the episodes table with metadata (title, date, audio URL) but no transcripts yet.

5. Transcribe episodes

make transcribe   # transcribes 3 episodes (~$0.15 in OpenAI credits)

For large audio files (>25MB), the pipeline automatically splits them with ffmpeg before sending to Whisper.

6. Generate embeddings

make embed   # chunks transcripts and stores embeddings in pgvector

7. Test search

make search
# Query: how do elites taper for a marathon?

8. Run locally

make api   # FastAPI on http://localhost:8000
make dev   # Next.js on http://localhost:3000 (in a second terminal)

Deployment

Backend → Railway

Connect this repo to Railway
Add all environment variables from .env
Railway picks up railway.toml automatically — no extra config

Frontend → Vercel

Connect the frontend/ directory to Vercel
Add environment variables:
- NEXT_PUBLIC_API_URL — your Railway backend URL
- NEXT_PUBLIC_USE_MOCK=false
- Clerk keys (NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY, CLERK_SECRET_KEY)

Transcription at scale → Modal

For processing large backlogs without keeping a local machine running:

pip install modal
modal secret create runcast \
  SUPABASE_URL=... \
  SUPABASE_SERVICE_KEY=... \
  OPENAI_API_KEY=...
modal deploy src/transcription/modal_worker.py

Deploys a serverless worker that transcribes new episodes on GPU. Cost: ~$0.05/hour of audio.

Cost estimate (low traffic)

Service	Cost
Supabase	Free tier
Railway	~$5/month
Vercel	Free
OpenAI (embeddings)	~$0.02/episode
OpenAI (Whisper)	~$0.006/minute of audio
OpenRouter (search)	~$0.001/query

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
frontend		frontend
scripts		scripts
src		src
supabase/migrations		supabase/migrations
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
railway.toml		railway.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RunCast Intelligence

What it does

Architecture

Podcasts indexed

Tech stack

Local setup

Prerequisites

1. Clone and install

2. Environment variables

3. Set up the database

4. Seed podcasts and crawl RSS feeds

5. Transcribe episodes

6. Generate embeddings

7. Test search

8. Run locally

Deployment

Backend → Railway

Frontend → Vercel

Transcription at scale → Modal

Cost estimate (low traffic)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RunCast Intelligence

What it does

Architecture

Podcasts indexed

Tech stack

Local setup

Prerequisites

1. Clone and install

2. Environment variables

3. Set up the database

4. Seed podcasts and crawl RSS feeds

5. Transcribe episodes

6. Generate embeddings

7. Test search

8. Run locally

Deployment

Backend → Railway

Frontend → Vercel

Transcription at scale → Modal

Cost estimate (low traffic)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages