Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
b7f61a1
feat: add multiple video option
natashaannn Apr 13, 2026
4c290a2
chore: reorganize remotion components dir
natashaannn Apr 14, 2026
cc4f5f1
feat: enhance nametitle lower third overlay
natashaannn Apr 14, 2026
029023b
feat: update conceptexplainer and codeblock
natashaannn Apr 14, 2026
644b02c
feat: update textoverlay
natashaannn Apr 14, 2026
1aecde8
fix: fix bugs in multiple video edit
natashaannn Apr 17, 2026
b2167df
feat: add visual video editor to app
natashaannn Apr 18, 2026
bb0e2d2
feat: enhance timeline in video editor
natashaannn Apr 19, 2026
926f29f
feat: add color grading
natashaannn Apr 19, 2026
ec40e66
feat: add guest review transcript generator
natashaannn Apr 19, 2026
9312670
fix: correct speaker diarization manually
natashaannn Apr 20, 2026
ddbc1ac
Merge branch 'feat/overlays' into feat/multiple-video
natashaannn Apr 20, 2026
4067c89
fix: overlay cue occuring on token not word
natashaannn Apr 20, 2026
37a9226
feat: add ragTech overlay
natashaannn Apr 20, 2026
8ae6865
feat: add chaptermarker
natashaannn Apr 20, 2026
ab97ec7
fix: multiangle and chaptermarker cue fixes
natashaannn Apr 20, 2026
b6abeec
feat: add episode images to outro and intro
natashaannn Apr 21, 2026
ef341a9
fix: fix hook overlays
natashaannn Apr 21, 2026
9dba65e
fix: hookoverlay has some overlaps
natashaannn Apr 21, 2026
475dcdd
fix: rendering issue
natashaannn Apr 21, 2026
a686a72
feat: add image and gif overlay with other fixes
natashaannn Apr 21, 2026
0d20bae
feat: add autogenerate thumbnail
natashaannn Apr 22, 2026
ac07be8
feat: enhance podcast thumbnail generator
natashaannn Apr 25, 2026
63701ef
fix: remotion not reading from transcript.json after reorg
natashaannn Apr 25, 2026
5bf487a
fix: build error
natashaannn Apr 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 29 additions & 12 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@

# next.js
/.next/
/out/

# production
/build
Expand All @@ -41,22 +40,38 @@ yarn-error.log*
*.tsbuildinfo
next-env.d.ts

# Media content
/public/sync/audio/*
/public/sync/output/*
/public/sync/video/*
/public/transcribe/input/*
/public/input/*
/public/sync/*
# ── Raw inputs (drop video/audio here before running the wizard) ───────────────
# Never committed — files are 1–10 GB
/input/

# comment this out if using agent to edit transcript
/public/transcribe/output/*
# ── Pipeline intermediates (large/generated media) ────────────────────────────
public/sync/ # synced video/audio working files
public/transcribe/input/ # audio extracted for Whisper
public/transcribe/output/ # raw Whisper JSON + VTT
public/thumbnail/ # candidate frames, cutouts, manifest
public/renders/ # final rendered .mp4 files
public/output/ # carousel exports

# ── Editable pipeline outputs (text/JSON) ─────────────────────────────────────
public/edit/ → transcript.doc.txt, transcript.json, SRT exports
public/camera/ → camera-profiles.json, frame snapshots, detections
#
# These are committed by default so collaborators (and coding agents) can read
# and edit the transcript. Coding agents like Claude access project context via
# git — committing these files lets the agent see, diff, and propose edits to
# the transcript doc without needing filesystem access.
#
# Uncomment to exclude (e.g. for a private or unreleased episode):
public/edit/*
public/camera/*

# Whisper.cpp binary and models (downloaded on first run)
/whisper.cpp/

# Claude Code
.claude/settings.local.json

# Python Environments
# Python environments
.envrc
.venv
env/
Expand All @@ -65,8 +80,10 @@ ENV/
env.bak/
venv.bak/

# Python Byte-compiled / optimized / DLL files
# Python bytecode
scripts/diarize/__pycache__/
*.py[codz]
*$py.class

# Editor
.vscode/settings.json
268 changes: 166 additions & 102 deletions AGENTS.md

Large diffs are not rendered by default.

91 changes: 67 additions & 24 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,96 @@
# RAG Tech Podcast — Project Context

## Show
**RAG Tech** is a biweekly tech podcast that explores real-life topics in tech. New episodes drop every other week.
**RAG Tech** biweekly tech podcast. Episodes drop every other week.

## Cohosts
| Name | Role | Image path |
|------|------|------------|
| Name | Role | Image |
|------|------|-------|
| Natasha | Software Engineer | `public/assets/team/natasha.PNG` |
| Saloni | Software Developer | `public/assets/team/saloni.PNG` |
| Victoria | Solutions Engineer | `public/assets/team/victoria.PNG` |

All cohost images have transparent backgrounds.

## Mascot
**Techybara** — a capybara mascot. Individual PNGs live in `public/assets/techybara/`.

## Brand
Brand config: `public/brand.json` (colors, typography, logo, shape radius).
Logo: `public/assets/logo/transparent-bg-logo.png`
Font: Nunito (variable, loaded via `remotion/loadFonts.ts`)
- Config: `public/brand.json` (colors, typography, logo, shape radius)
- Logo: `public/assets/logo/transparent-bg-logo.png`
- Font: Nunito (variable, loaded via `remotion/loadFonts.ts`)
- Mascot: **Techybara** (capybara) — PNGs in `public/assets/techybara/`

## Platforms
- **Audio/Video:** Spotify, YouTube, Apple Podcasts
- **Social:** Instagram, TikTok, LinkedIn
- **Handle:** `@ragtechdev` (same on all platforms)

## Tone
Fun and accessible — tech content that doesn't take itself too seriously.
Spotify · YouTube · Apple Podcasts · Instagram · TikTok · LinkedIn — handle `@ragtechdev`

## Key assets
| Asset | Path |
|-------|------|
| Intro/outro music | `public/sounds/intro-outro-music.mp3` |
| Background music (main) | `public/sounds/jazz-cafe-music.mp3` |
| Background music | `public/sounds/jazz-cafe-music.mp3` |
| Techybara images | `public/assets/techybara/` |
| Cohost photos | `public/assets/team/` |
| Logo | `public/assets/logo/` |

## Remotion compositions
| ID | Component | Notes |
|----|-----------|-------|
| `ragTechVodcast` | `MyComposition` | Full episode (hooks → intro → main video) |
| `PodcastIntro` | `PodcastIntroComposition` | Standalone 7 s intro (420 frames @ 60 fps) |
| `ragTechVodcast` | `MyComposition` | Full episode: hooks → intro → main video |
| `PodcastIntro` | `PodcastIntroComposition` | 7 s intro (420 frames @ 60 fps) |

## Pipeline overview
```
[sync] Audio ↔ video alignment → synced-output.mp4
[transcribe] Whisper.cpp → token-level timestamps
[diarize] Speaker turn detection
[assign-speakers] Labels segments with speaker names
[align] WhisperX forced alignment → populates token.t_end
[edit-transcript] Merges phrases into sentences → transcript.doc.txt
Human edits doc (cuts, corrections, hooks, camera cues)
[merge-doc] Applies doc edits → transcript.json
[setup-camera] Face detection + GUI → camera-profiles.json
Remotion transcript.json + camera-profiles.json → composed video
```

Intermediate files: `public/transcribe/output/`. Synced video: `public/sync/output/`.

## transcript.json key schema
```
meta
videoSrc?: string path relative to /public (overrides composition prop)
videoSrcs?: string[] all angle paths for multi-angle shoots
videoStart?: number source seconds — segments before excluded
videoEnd?: number source seconds — segments after excluded
fps: 60
segments[]
id, start, end, speaker, text, cut: boolean
tokens[]: { t_dtw, t_end?, text, cut }
cuts[]: [{ from, to }] intra-segment ranges to skip
hook? hookFrom?, hookTo? hook clip bounds
cameraCues[] explicit camera overrides (> CAM directives)
```

## Video pipeline overview
1. **Hooks** — selected transcript segments play first as teasers, with karaoke captions and the Techybara mascot overlay (`HookOverlay`).
2. **Intro** — `PodcastIntro` plays between hooks and the main episode content.
3. **Main episode** — full edited recording with optional camera punch-ins (`CameraPlayer`).
`token.t_end` is populated only after forced alignment — enables exact cut boundaries; without it, heuristic biases apply.

Forced alignment (`npm run align`) populates `token.t_end` (word-end boundary) alongside `token.t_dtw` (word start), enabling exact cut boundaries when words are marked for removal. Without it, cut boundaries fall back to heuristic bias constants. See `AGENTS.md` for full architecture details.
## camera-profiles.json key schema
```json
{
"sourceWidth": 1920, "sourceHeight": 1080,
"outputWidth": 1920, "outputHeight": 1080,
"wideViewport": { "cx": 0.5, "cy": 0.5, "w": 1, "h": 1 },
"angles": { // multi-angle only
"angle1": { "videoSrc": "sync/output/synced-output-1.mp4",
"sourceWidth": 1920, "sourceHeight": 1080 },
"angle2": { "videoSrc": "sync/output/synced-output-2.mp4",
"sourceWidth": 1920, "sourceHeight": 1080 }
},
"speakers": {
"Natasha": {
"label": "Natasha",
"angleName": "angle1", // multi-angle only
"closeupViewport": { "cx": 0.3, "cy": 0.4, "w": 0.35, "h": 0.35 },
"portraitCx": 0.3
}
}
}
```

Transcript editing scripts live in `scripts/`. Transcription pipeline is in `scripts/transcribe/`.
**Multi-angle rendering**: `CameraPlayer` stacks one `SegmentPlayer` per unique angle video, switches visibility via `opacity` at shot boundaries. Non-active layers are `muted`. All angles share the same jump-cut sections (cuts are audio-driven). See `AGENTS.md` for full architecture.
13 changes: 8 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,16 @@ COPY . .
RUN pip install --upgrade pip setuptools wheel && \
pip install -r scripts/diarize/requirements.txt && \
pip install whisperx faster-whisper && \
pip install -r scripts/camera/requirements.txt
pip install -r scripts/camera/requirements.txt && \
pip install -r scripts/thumbnail/requirements.txt && \
pip install "coverage>=7.0"

# Ensure public directories exist
RUN mkdir -p public/input/video public/input/audio \
# Ensure directories exist
RUN mkdir -p input/video input/audio \
public/sync/video public/sync/audio public/sync/output \
public/transcribe/input public/transcribe/output/raw public/transcribe/output/edit \
public/transcribe/output/camera
public/transcribe/input public/transcribe/output/raw \
public/edit public/camera public/thumbnail \
public/renders public/output

# Set environment variables
ENV PYTHON_PATH="/usr/local/bin/python"
Expand Down
Loading