This document explains how to turn text into a set of audio files delivered through a self-hosted podcast feed using GitHub Pages.
The workflow:
- Extract text from a PDF
- Convert text → speech using macOS
say - Convert AIFF → M4A
- Split audio into chapters or segments
- Repair timing/index issues in the audio container
- Build an RSS feed
- Host on GitHub Pages
- Subscribe with any podcast app
If the PDF is selectable text:
pdftotext input.pdf output.txtIf the PDF contains scanned images:
ocrmypdf input.pdf ocr_output.pdf
pdftotext ocr_output.pdf output.txtsay -v "jamie" -o full.aiff -f output.txtList voices:
say -v '?'afconvert -f m4af -d aac full.aiff full.m4aEven time segments:
ffmpeg -i full.m4a -f segment -segment_time 1200 -c copy part_%03d.m4aSpecific timestamps:
ffmpeg -i full.m4a -ss 00:00:00 -to 00:25:00 -c copy seg_000.m4a
ffmpeg -i full.m4a -ss 00:25:00 -to 00:50:00 -c copy seg_001.m4a
ffmpeg -i full.m4a -ss 00:50:00 -c copy seg_002.m4aLossless remux:
ffmpeg -i seg_000.m4a -c copy -movflags +faststart fixed_000.m4aBatch:
mkdir fixed
for f in seg_*.m4a; do
ffmpeg -i "$f" -c copy -movflags +faststart "fixed/$f"
doneIf broken, re-encode:
ffmpeg -i seg_000.m4a -c:a aac -b:a 96k -movflags +faststart fixed_000.m4aExample structure:
podcast/
<obfuscated-id>/
seg_000.m4a
seg_001.m4a
rss.xml
RSS template:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>My Private Audio Feed</title>
<link>https://USERNAME.github.io/REPO/podcast/<id>/</link>
<description>Private feed</description>
<language>en-us</language>
<item>
<title>Segment 000</title>
<enclosure url="https://USERNAME.github.io/REPO/podcast/<id>/seg_000.m4a"
type="audio/x-m4a"
length="12345678" />
<guid>unique-guid-here</guid>
<pubDate>Tue, 01 Jan 2030 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>Use MIME type audio/x-m4a.
git init
git add .
git commit -m "initial"
git remote add origin git@github.com:USERNAME/REPO.git
git branch -M main
git push -u origin mainEnable Pages:
Settings → Pages
- Source: Deploy from a branch
- Branch:
main - Folder:
/
Feed URL:
https://USERNAME.github.io/REPO/podcast/<id>/rss.xml
Add podcast by URL → paste RSS feed.
If audio doesn’t play:
- Ensure enclosure MIME type is
audio/x-m4a - Verify file loads in a browser
- Fix timing using ffmpeg (Section 5)
This README covers:
- Extracting text from PDFs
- Converting text → audio
- Splitting audio
- Repairing timing issues
- Generating an RSS feed
- Publishing via GitHub Pages
- Subscribing through a podcast app
Automatable via scripts or Makefiles as needed.