This project is a comprehensive tool for create useful artifacts for podcast episodes. It automates the following tasks:
- Transcribing audio to text using Whisper
- Extracting links from episode descriptions
- Generating timestamped chapters based on discussed links
- Creating SRT files for subtitles
The project is designed to handle multiple podcasts and episodes, keeping track of processed content to avoid duplication.
run_this_first.zsh: Initial setup scriptapp.py: Main application scriptpodcastFetcher.py: Handles podcast episode downloading and processingtranscriptToSRT.py: Converts transcripts to SRT formatscrapeLinks.py: Extracts links from episode descriptionsgenerateDescriptionChaptersV2.py: Generates timestamped chapters
-
Clone this repository:
git clone https://github.com/your-username/podcast-processing-project.git cd podcast-processing-project -
Install required dependencies:
pip install -r requirements.txt -
Set up environment variables: Create a
.envfile in the project root and add your Anthropic API key:ANTHROPIC_API_KEY=your_api_key_here
-
Run the initial setup script:
chmod +x run_this_first.zsh ./run_this_first.zshThis will add a allMappedChapterPodcasts.json file with a default podcast. Replace the details of this default podcast to process a new podcast. NOTE: This will run continuously until you kill it. Need to add a limit value still. Also, the processed_episodes.json file in each podcast folder will track the GUID of each processed podcast so it isn't processed again unnecessarily.
-
(a) Run the main application:
python app.py -
(b) To process a specific podcast episode:
python podcastFetcher.py <podcast_name> <rss_url> <episode_guid>
Contributions to this project are welcome! Here's how you can contribute:
- Fork the repository
- Create a new branch (
git checkout -b feature/your-feature-name) - Make your changes
- Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin feature/your-feature-name) - Create a new Pull Request
Please ensure your code adheres to the project's coding standards and include tests for new features.
Dave Norman - @1davidnorman on X.com