Podcast Transcriber

Tool to create written transcripts for podcasts and other interview style audio.

Demo

See an example podcast transcript here.

Introduction

Podcasts are verging on the most popular form of long-form media due to its ease of creation, and consumption by end-users. Originally built to allow for revision and notetaking, now with summarisation features it can also be used to get a quick understanding of a podcast's contents.

Features

Audio-to-Text transcription including speaker detection (diarization).
Automatic topic generation.
Topic and full-text summarisation.
Front-end transcript display.
- Easy to read transcript with summarisation.
- Episode audio player with word-level seeking.

How it Works

Use of free publically available AI models that can be run on a local instance.

Transcription: whisperx
- Open source transcription pipeline combining OpenAI's transcription model whisper with Active Speech Recognition (ASR) to generate accurate word-level timestamps.
Diarization: pyannote
- Open source model to diarize transcripts.
Topic Modelling:
- Clustering embeddings created by public sentence embedding model as well as cosine positional embeddings.
Summarisation:
- Use of meta's Llama2 LLM to generate titles and summaries for topics.

Usage

Run the main.py script with the following parameters supplied:

url
episode_name
media_type (podcast/youtube)
n_speakers (optional)

This will run a pipeline of the mentioned processes, downloading the specified url, and outputting the transcript as an html file that can be viewed through a browser.

Development

Developed firstly through jupyter notebooks (see each for additional info and design choices).

These notebooks have each been created as python modules through the nbdev library. This was simply done by tagging the code with required cells with the #|export tag.

Installation

pip install git+https://github.com/stephankostov/transcriber.git

Whisper requires ffmpeg and also rust to also be installed. See their installation instructions in their repo for details.

ToDo

Pyannote diarization model performs poorly with overlapping speech
- Look into other diarization models such as nvidia nemo
Speech segments often wrongly split in between sentences
- Split speech segments on sentences rather than words
Topic grouping is rather arbitrary
- Topics are usually introduced by the interviewer so code a solution that takes this into account in the topic splitting.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets		assets
data/podcast		data/podcast
docs		docs
nbs		nbs
transcriber		transcriber
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Podcast Transcriber

Demo

Introduction

Features

How it Works

Usage

Development

Installation

ToDo

About

Releases

Packages

Languages

stephankostov/podcast-transcriber

Folders and files

Latest commit

History

Repository files navigation

Podcast Transcriber

Demo

Introduction

Features

How it Works

Usage

Development

Installation

ToDo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages