Skip to content

🎙️ AI generated subtitles and segmented chapters for podcasts

Notifications You must be signed in to change notification settings

johan-akerman/SpotifyTranscripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotify Transcripts:
AI generated subtitles and segmented chapters for podcasts.

✨ Key features

  • Transcripts: Speech recognition to convert speech into text and a timestamp.
  • Search: Search the transcript and jump to a particular part of a conversation.
  • Chapters: Break down an episode into auto-generated chapters based on topics.
  • Subtitles: Make podcasts accessible to people with hearing difficulties.

📖 About the project

This project is the result of a combination and continued development of two of my previous projects:

  • Spotify Topics: During the summer of 2020, I participated in Spotify’s summer hackathon and developed a tool that lets you fast forward to timestamps where certain topics were being discussed.
  • Spotify Subtitles: In 2022, I continued to experiment by building subtitles for podcasts based on a feature idea which received 4500+ upvotes on Spotify’s community forum.

In 2023, in the midst of the ChatGPT hype, I got inspired to combine my two previous projects into one podcast player and improve it by utilizing Open AI’s APIs.

FYI: Spotify later released a similar solution for both podcast subtitles and chapters, read more here.

⚙️ Technologies used

The technologies used in this project can be found in the table below.

Technology Use case
React Frontend framework
Tailwind CSS styling library
Python Backend to handle transcription logic
Flask Connects python backend with react frontend
Spotify API To get information about podcast episodes
Google Speech Recognition API Converts speech to text, i.e transcribes the podcast
Open AI's GPT 3.5 API Segment transcript into chapters based on transcript

I wanted to learn how to connect a React frontend to a Python backend so I used this project as a learning opportunity to do that. As a result, I did some overengineering by building my own API to handle transcriptions on a Python backend instead of calling a plug-and-play API in the frontend.

More specifically, the frontend makes a call to the Spotify API and gets the URL of the requested podcast. The URL is sent as a request to the backend that downloads the podcast as an mp3 in order to process it.

The reason that the mp3 needs to be processed is that I need to get timestamps for each sentence in order to display them at the correct time in the subtitles. I identify sentences in the transcript by listening for a silence (< 14 decibels) longer than 500 ms. When a silence is identified, I split the original audio file to create a set of smaller audio files, one for each sentence. By doing this, I was able to calculate the start and end time of each sentence by looking at the length of each smaller audio file, see figure below.

All of the audio files are now sent to Google’s speech recognition API and returns a string of the transcribed audio. The transcription is now being sent back to the frontend who makes a request to Open AI’s API to segment the transcript and identify potential topics to divide the episode into different chapters.

🚫 Limitations

Spotify’s API does not allow you to download full podcast episodes, only 30 second previews. This makes the app very limited to use and it is therefore only a proof of concept.

🚀 Getting started

Step 1: Sign up for API keys

Step 2: Add API keys to .env file

Create a .env file in the root directory and add your API keys:

REACT_APP_SPOTFY_CLIENT_ID=YOUR_SPOTIFY_CLIENT_ID_GOES_HERE
REACT_APP_OPEN_AI_KEY=YOUR_OPEN_AI_KEY_GOES_HERE

Step 3: Run the project

Use the following commands to run the project. Start the frontend in one terminal and the backend in another terminal.

Backend

export FLASK_APP=backend
export FLASK_DEBUG=1
flask run

Frontend

cd frontend
npm start

🎞️ Demo

Watch a 1 min demo of the project here.

📸 Screenshots

Home page with Spotify authentication


Discovery page


Loading screen


Episode screen


Episode screen


Subtitles in fullscreen


Overview of chapters within an episode


Audio player divided by chapters


Search transcript