# Transcribe a YouTube Video in English with Assembly AI and yt-dlp

### References:

- [Assembly AI documentation](https://www.assemblyai.com/docs)
- [yt-dlp on GitHub](https://github.com/yt-dlp/yt-dlp)

## Preparation

#### Imports and Globals

In [1]:
import assemblyai as aai
import yt_dlp
import json

from config import *

aai.settings.api_key = aai_key
YT_BASE_URL = 'https://www.youtube.com/watch?v='
DST_FOLDER = 'files'

#### Task-specific Variables

In [25]:
v_id = 'wKS2sFRI87w'  # the main identifier of the video, absolutely needeed

#### Pull and save the soundtrack with yt-dlp

In [26]:
url = f'{YT_BASE_URL}{v_id}'

ydl_opts = {
    'format': 'm4a/bestaudio/best',  # The best audio version in m4a format
    'outtmpl': f'{DST_FOLDER}/%(title)s_%(id)s.%(ext)s',  
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(url)
    audio_file = ydl.prepare_filename(info)

print(f'\n>>> Downloaded to: {audio_file}')

[youtube] Extracting URL: https://www.youtube.com/watch?v=wKS2sFRI87w
[youtube] wKS2sFRI87w: Downloading webpage
[youtube] wKS2sFRI87w: Downloading tv client config
[youtube] wKS2sFRI87w: Downloading player 56511309
[youtube] wKS2sFRI87w: Downloading tv player API JSON
[youtube] wKS2sFRI87w: Downloading ios player API JSON
[youtube] wKS2sFRI87w: Downloading m3u8 information
[info] wKS2sFRI87w: Downloading 1 format(s): 140
[download] files\The Power To OverRule G-d：_wKS2sFRI87w.m4a has already been downloaded
[download] 100% of   23.92MiB

>>> Downloaded to: files\The Power To OverRule G-d：_wKS2sFRI87w.m4a


## Building the transcript via AssemblyAI API

In [27]:
transcriber = aai.Transcriber()

In [None]:
# no speaker differentiation
config = aai.TranscriptionConfig(language_detection=True)
transcript = transcriber.transcribe(audio_file, config)
print(transcript.status)

In [28]:
# with speaker differentiation
config = aai.TranscriptionConfig(language_code='en', speaker_labels=True)
transcript = transcriber.transcribe(audio_file, config)
print(transcript.status, transcript.id)

TranscriptStatus.completed 53283de1-e100-458d-8e80-8cec00ce5241


In [29]:
file_name = audio_file.split("\\")[-1].split(".")[0]
file_name

'The Power To OverRule G-d：_wKS2sFRI87w'

In [30]:
json.dump(transcript.json_response, 
          open(f'files/transcript_{file_name}_{transcript.id}.json', 'w', encoding='utf-8'), 
          indent=4, 
          ensure_ascii=False
)

with open(f'files/transcript_{file_name}_{transcript.id}.txt', 'w', encoding='utf-8') as f:
    f.write(transcript.text)

In [31]:
for utterance in transcript.utterances:
  print(f"Speaker {utterance.speaker}: {utterance.text}")

Speaker A: Good morning and welcome to Worldwide Wisdom. Folks. Today we're learning about a mission in this world which is to transform darkness into light. A mission in this world which is to transform chaos into chorus, dysfunction into clarity. God gives us darkness in order that we should turn it into light. That's an amazing transformation. You know, there's a story in the Talmud about one of the sages who was asked why it is that goats walk before sheep. Have you ever pondered this question?
Speaker B: Absolutely, before I go to bed.
Speaker A: Well, now you get ready for the answer. You ready? This is it. Just get a set, put it to bed.
Speaker B: It's a bad answer.
Speaker A: So here goes. Good morning, Stacy. So here goes. Why is it. I mean, can you imagine the Talmud, the. The second most important book of Judaism dealing with this? So why do the goats walk in front of the sheep? And the answer given was that the sage responded that go. Now, I don't know this, but goats appar

## Working with the transcript

In [2]:
# load the transcript by id if necessary
job_id = '1dc162c7-0114-4698-be0a-8259ad1e7edf'

transcript = aai.Transcript.get_by_id(job_id)
transcript.id

'1dc162c7-0114-4698-be0a-8259ad1e7edf'

### Search for words

In [16]:
def convert_millis(millis):
    seconds = millis // 1000
    hours = seconds // 3600
    minutes = (seconds % 3600) // 60
    seconds = seconds % 60
    return f"{hours:02}:{minutes:02}:{seconds:02}"


def find_words(query):
    matches = transcript.word_search(query.split())
    for match in matches:
        print(f'{match.text}: ', end='')
        print('; '.join([convert_millis(start) for start, end in match.timestamps]))


def find_sequence(query):
    starts = [] 
    ordered_words = [word.lower().strip() for word in query.split()]
    matches = transcript.word_search(ordered_words)
    match0 = None
    for i, match in enumerate(matches):
        if match.text == ordered_words[0]:
            match0 = matches.pop(i)
            break
    if match0 is None:
        print(f'No match for "{query}"')
        return
    for index, timestamp in zip(match0.indexes, match0.timestamps):
        for match in matches:
            if index + 1 in match.indexes:
                starts.append(timestamp[0])
    return starts

In [42]:
query = 'pro'
find_words(query)

pro: 00:25:33


In [14]:
query = 'overrule God'
matches = transcript.word_search(query.split())
matches

[WordSearchMatch(text='overrule', count=8, timestamps=[(155652, 156124), (158766, 159158), (323482, 323986), (329482, 329778), (526892, 527460), (1310364, 1310876), (1398862, 1399462), (1421294, 1421766)], indexes=[383, 391, 800, 817, 1438, 3897, 4189, 4249]),
 WordSearchMatch(text='god', count=62, timestamps=[(20664, 20984), (156172, 156800), (159174, 159366), (161486, 161686), (170430, 170646), (190434, 190666), (234720, 234984), (324018, 324274), (329794, 329986), (394814, 395046), (401282, 401466), (405602, 405834), (414562, 414746), (415762, 415946), (416946, 417194), (431460, 431932), (437780, 438044), (492546, 492746), (494514, 494890), (527540, 527844), (538984, 539360), (552664, 552944), (574656, 574856), (600348, 600644), (694018, 694178), (695754, 695986), (696666, 696818), (701882, 702370), (724746, 725106), (730666, 730914), (773528, 773728), (861360, 861664), (864048, 864256), (870208, 870416), (988096, 988296), (991316, 991468), (1200312, 1200448), (1236932, 1237560), (1

In [19]:
starts = find_sequence(query)
starts

[155652, 158766, 323482, 329482, 526892, 1310364]

In [22]:
for i, word in enumerate(transcript.words):
    if word.start in starts:
        print(f'{word.start} {word.text} {transcript.words[i+1].text}')

155652 overrule God.
158766 overrule God?
323482 overrule God,
329482 overrule God,
526892 overrule God
1310364 overrule God.


In [20]:
transcript.words

[Word(text='Good', start=1440, end=1552, confidence=0.99349, speaker=None, channel=None),
 Word(text='morning', start=1552, end=1736, confidence=0.99996, speaker=None, channel=None),
 Word(text='and', start=1760, end=1912, confidence=0.99584, speaker=None, channel=None),
 Word(text='welcome', start=1936, end=2136, confidence=0.67908, speaker=None, channel=None),
 Word(text='to', start=2168, end=2312, confidence=0.99951, speaker=None, channel=None),
 Word(text='Worldwide', start=2336, end=2776, confidence=0.54569, speaker=None, channel=None),
 Word(text='Wisdom.', start=2808, end=3192, confidence=0.98676, speaker=None, channel=None),
 Word(text='Folks.', start=3256, end=3544, confidence=0.99498, speaker=None, channel=None),
 Word(text='Today', start=3592, end=3800, confidence=0.99831, speaker=None, channel=None),
 Word(text="we're", start=3840, end=4008, confidence=0.97077, speaker=None, channel=None),
 Word(text='learning', start=4024, end=4264, confidence=0.99088, speaker=None, channe