Based on:
https://blog.devgenius.io/transcribing-youtube-videos-using-openais-whisper-%EF%B8%8F-%EF%B8%8F-a29d264d6fb1

### Basic Imports

In [1]:
import yt_dlp


In [2]:
def download(video_id: str) -> str:
    video_url = f'https://www.youtube.com/watch?v={video_id}'
    ydl_opts = {
        'format': 'm4a/bestaudio/best',
        'paths': {'home': 'audio/'},
        'outtmpl': {'default': '%(id)s.%(ext)s'},
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'm4a',
        }]
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        error_code = ydl.download([video_url])
        if error_code != 0:
            raise Exception('Failed to download video')

    return f'audio/{video_id}.m4a'

In [3]:
download('CuBzyh4Xmvk')

[youtube] Extracting URL: https://www.youtube.com/watch?v=CuBzyh4Xmvk
[youtube] CuBzyh4Xmvk: Downloading webpage
[youtube] CuBzyh4Xmvk: Downloading ios player API JSON
[youtube] CuBzyh4Xmvk: Downloading android player API JSON
[youtube] CuBzyh4Xmvk: Downloading player d23221b6
[youtube] CuBzyh4Xmvk: Downloading m3u8 information
[info] CuBzyh4Xmvk: Downloading 1 format(s): 140
[download] Destination: audio/CuBzyh4Xmvk.m4a
[download] 100% of   72.31MiB in 00:00:41 at 1.75MiB/s     
[FixupM4a] Correcting container of "audio/CuBzyh4Xmvk.m4a"
[ExtractAudio] Not converting audio audio/CuBzyh4Xmvk.m4a; file is already in target format m4a


'audio/CuBzyh4Xmvk.m4a'

In [4]:
import whisper

In [5]:
whisper_model = whisper.load_model("base.en")


100%|███████████████████████████████████████| 139M/139M [00:03<00:00, 46.6MiB/s]


In [7]:
transcription = whisper_model.transcribe("audio/CuBzyh4Xmvk.m4a", fp16=True, verbose=True)

[00:00.000 --> 00:05.400]  Please look at the code mentioned above and please sign up on the Google Cloud.
[00:05.400 --> 00:08.520]  We've already started making some announcements.
[00:08.520 --> 00:14.240]  You will likely end up missing the announcements and you'll have no one else to play with.
[00:14.240 --> 00:20.080]  The second quick logistical announcement is that we'll have an extra lecture on Saturday,
[00:20.080 --> 00:23.800]  11th Jan at 11am in 1.101.
[00:23.800 --> 00:26.240]  So a lot of ones over there.
[00:26.240 --> 00:32.000]  And I think one or two people still have conflict, but in the larger, in the larger
[00:32.000 --> 00:36.240]  phone we will have almost everyone available, so we'll have to stick with this.
[00:36.240 --> 00:43.960]  FAQ and the projects which were earlier shared on Google Docs, I'll give all of you a comment
[00:43.960 --> 00:48.960]  access on it so that if you have any questions, queries, things like what should be the,
[00:48.960 --> 00

In [8]:
transcription

{'text': " Please look at the code mentioned above and please sign up on the Google Cloud. We've already started making some announcements. You will likely end up missing the announcements and you'll have no one else to play with. The second quick logistical announcement is that we'll have an extra lecture on Saturday, 11th Jan at 11am in 1.101. So a lot of ones over there. And I think one or two people still have conflict, but in the larger, in the larger phone we will have almost everyone available, so we'll have to stick with this. FAQ and the projects which were earlier shared on Google Docs, I'll give all of you a comment access on it so that if you have any questions, queries, things like what should be the, what are the maps, what are the main group size you can ask situations if they're already not there. Also about projects, if you have any questions, like what is the expectation if it's something is not mentioned clearly, you can please comment on the Google Doc and we'll get

In [19]:
def create_srt_from_transcription(transcription_objects, srt_file_path):
    with open(srt_file_path, 'w') as srt_file:
        index = 1  # SRT format starts with index 1

        for entry in transcription_objects['segments']:
            start_time = entry['start']
            end_time = entry['end']
            text = entry['text']

            # Convert time to SRT format
            start_time_str = format_time(start_time)
            end_time_str = format_time(end_time)

            # Write entry to SRT file
            srt_file.write(f"{index}\n")
            srt_file.write(f"{start_time_str} --> {end_time_str}\n")
            srt_file.write(f"{text}\n\n")

            index += 1

def format_time(time_seconds):
    minutes, seconds = divmod(time_seconds, 60)
    hours, minutes = divmod(minutes, 60)
    return f"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d},000"


In [20]:
create_srt_from_transcription(transcription, "audio/CuBzyh4Xmvk.srt")

In [21]:
!head audio/CuBzyh4Xmvk.srt

1
00:00:00,000 --> 00:00:05,000
 Please look at the code mentioned above and please sign up on the Google Cloud.

2
00:00:05,000 --> 00:00:08,000
 We've already started making some announcements.

3
00:00:08,000 --> 00:00:14,000


### TODO

In [23]:
import openai
import getpass

# Get OpenAI API key from the user without displaying it
openai.api_key = getpass.getpass("Enter your OpenAI API key: ")

In [39]:
from openai import OpenAI

In [40]:
def generate_summary(transcription_text):
    client = OpenAI(api_key=openai.api_key)

    prompt = f"Summarize the following transcription:\n\n{transcription_text}\n\nSummary:"

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model="gpt-3.5-turbo",
    )

    summary = chat_completion['choices'][0]['message']['content'].strip()
    return summary