# ⚠️ IMPORTANT LEGAL DISCLAIMER

This code is provided for EDUCATIONAL and RESEARCH purposes only.
Song lyrics are the intellectual property of their respective copyright holders.

By using this code, you acknowledge that:
1. You will use the scraped content only for non-commercial, educational analysis
2. You will not redistribute or republish the lyrics
3. You understand that web scraping may violate website terms of service
4. This project falls under Fair Use doctrine for academic research
5. The author of this code is not responsible for any misuse

For production use, please obtain proper licensing through official APIs
or contact the copyright holders directly.

Taylor Swift's lyrics © Taylor Swift and respective publishers

In [1]:
import subprocess
import whisper
import json
import os
import pickle
from datetime import datetime
from yt_dlp import YoutubeDL
from pyannote.audio import Pipeline
from pyannote.core import Annotation, Segment
import glob

Check if ffmpeg is installed on the laptop. If not, it needs to be installed.

In [2]:
try:
    subprocess.run(["ffmpeg", "-version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    print("FFmpeg is installed and available.")
except FileNotFoundError:
    print("FFmpeg is not installed or not found in PATH.")

FFmpeg is installed and available.


## Getting audio from YouTube link
The first step of the analysis is to extract audio from the podcast.

In [3]:
def download_audio_from_youtube(url: str, out_dir: str = "audio", sample_rate: int = 16000) -> str:
    """
    Downloads audio from YouTube videos and converts it to WAV (mono, 16kHz)
    Returns the path to the resulting file.
    """
    os.makedirs(out_dir, exist_ok=True)

    ydl_opts = {
        "format": "bestaudio/best",
        "outtmpl": os.path.join(out_dir, "%(id)s.%(ext)s"),
        "postprocessors": [{
            "key": "FFmpegExtractAudio",
            "preferredcodec": "wav",
            "preferredquality": "192",
        }],
        "postprocessor_args": ["-ar", str(sample_rate), "-ac", "1"],
        "prefer_ffmpeg": True,
        "quiet": False,
        "no_warnings": True,
    }

    with YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(url, download=True)

    video_id = info.get("id")
    pattern = os.path.join(out_dir, f"{video_id}*.wav")
    matches = glob.glob(pattern)
    if not matches:
        raise FileNotFoundError(f"No .wav file found for id={video_id}")
    return matches[0]

In [4]:
youtube_url = "https://www.youtube.com/watch?v=M2lX9XESvDE"
audio_path = download_audio_from_youtube(youtube_url)
print(f"Audio saved in: {audio_path}")

[youtube] Extracting URL: https://www.youtube.com/watch?v=M2lX9XESvDE
[youtube] M2lX9XESvDE: Downloading webpage
[youtube] M2lX9XESvDE: Downloading tv client config
[youtube] M2lX9XESvDE: Downloading player 6742b2b9-main
[youtube] M2lX9XESvDE: Downloading tv player API JSON
[youtube] M2lX9XESvDE: Downloading ios player API JSON
[youtube] M2lX9XESvDE: Downloading m3u8 information
[info] Testing format 234
[info] M2lX9XESvDE: Downloading 1 format(s): 234
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 1375
[download] Destination: audio/M2lX9XESvDE.mp4
[download] 100% of  115.45MiB in 00:04:07 at 476.97KiB/s                   
[ExtractAudio] Destination: audio/M2lX9XESvDE.wav
Deleting original file audio/M2lX9XESvDE.mp4 (pass -k to keep)
Audio saved in: audio/M2lX9XESvDE.wav


## Getting text from audio file
The second step is to convert audio to text.

In [5]:
# Transcribe audio with Whisper
def transcribe_audio_whisper(audio_path: str, model_size="small"):
    print(f"[INFO] Loading Whisper model: {model_size}")
    model = whisper.load_model(model_size)  # "small", "medium", or "large" for better accuracy
    print("[INFO] Starting transcription...")
    result = model.transcribe(audio_path, verbose=True)
    return result  # Contains 'segments' with text, start, end


In [10]:
# model_size is set to 'medium' because results from 'small' were very poor, but 'large' takes too long to process

transcription = transcribe_audio_whisper(audio_path, 'medium')

[INFO] Loading Whisper model: medium


100%|█████████████████████████████████████| 1.42G/1.42G [00:59<00:00, 25.5MiB/s]


[INFO] Starting transcription...




Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:08.220]  All right, let's get to the part of this show that I think is what everybody is going to
[00:08.220 --> 00:09.220]  be talking about.
[00:09.220 --> 00:10.220]  Do I get to say it?
[00:10.220 --> 00:13.020]  Do I get to say I get to say the two words?
[00:13.020 --> 00:14.020]  Yes.
[00:14.020 --> 00:15.020]  First of all, you can do whatever you want to.
[00:15.020 --> 00:16.020]  I don't know.
[00:16.020 --> 00:17.020]  We're not going to review.
[00:17.020 --> 00:19.740]  This is very much whatever within the framework of the podcast.
[00:19.740 --> 00:21.900]  I'm a fan of the podcast.
[00:21.900 --> 00:25.060]  Typically allow the guests to say new news.
[00:25.060 --> 00:26.340]  Yeah, I want to do it.
[00:26.340 --> 00:34.800]  I think Taylor has a little bit of new news.
[00:34.800 --> 00:36.440]  Welcome back to New Heights.
[00:36

[05:04.680 --> 05:05.680]  It's so funny.
[05:05.680 --> 05:07.720]  It's like, she loves me.
[05:07.720 --> 05:08.720]  She loves me.
[05:08.720 --> 05:09.720]  Not she loves me.
[05:09.720 --> 05:10.720]  She loves me.
[05:10.720 --> 05:17.840]  This is so like, it was such a wild romantic gesture to just be like, I don't want to
[05:17.840 --> 05:21.040]  date you like on, I don't know.
[05:21.040 --> 05:25.200]  It was at first when I looked at it, I was like, that's what it does when you're on the
[05:25.200 --> 05:26.880]  stage and you perform an arrowhead.
[05:26.880 --> 05:27.880]  That's what it did.
[05:27.880 --> 05:28.880]  This dude didn't get a meet and greet.
[05:28.880 --> 05:30.680]  He's making it everyone's problem.
[05:30.680 --> 05:31.680]  That's what I thought it was.
[05:31.680 --> 05:33.640]  You come to arrowhead, I get to meet you.
[05:33.640 --> 05:35.360]  That's the perk of playing for the chiefs.
[05:35.360 --> 05:37.880]  You realize he didn't even reac

[10:10.560 --> 10:13.560]  I didn't know what the chains were.
[10:13.560 --> 10:16.560]  I didn't know what a tight end was.
[10:16.560 --> 10:21.560]  I am forever thankful for you diving into the football world wholeheartedly.
[10:21.560 --> 10:22.560]  Oh, my God. I fell in love with it.
[10:22.560 --> 10:23.560]  I became obsessed with it.
[10:23.560 --> 10:30.560]  I became like a person who was running through the halls of my house screaming, we drafted Xavier worthy.
[10:30.560 --> 10:34.560]  And my friends are like, what is who?
[10:34.560 --> 10:35.560]  Body snatched you.
[10:35.560 --> 10:38.560]  This is this is what do you mean?
[10:38.560 --> 10:40.560]  We drafted Xavier worthy.
[10:40.560 --> 10:45.560]  I forget where I was, but you were the first person to tell me that I was the fastest man in the draft.
[10:45.560 --> 10:46.560]  No, I was screeching.
[10:46.560 --> 10:47.560]  I couldn't believe it.
[10:47.560 --> 10:48.560]  I was freaking out.
[10:48.560 --> 10:

[16:30.560 --> 16:36.560]  she's like, they we you got your music.
[16:36.560 --> 16:42.560]  And so sorry that this is it's literally been so long since this happened.
[16:42.560 --> 16:46.560]  It's every time I talk about it, she's like, you got you got your music.
[16:46.560 --> 16:52.560]  And I just like very dramatically hit the floor for real.
[16:52.560 --> 16:56.560]  Like, honestly, just started a long time bawling my eyes out.
[16:56.560 --> 17:01.560]  And I'm just like, just just weeping and kind of like unable.
[17:01.560 --> 17:04.560]  I was just like, really? Really? Really? What do you mean?
[17:04.560 --> 17:07.560]  What do you mean? I'm like, get yourself together, get your shit together.
[17:07.560 --> 17:10.560]  Like, just go tell Travis in a normal way.
[17:10.560 --> 17:14.560]  And I knock on the door. He's playing video games.
[17:14.560 --> 17:18.560]  And I'm trying to say it in a normal way. And I'm just like.
[17:18.560 --> 17:20.560]  Travis!
[17:20.56

[23:04.560 --> 23:11.560]  And it's not for everyone. Not everyone cares about this artist to artist artists have different priorities.
[23:11.560 --> 23:15.560]  Some artists sell off all their masters because everyone's allowed to have their own priorities.
[23:15.560 --> 23:25.560]  What I wanted, though, is that if I were to put the information out about what I went through, at least it gets artists talking about this to decide whether this is a priority for them.
[23:26.560 --> 23:32.560]  Because you can't know if it's a priority for you unless you know what has come before you and what has happened.
[23:32.560 --> 23:37.560]  And so the master recordings thing, that's your actual ownership of your recordings.
[23:37.560 --> 23:49.560]  To put it in perspective, if I never would have been able to buy back my music, one day someone else would be leaving all of my music from my first six albums to their kids in their will.
[23:49.560 --> 23:51.560]  Right. Yeah.
[23:51.560 --> 23:5

[29:05.560 --> 29:09.560]  to the point where I'm like, do you need to go to the hospital?
[29:09.560 --> 29:13.560]  It was so passionate, the fan response to that song at the Eras Tour.
[29:13.560 --> 29:17.560]  And I remember thinking so many times, what if this never happened?
[29:17.560 --> 29:19.560]  What if I never had to record this?
[29:19.560 --> 29:22.560]  It's one of the most iconic parts of that tour.
[29:22.560 --> 29:23.560]  It was so fun.
[29:23.560 --> 29:28.560]  And so yeah, it really did make me fall back in love with that album specifically.
[29:28.560 --> 29:33.560]  I have a song with Phoebe Bridgers that I really love on that, from the vault.
[29:33.560 --> 29:35.560]  Chris Stapleton was a part of it as well.
[29:35.560 --> 29:42.560]  And I just, I think that one, I've always loved Fearless in 1989 in a very pure way.
[29:42.560 --> 29:45.560]  But Red, I've kind of gone back and forth over the years and been like,
[29:45.560 --> 29:49.560]  you know, like

[35:47.560 --> 36:04.560]  It was a lot of countries. What was really fun about all the countries that we went to is when we first started dating, he was like, I always wanted to go and really vacation in Europe and see Australia and, you know, go to Asia.
[36:04.560 --> 36:10.560]  And I was like, well, I got I got a tour for that. You know, it's coming up.
[36:10.560 --> 36:13.560]  Oh, nice. You got room for a six five guy.
[36:13.560 --> 36:19.560]  We can figure it out. Like, no, that's it. The dimensions are wild, but we'll make some room.
[36:19.560 --> 36:23.560]  I think we'll have to leave some some equipment.
[36:23.560 --> 36:27.560]  Thank you for the size. But but absolutely would be happy to have you.
[36:27.560 --> 36:37.560]  We got to, you know, we got to travel the world and have vacations and adventures when I wasn't on stage, which was really fun because like Europe was so fun.
[36:37.560 --> 36:39.560]  Australia is amazing.
[36:39.560 --> 36:50.560]  Yeah, it was

[42:01.560 --> 42:02.560]  I've actually got a station.
[42:02.560 --> 42:04.920]  We've set it up where it's like, I've got a station and he's got a station and he's
[42:04.920 --> 42:05.920]  done all of it.
[42:05.920 --> 42:06.920]  Okay.
[42:06.920 --> 42:07.920]  So he's actually baked too.
[42:07.920 --> 42:09.480]  I've stretched and folded before.
[42:09.480 --> 42:10.480]  Yeah.
[42:10.480 --> 42:12.480]  But there's just like something always slightly.
[42:12.480 --> 42:13.480]  I've weighed all the flow.
[42:13.480 --> 42:15.600]  No, his was actually higher than mine.
[42:15.600 --> 42:18.560]  His was more delicious than mine.
[42:18.560 --> 42:21.200]  It's like, it's also, you know, no directions over here.
[42:21.200 --> 42:22.640]  I get into the old chance.
[42:22.640 --> 42:24.840]  Jason, you've had it and you said you loved it.
[42:24.840 --> 42:28.920]  I'm not saying it's not good, but it's no way as good as Taylor's.
[42:28.920 --> 42:29.920]  That's not true.


[47:36.760 --> 47:42.520]  Put me, put me on an Olympic one person trying to do some stuff sport.
[47:42.520 --> 47:43.520]  I'm doing this.
[47:43.520 --> 47:44.520]  Yeah, I don't care.
[47:44.520 --> 47:45.520]  I'm getting out of here.
[47:45.520 --> 47:47.680]  Yeah, I'm, I'm doing this.
[47:47.680 --> 47:48.680]  I care.
[47:48.680 --> 47:49.680]  Okay.
[47:49.680 --> 47:50.680]  I'm doing this.
[47:50.680 --> 47:51.680]  I care.
[47:51.680 --> 47:53.280]  This care a lot.
[47:53.280 --> 47:54.280]  We're doing this.
[47:54.280 --> 47:55.280]  We're kneading dough.
[47:55.280 --> 47:56.280]  I don't, I don't care.
[47:56.280 --> 47:57.280]  Stretching.
[47:57.280 --> 47:58.280]  I don't care if this happens.
[47:58.280 --> 48:00.080]  I don't care if this goes here.
[48:00.080 --> 48:01.800]  It's not part of my metrics for my self-worth.
[48:01.800 --> 48:02.800]  All right.
[48:02.800 --> 48:04.480]  That's probably very healthy.
[48:04.480 --> 48:05.480]  To be honest with you

[53:14.360 --> 53:17.280]  I do. You don't you don't need to know what that is.
[53:17.920 --> 53:20.840]  Yeah. But if you want, if you want to look at that.
[53:22.320 --> 53:24.640]  But if you do, then it's there.
[53:24.760 --> 53:28.200]  Yeah. You know what I mean? Like, if you know, you know, you know, you know.
[53:28.400 --> 53:31.640]  Oh, yeah. What's then you know, then you know, let's talk about something I don't know.
[53:31.640 --> 53:34.320]  What is numerology? You threw that phrase out there like that's a comment.
[53:34.360 --> 53:35.360]  What is numerology?
[53:36.400 --> 53:37.720]  You don't know what numerology is.
[53:38.080 --> 53:39.480]  I'm assuming something with numbers.
[53:39.720 --> 53:42.320]  Yeah. Like I'm 87 and she's 13.
[53:42.480 --> 53:43.840]  Yeah. Literally, it's that simple.
[53:44.320 --> 53:46.560]  Just numbers and 100.
[53:47.160 --> 53:49.680]  Yeah. 13 plus 87 equals 100.
[53:49.720 --> 53:54.320]  That's numerology. Like numbers, nu

[58:33.760 --> 58:36.800]  Yeah, I watched him have this moment with his beer where he's just like,
[58:37.160 --> 58:38.360]  but I want to take it.
[58:38.360 --> 58:41.280]  But I know that I I probably should not take it.
[58:41.800 --> 58:44.480]  I watched this happen and it was kind of the most amazing.
[58:44.480 --> 58:47.840]  When I meet, I like to pick up on it because that was exactly what I don't.
[58:47.840 --> 58:51.160]  Like if I don't have my beer, what do I do with this hand now?
[58:52.120 --> 58:53.640]  Disrespectful to have a beer.
[58:53.640 --> 58:57.400]  Yeah. Or am I just like being being authentic by having the beer?
[58:57.400 --> 58:59.920]  I would normally have the beer, wouldn't they want me to be myself?
[58:59.920 --> 59:02.040]  I'm watching you say that in your head.
[59:02.040 --> 59:03.440]  And it was fantastic.
[59:03.440 --> 59:04.920]  I don't know. I always remember that.
[59:04.920 --> 59:07.280]  I was just just like, I'll always remember

[01:03:50.960 --> 01:03:52.120]  Oh, my gosh.
[01:03:52.120 --> 01:03:53.960]  I was just like, oh, she's just in it.
[01:03:53.960 --> 01:03:55.640]  She's down. She's down for the ride.
[01:03:55.640 --> 01:03:57.480]  She's here. She's here for the fun.
[01:03:57.480 --> 01:03:59.800]  She's like, I'll fucking go through the mud.
[01:03:59.800 --> 01:04:01.560]  I'll be a part of a cheese kingdom.
[01:04:01.560 --> 01:04:03.640]  Like this is where we walk in. This is where we walk in.
[01:04:03.640 --> 01:04:06.040]  I don't know what to tell you. I don't have an alternative.
[01:04:06.040 --> 01:04:09.000]  I'm like, you know, we just played here three months ago.
[01:04:09.280 --> 01:04:11.280]  Yeah. And we went a different way.
[01:04:11.280 --> 01:04:12.440]  But I'm not going to say that.
[01:04:12.440 --> 01:04:14.440]  I'm not going to backseat drive this shit.
[01:04:14.440 --> 01:04:17.160]  What was that like? What were people doing?
[01:04:17.200 --> 01:04:22.840]  I re

[01:08:28.160 --> 01:08:31.000]  I always feel like when you lose your shit, you lose your leadership.
[01:08:31.360 --> 01:08:32.840]  You know, for sure.
[01:08:32.840 --> 01:08:36.440]  It's just kind of something I've always kind of tried to administer.
[01:08:36.440 --> 01:08:37.880]  It's always what I'm doing.
[01:08:37.880 --> 01:08:41.360]  But he's a he's a huge role model for that is how he motivates people
[01:08:41.360 --> 01:08:46.280]  and how he does so without flying off the handle and is just very focused on
[01:08:46.600 --> 01:08:48.920]  what the right thing is at the right moment.
[01:08:48.920 --> 01:08:50.480]  You know. Yep.
[01:08:50.480 --> 01:08:53.480]  So what so did Andy tell Scott something?
[01:08:53.800 --> 01:08:55.240]  Is that what I'm trying to do?
[01:08:55.240 --> 01:08:56.200]  What did Andy say, Scott?
[01:08:56.200 --> 01:08:59.920]  I mean, OK, so when you guys did the full send on the podcast.
[01:08:59.920 --> 01:09:01.400]  Yes, it was.
[0

[01:13:12.760 --> 01:13:16.280]  that goes like this. Like it's just surreal, man.
[01:13:16.280 --> 01:13:22.160]  And you're like, you know, we just all moved in with him for the whole summer pretty much.
[01:13:22.160 --> 01:13:25.960]  And just, you know, because you can't you can't really walk on your own.
[01:13:25.960 --> 01:13:30.520]  You got a little harness for my dad, just like walking dad on his harness.
[01:13:30.520 --> 01:13:32.840]  And he was like the loveliest patient ever.
[01:13:32.840 --> 01:13:35.240]  He just kept saying thank you over and over again.
[01:13:35.960 --> 01:13:38.080]  So I was full of life, man. Yeah.
[01:13:38.080 --> 01:13:40.040]  She was appreciative that he that he caught it.
[01:13:40.040 --> 01:13:42.680]  And he still is. Yeah. You know, we had a lot.
[01:13:43.000 --> 01:13:45.160]  We had the face time with him last time we were together.
[01:13:45.160 --> 01:13:46.520]  And I was like, yeah, he hasn't changed a bit.
[01:13:46.520 --> 0

[01:18:27.880 --> 01:18:29.800]  That's what I can do more.
[01:18:29.800 --> 01:18:34.040]  Yes, I've heard about the we've talked about the treadmill walk in while you're doing the
[01:18:34.040 --> 01:18:35.480]  whole show and like all of that.
[01:18:35.480 --> 01:18:36.600]  We can't walk for four hours.
[01:18:36.600 --> 01:18:38.200]  Can't what's the last time you walk for four hours?
[01:18:38.200 --> 01:18:40.200]  Travis, I've never even tried that.
[01:18:41.080 --> 01:18:42.920]  I'm not stupid enough to try something like that.
[01:18:44.920 --> 01:18:45.640]  You kidding?
[01:18:45.640 --> 01:18:46.840]  My knees would be shredded.
[01:18:46.840 --> 01:18:47.640]  Oh my God.
[01:18:47.640 --> 01:18:53.800]  So one of the other things that's been crazy to witness is just the immediate attention,
[01:18:53.800 --> 01:18:54.040]  right?
[01:18:54.040 --> 01:18:56.840]  Like I think you play in the NFL.
[01:18:56.840 --> 01:18:59.720]  You think on that on the two of us.
[0

[01:23:34.280 --> 01:23:37.240]  Now that there's auto correct, I need to get back on Twitter.
[01:23:37.240 --> 01:23:37.720]  Right.
[01:23:37.720 --> 01:23:41.320]  I'm like, I need a pair of scissors to open these scissors.
[01:23:41.320 --> 01:23:42.040]  I need to be good.
[01:23:42.040 --> 01:23:43.320]  We're just thinking things, right?
[01:23:43.320 --> 01:23:45.080]  It's like a different thing now.
[01:23:45.080 --> 01:23:51.640]  And it's kind of about like information is power, I guess, unless all of your information
[01:23:51.640 --> 01:23:55.160]  is geared towards you thinking that everything is about you.
[01:23:55.160 --> 01:24:01.960]  Because not everyone is ever thinking about one person all the time at any point.
[01:24:02.680 --> 01:24:08.360]  If your algorithm is giving you either criticisms of yourself or adulation or praise,
[01:24:10.120 --> 01:24:14.600]  you're creating an ecosystem in which you're the centerpiece of the table.
[01:24:14.600 --> 01:24:16.

[01:28:53.960 --> 01:28:56.120]  And literally living the life of a showgirl.
[01:28:56.120 --> 01:28:56.600]  I was.
[01:28:57.400 --> 01:28:58.120]  While she wrote it.
[01:28:58.120 --> 01:29:00.600]  That's why I called it that.
[01:29:00.600 --> 01:29:01.080]  Nailed it.
[01:29:01.640 --> 01:29:04.440]  So do you want to see the back cover?
[01:29:04.440 --> 01:29:05.480]  I would love to see all of it.
[01:29:05.480 --> 01:29:05.720]  Yes.
[01:29:05.720 --> 01:29:08.760]  Back cover is where we find the 12 tracks for my 12...
[01:29:08.760 --> 01:29:09.800]  12 tracks.
[01:29:09.800 --> 01:29:10.680]  Bangers.
[01:29:10.680 --> 01:29:11.000]  Okay.
[01:29:11.000 --> 01:29:11.640]  So this is...
[01:29:11.640 --> 01:29:12.520]  So we got all...
[01:29:12.520 --> 01:29:13.400]  What are they?
[01:29:13.400 --> 01:29:17.640]  So we got track one, The Fate of Ophelia.
[01:29:17.640 --> 01:29:17.960]  Okay.
[01:29:18.520 --> 01:29:19.080]  Track two.
[01:29:20.280 --> 01:29:21.160]  G

[01:33:41.400 --> 01:33:42.440]  Well, okay.
[01:33:42.440 --> 01:33:49.720]  So it's like, I spent time in the time that we were off doing different projects and
[01:33:49.720 --> 01:33:51.320]  he and Shellback were doing different things.
[01:33:51.320 --> 01:33:56.200]  And I was making albums that were a little bit more esoteric, like folklore.
[01:33:56.200 --> 01:33:58.280]  She's so hot, she says these big words.
[01:33:58.280 --> 01:33:59.560]  You know what esoteric means.
[01:33:59.560 --> 01:34:01.160]  I know, it's for a specific following.
[01:34:01.160 --> 01:34:01.720]  Exactly.
[01:34:01.720 --> 01:34:02.280]  Exactly.
[01:34:02.280 --> 01:34:02.780]  Wait, what?
[01:34:03.880 --> 01:34:05.000]  He knows what that means.
[01:34:05.000 --> 01:34:05.560]  What does esoteric mean?
[01:34:05.560 --> 01:34:07.240]  He doesn't know what these words mean, but he knows what they mean.
[01:34:07.240 --> 01:34:10.680]  Esoteric means for a specific following, like a specific gen

[01:38:25.960 --> 01:38:29.560]  But the photos are done by Merton Marcus, who are two of my favorite photographers.
[01:38:29.560 --> 01:38:33.000]  The last time I, the only time I worked with them for an album
[01:38:33.000 --> 01:38:35.320]  cover shoot was with Reputation with that album.
[01:38:35.320 --> 01:38:37.240]  And I loved what they did with those photos.
[01:38:37.240 --> 01:38:38.600]  So I called them up for this one.
[01:38:38.600 --> 01:38:42.120]  And I'm so happy with the way that the photos came out for this one.
[01:38:42.120 --> 01:38:46.040]  And it just basically was like, I was so proud of the music and so
[01:38:46.040 --> 01:38:50.200]  excited about this project from a creative standpoint that I was just like,
[01:38:50.200 --> 01:38:51.160]  all hands on deck.
[01:38:51.160 --> 01:38:52.040]  We're going all out.
[01:38:52.760 --> 01:38:54.200]  This is a full send.
[01:38:54.680 --> 01:38:59.080]  I care about this record more than I can even overstate.

[01:44:02.360 --> 01:44:09.240]  But you had to jump through 50 million hoops in this obstacle course that is your show.
[01:44:09.960 --> 01:44:12.520]  And you did it.
[01:44:12.520 --> 01:44:14.040]  You got two more in a row.
[01:44:15.080 --> 01:44:16.120]  But you did it tonight.
[01:44:16.920 --> 01:44:17.880]  That's all that matters.
[01:44:17.880 --> 01:44:22.760]  And the reason I wanted to have it sort of like an offstage moment as the main album cover is
[01:44:22.760 --> 01:44:26.920]  because this album isn't really about what happened to me on stage.
[01:44:26.920 --> 01:44:30.600]  It's about what I was going through offstage.
[01:44:31.560 --> 01:44:32.060]  Sure.
[01:44:32.600 --> 01:44:35.960]  It's like, you know, I didn't want to have like the lights are bright.
[01:44:35.960 --> 01:44:37.720]  I'm on the stage as the main album cover.
[01:44:37.720 --> 01:44:44.440]  It's just this to me tells more of what the actual contents lyrically of the album are.
[01:44:44

[01:48:47.320 --> 01:48:48.520]  The cat that we had.
[01:48:48.520 --> 01:48:49.320]  What did you think of?
[01:48:49.320 --> 01:48:51.000]  What did you think of my three cats?
[01:48:51.000 --> 01:48:51.640]  My three perfect.
[01:48:51.640 --> 01:48:53.720]  The only one that would go near me was Benjamin.
[01:48:55.160 --> 01:48:55.320]  Yeah.
[01:48:55.320 --> 01:48:58.120]  Cause the other two consents that you had resting cat.
[01:48:58.120 --> 01:49:00.360]  He turned that might be it.
[01:49:00.920 --> 01:49:01.720]  All right.
[01:49:01.720 --> 01:49:06.600]  If you had to choose to bake one thing to show off your baking skills, what would you choose?
[01:49:06.600 --> 01:49:10.280]  Oh, right now it would be my cinnamon swirl sourdough.
[01:49:11.160 --> 01:49:12.920]  Cinnamon swirl sourdough.
[01:49:12.920 --> 01:49:14.280]  I thought you were going to go pop tarts.
[01:49:14.280 --> 01:49:15.400]  Pop tarts, pop tarts.
[01:49:15.400 --> 01:49:17.640]  It depends on who 

[01:53:31.640 --> 01:53:31.880]  Yeah.
[01:53:31.880 --> 01:53:32.520]  You need to live a little.
[01:53:32.520 --> 01:53:33.560]  You're doing it wrong.
[01:53:33.560 --> 01:53:34.120]  Live a little.
[01:53:35.080 --> 01:53:36.840]  Get into a fake mop cart.
[01:53:36.840 --> 01:53:40.040]  Everyone knows you're in there, but you're in there like, I'm sneaking around.
[01:53:40.600 --> 01:53:41.640]  Nobody knows a minute.
[01:53:41.640 --> 01:53:42.600]  Everyone knows you're in there.
[01:53:45.640 --> 01:53:49.080]  Well, it's like, it's one of those things like I work so hard to try to surprise fans.
[01:53:49.640 --> 01:53:51.880]  And they're like, I don't want to be surprised.
[01:53:51.880 --> 01:53:54.120]  Sometimes they really don't think they want to be surprised.
[01:53:54.120 --> 01:53:55.400]  I want to figure stuff out.
[01:53:55.400 --> 01:53:56.600]  No, they do.
[01:53:56.600 --> 01:54:01.720]  But then I know, I know that when I can really get them and surprise t

[01:58:15.320 --> 01:58:21.320]  So it was not the, like the altitude I was reaching, there was more pressure every step of the way,
[01:58:21.320 --> 01:58:23.960]  but I was getting more creative control and freedom every time.
[01:58:24.920 --> 01:58:30.360]  And that is why it felt so much more special to keep working harder, to get to new levels.
[01:58:30.360 --> 01:58:36.280]  It's not like, oh, like I can afford to buy this or whatever, like it, not to discount that.
[01:58:37.000 --> 01:58:41.400]  But for me, every time I got higher up on a new rung of the ladder,
[01:58:41.400 --> 01:58:46.600]  I could make stuff in a more focused and free and autonomous way.
[01:58:46.600 --> 01:58:47.960]  What else would you want as an artist?
[01:58:47.960 --> 01:58:51.880]  I never really let myself be like, I've made it, but the arrows tour, I was like, oh,
[01:58:54.120 --> 01:58:55.160]  this is different.
[01:58:55.160 --> 01:58:59.080]  Like this is not, this is nothing like what 

[02:03:00.760 --> 02:03:07.880]  All right.
[02:03:07.880 --> 02:03:11.080]  I don't know where to go from here.
[02:03:15.080 --> 02:03:16.200]  There was confetti all over.
[02:03:20.520 --> 02:03:24.120]  Oh, this is like when the cats taste something they don't that they're confused by.
[02:03:25.560 --> 02:03:26.600]  This is my favorite impression.
[02:03:26.600 --> 02:03:27.160]  This is it.
[02:03:27.160 --> 02:03:28.040]  It's a confused.
[02:03:28.040 --> 02:03:28.840]  The cat ate something.
[02:03:28.840 --> 02:03:30.440]  It doesn't know what it is.
[02:03:30.520 --> 02:03:33.080]  This is this that gets gets me every time.
[02:03:33.080 --> 02:03:34.920]  That gets me every time, man.
[02:03:34.920 --> 02:03:36.200]  I will laugh at that.
[02:03:36.200 --> 02:03:40.200]  I will like fall off the couch laughing at that if he'll do it at, you know.


In [7]:
# Create directory for saved files (if it doesn't exist yet)
save_dir = "saved_diarizations" # using this name, so eventually all needed files were in one directory
os.makedirs(save_dir, exist_ok=True)

# Saving json file with results of transcription
transcription_file = os.path.join(save_dir, "transcription_medium.json")
with open(transcription_file, "w", encoding="utf-8") as f:
    json.dump(transcription, f, ensure_ascii=False, indent=2)

print(f"[SAVED] Transcription: {transcription_file}")

[SAVED] Transcription: saved_diarizations/transcription_medium.json


## Defining speakers
The next step is to assign the corresponding speaker to each text segment.

In [29]:
HF_TOKEN = "" #<- paste your hf tolen

print("[INFO] Loading speaker diarization pipeline...")

try:
    pipeline = Pipeline.from_pretrained(
        "pyannote/speaker-diarization-3.1", 
        use_auth_token=HF_TOKEN
    )
    print("[INFO] Successfully loaded speaker-diarization-3.1 model")
except Exception as e:
    print(f"[ERROR] Could not load v3.1, trying v3.0: {e}")
    try:
        pipeline = Pipeline.from_pretrained(
            "pyannote/speaker-diarization-3.0", 
            use_auth_token=HF_TOKEN
        )
        print("[INFO] Successfully loaded speaker-diarization-3.0 model")
    except Exception as fallback_e:
        print(f"[ERROR] All models failed to load: {fallback_e}")
        raise

print("[INFO] Running diarization on audio file...")
diarization = pipeline(audio_path)

# Print diarization results for debugging
print("[INFO] Diarization results:")
speakers = set()
for turn, _, speaker in diarization.itertracks(yield_label=True):
    speakers.add(speaker)
    print(f"  {turn.start:.2f}s - {turn.end:.2f}s: {speaker}")
print(f"[INFO] Detected {len(speakers)} speakers: {', '.join(sorted(speakers))}")

# Step 4: Merge transcription segments with diarization labels
def merge_transcript_with_speakers(transcript, diarization):
    """
    Merge transcription segments with speaker diarization labels.
    
    Args:
        transcript: Dictionary with 'segments' key containing transcription data
        diarization: pyannote diarization result
    
    Returns:
        List of merged segments with speaker labels
    """
    merged_segments = []
    
    for seg in transcript["segments"]:
        start, end, text = seg["start"], seg["end"], seg["text"]
        speaker_label = None
        max_overlap = 0
        
        # Find speaker label by maximum overlapping diarization segment
        for turn, _, speaker in diarization.itertracks(yield_label=True):
            # Calculate overlap between transcript segment and diarization segment
            overlap_start = max(start, turn.start)
            overlap_end = min(end, turn.end)
            overlap_duration = max(0, overlap_end - overlap_start)
            
            if overlap_duration > max_overlap:
                max_overlap = overlap_duration
                speaker_label = speaker
        
        merged_segments.append({
            "start": start,
            "end": end,
            "text": text.strip(),
            "speaker": speaker_label,
            "overlap_duration": max_overlap
        })
    
    return merged_segments

# Assuming 'transcription' variable exists from previous steps
try:
    merged = merge_transcript_with_speakers(transcription, diarization)
    
    print("[INFO] Example of merged transcript:")
    for i, m in enumerate(merged[:5]):
        overlap_info = f"(overlap: {m['overlap_duration']:.2f}s)" if m['overlap_duration'] > 0 else "(no overlap)"
        print(f"  {i+1}. {m['speaker']}: {m['text']} {overlap_info}")
    
    # Step 5: Show available speakers and filter by chosen speaker
    available_speakers = set(m["speaker"] for m in merged if m["speaker"] is not None)
    print(f"[INFO] Available speakers in merged transcript: {', '.join(sorted(available_speakers))}")
    
    # Choose the most frequent speaker as default, or specify manually
    if available_speakers:
        speaker_counts = {}
        for m in merged:
            if m["speaker"]:
                speaker_counts[m["speaker"]] = speaker_counts.get(m["speaker"], 0) + 1
        
        # Get most frequent speaker
        most_frequent_speaker = max(speaker_counts, key=speaker_counts.get)
        target_speaker = most_frequent_speaker  # or specify manually like "SPEAKER_00"
        
        print(f"[INFO] Speaker statistics:")
        for speaker, count in sorted(speaker_counts.items()):
            print(f"  {speaker}: {count} segments")
        
        print(f"[INFO] Using most frequent speaker: {target_speaker}")
        
        # Filter by chosen speaker
        filtered_segments = [m for m in merged if m["speaker"] == target_speaker]
        filtered_text = [m["text"] for m in filtered_segments]
        
        print(f"[INFO] Filtered text for {target_speaker} ({len(filtered_segments)} segments):")
        combined_text = " ".join(filtered_text)
        
        # Print first 500 characters as preview
        preview_length = 500
        if len(combined_text) > preview_length:
            print(f"  {combined_text[:preview_length]}...")
            print(f"  [Total length: {len(combined_text)} characters]")
        else:
            print(f"  {combined_text}")
            
        # Optional: Save to file
        # with open(f"speaker_{target_speaker}_transcript.txt", "w", encoding="utf-8") as f:
        #     f.write(combined_text)
        # print(f"[INFO] Saved to speaker_{target_speaker}_transcript.txt")
        
    else:
        print("[WARNING] No speakers found in merged transcript!")
        
except NameError:
    print("[ERROR] 'transcription' variable not found. Make sure you have run the transcription step first.")
except Exception as e:
    print(f"[ERROR] Error during merging: {e}")
    print(f"[ERROR] Error type: {type(e).__name__}")

[INFO] Loading speaker diarization pipeline...


Downloading pytorch_model.bin:   0%|          | 0.00/5.91M [00:00<?, ?B/s]

Downloading config.yaml:   0%|          | 0.00/399 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/26.6M [00:00<?, ?B/s]

Downloading config.yaml:   0%|          | 0.00/221 [00:00<?, ?B/s]

[INFO] Successfully loaded speaker-diarization-3.1 model
[INFO] Running diarization on audio file...


  std = sequences.std(dim=-1, correction=1)


[INFO] Diarization results:
  0.03s - 0.32s: SPEAKER_00
  0.91s - 2.33s: SPEAKER_00
  3.30s - 8.99s: SPEAKER_00
  9.24s - 13.14s: SPEAKER_04
  10.56s - 11.12s: SPEAKER_00
  11.59s - 12.42s: SPEAKER_00
  13.14s - 18.71s: SPEAKER_00
  15.18s - 15.20s: SPEAKER_04
  15.20s - 16.53s: SPEAKER_03
  17.80s - 22.36s: SPEAKER_03
  22.36s - 22.76s: SPEAKER_00
  22.39s - 22.64s: SPEAKER_03
  22.76s - 23.03s: SPEAKER_03
  23.03s - 26.27s: SPEAKER_00
  26.31s - 31.03s: SPEAKER_01
  34.29s - 53.42s: SPEAKER_01
  38.10s - 38.35s: SPEAKER_00
  38.35s - 38.64s: SPEAKER_02
  38.64s - 38.67s: SPEAKER_00
  38.67s - 38.83s: SPEAKER_02
  38.83s - 38.98s: SPEAKER_00
  44.72s - 45.04s: SPEAKER_00
  50.37s - 51.03s: SPEAKER_00
  53.61s - 64.97s: SPEAKER_01
  56.51s - 56.55s: SPEAKER_00
  65.64s - 70.30s: SPEAKER_01
  70.30s - 75.75s: SPEAKER_00
  76.37s - 76.69s: SPEAKER_00
  77.20s - 79.14s: SPEAKER_00
  77.30s - 77.94s: SPEAKER_01
  79.75s - 88.44s: SPEAKER_00
  89.28s - 92.61s: SPEAKER_00
  92.76s - 94.51s: 

[INFO] Example of merged transcript:
  1. SPEAKER_00: All right, let's get to the part of this show that I think is what everybody is going to (overlap: 4.92s)
  2. SPEAKER_00: be talking about. (overlap: 0.77s)
  3. SPEAKER_04: Do I get to say it? (overlap: 0.98s)
  4. SPEAKER_04: Do I get to say I get to say the two words? (overlap: 2.80s)
  5. SPEAKER_00: Yes. (overlap: 0.88s)
[INFO] Available speakers in merged transcript: SPEAKER_00, SPEAKER_01, SPEAKER_02, SPEAKER_03, SPEAKER_04
[INFO] Speaker statistics:
  SPEAKER_00: 588 segments
  SPEAKER_01: 431 segments
  SPEAKER_02: 12 segments
  SPEAKER_03: 1525 segments
  SPEAKER_04: 24 segments
[INFO] Using most frequent speaker: SPEAKER_03
[INFO] Filtered text for SPEAKER_03 (1525 segments):
  This is very much whatever within the framework of the podcast. I'm a fan of the podcast. What's that intro, Jason? Oh my God. I've seen this before. Look, his soul has left his body. No, that was so good. Thank you. Thank you for screaming for li

What we see in the data is that diarization identified 5 speakers, but we know there were only 3 speakers in the podcast.
So we need to fix it. Let's start by defining text samples for all speakers to determine where SPEAKER_02 and SPEAKER_04 should be merged.

In [39]:
# Show text samples for each speaker to manually identify who is who

speaker_texts = {}
speaker_counts = {}

# Collect text for each speaker
for segment in merged:
    speaker = segment["speaker"]
    if speaker is None:
        continue
    
    if speaker not in speaker_texts:
        speaker_texts[speaker] = ""
        speaker_counts[speaker] = 0
    
    speaker_texts[speaker] += " " + segment["text"]
    speaker_counts[speaker] += 1

# Display samples for each speaker
print("=" * 80)
print("SPEAKER TEXT SAMPLES FOR MANUAL IDENTIFICATION")
print("=" * 80)

for speaker in sorted(speaker_texts.keys()):
    text = speaker_texts[speaker].strip()
    count = speaker_counts[speaker]
    
    print(f"\n{speaker} ({count} segments):")
    print("-" * 50)
    
    # Show first 1500 characters
    if len(text) > 1500:
        display_text = text[:1500] + "..."
    else:
        display_text = text
        
    print(display_text)
    print("-" * 50)

print(f"\nSUMMARY:")
for speaker in sorted(speaker_counts.keys()):
    print(f"  {speaker}: {speaker_counts[speaker]} segments")
print("=" * 80)

SPEAKER TEXT SAMPLES FOR MANUAL IDENTIFICATION

SPEAKER_00 (588 segments):
--------------------------------------------------
All right, let's get to the part of this show that I think is what everybody is going to be talking about. Yes. First of all, you can do whatever you want to. I don't know. We're not going to review. Typically allow the guests to say new news. Yeah, I want to do it. That's right. 92 percenters. You may remember when we said New Heights wasn't coming back till August 27th. Well, that was a lie. That was a lie. And hopefully you can forgive us because we got a as Travis said, we have a very special episode today that we just simply could not turn down. That's right. This is a special preseason episode that we decided to bring to you a little early. Our guest today is the singer, songwriter, producer and director from Nashville, Tennessee. That's bullshit. She is from waiting. She is the most awarded artist in the history of the American Music Awards, Billboard Mus

## Fixing speakers [from 5 to 3]
Based on the samples, it's clear that the two redundant speakers should be part of TS segments, i.e., SPEAKER_03.
SPEAKER_00 is JK and SPEAKER_01 is TK.

In [45]:
# Correct speaker mapping and rename to real names
from pyannote.core import Annotation, Segment

# Create corrected diarization with proper speaker names
corrected_diarization = Annotation()

# Define speaker corrections and renaming
speaker_mapping = {
    "SPEAKER_00": "Jason Kelsey",
    "SPEAKER_01": "Travis Kelsey", 
    "SPEAKER_02": "Taylor Swift",  # merge to Taylor Swift
    "SPEAKER_03": "Taylor Swift",  # already Taylor Swift
    "SPEAKER_04": "Taylor Swift"   # merge to Taylor Swift
}

print("[INFO] Applying speaker corrections...")
print("[INFO] Mapping:")
for old, new in speaker_mapping.items():
    print(f"  {old} -> {new}")

# Apply corrections to diarization
for turn, _, speaker in diarization.itertracks(yield_label=True):
    # Get corrected speaker name
    corrected_speaker = speaker_mapping.get(speaker, speaker)
    
    # Add segment with corrected speaker name
    corrected_diarization[turn] = corrected_speaker

# Replace original diarization with corrected version
diarization = corrected_diarization

print("\n[INFO] Corrected diarization results:")
corrected_speakers = set()
corrected_stats = {}

# Show corrected results
for turn, _, speaker in diarization.itertracks(yield_label=True):
    corrected_speakers.add(speaker)
    corrected_stats[speaker] = corrected_stats.get(speaker, 0) + 1
    print(f"  {turn.start:.2f}s - {turn.end:.2f}s: {speaker}")

print(f"\n[INFO] Final speakers: {', '.join(sorted(corrected_speakers))}")
print("[INFO] Final speaker statistics:")
for speaker, count in sorted(corrected_stats.items()):
    print(f"  {speaker}: {count} segments")

print(f"\n[INFO] Total speakers after correction: {len(corrected_speakers)}")
print("[INFO] Diarization variable updated with correct speaker names")

[INFO] Applying speaker corrections...
[INFO] Mapping:
  SPEAKER_00 -> Jason Kelsey
  SPEAKER_01 -> Travis Kelsey
  SPEAKER_02 -> Taylor Swift
  SPEAKER_03 -> Taylor Swift
  SPEAKER_04 -> Taylor Swift

[INFO] Corrected diarization results:
  0.03s - 0.32s: Jason Kelsey
  0.91s - 2.33s: Jason Kelsey
  3.30s - 8.99s: Jason Kelsey
  9.24s - 13.14s: Taylor Swift
  10.56s - 11.12s: Jason Kelsey
  11.59s - 12.42s: Jason Kelsey
  13.14s - 18.71s: Jason Kelsey
  15.18s - 15.20s: Taylor Swift
  15.20s - 16.53s: Taylor Swift
  17.80s - 22.36s: Taylor Swift
  22.36s - 22.76s: Jason Kelsey
  22.39s - 22.64s: Taylor Swift
  22.76s - 23.03s: Taylor Swift
  23.03s - 26.27s: Jason Kelsey
  26.31s - 31.03s: Travis Kelsey
  34.29s - 53.42s: Travis Kelsey
  38.10s - 38.35s: Jason Kelsey
  38.35s - 38.64s: Taylor Swift
  38.64s - 38.67s: Jason Kelsey
  38.67s - 38.83s: Taylor Swift
  38.83s - 38.98s: Jason Kelsey
  44.72s - 45.04s: Jason Kelsey
  50.37s - 51.03s: Jason Kelsey
  53.61s - 64.97s: Travis Kel

### Saving file with fixed speakers
This is needed so we don't have to run all this preparation each time we want to analyze the data.

In [48]:
# Create directory for saved files
save_dir = "saved_diarizations"
os.makedirs(save_dir, exist_ok=True)

# Base filename
audio_filename = "podcast"  # change to your actual audio filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

# 1. Save as pickle (fastest loading for Python)
pickle_file = os.path.join(save_dir, f"{audio_filename}_corrected_diarization.pkl")
with open(pickle_file, 'wb') as f:
    pickle.dump(diarization, f)
print(f"[SAVED] Pickle format: {pickle_file}")

# 2. Save as JSON (universal format)
json_data = []
for turn, _, speaker in diarization.itertracks(yield_label=True):
    json_data.append({
        "start": float(turn.start),
        "end": float(turn.end),
        "speaker": speaker,
        "duration": float(turn.end - turn.start)
    })

json_file = os.path.join(save_dir, f"{audio_filename}_corrected_diarization.json")
with open(json_file, 'w', encoding='utf-8') as f:
    json.dump({
        "audio_file": f"{audio_filename}.wav",
        "created_at": datetime.now().isoformat(),
        "speakers": sorted(list(set(item["speaker"] for item in json_data))),
        "total_segments": len(json_data),
        "segments": json_data
    }, f, indent=2, ensure_ascii=False)
print(f"[SAVED] JSON format: {json_file}")

# 3. Save as readable text
txt_file = os.path.join(save_dir, f"{audio_filename}_corrected_diarization.txt")
with open(txt_file, 'w', encoding='utf-8') as f:
    f.write(f"Corrected Diarization Results\n")
    f.write(f"Audio: {audio_filename}.wav\n")
    f.write(f"Created: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
    f.write(f"Speakers: Jason Kelsey, Travis Kelsey, Taylor Swift\n")
    f.write(f"Total segments: {len(json_data)}\n")
    f.write("=" * 60 + "\n\n")
    
    for item in json_data:
        f.write(f"{item['start']:>7.2f}s - {item['end']:>7.2f}s ({item['duration']:>5.2f}s): {item['speaker']}\n")
print(f"[SAVED] Text format: {txt_file}")

print(f"\n[SUCCESS] Diarization saved in 3 formats!")
print(f"[INFO] Files saved in: {save_dir}/")

# ============ LOADING FUNCTIONS ============

def load_diarization_from_pickle(pickle_file):
    """Load diarization from pickle file (fastest)"""
    with open(pickle_file, 'rb') as f:
        return pickle.load(f)

def load_diarization_from_json(json_file):
    """Load diarization from JSON file (universal)"""
    from pyannote.core import Annotation, Segment
    
    with open(json_file, 'r', encoding='utf-8') as f:
        data = json.load(f)
    
    annotation = Annotation()
    for segment_data in data["segments"]:
        segment = Segment(segment_data["start"], segment_data["end"])
        annotation[segment] = segment_data["speaker"]
    
    return annotation

print(f"\n" + "=" * 70)
print("HOW TO LOAD IN FUTURE:")
print("=" * 70)
print("# Instead of running the whole pipeline again, just load:")
print(f"diarization = load_diarization_from_pickle('{pickle_file}')")
print("# Or from JSON:")
print(f"diarization = load_diarization_from_json('{json_file}')")
print("# Loading takes 1 second instead of 2+ hours!")
print("=" * 70)

[SAVED] Pickle format: saved_diarizations/podcast_corrected_diarization.pkl
[SAVED] JSON format: saved_diarizations/podcast_corrected_diarization.json
[SAVED] Text format: saved_diarizations/podcast_corrected_diarization.txt

[SUCCESS] Diarization saved in 3 formats!
[INFO] Files saved in: saved_diarizations/

HOW TO LOAD IN FUTURE:
# Instead of running the whole pipeline again, just load:
diarization = load_diarization_from_pickle('saved_diarizations/podcast_corrected_diarization.pkl')
# Or from JSON:
diarization = load_diarization_from_json('saved_diarizations/podcast_corrected_diarization.json')
# Loading takes 1 second instead of 2+ hours!


### Rebuild merged segments with corrected speaker names

In [51]:
# Rebuild merged segments with corrected speaker names

print("[INFO] Rebuilding merged segments with corrected speakers...")

# Clear old merged data and rebuild with corrected diarization
new_merged = []

for seg in transcription["segments"]:
    start, end, text = seg["start"], seg["end"], seg["text"]
    speaker_label = None
    max_overlap = 0
    
    # Find speaker label by maximum overlapping diarization segment
    for turn, _, speaker in diarization.itertracks(yield_label=True):
        # Calculate overlap between transcript segment and diarization segment
        overlap_start = max(start, turn.start)
        overlap_end = min(end, turn.end)
        overlap_duration = max(0, overlap_end - overlap_start)
        
        if overlap_duration > max_overlap:
            max_overlap = overlap_duration
            speaker_label = speaker
    
    new_merged.append({
        "start": start,
        "end": end,
        "text": text.strip(),
        "speaker": speaker_label,
        "overlap_duration": max_overlap
    })

# Replace old merged with new one
merged = new_merged

print("[INFO] Updated merged segments with corrected speaker names")

# Show new statistics
speaker_stats = {}
for segment in merged:
    if segment["speaker"]:
        speaker_stats[segment["speaker"]] = speaker_stats.get(segment["speaker"], 0) + 1

print("[INFO] New merged statistics:")
for speaker, count in sorted(speaker_stats.items()):
    print(f"  {speaker}: {count} segments")

# Show sample of corrected merged data
print("\n[INFO] Sample of corrected merged segments:")
for i, segment in enumerate(merged[:5]):
    print(f"  {i+1}. {segment['speaker']}: \"{segment['text'][:80]}...\"")

print(f"\n[SUCCESS] Merged variable updated with {len(merged)} segments")

[INFO] Rebuilding merged segments with corrected speakers...
[INFO] Updated merged segments with corrected speaker names
[INFO] New merged statistics:
  Jason Kelsey: 588 segments
  Taylor Swift: 1561 segments
  Travis Kelsey: 431 segments

[INFO] Sample of corrected merged segments:
  1. Jason Kelsey: "All right, let's get to the part of this show that I think is what everybody is ..."
  2. Jason Kelsey: "be talking about...."
  3. Taylor Swift: "Do I get to say it?..."
  4. Taylor Swift: "Do I get to say I get to say the two words?..."
  5. Jason Kelsey: "Yes...."

[SUCCESS] Merged variable updated with 2584 segments
