# Model Validation
In this notebook, I take the mp3s of videos from Wes's and Scott's youtube channels to verify the speaker prediction model. Wes' video is [CSS GRID: Full Bleed Blog Layout Exercise — 25 of 25](https://www.youtube.com/watch?v=z9p4ctpvmTs) and Scott's video is [Code Blog - How I Fixed A Very Weird Bug](https://www.youtube.com/watch?v=Jt5zb94F-bI). The idea here is that the model should predict most of the one second segments of the audio are the author of the video speaking.

In this notebook we:
1. Split the video mp3s into one second secgments
2. Make predictions for the two videos
3. Spot check that most of the segments are predicted as the author of the video speaking.

Our model is quite accurate with both videos, with `95%` accuracy for Scott and `97%` accuracy for Wes.

This directory contains two folders, `clips` and `mp3-files`. The `mp3-files` folder contains downloaded mp3s from YouTube videos of Wes and Scott. Wes' video is [CSS GRID: Full Bleed Blog Layout Exercise — 25 of 25](https://www.youtube.com/watch?v=z9p4ctpvmTs) and Scott's video is [Code Blog - How I Fixed A Very Weird Bug](https://www.youtube.com/watch?v=Jt5zb94F-bI). The `clips` folder contains the one second clips of both episodes for validation.

### Split MP3s

In [1]:
from collections import Counter
import warnings
from pathlib import Path

import jsonlines
import librosa
import numpy as np
import pydub
from sklearn.externals import joblib
from tqdm import tqdm

warnings.filterwarnings("ignore", category=DeprecationWarning)

In [2]:
def split_audio(segment, fname):
    """Takes an episode segment and splits it into 1s chunks"""
    slices_1s = segment[::1000]
    for i, chunk in enumerate(slices_1s):
        second_zfill = str(i).zfill(4)
        try:
            with open(f"validation-data/clips/{fname}-{second_zfill}.mp3", "xb") as f:
                chunk.export(f, format="mp3")
        except FileExistsError:
            pass

In [3]:
wes_mp3 = "./validation-data/mp3-files/CSS GRID Full Bleed Blog Layout Exercise 25 of 25.mp3"
wes_mp3_segment = pydub.AudioSegment.from_mp3(wes_mp3).set_channels(1)
split_audio(wes_mp3_segment, "wes")

In [4]:
scott_mp3 = "./validation-data/mp3-files/Code Blog - How I Fixed A Very Weird Bug.mp3"
scott_mp3_segment = pydub.AudioSegment.from_mp3(scott_mp3).set_channels(1)
split_audio(scott_mp3_segment, "scott")

### Make Predictions

In [5]:
full_model = joblib.load("syntax-speaker-predictor.pkl")

In [6]:
def get_features(mp3_path):
    y, sr = librosa.load(mp3_path)
    mel_spec = librosa.feature.melspectrogram(y=y, sr=sr)
    mel_ravel = mel_spec.ravel()
    return mel_ravel


def make_prediction(mp3_path):
    try:
        x = get_features(mp3_path)
    except EOFError:
        return 2
    if x.shape != (5888,):
        return 2  # other (under 1s)
    else:
        pred = full_model.predict([x])
        return pred[0]

In [7]:
wes_clips = list(sorted(Path("./validation-data/clips/").glob(f"wes-*.mp3")))
wes_predictions = []
for clip in tqdm(wes_clips, total=len(wes_clips)):
    wes_predictions.append((clip.stem, make_prediction(str(clip))))

100%|██████████| 657/657 [01:22<00:00,  7.92it/s]


In [8]:
# Should be mostly `1` for Wes
# Some `2` for "Other" Intro Music, etc

speaker_counts_wes = Counter(i[1] for i in wes_predictions).most_common()
for speaker, count in speaker_counts_wes:
    print(speaker, count, f"{count / len(wes_predictions):.2f}")


1 638 0.97
0 16 0.02
2 3 0.00


In [9]:
scott_clips = list(sorted(Path("./validation-data/clips/").glob(f"scott-*.mp3")))
scott_predictions = []
for clip in tqdm(scott_clips, total=len(scott_clips)):
    scott_predictions.append((clip.stem, make_prediction(str(clip))))

100%|██████████| 641/641 [01:11<00:00,  8.92it/s]


In [10]:
# Should be mostly `0` for Scott
# Some `2` for "Other" Intro Music, etc

speaker_counts_scott = Counter(i[1] for i in scott_predictions).most_common()
for speaker, count in speaker_counts_scott:
    print(speaker, count, f"{count / len(scott_predictions):.2f}")


0 608 0.95
1 32 0.05
2 1 0.00
