# Voice Recognition Lab 2: Extending SVD/PCA Classification with Spectral Analysis

### EECS 16A: Foundations of Signals, Dynamical Systems, and Information Processing, Fall 2025

Junha Kim, Jessica Fan, Savit Bhat, Jack Kang (Fall 2024).

Sonia Chacon (Spring 2025)

Tanya Deniz Ipek (Fall 2025)

This lab was heavily inspired by previous EECS16B lab 8, written by Nathaniel Mailoa, Emily Naviasky, et al.

* [Task 1: Data Preprocessing](#task1)
* [Task 2: Spectral Analysis](#task2)
* [Task 3: Data Reshaping](#task3)
* [Task 4: PCA via SVD](#task4)
* [Task 5: Testing your Classifier](#task5)
* [Task 6: Live Classify](#task6)

### Before you start, please migrate your recording CSV file from last week. Also, have the notebook from last week handy, as you might be copying code from it to this notebook.

In [None]:
import sounddevice
import pyaudio
import wave
import numpy as np
import csv
from tqdm.notebook import trange
from IPython import display
import time
import matplotlib.pyplot as plt
import scipy.io
import utils
from mpl_toolkits.mplot3d import Axes3D
from scipy.signal import stft, spectrogram
%matplotlib inline

cm = ['blue', 'red', 'green', 'orange', 'black', 'purple']

<a id='task1'></a>
# <span style="color:navy"> Task 1: Data Preprocessing

We will repeat the data preprocessing step from lab 1 below.

In [2]:
# YOUR CODE HERE: If you recorded an additional word for VR1, replace the empty string with that word.
# If not, please remove the empty string.
all_words_arr = ['jack', 'jason', 'jessica', 'principalcomponent', '...']

Let's begin by splitting our data into a training and testing set with a 70/30 split. Run the code below to do so.

In [3]:
# Load data from csv
train_test_split_ratio = 0.7
train_dict = {}
test_dict = {}

# Build the dictionary of train and test samples.
for i in range(len(all_words_arr)):
    word_raw = utils.read_csv("{}.csv".format(all_words_arr[i]))
    word_raw_train, word_raw_test = utils.train_test_split(word_raw, train_test_split_ratio)
    train_dict[all_words_arr[i]] = word_raw_train
    test_dict[all_words_arr[i]] = word_raw_test

# Count the minimum number of samples you took across the six recorded words. These variables might be useful for you!
num_samples_train = min(list(map(lambda x : np.shape(x)[0], train_dict.values())))
num_samples_test = min(list(map(lambda x : np.shape(x)[0], test_dict.values())))

# Crop the number of samples for each word to the minimum number so all words have the same number of samples.
for key, raw_word in train_dict.items():
    train_dict[key] = raw_word[:num_samples_train,:]

for key, raw_word in test_dict.items():
    test_dict[key] = raw_word[:num_samples_test,:]


Align the recordings as we did last week. Paste in the `align_recording` function you wrote last week.

In [None]:
def align_recording(recording, length, pre_length, threshold, envelope=False):
    """
    align a single audio samples according to the given parameters.
    
    Args:
        recording (np.ndarray): a single audio sample.
        length (int): The length of each aligned audio snippet.
        pre_length (int): The number of samples to include before the threshold is first crossed.
        threshold (float): Used to find the start of the speech command. The speech command begins where the
            magnitude of the audio sample is greater than (threshold * max(samples)).
        envelope (bool): if True, use enveloping.
    
    Returns:
        aligned recording.
    """
    
    # TODO: PASTE IN YOUR ALIGN_RECORDING FUNCTION FROM LAST WEEK
    
    if envelope:
        recording = utils.envelope(recording, 5400, 100)
    
    # Find the threshold
    recording_threshold = threshold * np.max(recording)

    # TODO: Use recording_threshold, length, and prelength to cut the snippet to the correct length
    # we note where the recording magnitude first exceeds recording_threshold.
    # then, we leave prelength number of samples before the crossing of the threshold, which is where our recording starts.
    # we cut the recording so that it's 'length' samples away from the start of the recording.
    
    i = 0
    while ... : # YOUR CODE HERE
        i += 1

    snippet_start = np.clip(i - pre_length, a_min=0, a_max=len(recording) - length)
    snippet = # YOUR CODE HERE

    # TODO: Normalize the recording.
    # "Normalize" in our case is dividing the signal by the maximum absolute value (different from taking the norm of a vector)
    snippet_normalized = # YOUR CODE HERE
    
    return snippet_normalized

In [None]:
def align_data(data, length, pre_length, threshold, envelope=False):
    """
    align all audio samples in dataset. (apply align_recording to all rows of the data matrix)
    
    Args:
        data (np.ndarray): Matrix where each row corresponds to a recording's audio samples.
        length (int): The length of each aligned audio snippet.
        pre_length (int): The number of samples to include before the threshold is first crossed.
        threshold (float): Used to find the start of the speech command. The speech command begins where the
            magnitude of the audio sample is greater than (threshold * max(samples)).
    
    Returns:
        Matrix of aligned recordings.
    """
    assert isinstance(data, np.ndarray) and len(data.shape) == 2, "'data' must be a 2D matrix"
    assert isinstance(length, int) and length > 0, "'length' of snippet must be an integer greater than 0"
    assert 0 <= threshold <= 1, "'threshold' must be between 0 and 1"
    snippets = []

    # Iterate over the rows in data
    for recording in data:
        snippets.append(align_recording(recording, length, pre_length, threshold, envelope))

    return np.vstack(snippets)

In [None]:
def process_data(selected_words_arr, dict_raw, length, pre_length, threshold, plot=True, envelope=False):
    """
    Process the raw data given parameters and return it. (wrapper function for align_data)
    
    Args:
        dict_raw (np.ndarray): Raw data collected.
        data (np.ndarray): Matrix where each row corresponds to a recording's audio samples.
        length (int): The length of each aligned audio snippet.
        pre_length (int): The number of samples to include before the threshold is first crossed.
        threshold (float): Used to find the start of the speech command. The speech command begins where the
            magnitude of the audio sample is greater than (threshold * max(samples)).
        plot (boolean): Plot the dataset if true.
            
    Returns:
        Processed data dictionary.
    """
    processed_dict = {}
    word_number = 0
    for key, word_raw in dict_raw.items():
        word_processed = align_data(word_raw, length, pre_length, threshold, envelope=envelope)
        processed_dict[key] = word_processed
        if plot:
            plt.plot(word_processed.T)
            plt.title('Samples for "{}"'.format(selected_words_arr[word_number]))
            word_number += 1
            plt.show()
            
    return processed_dict

Align your recordings. **NOTE: we want to set `envelope=False` for spectral analysis!**

In [None]:
# TODO: Edit the parameters to get the best alignment.
length = # YOUR CODE HERE
pre_length = 400 # Modify this as necessary
threshold = # YOUR CODE HERE

# align training and test data
processed_train_dict = process_data(all_words_arr, train_dict, length, pre_length, threshold, envelope=False)

<a id='task2'></a>
# <span style="color:navy">Task 2: Spectral Analysis</span>

You have seen spectrograms previously in the Shazam lab. As a reminder, it's a DFT calculated on many small snippets of the signal in question. The spectrogram is a 2d plot that gives insights on both temporal and frequency information in the signal.

## <span style="color:navy"> Utilizing Spectrograms to extract features

By obtaining the spectrogram, we are able to capture the unique time-varying frequency footprint of the signal. After collecting these coefficients, we will be able to use PCA, a method of low-rank approximation, to convert the spectrogram coefficients into a basis that maximizes the amount of variance.

Calculating the spectrogram first will allow PCA to identify variations in both frequency and temporal details. 

Generate a spectrogram for each recording below.

In [None]:
# calculate the spectrogram of one recording.
# hint: scipy.signal.spectrogram may be useful here. you can just invoke spectrogram() without writing scipy.signal
def spectrogram_single_recording(data, sample_rate=5400, return_f_t=False):
    """
    calculate spectrogram of one recording.
    
    Args:
        data (np.array): single recording (row)
        sample_rate (int): sampling rate in Hz
        return_f_t (bool): indicate whether to only return Zxx or f,t,Zxx (for plotting)
    Returns:
        f (np.array): frequency index array
        t (np.array): time index array
        Zxx (np.array): array of 2D arrays (spectrogram result of each row)
    """
    f, t, Zxx = # YOUR CODE HERE
    
    if return_f_t:
        return f,t,Zxx
    else:
        return Zxx

def spectrogram_recordings(values, sample_rate=5400, return_f_t=False):
    """
    calculate spectrogram of multiple recordings.
    
    Args:
        values (np.array): recordings matrix
        sample_rate (int): sampling rate in Hz
        return_f_t (bool): indicate whether to only return Zxx or f,t,Zxx (for plotting)
    Returns:
        f (np.array): frequency index array
        t (np.array): time index array
        Zxx (np.array): array of 2D arrays (spectrogram result of each row)
    """
    spect_vals = []
    for (word, recordings) in values:
        for i in range(recordings.shape[0]):
            f, t, Zxx = spectrogram_single_recording(recordings[i, :], sample_rate=sample_rate, return_f_t=True)
            spect_vals.append(...) # YOUR CODE HERE

    # Convert the list to a numpy array
    if return_f_t:
        return f,t,np.array(spect_vals)
    else:
        return np.array(spect_vals)

In [None]:
f, t, spectrogram_results = spectrogram_recordings(processed_train_dict.items(), return_f_t=True)

In [None]:
# show the first spectrogram result
plt.pcolormesh(t, f, np.abs(spectrogram_results[0]), shading='gouraud')
plt.title('Spectrogram')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()

## <span style="color:navy"> Mel-scaled Spectrograms

The Mel-scaling is a frequency scaling that takes advantage of knowledge of the human auditory system. Rather than representing frequencies linearly, it uses this prior knowledge to compress higher frequencies and expand lower frequencies, similar to our ears. In other words, it reforms the frequency scaling to be more similar to the way humans perceive pitch. This has the potential to extract features that are more relevant to human auditory perception.

In DFT (Discrete Fourier Transform), the frequency bins are dictated by the DFT frequency bins. In the Mel-scaled equivalent, the frequency bins use a new rescaled unit called 'Mels', which is what the `n_mels` argument for the function below denotes. Thus, the returned spectrogram result would be a 2D array of shape (# timesteps x # Mels) that contains the spectrogram value in each location.

We will provide the functions needed for generating Mel-scaled spectrograms, so you can treat it as a black box if you'd like.

In [None]:
# run mel-scaled spectrogram on single recording
def mel_spectrogram_single_recording(data, sample_rate=5400, n_fft=256, n_mels=100, return_f_t=False):
    mel_filter = utils.mel_filter_bank(sample_rate, n_fft, n_mels)
    f, t, spectrogram_result = spectrogram_single_recording(data, return_f_t=True)
    mel_spectrogram_result = np.array(utils.apply_mel_filter([spectrogram_result], mel_filter))[0]
    mel_freqs = utils.mel_frequencies(n_mels, sample_rate, n_fft)
    if return_f_t:
        return mel_freqs, t, mel_spectrogram_result
    else:
        return mel_spectrogram_result

# run mel-scaled spectrogram on entire dataset
def mel_spectrogram_recordings(vals, sample_rate=5400, n_fft=256, n_mels=100, return_f_t=False):
    mel_filter = utils.mel_filter_bank(sample_rate, n_fft, n_mels)
    f, t, spectrogram_results = spectrogram_recordings(vals, return_f_t=True)
    mel_spectrogram_results = np.array(utils.apply_mel_filter(spectrogram_results, mel_filter))
    mel_freqs = utils.mel_frequencies(n_mels, sample_rate, n_fft)
    if return_f_t:
        return mel_freqs, t, mel_spectrogram_results
    else:
        return mel_spectrogram_results

In [None]:
mel_freqs, t, mel_spectrogram_results = mel_spectrogram_recordings(processed_train_dict.items(), return_f_t=True)

In [None]:
plt.pcolormesh(t, mel_freqs, mel_spectrogram_results[0], shading='gouraud')
plt.title('Mel Spectrogram')
plt.ylabel('Frequency [Mels]')
plt.xlabel('Time [sec]')
plt.show()

<a id='task3'></a>
# <span style="color:navy"> Task 3: Data Reshaping</span>


Currently, our spectrogram matrices (both regular and Mel-scaled) are 2d (time and frequency). This doesn't lend itself well to PCA, which requires a single vector per measurement. 

Therefore, we must flatten our array to a 1d format. There are many ways to accomplish this--we will be exploring and attempting some of the options below.

**Sanity check**: What do each axis in our spectrogram_results array represent?

In [None]:
spectrogram_results.shape

In [None]:
mel_spectrogram_results.shape

## <span style="color:navy"> Flattening

In flattening, we are simply taking all the elements from the matrix and place them into a single sequence, thereby having no data loss. Keeping in mind that the two dimensions are time and frequency, we might choose to flatten row-wise, or column-wise. We may even decide to use a different approach, zig-zagging through the matrix to flatten it. For now, we will use row-wise flattening, but we encourage you to experiment with approaches in the later part of the lab.

<img src="images/flatten.png">

In [None]:
# flatten the matrix row-wise.
def apply_flattening_single_recording(data):
    return # YOUR CODE HERE hint: np.reshape may be helpful here.

# flatten a list of matrices row-wise.
def apply_flattening(vals):
    return # YOUR CODE HERE

Test these functions by running the cells below. Does the shape match what you would expect?

In [None]:
# Test apply_flattening_single_recording
x = np.array([[1,1], [2,2], [3,3]])
y = apply_flattening_single_recording(x)
print("Input:", x)
print("Result:", y)
print("Shape:", y.shape)

In [None]:
# Test apply_flattening
a = np.array([x, 2*x, 3*x]) # create a list of matrices
b = apply_flattening(a)
print("Input:", a)
print("Result:", b)
print("Shape:", b.shape)
print()

## <span style="color:navy"> Aggregation of standard deviation / variance features over frames

Another method to convert our matrix to a vector is to use mean and variances over frequency bins for each time frame. This does lose some information, but very specific frequency information is likely redundant for a simple voice classification scheme. Generally, the memory tradeoff for this feature is worth it.

<img src="images/aggregate.png">

In [None]:
# TODO: Compute mean and variance over frequency bins for each time frame

# apply aggregation for one recording
def apply_aggregation_single_recording(data):
    
    # hint 1: use np.mean and np.var
    # hint 2: the time varies down the rows of data, so what axis should you take the mean/var from?
    mean_feature, variance_feature = # YOUR CODE HERE

    # Stack features to get a combined feature set
    return # YOUR CODE HERE hint: use np.concatenate

# apply aggregation for multiple recordings
def apply_aggregation(vals):
    
    # hint: you should use the same functions as above, but how should the axis argument change since
    #       vals is a list of recordings?
    mean_features, variance_features = # YOUR CODE HERE

    # Stack features to get a combined feature set
    return # YOUR CODE HERE (same as above, but you might have to specify a different axis argument)

## <span style="color:navy"> Computing all 4 possible configurations for spectral analysis + reshaping

We have two methods of spectrogram generation (Mel-scaled spectrograms and regular spectrograms) and two methods of data aggregation (regular flattening vs mean & variance aggregation). That gives us 4 possible combinations, so let's generate a processed dataset for all 4 cases so we can compare!

In [None]:
# use the functions we wrote above!
# Remember the data we are using is from either spectrogram_results or mel_spectrogram_results!

# TODO: mel scaled spectrogram + flattening
processed_A_mel_flattening = # YOUR CODE HERE
print(f"processed_A_mel_flattening shape: {processed_A_mel_flattening.shape}")

# TODO: regular spectrogram + flattening
processed_A_spectrogram_flattening = # YOUR CODE HERE
print(f"processed_A_spectrogram_flattening shape: {processed_A_spectrogram_flattening.shape}")

# TODO: mel scaled spectrogram + aggregation
processed_A_mel_aggregated = # YOUR CODE HERE
print(f"processed_A_mel_aggregated shape: {processed_A_mel_aggregated.shape}")

# TODO: regular spectrogram + aggregation
processed_A_spectrogram_aggregated = # YOUR CODE HERE
print(f"processed_A_spectrogram_aggregated shape: {processed_A_spectrogram_aggregated.shape}")

<a id='task4'></a>
# <span style="color:navy">Task 4: PCA via SVD</span>

Now we will be repeating our PCA steps from last week's lab. Refer to the code you've written to complete this section. In the empty `processed_A`, choose one of the four processed datasets from above. Rerun this section for different cases to compare the clustering characteristics and results!

In [None]:
# choose one of the four processed_A we generated above
processed_A = # YOUR CODE HERE
# demean the matrix A
mean_vec = # YOUR CODE HERE
demeaned_A = # YOUR CODE HERE

# Take the SVD of matrix demeaned_A
U, S, Vt = # YOUR CODE HERE

# Plot out the sigma values
plt.figure()
plt.stem(S)
plt.title("Stem Plot of Sigma Values")

# Plot the principal component(s)
new_basis = # YOUR CODE HERE
plt.figure()
plt.plot(new_basis)
plt.title("New Basis Vectors")

In [None]:
# Project the data onto the new basis
proj = # YOUR CODE HERE hint: np.dot() may help

# Determine the centroids of each cluster
centroids = []
for i in range(len(all_words_arr)):
    centroid = np.mean(proj[i*num_samples_train:(i + 1)* num_samples_train], axis=0)
    centroids.append(centroid)


######################
# Plot the centroids #
######################
centroid_list = np.vstack(centroids)
colors = cm[:(len(centroids))]

if new_basis.shape[1] == 3:
    fig=plt.figure(figsize=(10,7))
    ax = fig.add_subplot(111, projection='3d')
    for i in range(len(all_words_arr)):
        Axes3D.scatter(ax, *proj[i*num_samples_train:num_samples_train*(i+1)].T, c=cm[i], marker = 'o', s=20)
    plt.legend(all_words_arr, loc='center left', bbox_to_anchor=(1.07, 0.5))
    for i in range(len(all_words_arr)):
        Axes3D.scatter(ax, *np.array([centroids[i]]).T, c=cm[i], marker = '*', s=300)
    plt.title("Training Data")
    
    fig, axs = plt.subplots(1, 3, figsize=(15,5))
    for i in range(len(all_words_arr)):
        axs[0].scatter(proj[i*num_samples_train:num_samples_train*(i+1),0], proj[i*num_samples_train:num_samples_train*(i+1),1], c=cm[i], edgecolor='none')
        axs[1].scatter(proj[i*num_samples_train:num_samples_train*(i+1),0], proj[i*num_samples_train:num_samples_train*(i+1),2], c=cm[i], edgecolor='none')
        axs[2].scatter(proj[i*num_samples_train:num_samples_train*(i+1),1], proj[i*num_samples_train:num_samples_train*(i+1),2], c=cm[i], edgecolor='none')
    axs[0].set_title("View 1")
    axs[1].set_title("View 2")
    axs[2].set_title("View 3")
    plt.legend(all_words_arr, loc='center left', bbox_to_anchor=(1, 0.5))
    axs[0].scatter(centroid_list[:,0], centroid_list[:,1], c=colors, marker='*', s=300)
    axs[1].scatter(centroid_list[:,0], centroid_list[:,2], c=colors, marker='*', s=300)
    axs[2].scatter(centroid_list[:,1], centroid_list[:,2], c=colors, marker='*', s=300)

elif new_basis.shape[1] == 2:
    fig=plt.figure(figsize=(10,7))
    for i in range(len(all_words_arr)):
        plt.scatter(proj[i*num_samples_train:num_samples_train*(i+1),0], proj[i*num_samples_train:num_samples_train*(i+1),1], c=colors[i], edgecolor='none')

    plt.scatter(centroid_list[:,0], centroid_list[:,1], c=colors, marker='*', s=300)
    plt.legend(all_words_arr, loc='center left', bbox_to_anchor=(1, 0.5))
    plt.title("Training Data")
    
plt.show()
for i, centroid in enumerate(centroid_list):
    print('Centroid {} is at: {}'.format(i, str(centroid)))

<a id='task5'></a>
# <span style="color:navy"> Task 5: Testing your Classifier</span>

Great! Now that we have the means (centroid) for each word, let's evaluate performance. Recall that we will classify each data point according to the centroid with the least Euclidian distance to it.

Before we perform classification, we need to do the same preprocessing to the test data that we did to the training data (enveloping, demeaning, projecting onto the PCA basis). You have already written most of the code for this part. However, note the difference in variable names as we are now working with test data.

First let's look at what our raw test data looks like.

## <span style="color:navy"> Test Data Preprocessing

Perform enveloping and trimming of our test data.

In [None]:
processed_test_dict = process_data(all_words_arr, test_dict, length, pre_length, threshold, envelope=False)

In [None]:
# run spectrogram and mel spectrogram on the test set as well
spectrogram_results_test = spectrogram_recordings(processed_test_dict.items(), return_f_t=False)
mel_spectrogram_results_test = mel_spectrogram_recordings(processed_test_dict.items(), return_f_t=False)

# generate the four possible configurations for the test data as well

# TODO: mel scaled spectrogram + flattening
processed_A_mel_flattening_test = # YOUR CODE HERE
print(f"processed_A_mel_flattening_test shape: {processed_A_mel_flattening_test.shape}")

# TODO: regular spectrogram + flattening
processed_A_spectrogram_flattening_test = # YOUR CODE HERE
print(f"processed_A_spectrogram_flattening_test shape: {processed_A_spectrogram_flattening_test.shape}")

# TODO: mel scaled spectrogram + aggregation
processed_A_mel_aggregated_test = # YOUR CODE HERE
print(f"processed_A_mel_aggregated_test shape: {processed_A_mel_aggregated_test.shape}")

# TODO: regular spectrogram + aggregation
processed_A_spectrogram_aggregated_test = # YOUR CODE HERE
print(f"processed_A_spectrogram_aggregated_test shape: {processed_A_spectrogram_aggregated_test.shape}")

Now we will project our processed test dataset the same way we did as before. As a reminder, we precomputed the mean vector $ \bar{x}_{\text{proj}} $ to save storage in our test classification and live classification:

$$ (x - \bar{x})P = xP - \bar{x}P = xP - \bar{x}_{\text{proj}} $$ 
$$ \bar{x}_{\text{proj}} = \bar{x}P $$

In [None]:
# choose the processed_A_test combination from above that matches the same combination as the training data
processed_A_test = # YOUR CODE HERE

# precompute the projected mean vector
projected_mean_vec = # YOUR CODE HERE

# Project the data onto the new basis and demean it with the precomputed projected mean vector
proj = # YOUR CODE HERE hint: np.dot() may help.

# Determine the centroids of each cluster
centroids = []
for i in range(len(all_words_arr)):
    centroid = np.mean(proj[i*num_samples_test:(i + 1)* num_samples_test], axis=0)
    centroids.append(centroid)

######################
# Plot the centroids #
######################
centroid_list = np.vstack(centroids)
colors = cm[:(len(centroids))]

if new_basis.shape[1] == 3:
    fig=plt.figure(figsize=(10,7))
    ax = fig.add_subplot(111, projection='3d')
    for i in range(len(all_words_arr)):
        Axes3D.scatter(ax, *proj[i*num_samples_test:num_samples_test*(i+1)].T, c=cm[i], marker = 'o', s=20)
    plt.legend(all_words_arr, loc='center left', bbox_to_anchor=(1.07, 0.5))
    for i in range(len(all_words_arr)):
        Axes3D.scatter(ax, *np.array([centroids[i]]).T, c=cm[i], marker = '*', s=300)
    plt.title("Training Data")
    
    fig, axs = plt.subplots(1, 3, figsize=(15,5))
    for i in range(len(all_words_arr)):
        axs[0].scatter(proj[i*num_samples_test:num_samples_test*(i+1),0], proj[i*num_samples_test:num_samples_test*(i+1),1], c=cm[i], edgecolor='none')
        axs[1].scatter(proj[i*num_samples_test:num_samples_test*(i+1),0], proj[i*num_samples_test:num_samples_test*(i+1),2], c=cm[i], edgecolor='none')
        axs[2].scatter(proj[i*num_samples_test:num_samples_test*(i+1),1], proj[i*num_samples_test:num_samples_test*(i+1),2], c=cm[i], edgecolor='none')
    axs[0].set_title("View 1")
    axs[1].set_title("View 2")
    axs[2].set_title("View 3")
    plt.legend(all_words_arr, loc='center left', bbox_to_anchor=(1, 0.5))
    axs[0].scatter(centroid_list[:,0], centroid_list[:,1], c=colors, marker='*', s=300)
    axs[1].scatter(centroid_list[:,0], centroid_list[:,2], c=colors, marker='*', s=300)
    axs[2].scatter(centroid_list[:,1], centroid_list[:,2], c=colors, marker='*', s=300)

elif new_basis.shape[1] == 2:
    fig=plt.figure(figsize=(10,7))
    for i in range(len(all_words_arr)):
        plt.scatter(proj[i*num_samples_test:num_samples_test*(i+1),0], proj[i*num_samples_test:num_samples_test*(i+1),1], c=colors[i], edgecolor='none')

    plt.scatter(centroid_list[:,0], centroid_list[:,1], c=colors, marker='*', s=300)
    plt.legend(all_words_arr, loc='center left', bbox_to_anchor=(1, 0.5))
    plt.title("Training Data")
    
plt.show()
for i, centroid in enumerate(centroid_list):
    print('Centroid {} is at: {}'.format(i, str(centroid)))

Implement the classify function.

In [None]:
def classify(data_point, new_basis, projected_mean_vec, centroids):
    """Classifies a new voice recording into a word.
    
    Args:
        data_point: new data point vector before demeaning and projection
        new_basis: the new processed basis to project on
        projected_mean_vec: the same projected_mean_vec as before
    Returns:
        Word number (should be in {1, 2, 3, 4} -> you might need to offset your indexing!)
    Hint:
        Remember to use 'projected_mean_vec'!
        np.argmin(), and np.linalg.norm() may also help!
    """
    # TODO: classify the demeaned data point by comparing its distance to the centroids
    projected_data_point = # YOUR CODE HERE
    demeaned = # YOUR CODE HERE
    return all_words_arr[...] # YOUR CODE HERE

In [None]:
# Try out the classification function
print(classify(processed_A_test[0,:], new_basis, projected_mean_vec, centroids)) # Modify the row index of processed_A_test to use other vectors

In [None]:
# Try to classify the whole A matrix
correct_counts = np.zeros(len(all_words_arr))

for (row_num, data) in enumerate(processed_A_test):
    word_num = row_num // num_samples_test
    if classify(data, new_basis, projected_mean_vec, centroids) == all_words_arr[word_num]:
        correct_counts[word_num] += 1
        
for i in range(len(correct_counts)):
    print("Percent correct of word {} = {}%".format(all_words_arr[i], 100 * correct_counts[i] / num_samples_test))

<a id='task6'></a>
# <span style="color:navy">Task 6: Testing the classifier: Real Time</span>

**Once you finish Task 5 with satisfactory accuracies, please come to the Cory 140 lab to try out your classifier!**

Now, we'll be testing the classifier in real time. Run the script below, and when prompted, say one of your chosen words. 

NOTE: Do not worry if your detector does not work as intended for some words. If you decided to record your own word in Task 1, it is likely your classifier will only detect your own word. This is likely due to the fact that your recording will have different frequencies compared to the recordings we provide.

In [None]:
rate = 5400  # Sample rate
chunk = 1024  # Chunk size
record_seconds = 3  # Record duration in seconds
num_recordings = 50  # Total number of recordings needed
recording_count = 0
word = ""

while True:
    user_input = input("Press Enter to start recording, or type 'stop' and then Enter to stop recording: ")

    if user_input == '':
        # Record audio
        audio_recording = utils.record_audio(seconds=record_seconds, rate=rate, chunk=chunk)

        # TODO: Preprocess the single data
        aligned_audio_recording = # YOUR CODE HERE hint: use align_recording
        
        # TODO: 
        # run your chosen combo of spectrogram vs mel-spectrogram + flattening vs aggregation
        # hint: you might want to use the single_recording version of the functions we wrote above
        result = # YOUR CODE HERE
        processed_audio_recording = # YOUR CODE HERE

        # Run Classify
        print("Classified Word: " + classify(processed_audio_recording, new_basis, projected_mean_vec, centroids))
        time.sleep(2)
       
    if user_input == 'stop':
        display.clear_output()
        break

## <span style="color:green">CHECKOFF</span>

### When you are ready to get checked off, fill out the **[Checkoff Google Form](https://docs.google.com/forms/d/e/1FAIpQLScwrFoYRPPZ7eAhCnDLsIB7FFP2na2CgkW0RTa3A0Ii1Xf5NA/viewform)**

- **Have all questions, code, and plots completed in this notebook.** Your TA will check all your PCA code and plots.
- **Show your GSI that you've achieved 80% accuracy on your test data for all 4 words.**
- Make sure to test **at least two** of the four possible combinations and be able to talk about the differences in accuracy.
- **Show your GSI that you are able to classify live (Please show up to section for this).**
- **Be prepared to answer conceptual questions about the lab.**