# Samson Leung, Naim Youssif 
# Introduction

Our project attempts to classify the voices of 34 different speakers by using scipy library's kmeans function on  their training data. We have 34 speakers, each having 500 seconds of data.

# Concatenate Raw Speaker Files
* This cell converts all 500 files of each of the 34 speakers to a single continuous files
* The 34 files are saved in ./Raw Speaker Files, each file will take about a minute to process and save
* Cell output will show when each file is completed in the format "Saved - %s seconds

In [None]:
##### CONCATENATE RAW SPEAKER FILES #####
import time
import pickle
import numpy as np
from os import listdir
import scipy.io.wavfile as wav

# load wav files
speakersToLoad = 34
filesToLoad = 500
i=1
for speaker in range(1,speakersToLoad+1):
    start_time = time.time()
    path = 'Raw Speaker Files/' + str(speaker)
    files = listdir(path)
    
    folder = []
    for file in range(filesToLoad):
        rate, data = wav.read(path +'/'+ files[file])
        folder = np.concatenate([folder, data])
    np.save('Raw Speaker Files/speaker' + str(i), folder)
    
    print(str(i)+" Saved - %s seconds" % (time.time() - start_time))
    i+=1
print('SAVED FILES')

# Apply Transformations
* This cell applies the Short Term Fourier Transformation (STFT) on each of the 34 raw speaker files
* The settings for the STFT are a 50ms window with 10ms overlap, each audio file is 25,000 samples/sec
* After STFT, we take the absolute value, and then natural log. After, we apply the whitening transformation so the data has an identity covariance matrix. Matrix is then transposed for the kmeans function
* The final result are 34 processed matrices of spectral vectors saved to the folder ./spectrals. Output will show when each spectral is made (~10 seconds each)


In [13]:
##### APPLY TRANSFORMATIONS #####
from scipy.signal import stft
from scipy.cluster.vq import whiten

for i in range(34):
    start_time = time.time()
    rawFile = np.load('Raw Speaker Files/speaker' + str(i+1) + '.npy')

    fs=25000 # audio is in 25,000 samples/sec
    nperseg=1250 # 1250 samples per segment (50ms)
    noverlap=250 # 250 sample overlap (10ms overlap)

    # apply, short-term Fourier, then absolute value, then natural log, then Whitening transform
    start_time = time.time()
    freqs, times, Zxx = stft(x=rawFile, fs=fs, nperseg=nperseg, noverlap=noverlap, boundary=None)  
    spectral = whiten(np.log(np.absolute(Zxx)).T)
    np.save('spectrals/spectral' + str(i+1), spectral)

    print(str(i+1)+" Spectral created - %s seconds" % (time.time() - start_time))

print('FOURIER, ABS, LOG, WHITEN, DONE')

This cell applies the Short Term Fourier Transformation (STFT) on each of the 34 raw speaker files
The settings for the STFT are a 50ms window with 10ms overlap, each audio file is 25,000 samples/secAfter STFT, we take the absolute value, and then natural log. After, we apply the whitening transformation
so the data has an identity covariance matrix. The final result are 34 processed matrices of spectral vectors
saved to the folder ./spectrals. Output will show when each spectral is made (around 10 seconds)
1 Spectral created - 2.812817335128784 seconds
2 Spectral created - 5.156804084777832 seconds
3 Spectral created - 9.047848224639893 seconds
4 Spectral created - 5.04529881477356 seconds
5 Spectral created - 5.9851531982421875 seconds
6 Spectral created - 5.618335723876953 seconds
7 Spectral created - 5.812391519546509 seconds
8 Spectral created - 6.101458311080933 seconds
9 Spectral created - 5.782270431518555 seconds
10 Spectral created - 4.552553415298462 seconds
11 Spectral crea

# Generate Code Books Using Kmeans
* This cell uses the Kmeans method to create code books for each speaker using the kmeans funciton from the scipy libraries
* First each speaker's spectral vector matrix is loaded
* Each kmeans takes around 5 minutes and the output will tell you when each kmeans has completed and the time it took to complete it. 
* Codebooks will be saved to the ./codeBooks folder
* Output of cell will tell you when each kmeans has finished

In [None]:
#### APPLY Kmeans, SAVE CODEBOOKS #####
print('This cell uses the Kmeans method to create code books for each speaker using the kmeans funciton from\n' 
      'the scipy libraries. Each kmeans takes around 5 minutes and the output will tell you when each kmeans has\n' 
      'completed and the time it took to complete it. Codebooks will be saved to the ./codeBooks folder')

import time
import numpy as np
from scipy.cluster.vq import kmeans

for i in range(34):
    spectral = np.load('./spectral' + str(i+1) + '.npy')
    start_time = time.time()
    codebook, distortion = kmeans(obs=spectral, k_or_guess=512)
    np.save('codeBooks/codeBook' + str(i+1), codebook)
    print(str(i+1) + " KMeans complete - %s seconds" % (time.time() - start_time))
print('KMeans ALL DONE')

# Generate Random Test Data
10% of the data from each of the 34 speakers (50 of the 500 files) is selected at random and loaded for testing. Output of this cell will tell you when 34 sets of random data have been sleected and loaded

In [None]:
##### GENERATE RANDOM TEST DATA #####
import time
import numpy as np
from os import listdir
import scipy.io.wavfile as wav

# Load Code Books into array
# codeBooks[speaker]
codeBooks = []
start_time = time.time()

for i in range(34):
    codeBooks.append(np.load('codeBooks/codeBook' + str(i+1) + '.npy'))
    
print("CodeBooks loaded - %s seconds" % (time.time() - start_time))

# Randomly select 10% of the files of each speaker to use as test data
# testData = []
testData = []

start_time = time.time()

for speaker in range(34):
    rand = np.random.randint(low=0, high=500, size=50) # Fifty random ints from 0-499
    path = 'Raw Speaker Files/' + str(speaker+1)
    files = listdir(path)
    
    fiftyFiles = []
    
    for i in rand:
        rate, data = wav.read(path +'/'+ files[i])
        fiftyFiles = np.concatenate([fiftyFiles, data])
        
    testData.append(fiftyFiles)
print("Test Data loaded - %s seconds" % (time.time() - start_time))   

# Process Test Data
Like in the 2nd cell, we take the random data, apply the Short Term Fourier Transformation, apply absolute value, apply natural log, and apply the whitening transform, then transpose

In [None]:
##### TRANSFORM RANDOM TEST DATA #####
from scipy.signal import stft
from scipy.cluster.vq import whiten

start_time = time.time()

# Apply Fourier, then absolute, then log, then whiten on testData
for speaker in range(len(testData)):

    fs=25000 # audio is in 25,000 samples/sec
    nperseg=1250 # 1250 samples per segment (50ms)
    noverlap=250 # 250 sample overlap (10ms overlap)

    # apply, short-term Fourier, then absolute value, then natural log, then Whitening transform
    freqs, times, Zxx = stft(x=testData[speaker], fs=fs, nperseg=nperseg, noverlap=noverlap, boundary=None)  
    testData[speaker] = whiten(np.log(np.absolute(Zxx)).T)

print("Test data transformed - %s seconds" % (time.time() - start_time))

In [None]:
##### Determine speaker from test data #####
from scipy.cluster.vq import vq

# results = [speaker]
results = []

start_time = time.time()

for speaker in testData:
    
    # distances[speaker]
    distances = []
    
    # Get distortion distance between testData[speaker] and each code book
    for book in codeBooks:
        code, dist = vq(obs=speaker, code_book=book)
        distances.append(np.sum(dist))

    i=1
    for d in distances:
        #if d == min(distances): print('Distortion from speaker ' + str(i) + ': ' + str(d) + ' <-- Closest Speaker ' + str(i))
        if d == min(distances): results.append(i)
        i+=1
print("Predictions Complete - %s seconds" % (time.time() - start_time))

In [None]:
#### Print results and calculate confusion matrix
import pandas as pd

# match[iteration]
match = []
for r in range(len(results)):
    m = []
    if (r+1) == results[r]:  m.append('Yes')
    else: m.append('No')
match.append(m)

df = pd.DataFrame(data={'Match?':match, 'Prediction':results}, index=range(1,35))
df = df[['Prediction', 'Match?']]
df.index.name = 'Speaker'
df