## Principle Component Analysis Network
The dataset "data/sound.csv" contains two sounds recorded by the two microphones. The goal of this assignment is using PCA network to find the approximation of the first principal component.
- Build a PCA network to reduce the number of features from 2 to 1
- Train the model and generate the processed data
- Save the data into output.wav and output.csv files
- Compare the sound_o.wav (audio with noise) and output.wav (audio is denoised)  

In [1]:
import numpy as np
import pandas as pd
from scipy.io import wavfile

In [2]:
samrate = 8000

In [3]:
# read csv into array
txtData = np.genfromtxt('data/sound.csv', delimiter=',')
txtData.shape

(50000, 2)

In [4]:
# save array to WAV file
scaledData = np.int16(txtData * samrate)
wavfile.write('data/sound_o.wav', samrate, scaledData)

In [5]:
# read WAV file into array
# The data in sound.csv is processed
# If you use the data generated here, you need to process the data by adding wavData = wavData / samrate
samrate, wavData = wavfile.read('data/sound_o.wav')
samrate, wavData.shape
wavData = wavData / samrate #divide by 80000

In [6]:
# save array to csv file
np.savetxt('data/sound_o.csv', txtData, delimiter=',')

In [7]:
# build PCA model and only Numpy can be used
class PCA(object):
    def __init__(self, lr, epoch):
        self.lr = lr
        self.epoch = epoch
         
    def train(self, x, n_components):
        numWeights = x.shape[1] #there's two features, therefore there's two weights
        self.W = np.random.rand(numWeights,) #generate a random value for the weights of each feature
        #print(self.W)
        for i in range(self.epoch):
            for xi in x: 
                y = xi[0] * self.W[0] + xi[1] * self.W[1]
                deltaW1 = (y * xi[0]) - (pow(y, 2) * self.W[0])
                deltaW2 = (y * xi[1]) - (pow(y, 2) * self.W[1])

                self.W[0] += self.lr * deltaW1
                self.W[1] += self.lr * deltaW2
        
    def predict(self, x):
        y_pred = np.matmul(x, self.W) #multiply by weights
        return y_pred 
                


In [8]:
# initialize and train the model
model1 = PCA(0.1, 10)
model1.train(wavData, 1) #set n_components to 1
y_pred = model1.predict(wavData)


In [9]:
# save the data
scaledData = np.int16(y_pred * samrate)
np.savetxt('data/output.csv', scaledData, delimiter=',')
wavfile.write('data/output.wav', samrate, scaledData)