<div style="font-size:38px;color:green"><u>Voice classifier</u></div><br/><br/>

<ul>
<li><b>Voice classifier</b> is <b>Artificial</b> Neural Network based classifier who's goal is to classify different voices or sound against provided labeled training dataset which consist of doctored <b>RAW audio</b> file.</li>
<li>For training all RAW Audio files must be of same size preferably <b>atleast 30 second</b> duration</li>
<li>RAW audio training data must not contain any <b>silence</b></li>
<li>This model is trained and tested in <b>8-bit unsigned PCM</b> RAW audio format.</li>
</ul>

## Audio Format

<ul>
<li> RAW audio format is sampled at <b>44100H</b>z which means amplitude of sound wave is taken 44100 times every second.
<li> Amplitude is divided into <b>256 parts</b> (in 8-bit PCM format) and stored in RAW audio file
<li> 
</ul>
    <i>NOTE: 44100Hz is chosen because if <b>nyquist rate</b> f<sub>c</sub> > 2f<sub>m</sub> where f<sub>m</sub> = 20,000Hz (max human hearable frequency)</i>

## Working

<ul>
<li> Every voice (say human voice) have distinct <b>spectrum</b> of frequency of <b>harmonices (Hz)</b> and <b>loudness (dB)</b>
<li> It is observed that around <b>1024 sample</b> is optimum for distinguishing several voices.
<li> But because of sampling rate 44100Hz sample size should be multiple of 44100 because if we multiply 
    sample rate with time <i>t</i>. we get total sample which is divided to get total training set size which should be integer
    <br/><i><b>(44100t)/n = total_tarining_set</b></i>
<li> Nearest integer is <b>882</b>
</ul>


## Spectrum

### CASE-1

<img src = "img/spec1.png">

### CASE-2

<img src = "img/spec2.png">

### CASE-3

<img src = "img/spec3.png">

<small>https://singhroshan1999.github.io/voice-classifier/</small>

In [6]:
# importing libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
INP_CONST = 882

In [7]:
def bytes_from_file(filename, chunksize=8192):
    """ Read RAW Audio file and return generator function
        Parameter(s): filename --> name of file to read
                      chunksize --> number of bytes to read at a time
        Returns: generator object
    """
    with open(filename, "rb") as f:
        while True:
            chunk = f.read(chunksize)
            if chunk:
                for b in chunk:
                    yield b
            else:
                break
def create_audio_data(l):
    """Reads list of RAW Audio file(s) and returns DataFrame"""
    datArr = []
    for i in l:
        t_arr = np.array([x for x in bytes_from_file(i)])
        datArr.append(t_arr[:t_arr.size - t_arr.size%INP_CONST])
    datArr = np.concatenate(datArr)
    datArr = datArr.reshape((datArr.size//INP_CONST,INP_CONST),order = 'C')
    return pd.DataFrame(datArr),datArr.shape[0]

def create_y_sequence(n_data,div):
    """Generate label for trainig"""
    onehotencoder = OneHotEncoder(categorical_features = [0])
    n = n_data//div  # check for float
    l = []
    for i in range(div):
        l.append(np.ones(n)*i)
    fet = pd.DataFrame(np.concatenate(l))
    return onehotencoder.fit_transform(fet).toarray()

In [8]:
def train_data(x,y,units,batch_size = 100,epochs = 100,verbose = 0):
    """Train Artifitial Neural Networks
    Parameter(s): x --> RAW audio data
                  y --> Label
                  units --> number of output of ANN
                  batch_size --> size of batch (default = 100)
                  epoch --> epoch (default = 100)
                  verbose --> verbosity (default = 0) 0 --> none | 1 --> only epochs with progressbar | 2 --> only epoch
    Return: tuple (classifier,sc) --> classifier --> keras model object
                                      sc --> StandardScalar object
    """
    sc = StandardScaler()
    x = sc.fit_transform(x)
    # initializing ANN
    classifier = Sequential()
    # adding input and hidden layers
    classifier.add(Dense(units = 50,use_bias = True, kernel_initializer = 'random_normal', activation = 'relu', input_dim = INP_CONST))  # first hidded layer
    classifier.add(Dense(units = 40,use_bias = True, kernel_initializer = 'random_normal', activation = 'relu'))  # second hidded layer
    classifier.add(Dense(units = 30,use_bias = True, kernel_initializer = 'random_normal', activation = 'relu'))  # second hidded layer
    classifier.add(Dense(units = 20,use_bias = True, kernel_initializer = 'random_normal', activation = 'relu'))  # second hidded layer
    classifier.add(Dense(units = units,use_bias = True, kernel_initializer = 'random_normal', activation = 'sigmoid'))  # output layer
    # compiling ANN
    classifier.compile(optimizer = 'nadam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    classifier.fit(x, y, batch_size = batch_size, epochs = epochs,verbose = verbose)
    return classifier,sc

In [None]:
def train(file_list,name_list = [],verbose = 0,epochs = 100):
    """Train every file in file_list with name of each voice
       Parameter(s): file_list --> list of file as on disk
                     name_list --> list of corrorsponing name of voice
       Return(s): (classifier,sc,name_list) --> classifier --> keras ANN model
                                                sc --> StandardScalar object
                                                name_list --> name_list
    """
    x,size = create_audio_data(file_list)
    y = create_y_sequence(size,5)
    classifier = train_data(x,y,len(file_list),verbose = verbose,epochs = epochs)
    return classifier[0],classifier[1],name_list


train = train(['anuragD30.raw','anupamD30.raw','animeshD30.raw','amanD30.raw','deepakbD30.raw'],
             ['anu','anup','ani','aman','dee'],verbose = 2)

In [10]:
def predict(filename,classifier,scalar):
    """predict voice of given file
       Parameter(s): filename --> as on disk
                     classifier --> train() classifier
                     scalar --> train() sc
       Return(s): Datafram of predicted values and size tuple"""
    df,size = create_audio_data([filename])
    predicted_value = classifier.predict(scalar.transform(df))
    return pd.DataFrame(predicted_value),size

def predict_names(filename,train):
    """returns percent predicted values with name of voice
       Parameter(s): filename --> as on disk
                     train --> tuple returned by train()
       Return(s): Dataframe of percent prediction and name"""
    predicted,sum = predict(filename,train[0],train[1])
    return pd.concat([pd.Series(train[2]),pd.Series([x/(sum) for x in  predicted.sum()])],axis = 1)
print(predict_names('x.raw',train))
print(predict_names('y.raw',train))
print(predict_names('an.raw',train))
print(predict_names('am.raw',train))
print(predict_names('dee.raw',train))


      0         1
0   anu  0.757456
1  anup  0.006656
2   ani  0.005969
3  aman  0.189908
4   dee  0.041171
      0         1
0   anu  0.026449
1  anup  0.686787
2   ani  0.111648
3  aman  0.034363
4   dee  0.131642
      0         1
0   anu  0.038250
1  anup  0.101881
2   ani  0.560419
3  aman  0.109841
4   dee  0.171051
      0         1
0   anu  0.191873
1  anup  0.044898
2   ani  0.081464
3  aman  0.506209
4   dee  0.175246
      0         1
0   anu  0.042293
1  anup  0.113324
2   ani  0.174360
3  aman  0.103249
4   dee  0.552758
