<a href="https://colab.research.google.com/github/sdkchris/Projects/blob/main/Audio_word_embedding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Audio word Embedding

Word embedding is a technique used for representing document vocabulary. It is capable of capturing context of a word in a document, semantic and syntactic similarity, relation with other words, etc.

Thus, by word embeddings we refer to a particular word vector representation. 
One popular technique used to learn word embeddings is the `Word2Vec` network developed by Tomas Mikolov in 2013 at Google.

Another way we can represent spoken words is by using `MFCCs` or mel-frequency cepstrum. Which are the short-term power spectrum of a sound. 

MFCCs are commonly derived as follows:

* Take the Fourier transform of (a windowed excerpt of) a signal.
* Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows
* Take the logs of the powers at each of the mel frequencies.
* Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
* The MFCCs are the amplitudes of the resulting spectrum.

In [None]:
import os
import librosa
data_path = "/Volumes/EOS_DIGITAL/Project/sample_audios"

def save_mfcc(file_path):
    mfccs_list = []
    for i, (dirpath,dirnames, filenames) in enumerate(os.walk(file_path)):
        for f in filenames:
                    #load audio files
                    file_path_ = os.path.join(dirpath,f)
                    signal, sr = librosa.load(file_path_,sr=22050)

                    #extrating mfcc of segments 
                    mfcc = librosa.feature.mfcc(signal,
                                                    n_mfcc=13, 
                                                    n_fft=2048, 
                                                    hop_length=512)
                    mfcc = mfcc.T
                    mfccs_list.append(mfcc)
    return mfccs_list

In [None]:
output = save_mfcc(data_path)


In [None]:
print("How many mfcc do we have:", len(output))

How many mfcc do we have: 976


In [None]:

print(output[1])
print('=============================')
print(output[2])
print('=============================')
print(output[3])

[[-512.6782     31.551628   28.06759  ...   11.730126   11.606915
    11.821565]
 [-512.00934    31.889656   26.967669 ...   10.198787    9.207342
    10.214563]
 [-513.1436     30.349157   25.680653 ...    9.199541    8.70477
     9.936244]
 ...
 [-520.11456    22.314564   22.006763 ...   15.924114   14.794645
    13.570598]
 [-519.81366    22.771467   22.546993 ...   16.420572   15.30821
    14.172741]
 [-518.64594    24.386747   24.05541  ...   15.580811   14.198052
    12.843634]]
[[-465.47824     38.446266    24.209316  ...    8.850157    11.720329
    17.371914 ]
 [-464.27512     39.203827    22.941734  ...    7.0304527   10.0044985
    16.802752 ]
 [-462.90906     41.01803     24.16542   ...    9.82336     10.416382
    15.878773 ]
 ...
 [-484.56894     17.42905     17.287022  ...   13.189135    12.400924
    11.578836 ]
 [-485.0208      16.800692    16.690435  ...   13.426166    12.777091
    12.091467 ]
 [-484.1604      17.998173    17.83051   ...   13.144154    12.275062
    

In [None]:
#print(len(output[i]) for i in output)

<generator object <genexpr> at 0x1c27236048>


In [None]:
# Python program to generate word vectors using Word2Vec 
# importing all necessary modules 
import nltk
import numpy
import scipy
import six
from nltk.tokenize import sent_tokenize, word_tokenize 
import warnings 
  
warnings.filterwarnings(action = 'ignore') 
import gensim 
from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec  

from gensim.models import KeyedVectors, Word2Vec 
import smart_open
from smart_open import smart_open

  
#  Reads the file
data_path = "/Volumes/EOS_DIGITAL/Project/sample_texts"
def save_wordEmbeddings(file_path):
    data = [] 
    my_dict = dict({})
    for i, (dirpath,dirnames, filenames) in enumerate(os.walk(file_path)):
        for t in filenames:
            #load text files
            file_path_ = os.path.join(dirpath,t)
            sample = open(file_path_,'r+') 
            s = sample.read() 
            
            #replace escape char with space
            f = s.replace("\n", " ")
            
            # iterate through each sentence in the file 
            for i in sent_tokenize(f): 
                temp = [] 

                # tokenize the sentence into words 
                for j in word_tokenize(i): 
                    temp.append(j.lower()) 

                data.append(temp) 

            # Create CBOW model 
            model1 = gensim.models.Word2Vec(data, min_count = 1,  
                                         size = 100, window = 5) 
            #model1.save("word2vec.model")
            #The trained word vectors are stored in a KeyedVectors instance in model.wv:
            #model1.wv.save(path)
            #wv = KeyedVectors.load("model1.wv", mmap='r')
            #word_emb = model1.wv.syn0(data)  # numpy vector of a word
            
            for idx, key in enumerate(model1.wv.vocab):
                my_dict[key] = model1.wv[key]
            # Or my_dict[key] = model.wv.get_vector(key)
            # Or my_dict[key] = model.wv.word_vec(key, use_norm=False)
            
            # Create Skip Gram model 
            # model2 = gensim.models.Word2Vec(data, min_count = 1, size = 100, 
                                                      #   window = 5, sg = 1)
           # data.append(word_emb)
    return my_dict   

In [None]:
word_emb = save_wordEmbeddings(data_path)


In [None]:
word_emb

{'please': array([-4.7416459e-03, -3.5641585e-03,  1.4538092e-03,  2.9771756e-03,
        -5.4323701e-03, -7.1554543e-03,  6.4432673e-04, -5.0734119e-03,
         2.7002117e-03, -3.2663832e-03, -4.3548964e-04,  3.9375578e-03,
        -2.1106119e-03, -3.8705950e-04,  9.7728353e-03,  5.5512697e-03,
        -7.9675876e-03, -6.5082760e-04, -7.6068891e-04,  7.3153260e-03,
         8.4596369e-03,  9.1413967e-04,  6.2175887e-03,  3.5193625e-03,
         2.5847356e-03, -4.8023658e-03,  3.6343986e-03, -3.3300035e-03,
         5.7447650e-03, -2.5287347e-03, -3.7753249e-05, -2.0132677e-03,
        -4.6357247e-03, -1.6197496e-03,  9.4637595e-04,  9.0638510e-05,
        -3.9450768e-03,  1.9282593e-03,  6.8643172e-03,  6.1279894e-03,
        -2.0865933e-03,  3.5675609e-04, -2.2390068e-03,  4.2546275e-03,
         9.0992008e-04, -4.9029239e-03, -5.0154887e-04, -7.7626593e-03,
        -4.3454612e-04, -4.2327330e-03,  1.8541249e-03, -5.3111236e-03,
        -3.2952046e-03,  3.5891701e-03,  8.5299807e-03

In [None]:
len(word_emb)

1636

In [None]:
word_emb['mind']

array([-1.12569928e-02,  1.65838021e-04,  4.09548497e-03,  6.39226148e-03,
       -9.31145437e-03, -1.26105100e-02, -3.00386036e-03,  4.21152159e-04,
        9.65420809e-03,  3.51815042e-03, -1.00719021e-03,  4.62145435e-05,
       -7.58026401e-03, -1.19705428e-03,  1.52305197e-02, -1.66296260e-04,
       -4.59042983e-03, -5.35294414e-04, -2.53588893e-03,  5.24575869e-03,
        4.41886485e-03, -1.84848777e-03, -7.79624970e-04,  1.60101021e-03,
       -1.43618404e-03, -8.23835190e-03,  4.36827587e-03,  5.82287461e-03,
        1.20205795e-02, -1.47113425e-03, -5.99818537e-03, -4.77101421e-03,
        1.09701243e-03, -7.80286035e-03, -3.71237169e-03, -8.34156014e-03,
       -8.66760500e-03,  4.90901002e-04,  1.74459198e-03,  6.18956564e-03,
       -5.04825206e-04,  1.58741733e-03, -6.13722950e-03,  1.77118916e-03,
       -7.11300084e-03, -2.79635144e-03,  1.56854920e-03, -1.11275828e-02,
        8.83124862e-03, -6.88140374e-03,  5.44251641e-03, -1.18149968e-03,
        3.08884494e-03,  

For the sake of visualization, the above embedding would be a vector representation of the word `mind`. 
 
#### Applications: 
* Voice recognition
* Speech generation in chat bots
* Machine Translation 
* Music information retrieval applications such as genre classification
* etc.