# 1. Introduction

### 1.1 What is this about?

In this kernel I'm going to try visualise `the look` of a sound. Particularly what spectrograms(and much more!) and other of dogs' & cats' voice look like. I'll extract their features and try to visualise them 2D's and 3D's plots.  

We will check if it is possible to distinguish the voice by looking at a picture. We will `SeeTheSound`.

In [25]:
display(HTML("\
    <ul style = 'list-style-type:none; display:inline-block; text-align:center'>\
    <li style = 'float:left'>\
        <ul style='list-style-type: none'>\
            <li><h3>Cat</h3></li>\
            <li><img src = 'data/images/cat/mfcc_2.png'/></li> \
        </ul>\
    </li>\
     <li style = 'float:left'>\
        <ul style='list-style-type: none'>\
            <li><h3>Dog</h3></li>\
            <li><img src = 'data/images/dog/mfcc_4.png'/></li> \
        </ul>\
    </li>\
    <li style = 'float:left'>\
        <ul style='list-style-type: none'>\
            <li><h3>Cat</h3></li>\
            <li><img src = 'data/images/cat/mfcc_1.png'/></li> \
        </ul>\
    </li>\
    <li style = 'float:left'>\
        <ul style='list-style-type: none'>\
            <li><h3>Dog</h3></li>\
            <li><img src = 'data/images/dog/mfcc_6.png'/></li> \
        </ul>\
    </li>\
    <ul> \
    </ul>"

))


HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

# 2. Imports of modules & reading the dataset

### 2.1 Importing libraries

In [52]:
import os
import random
import librosa
import librosa.display
from scipy.io import wavfile
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import HTML, display
import plotly.graph_objs as go
from scipy import signal

### 2.2 Read the dataset

In [26]:
"""Creating a string path variables for future easier use"""
general_path = os.getcwd() + "\\data\\"
dog_path = general_path + "dog"
cat_path = general_path + "cat"

### 2.3 Lists of files

In [27]:
cats_list = os.listdir(cat_path)
dogs_list = os.listdir(dog_path)

In [28]:
def pick_random_files(list_of_files, total):
    random_list = []
    output_str_list = []
    for i in range(0, total):
        random_index = random.randint(0, len(list_of_files))
        while random_index in random_list:
            random_index = random.randint(0, len(list_of_files))
        output_str_list.append(list_of_files[random_index])
    return output_str_list

In [29]:
total = 10
cats_files = pick_random_files(cats_list, total)
dogs_files = pick_random_files(dogs_list, total)  

# 3. Converting .wav files into images(.PNG)

### 3.1 Creating path vars

In [30]:
dogs_images_path = general_path + "\\images\\dog"
cats_images_path = general_path + "\\images\\cat"

### 3.2 Spectrograms of the files and saving them

#### 3.2.1 Dogs

In [78]:
counter = 0
for file in dogs_files:
    x, sr = librosa.load(dog_path + "\\" + file)
    X = librosa.stft(x)
    Xdb = librosa.amplitude_to_db(abs(X))
    librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
    plt.savefig(dogs_images_path + "\\" + str(counter))
    plt.close()
    counter = counter + 1

#### 3.2.2 Cats

In [80]:
counter = 0
for file in cats_files:
    x, sr = librosa.load(cat_path + "\\" + file)
    X = librosa.stft(x)
    Xdb = librosa.amplitude_to_db(abs(X))
    librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
    plt.savefig(cats_images_path + "\\" + str(counter))
    plt.close()
    counter = counter + 1

### 3.3 Mel-Frequency Cepstral Coefficients Exploration, Plotting & Saving 

#### 3.3.1 Universal function creating MFCC features and saving

In [31]:
def handle_mfcc(directory, destination, list_of_files):
    counter = 0
    prefix = "mfcc_"
    for file in list_of_files:
        samples, sr = librosa.load(directory + "\\" + file)
        S = librosa.feature.melspectrogram(samples, sr= sr, n_mels=128)
        log_S = librosa.power_to_db(S, ref=np.max)
        librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
        plt.colorbar(format='%+02.0f dB')
        plt.savefig(destination + "\\" + prefix + str(counter))
        plt.close()
        counter = counter + 1

#### 3.3.2 Applying the above function

In [89]:
handle_mfcc(dog_path, dogs_images_path, dogs_files)
handle_mfcc(cat_path, cats_images_path, cats_files)

# 4. 3D visualisation

### 4.1 3D Spectrograms

#### 4.1.1 Function extracting necessary features to plot on 3D diagram

In [76]:
def log_specgram(audio, sample_rate, window_size=20,
                 step_size=10, eps=1e-10):
    nperseg = int(round(window_size * sample_rate / 1e3))
    noverlap = int(round(step_size * sample_rate / 1e3))
    freqs, times, spec = signal.spectrogram(audio,
                                    fs=sample_rate,
                                    window='hann',
                                    nperseg=nperseg,
                                    noverlap=noverlap,
                                    detrend=False)
    return freqs, times, np.log(spec.T.astype(np.float32) + eps)

#### 4.1.2 Picking an exemplary file to read

In [85]:
audio, sr = wavfile.read(str(dog_path) + "\\" + "dog_barking_1.wav")

# 5. To be continued....