## Takes the generated snippets and converts them to spectrograms
Spectrograms are created using the librosa library, in the original notebook at publicaly availabe 
audiobook of 'The Time Machine' was used to generate snippets and consequent spectrograms.

In [4]:
import numpy as np
import librosa
import librosa.display
from os import listdir

In [3]:
!pip install librosa

Successfully installed appdirs-1.4.4 audioread-3.0.0 librosa-0.9.2 llvmlite-0.39.1 numba-0.56.4 pooch-1.6.0 resampy-0.4.2


reading the locally saved file and plotting the original waveform using matplot and scipy

In [5]:
from scipy.io.wavfile import read
import matplotlib.pyplot as plt

split_dir = 'time_machine_split'
input_data = read(f'{split_dir}/s_7.wav')
audio = input_data[1]
plt.plot(audio)

FileNotFoundError: [Errno 2] No such file or directory: 'time_machine_split/s_7.wav'

#### Librosa and Spectrograms
using librosa to load the .wav files, librosa loads it as a tupple: (y: signal, sr: default sampling rate),
we overwrite default sr to the real sr of our files

In [None]:
y, sr = librosa.load("time_machine_split/s_7.wav")[0], 44100

Generating the spectrogram data

In [None]:
S = librosa.feature.melspectrogram(y=y, sr=sr)

S_db: converting magitude to the decibel scale (better resolution of the picture)
librosa.display creates an image inside our pyplot figure

In [None]:
import matplotlib.pyplot as plt


S_dB = librosa.power_to_db(S, ref=np.max)

plt.figure()
librosa.display.specshow(S_dB)

plt.savefig("original_sg/s_7.jpg")
plt.show()

In [None]:
def gen_melsg(wav_file_path):
    '''
    generating a mel spectrogram based on the filepath
    '''
    y, sr = librosa.load(wav_file_path)[0], 44100
    return librosa.feature.melspectrogram(y=y, sr=sr)

In [None]:
def gen_save_fig(melsg,target_dir,name='pic_0'):
    '''
    saving the generated figure,
    
    target_dir: target directory where the file is saved
    name: filename
    melsg: mel spectrogram, using gen_melsg() function
    '''
    S_dB = librosa.power_to_db(melsg, ref=np.max)
    plt.figure()
    librosa.display.specshow(S_dB)
    plt.savefig(f'{target_dir}/{name}.jpg')
    plt.close()

### Spectrogram based on local data using gen_melsg and gen_save_fig functions

In [None]:
gen_save_fig(gen_melsg('time_machine_split/s_0.wav'),'original_sg','s_0')

#### Mass spectrogram generation

In [None]:
def save_all_sg(source_dir,target_dir=None):    
    '''
    generates mel spectrogram for all the files in the snippets directory
    the spectrograms are saved using the same name as the .wav file to 
    make it easier to match each snippet to a given spectrogram
    
    source_dir: directory from where snippets are sourced from
    target_dir: directory where the spectrograms will be saved
    '''
    splits_list = listdir(source_dir)
    for filename in splits_list:
        wav_path = f'{source_dir}/{filename}'
        S= gen_melsg(wav_path)
        pic_name = filename.replace('.wav','')
        gen_save_fig(S, target_dir, pic_name)

In [None]:
# save_all_sg('time_machine_split', 'original_sg')

### If the process of snippets was interupted, following functions can be used to indentify unused .wav files and generate missing spectrograms

In [None]:
def diff_list(wav_dir, jpg_dir):
    '''
    in case something interupts the saving of the files and a comparison between 
    two directories has to be made, this function generates a list of filenames 
    (no extensions) by which source directory differs from the target
    '''
    s1 = set([file.replace('.wav','') for file in listdir(wav_dir)])
    s2 = set([file.replace('.jpg','') for file in listdir(jpg_dir)])
    
    return s1-s2
    

In [None]:
def save_diff(diff_list,source_dir, target_dir=None):
    '''
    saves the files that exist in the source directory but are not saved
    as spectrograms in the target directory
    '''
    for filename in diff_list:
        wav_path = f'{source_dir}/{filename}.wav'
        S= gen_melsg(wav_path)
        gen_save_fig(S, target_dir, filename)