<h1 style="font-family: Helvetica; font-size: 29px"> Notebook: Preprocessing Arabic Music Data for Variational Auto Encoder Generation Model</h1>
<p style="font-size: 16px; font-family: Times New Roman;">This Notebook is based on the preprocessing done by Valerio Velardo for VAE based model. We edited some parameters to fitin with our data. 
In this notebook, data is preprocced through piplining process, started by segmenting audio files each <b>5.94 seconds</b> , then padding them if necessary , then extracting Log specttrograms to use in model training , then applying min_max normlizer to use it in post processing , and finally we call all of that to save them in the mentioned paths, as an extra step we extract the log spectrograms Images to observe the music trends given frequency and time.
The data being preprocessed is <b>75<b> Wav files , segmented into <b>8756<b></p>

<h1 style="font-family: Helvetica; font-size: 28px"> Loading Necessary Libraries</h1>

<p style="font-size: 16px; font-family: Times New Roman;">
This cell of code imports the necessary libraries and modules needed for audio processing
os is used for file system operations. pickle is used for serializing and de-serializing Python object structures. librosa is a Python package for music and audio analysis. pydub is a high-level audio library that makes it easy to work with audio files. soundfile is a Python library for reading and writing audio files. io provides Python's core input/output support.</p>

In [2]:
import os
import pickle
import librosa
from pydub import AudioSegment
import soundfile as sf
from matplotlib import pyplot as plt
import librosa.display
import numpy as np
import io
from keras.models import load_model

<h1 style="font_family:Helvetica; font-size:28px"> Segment Wav files</h1> 

<p style="font-size: 16px; font-family: Times New Roman;">
The code in the cell below defines a function named 'Segment' that segments audio files into smaller segments of a specified length using a sliding window approach. The input to the function is a folder containing audio files, and the output is a folder where the segmented audio files will be saved.
Overall, this code is useful for preprocessing audio files for tasks such as speech recognition, music genre classification, or any other audio-based machine learning tasks that require the audio files to be segmented or as we used it in preprocessing for generation model.</p>



In [None]:
def Segment(input_folder, output_folder):
    segment_length = 2.97
    hop_length = 2
    
    for filename in os.listdir(input_folder):
        if filename.endswith(".wav"):
            ## Set input and output file paths
            input_path = os.path.join(input_folder, filename)
            output_path = os.path.join(output_folder, filename)

            # Load audio file using librosa
            y, sr = librosa.load(input_path, sr=None)

            # Calculate segment frame and sample lengths
            segment_frames = int(segment_length * sr)
            hop_frames = int(hop_length * sr)
            total_frames = len(y)
            total_segments = int((total_frames - segment_frames) / hop_frames) + 1

            # Segment audio using a sliding window
            for i in range(total_segments):
                # Calculate start and end frame indices for current segment
                start_frame = i * hop_frames
                end_frame = start_frame + segment_frames

                # Extract audio segment
                y_segment = y[start_frame:end_frame]

                # Set output file path for current segment
                output_segment_path = output_path.replace(".wav", f"_segment{i}.wav")

                # Save audio segment to file
                #librosa.output.write_wav(output_segment_path, y_segment, sr)
                sf.write(output_segment_path, y_segment, sr)

<h1 style="font-family: Helvetica; font-size: 28px"> Load the files</h1>

<p style="font-size: 16px; font-family: Times New Roman;">
The Loader class is used to load audio files with specific parameters. It takes in three arguments: sample_rate, duration, and mono. sample_rate.This class is useful when working with large audio datasets and allows for easy and customizable loading of audio files. </p>



In [None]:
class Loader:

    def __init__(self, sample_rate, duration, mono):
        self.sample_rate = sample_rate
        self.duration = duration
        self.mono = mono

    def load(self, file_path):
        signal = librosa.load(file_path,
                              sr=self.sample_rate,
                              duration=self.duration,
                              mono=self.mono)[0]
        return signal

<h1 style="font-family: Helvetica; font-size: 28px"> Padding if necessary </h1>

<p style="font-size:16xp : font_family :Times New Romans; ">In the cell below, The Padder class is designed to add padding to an array using '0' to make it a certain size. The class has two methods, left_pad and right_pad, which take an array and the number of missing items as inputs. The mode parameter sets the type of padding to be used, which defaults to "constant". The left_pad method adds padding to the left side of the array, while the right_pad method adds padding to the right side of the array. Both methods return the padded array.</p>

In [None]:
class Padder:

    def __init__(self, mode="constant"):
        self.mode = mode

    def left_pad(self, array, num_missing_items):
        padded_array = np.pad(array,
                              (num_missing_items, 0),
                              mode=self.mode)
        return padded_array

    def right_pad(self, array, num_missing_items):
        padded_array = np.pad(array,
                              (0, num_missing_items),
                              mode=self.mode)
        return padded_array


<h1 style="font-family: Helvetica; font-size: 28px">Extracting log_spectrogram</h1>

<p style="font-size:16xp : font_family :Times New Romans; ">In the cell below ,The LogSpectrogramExtractor class is responsible for extracting a log-scaled spectrogram from an audio signal. It takes in two parameters during initialization, frame_size and hop_length, which determine the size of the short-time Fourier transform (STFT) window and the number of samples to advance the window respectively. In the extract method, the signal is first transformed using STFT, and the magnitude of the complex-valued STFT is computed. The magnitude spectrogram is then converted to decibel scale using librosa.amplitude_to_db, which applies a logarithmic compression to the spectrogram. The resulting log-spectrogram is returned by the method. We usend log_seoctrogram in previous classification task and here we are using it in  generation model.</p>

In [None]:
class LogSpectrogramExtractor:

    def __init__(self, frame_size, hop_length):
        self.frame_size = frame_size
        self.hop_length = hop_length

    def extract(self, signal):
        stft = librosa.stft(signal,
                            n_fft=self.frame_size,
                            hop_length=self.hop_length)[:-1]
        spectrogram = np.abs(stft)
        log_spectrogram = librosa.amplitude_to_db(spectrogram)
        return log_spectrogram

<h1 style="font-family: Helvetica; font-size: 28px">Min_Max Normaliser</h1> 

<p style="font-size:16xp : font_family :Times New Romans; ">The MinMaxNormaliser class applies min-max normalization to an input array. It is used to scale an array of values to a specified range, defined by min_val and max_val. The class contains two main methods: normalise and denormalise. The normalise method takes an input array, calculates its minimum and maximum values, and scales the array to the range specified by min_val and max_val. The resulting normalized array is returned. The denormalise method takes a normalized array, the original minimum and maximum values of the input array, and scales the normalized array back to the original range. This is useful for our  task as we normalize the arrays for generation then, denormalise for postprocessing to return to original value.</p>

In [None]:
class MinMaxNormaliser:
    """MinMaxNormaliser applies min max normalisation to an array."""

    def __init__(self, min_val, max_val):
        self.min = min_val
        self.max = max_val

    def normalise(self, array):
        norm_array = (array - array.min()) / (array.max() - array.min())
        norm_array = norm_array * (self.max - self.min) + self.min
        return norm_array

    def denormalise(self, norm_array, original_min, original_max):
        array = (norm_array - self.min) / (self.max - self.min)
        array = array * (original_max - original_min) + original_min
        return array


<h1 style="font-family: Helvetica; font-size: 28px">Saver</h1>  

<p style="font-size:16xp : font_family :Times New Romans; "> The cell below is responsible to save features and min_max values.</p>

In [None]:
class Saver:

    def __init__(self, feature_save_dir, min_max_values_save_dir):
        self.feature_save_dir = feature_save_dir
        self.min_max_values_save_dir = min_max_values_save_dir

    def save_feature(self, feature, file_path):
        save_path = self._generate_save_path(file_path)
        np.save(save_path, feature)

    def save_min_max_values(self, min_max_values):
        save_path = os.path.join(self.min_max_values_save_dir,
                                 "min_max_values.pkl")
        self._save(min_max_values, save_path)

    @staticmethod
    def _save(data, save_path):
        with open(save_path, "wb") as f:
            pickle.dump(data, f)

    def _generate_save_path(self, file_path):
        file_name = os.path.split(file_path)[1]
        save_path = os.path.join(self.feature_save_dir, file_name + ".npy")
        return save_path


<h1 style="font-family: Helvetica; font-size: 28px">Extracting log_spectrogram</h1>

<p style="font-size:16xp : font_family :Times New Romans; "> The cell below expalins Preprocessing Pipeline processes audio files in a directory, applying
    the following steps to each file: <ol>
  <li>Load a file</li>
  <li>Pad the signal (if necessary)</li>
  <li>Extract log spectrogram from signal</li>
  <li>Normalise spectrogram</li>
  <li>Save the normalised spectrogram</li>
   </ol></p>
    


In [None]:
class PreprocessingPipeline:
    
    def __init__(self):
        #self.padder = None
        self.extractor = None
        self.normaliser = None
        self.saver = None
        self.min_max_values = {}
        self._loader = None
        self._num_expected_samples = None

    @property
    def loader(self):
        return self._loader

    @loader.setter
    def loader(self, loader):
        self._loader = loader
        self._num_expected_samples = int(loader.sample_rate * loader.duration)

    def process(self, audio_files_dir):
        for root, _, files in os.walk(audio_files_dir):
            for file in files:
                file_path = os.path.join(root, file)
                self._process_file(file_path)
                print(f"Processed file {file_path}")
        self.saver.save_min_max_values(self.min_max_values)

    def _process_file(self, file_path):
        signal = self.loader.load(file_path)
        #if self._is_padding_necessary(signal):
            #signal = self._apply_padding(signal)
        feature = self.extractor.extract(signal)
        norm_feature = self.normaliser.normalise(feature)
        save_path = self.saver.save_feature(norm_feature, file_path)
        self._store_min_max_value(save_path, feature.min(), feature.max())

    #def _is_padding_necessary(self, signal):
        #if len(signal) < self._num_expected_samples:
            #return True
        #return False

    #def _apply_padding(self, signal):
        #num_missing_samples = self._num_expected_samples - len(signal)
        #padded_signal = self.padder.right_pad(signal, num_missing_samples)
        #return padded_signal

    def _store_min_max_value(self, save_path, min_val, max_val):
        self.min_max_values[save_path] = {
            "min": min_val,
            "max": max_val
        }

<h1 style="font-family: Helvetica; font-size: 28px">Call Segement Function</h1>

In [61]:
input_folder = r"E:\E just\Spring 3rd Year\PBL\generation\Generation by one genre\class wav"
seg_out = r"E:\E just\Spring 3rd Year\PBL\generation\Generation by one genre\wav new shape"
Segment(input_folder, seg_out)

<h1 style="font-family: Helvetica; font-size: 28px"> Instantiate all objects</h1>

<p style="font-size:16xp : font_family :Times New Romans; ">The code in the cell below instantiates several objects and sets up a preprocessing pipeline for audio files.</p>

In [None]:
FRAME_SIZE = 512
HOP_LENGTH = 256
DURATION = 2.97 # in seconds
SAMPLE_RATE = 22050
MONO = True

SPECTROGRAMS_SAVE_DIR = r"path link"
MIN_MAX_VALUES_SAVE_DIR = r"E:\E just\Spring 3rd Year\PBL\generation\Generation by one genre\min_max"
FILES_DIR = seg_out

# instantiate all objects
loader = Loader(SAMPLE_RATE, DURATION, MONO)
#padder = Padder()
log_spectrogram_extractor = LogSpectrogramExtractor(FRAME_SIZE, HOP_LENGTH)
min_max_normaliser = MinMaxNormaliser(0, 1)
saver = Saver(SPECTROGRAMS_SAVE_DIR, MIN_MAX_VALUES_SAVE_DIR)

preprocessing_pipeline = PreprocessingPipeline()
preprocessing_pipeline.loader = loader
#preprocessing_pipeline.padder = padder
preprocessing_pipeline.extractor = log_spectrogram_extractor
preprocessing_pipeline.normaliser = min_max_normaliser
preprocessing_pipeline.saver = saver

preprocessing_pipeline.process(FILES_DIR)

<h1 style="font-family: Helvetica; font-size: 28px"> Extract Log-Spectrograms images</h1>

<p style="font-size:16xp : font_family :Times New Romans; "> In the cell below, the code is extracting the log spectrograms images from the log spectrogram numpy arrays using matplot.</p>

In [None]:
import numpy as np
spectrogram_path = r"E:\E just\Spring 3rd Year\PBL\generation\test generation trial3\log_numpy"
images_path = r"E:\E just\Spring 3rd Year\PBL\generation\test generation trial3\log_images"

for file_name in os.listdir(spectrogram_path):
    file_path = os.path.join(spectrogram_path, file_name)
    img_path = os.path.join(images_path, file_name.replace('.npy', '_mel_spec.png'))
    spectrogram = np.load(file_path)
    
    # Plot the spectrogram as an image
    plt.imshow(spectrogram, origin='lower', aspect='auto')
    plt.colorbar()
    plt.xlabel('Time')
    plt.ylabel('Mel bins')
    plt.title('Log Mel Spectrogram')
    plt.savefig(img_path)
    plt.clf()
